From patchwork Tue Apr 30 12:49:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joe Ramsay X-Patchwork-Id: 1929498 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=BLG1LOOY; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=BLG1LOOY; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4VTKnJ2ZgPz23jG for ; Tue, 30 Apr 2024 22:51:00 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6A0B0384AB47 for ; Tue, 30 Apr 2024 12:50:58 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR02-VI1-obe.outbound.protection.outlook.com (mail-vi1eur02on2042.outbound.protection.outlook.com [40.107.241.42]) by sourceware.org (Postfix) with ESMTPS id 5BD323858288 for ; Tue, 30 Apr 2024 12:50:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5BD323858288 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 5BD323858288 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.241.42 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1714481438; cv=pass; b=k2MtEFEwuWzAqri1wXiGXiRMfwCx4yaCEACf9S0kiHX9uDekBFWK1S1lrtLjc6m+Xu4CcUcfP2gWvpuvKDl+EWFZi5BDfyS5FzrO9i67CrdsLm+d7Dyi1kYLaALJUnUWekiCg28vnc1g0Ff7f2sqwkX6fUMMrVnuHt1VjH1XD40= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1714481438; c=relaxed/simple; bh=BZnIGv6/QqIlU+1Ql+zSVpY73nndHHRHsqHVaTrpYyE=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=Iw7iY9rMva18olelWMHFCh9GmKXctipAw2AXDqi8pgPjzRHNJPhmFDrDDooFJZE3Dny619aftAadkJag8MjfGcryrPU71+traQz92dVl8TKTPMghvoAe+FCkeJSfnZDoFkdqJDIpO/FkuhPDBeHz8ExEmVLTMTixXyn5aPienfg= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=VqMfiZjBGp2Swt5y4BIAamccfEJ/F2PBDZFi/xJbhaO3n9SLbxMeQ1+TUT7Q24huxJ6RTtEFdgJSqGKnJmlRVChvhIInzkjF4T5RWbIrpSGPKqkZMsIVO61wanZNw8qk/fvQKqcbNqf8kQUMBIRnJxSPrFbAzReDJVh3JGd3xOedjnbIfzuXzXAP1968XhEQ5B+szk3cxj6Y9hTb3nj8IdlP7ddX8aChBOTUPQ3Bp2QQILJR9hxlBXLvP5BMCkSwCcERj3eqHumU2j8/JpYUe8KzxHtqala0tgAQHnEVcty3o4bTYTLc+E7bLTvhhAQHZj/taDTzE1jbjZerg1YysA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=rrwp5q5Dq8Fam8hj2a19DeNUlPKTw1fCjFIzZHP4Ewk=; b=LYCKyq6S4ZMAf1rxkWuaND8CR36B9bfjeaDzGniXRtsyU+GRtJbxvBqOxCO1pFyOV+plNNprYnTj+jFXMjj3gYUy582jGp1717puG2pOwJQxXJUj2JA6wSKrWVDTEtn3YksZgnbij9TfXcf+ghjpbDRSlrMrcboy7HiwTG7jAdGCakvEbdH/LSZucGnE+O859SWle0GFVJYRYk1UUdFAIRkXmuxjWFKbGTRpzBziPuBGiwzUkgr2BZsWXF3tZUei7v9jWPzMDE2JdCHqSK4P35s0rLtN4u3zlrY3UanMcuD+fefDmZlQEV4eAU5Mn72VBQPOf9KDdlqWKamYi9Uerg== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=arm.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=rrwp5q5Dq8Fam8hj2a19DeNUlPKTw1fCjFIzZHP4Ewk=; b=BLG1LOOYuvTgidvGeIjWmQBsQ7vIyM66y/ikPlQt5G5vkRAOqVgvUQMIpyVvDOPHcqDyJnifrnmUqQVg+w7GTGZ8PdX0pwMT0m946Pz4KAMsokXCS9fWXcEu9AY5sY380qh7af+XZSwnCWgQ3DaeF+g3Xu3yvvW2DdHqS+oFLfg= Received: from AS4P195CA0049.EURP195.PROD.OUTLOOK.COM (2603:10a6:20b:65a::25) by PAWPR08MB10305.eurprd08.prod.outlook.com (2603:10a6:102:367::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7519.32; Tue, 30 Apr 2024 12:50:20 +0000 Received: from AM3PEPF0000A790.eurprd04.prod.outlook.com (2603:10a6:20b:65a:cafe::f) by AS4P195CA0049.outlook.office365.com (2603:10a6:20b:65a::25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7519.32 via Frontend Transport; Tue, 30 Apr 2024 12:50:20 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=arm.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM3PEPF0000A790.mail.protection.outlook.com (10.167.16.119) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.7544.18 via Frontend Transport; Tue, 30 Apr 2024 12:50:18 +0000 Received: ("Tessian outbound ba75727f6dca:v315"); Tue, 30 Apr 2024 12:50:18 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: eda5998650458668 X-CR-MTA-TID: 64aa7808 Received: from c5be33261b7c.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 46EA6986-B1D2-4FBF-8D0E-683125DED672.1; Tue, 30 Apr 2024 12:50:12 +0000 Received: from EUR02-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id c5be33261b7c.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Tue, 30 Apr 2024 12:50:12 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=mFwzskLDbmmY/97GaXaVQP2Bh4oSR+/hdLOP8sW98ypC5hQOYOZFhjYCS9OUCMTNVvXcOKi2mR26fhsH0iRs9eR7g4yy27Dha8ErbPFYEEwWn1ZeuQw7zJ7LLS+1Ao79hlOSKBStNfUEbLlJUvJ6koae/fZgvY/JnkbtRAH0u8mrpttPuG09wggFOZJUTfwDAMJw59uh4TVIuQHQqdEmDAVGPua0s4XNOI6wtLBymFEmZ20Y0AWWEHPLU7/k5Sp6q9qH+zfRnV0P6ueDwIyCv6FUHVg6Nsx7HRaWit7SSteSRCnGIT7gyMT3/T0QUQPp0H/asspq/DIgdO2Gzs9f6g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=rrwp5q5Dq8Fam8hj2a19DeNUlPKTw1fCjFIzZHP4Ewk=; b=B9QKGlY2ws7gZ+BWxmvSBjZbzGHOXS1O/F/QMl3+3MkEFcUoN48LDU7SfmUht56xlFizi3RzrjNhEf2UeYiXb8x3YzsTQZshCQGviB0qk2kJaDYQNwlwKc+MCiEM6zYuRrB1xEE/TRwRVI3wOmG1NlYfqxqREkw7gEoqnaNyryWuM0xyxXQ1sk0R/yopRRTvhzE2kBvSM+zWHVoVoBm30Y+vVViisYHOc0Q9tFKYO03a4xctzqGlt0jZav0EaGcLJ+BA8L2HzInNq3vfWO4fAC4GDWINCEW0NsXkL/h8GQ7XnqsEhPOQ3oeQnuO89TlCUP4AhtepTD9V0wFhandquQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 40.67.248.234) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=rrwp5q5Dq8Fam8hj2a19DeNUlPKTw1fCjFIzZHP4Ewk=; b=BLG1LOOYuvTgidvGeIjWmQBsQ7vIyM66y/ikPlQt5G5vkRAOqVgvUQMIpyVvDOPHcqDyJnifrnmUqQVg+w7GTGZ8PdX0pwMT0m946Pz4KAMsokXCS9fWXcEu9AY5sY380qh7af+XZSwnCWgQ3DaeF+g3Xu3yvvW2DdHqS+oFLfg= Received: from DU7PR01CA0021.eurprd01.prod.exchangelabs.com (2603:10a6:10:50f::9) by AS8PR08MB8182.eurprd08.prod.outlook.com (2603:10a6:20b:54f::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7519.34; Tue, 30 Apr 2024 12:50:08 +0000 Received: from DU2PEPF00028D0B.eurprd03.prod.outlook.com (2603:10a6:10:50f:cafe::de) by DU7PR01CA0021.outlook.office365.com (2603:10a6:10:50f::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7519.36 via Frontend Transport; Tue, 30 Apr 2024 12:50:07 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 40.67.248.234) smtp.mailfrom=arm.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 40.67.248.234 as permitted sender) receiver=protection.outlook.com; client-ip=40.67.248.234; helo=nebula.arm.com; pr=C Received: from nebula.arm.com (40.67.248.234) by DU2PEPF00028D0B.mail.protection.outlook.com (10.167.242.171) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7544.18 via Frontend Transport; Tue, 30 Apr 2024 12:50:06 +0000 Received: from AZ-NEU-EX04.Arm.com (10.251.24.32) by AZ-NEU-EX03.Arm.com (10.251.24.31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Tue, 30 Apr 2024 12:50:02 +0000 Received: from vcn-man-apps.manchester.arm.com (10.32.108.22) by mail.arm.com (10.251.24.32) with Microsoft SMTP Server id 15.1.2507.35 via Frontend Transport; Tue, 30 Apr 2024 12:50:02 +0000 From: Joe Ramsay To: CC: Joe Ramsay Subject: [PATCH 1/3] aarch64/fpu: Add vector variants of hypot Date: Tue, 30 Apr 2024 13:49:58 +0100 Message-ID: <20240430125000.50324-1-Joe.Ramsay@arm.com> X-Mailer: git-send-email 2.27.0 MIME-Version: 1.0 X-EOPAttributedMessage: 1 X-MS-TrafficTypeDiagnostic: DU2PEPF00028D0B:EE_|AS8PR08MB8182:EE_|AM3PEPF0000A790:EE_|PAWPR08MB10305:EE_ X-MS-Office365-Filtering-Correlation-Id: 84010696-28e4-4e53-fac1-08dc69141719 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; ARA:13230031|1800799015|376005|82310400014|36860700004; X-Microsoft-Antispam-Message-Info-Original: b6WpPwdd6ifPOCQ6pq7aaU0N3g/23TM6eZmxvNtuVUsp4CBX9YlXwo4bas13dfOXWd3kYCfzWCGgU0u4KAHMNdYEwrYfozPKqzcC1Gn3UzIwI6WwbfztaG2t7lfAY/bM13I7uCjCe6y1EwwjkLKxjstckBfMAPK49NHypFSCyFeaTM4W64/XfuJfIgku2xKOqeolTp/Gx3D/ZNhs1LEMPW9kt6pcZgqvgkevxi8WJrZjZ5POuxRDcrS2BLhCUbhOImcHQ+W+yCZVTmJ8R3GyR7SfpB42fx7xRpYx47ZIRMCA/DWrcgEmK0rp8LUC8V3hM1TzDAj44KxFdI8gc4CsEQ4ap9HUk0wy2W3B5Nx11//sd9UyIIy1ietLmeJlIlxnDf87G+THI9EyXq10sKEIH2CeG0YsHvjHcbCE40++wvCR1jp0Zq6Nrj/2dY/kUICFyP+tvhXsb+xSLtxXcVUUS5JLVdNo5Cw+bPdhYEwguVMdtB5pS+zZKev+V90yoEBPTC1/UILN0NIbs+6EZBsOPey3UEHbaNKtWM8wrec137PUHBYZJ6BAvCAKEe9+u2ezeBQ2zAHo5/VPOa9FhGfGQuSaz1AEaMXsF8bkEsTnTf8e+RU99tN3RSteEsZy5jqN9NoxEpmsyJx9v14SIeu1HbWNTTDCPMWNM6MURJUkYsz5oB6DGSmUwnUBEbGmCB+VyaHkAQbFo3c/EdAMyEAPRgTUjsJ/eBJKRALhoUEbSLiM9f7A1cEWKdDiOpq7rOy5gT6UVqZxuJtbe8sTmzkOVYsfkVL4RmFSwff4BbbV3brF9tTqAG18/BCDL2dBReWSYH+/8YYSYx63DR04ssDirQAW/0ZXuCh3ZI/tsJ5NM7T9tvZRn62fYiF2MPqRjwro7jjkdDAVEZZEYCjUZGBNmlmv4jDVuOEuOeMWQP6m2xpljZbawlwmj1dnqaVH96VgIOu2tEujMhoR53XziZmrU8C2HOuBHpIAdnSRvFVU/Cu0+puDmhhaZHCbrGBS4GZCfwXhe8lx1XnQALiDX1xC9gGXzYJ6DwYhwGPugdZCQ8Y5pQgTwgZOMYWjf8XNJvom8pOamePqcIMBO+MB7ovhBLnQTEMHzQJdBFFh2IMVmuR81v+/zeHqmyKL9yYR1HOMs5bEOxZCmJbSGtPPY0mD79ssajkhG86VI+N364SKp0BWvYAwYCkkqvmMcDHiAn25KQcS4vtTxrUO7UYB12ivoQeTUznuhF/2jSDRA+/3VjshCOdenNiiJcra18c2G7ZAMgm+n1hBdctrN7pFvY/7w9n9B3im6tSOgyxouAHOSt4J2qcF+TVyEDTRopQQ+/FZ X-Forefront-Antispam-Report-Untrusted: CIP:40.67.248.234; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:nebula.arm.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230031)(1800799015)(376005)(82310400014)(36860700004); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB8182 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM3PEPF0000A790.eurprd04.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 0968970e-d36b-45d1-7c4f-08dc69140f8e X-Microsoft-Antispam: BCL:0; ARA:13230031|35042699010|1800799015|376005|82310400014|36860700004; X-Microsoft-Antispam-Message-Info: hscfFjIj8+FFIxZxnWuByvygTLZVNq/2TTIbgJ/AX0MWTAXSSlV+idigL1zEapk89K8V88xbGlBSgMxQdH/5Emc8rxPXg4TdWLK5YDgmLeuJiwnbvfRf9LvYwTOPz3w2gwn5ejr2xC5tbKfDVqyxozM6YPn81G1jeVda2t9kgh8EDlkcexbPNlef5TZnpdQble5tUXGPKUOhY1k4de5aD2JWT39lMxiNGkNhJ7Fb3C+/M890lUsKZnQ5ffmEiflCImzCk2aiWVp5ITNzwmA6vBiRKUaNlb67VL+whbbI9qrTVCkWQPiMoSedDL6xaVbIsfYMqn96Xy0X6zBujhXK+VC9eC9q71NeKeJVEiYEIFlRx7UsID5QH2T/nynrmS4mXRKT0hPbsrBB9BxfHr9nH7HgDpy2Xtx2t8KjUr+TTKo43ASru+FRS24wgEVR70jZGBMqJEt7sCiMKOPAxsz5Lldx/06VEFUbJyPloP3LDbQGsn9zG6NirTW6s7dKHIPgSG70WhKZASQk58HAXIzBj4JSaXMDlP1Kpt5dGbkuTMb7ITdPYcDOxIiX0c1nZdHbyP1E6YANqsXwnE4oH2HYN+jvV5yU4DnftLA9TyYHclGeucCEvOfQd5JCk3nkVHxv0+Kt6aTADKcQ9/7JNnoKCpGqc+sWdavbhs81PJegGBbkTQk1MkM4vZNvUq5lzESBtQiv9RRMFUw4/x7Z8UVZRkuWDda9OTngU268puVVO3e9e59aZDv/81ZtHlkDeh3BkvBSAAsSHLSiK5qecoxDePyeAH5M/yhXgLLBmEPsNdzYpbnWl6vUly6AhtEPsf1KoCFkdt/5HdyRqIkhlE28OyUzuWHH5QQ82hMwTXcSCOxX2oFTJ2HUF2+2WaNACetPjd9lcd5FWyvEOKReb1ZuEYLMXU6gtBRb0LQdKrfwGZCqRk51N07JvyXG2eU/lZUZcC/ei5I80EedNhwLJOO5E2azGniIObYaguCQ8zBc8mfZTJ3PFxcRrZTi5gTC5Hz1eu1mwfLY5Z/IqDnoipPcPoZBuRA0o9thMIit5SrfsgeU8lcT9pVSADu9gDKFg066pcQc4dkOWkTA/YST7dfoTI6kBSZMJvWMtq4U+fwUCnCRgnVvXK8EEgGUrvK+3lAVl6+AUfz34dAoICena09h0Te/t9iqqt+Ka3fwDtG5Pk/noWlER4L2jQTzCyHXaX3EgNFTQmh8kq8xScp8hEHLbMmcjY4Qe/R52XmY5UxH0AOI6I9DYkOtucy/ED8VQCLfc1GTJnSvEq1gTWIcyKJeDJL93iMcJcJKhnsFmHcGLfdJUaYkFzmDligdgGCIu0M9 X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230031)(35042699010)(1800799015)(376005)(82310400014)(36860700004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 Apr 2024 12:50:18.6983 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 84010696-28e4-4e53-fac1-08dc69141719 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM3PEPF0000A790.eurprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAWPR08MB10305 X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FORGED_SPF_HELO, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org --- Thanks, Joe sysdeps/aarch64/fpu/Makefile | 1 + sysdeps/aarch64/fpu/Versions | 5 + sysdeps/aarch64/fpu/advsimd_f32_protos.h | 1 + sysdeps/aarch64/fpu/bits/math-vector.h | 8 ++ sysdeps/aarch64/fpu/hypot_advsimd.c | 97 ++++++++++++++++++ sysdeps/aarch64/fpu/hypot_sve.c | 54 ++++++++++ sysdeps/aarch64/fpu/hypotf_advsimd.c | 98 +++++++++++++++++++ sysdeps/aarch64/fpu/hypotf_sve.c | 48 +++++++++ .../fpu/test-double-advsimd-wrappers.c | 1 + .../aarch64/fpu/test-double-sve-wrappers.c | 1 + .../aarch64/fpu/test-float-advsimd-wrappers.c | 1 + sysdeps/aarch64/fpu/test-float-sve-wrappers.c | 1 + sysdeps/aarch64/libm-test-ulps | 8 ++ .../unix/sysv/linux/aarch64/libmvec.abilist | 5 + 14 files changed, 329 insertions(+) create mode 100644 sysdeps/aarch64/fpu/hypot_advsimd.c create mode 100644 sysdeps/aarch64/fpu/hypot_sve.c create mode 100644 sysdeps/aarch64/fpu/hypotf_advsimd.c create mode 100644 sysdeps/aarch64/fpu/hypotf_sve.c Reviewed-by: Szabolcs Nagy diff --git a/sysdeps/aarch64/fpu/Makefile b/sysdeps/aarch64/fpu/Makefile index e8af35099d..06657782a1 100644 --- a/sysdeps/aarch64/fpu/Makefile +++ b/sysdeps/aarch64/fpu/Makefile @@ -13,6 +13,7 @@ libmvec-supported-funcs = acos \ exp10 \ exp2 \ expm1 \ + hypot \ log \ log10 \ log1p \ diff --git a/sysdeps/aarch64/fpu/Versions b/sysdeps/aarch64/fpu/Versions index 3cb1b82bd2..aedae9457b 100644 --- a/sysdeps/aarch64/fpu/Versions +++ b/sysdeps/aarch64/fpu/Versions @@ -109,6 +109,11 @@ libmvec { _ZGVnN4v_erfcf; _ZGVsMxv_erfc; _ZGVsMxv_erfcf; + _ZGVnN4vv_hypotf; + _ZGVnN2vv_hypotf; + _ZGVnN2vv_hypot; + _ZGVsMxvv_hypotf; + _ZGVsMxvv_hypot; _ZGVnN2v_sinh; _ZGVnN2v_sinhf; _ZGVnN4v_sinhf; diff --git a/sysdeps/aarch64/fpu/advsimd_f32_protos.h b/sysdeps/aarch64/fpu/advsimd_f32_protos.h index 383c436972..a8889a92fd 100644 --- a/sysdeps/aarch64/fpu/advsimd_f32_protos.h +++ b/sysdeps/aarch64/fpu/advsimd_f32_protos.h @@ -31,6 +31,7 @@ libmvec_hidden_proto (V_NAME_F1(exp10)); libmvec_hidden_proto (V_NAME_F1(exp2)); libmvec_hidden_proto (V_NAME_F1(exp)); libmvec_hidden_proto (V_NAME_F1(expm1)); +libmvec_hidden_proto (V_NAME_F2(hypot)); libmvec_hidden_proto (V_NAME_F1(log10)); libmvec_hidden_proto (V_NAME_F1(log1p)); libmvec_hidden_proto (V_NAME_F1(log2)); diff --git a/sysdeps/aarch64/fpu/bits/math-vector.h b/sysdeps/aarch64/fpu/bits/math-vector.h index e29b2d1c09..ca30177339 100644 --- a/sysdeps/aarch64/fpu/bits/math-vector.h +++ b/sysdeps/aarch64/fpu/bits/math-vector.h @@ -89,6 +89,10 @@ # define __DECL_SIMD_expm1 __DECL_SIMD_aarch64 # undef __DECL_SIMD_expm1f # define __DECL_SIMD_expm1f __DECL_SIMD_aarch64 +# undef __DECL_SIMD_hypot +# define __DECL_SIMD_hypot __DECL_SIMD_aarch64 +# undef __DECL_SIMD_hypotf +# define __DECL_SIMD_hypotf __DECL_SIMD_aarch64 # undef __DECL_SIMD_log # define __DECL_SIMD_log __DECL_SIMD_aarch64 # undef __DECL_SIMD_logf @@ -162,6 +166,7 @@ __vpcs __f32x4_t _ZGVnN4v_expf (__f32x4_t); __vpcs __f32x4_t _ZGVnN4v_exp10f (__f32x4_t); __vpcs __f32x4_t _ZGVnN4v_exp2f (__f32x4_t); __vpcs __f32x4_t _ZGVnN4v_expm1f (__f32x4_t); +__vpcs __f32x4_t _ZGVnN4vv_hypotf (__f32x4_t, __f32x4_t); __vpcs __f32x4_t _ZGVnN4v_logf (__f32x4_t); __vpcs __f32x4_t _ZGVnN4v_log10f (__f32x4_t); __vpcs __f32x4_t _ZGVnN4v_log1pf (__f32x4_t); @@ -186,6 +191,7 @@ __vpcs __f64x2_t _ZGVnN2v_exp (__f64x2_t); __vpcs __f64x2_t _ZGVnN2v_exp10 (__f64x2_t); __vpcs __f64x2_t _ZGVnN2v_exp2 (__f64x2_t); __vpcs __f64x2_t _ZGVnN2v_expm1 (__f64x2_t); +__vpcs __f64x2_t _ZGVnN2vv_hypot (__f64x2_t, __f64x2_t); __vpcs __f64x2_t _ZGVnN2v_log (__f64x2_t); __vpcs __f64x2_t _ZGVnN2v_log10 (__f64x2_t); __vpcs __f64x2_t _ZGVnN2v_log1p (__f64x2_t); @@ -215,6 +221,7 @@ __sv_f32_t _ZGVsMxv_expf (__sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxv_exp10f (__sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxv_exp2f (__sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxv_expm1f (__sv_f32_t, __sv_bool_t); +__sv_f32_t _ZGVsMxvv_hypotf (__sv_f32_t, __sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxv_logf (__sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxv_log10f (__sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxv_log1pf (__sv_f32_t, __sv_bool_t); @@ -239,6 +246,7 @@ __sv_f64_t _ZGVsMxv_exp (__sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_exp10 (__sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_exp2 (__sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_expm1 (__sv_f64_t, __sv_bool_t); +__sv_f64_t _ZGVsMxvv_hypot (__sv_f64_t, __sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_log (__sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_log10 (__sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_log1p (__sv_f64_t, __sv_bool_t); diff --git a/sysdeps/aarch64/fpu/hypot_advsimd.c b/sysdeps/aarch64/fpu/hypot_advsimd.c new file mode 100644 index 0000000000..e4e279fa0c --- /dev/null +++ b/sysdeps/aarch64/fpu/hypot_advsimd.c @@ -0,0 +1,97 @@ +/* Double-precision vector (Advanced SIMD) hypot function + + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "v_math.h" + +#if WANT_SIMD_EXCEPT +static const struct data +{ + uint64x2_t tiny_bound, thres; +} data = { + .tiny_bound = V2 (0x2000000000000000), /* asuint (0x1p-511). */ + .thres = V2 (0x3fe0000000000000), /* asuint (0x1p511) - tiny_bound. */ +}; +#else +static const struct data +{ + uint64x2_t tiny_bound; + uint32x4_t thres; +} data = { + .tiny_bound = V2 (0x0360000000000000), /* asuint (0x1p-969). */ + .thres = V4 (0x7c900000), /* asuint (inf) - tiny_bound. */ +}; +#endif + +static float64x2_t VPCS_ATTR NOINLINE +special_case (float64x2_t x, float64x2_t y, float64x2_t sqsum, + uint32x2_t special) +{ + return v_call2_f64 (hypot, x, y, vsqrtq_f64 (sqsum), vmovl_u32 (special)); +} + +/* Vector implementation of double-precision hypot. + Maximum error observed is 1.21 ULP: + _ZGVnN2vv_hypot (0x1.6a1b193ff85b5p-204, 0x1.bc50676c2a447p-222) + got 0x1.6a1b19400964ep-204 + want 0x1.6a1b19400964dp-204. */ +#if WANT_SIMD_EXCEPT + +float64x2_t VPCS_ATTR V_NAME_D2 (hypot) (float64x2_t x, float64x2_t y) +{ + const struct data *d = ptr_barrier (&data); + + float64x2_t ax = vabsq_f64 (x); + float64x2_t ay = vabsq_f64 (y); + + uint64x2_t ix = vreinterpretq_u64_f64 (ax); + uint64x2_t iy = vreinterpretq_u64_f64 (ay); + + /* Extreme values, NaNs, and infinities should be handled by the scalar + fallback for correct flag handling. */ + uint64x2_t specialx = vcgeq_u64 (vsubq_u64 (ix, d->tiny_bound), d->thres); + uint64x2_t specialy = vcgeq_u64 (vsubq_u64 (iy, d->tiny_bound), d->thres); + ax = v_zerofy_f64 (ax, specialx); + ay = v_zerofy_f64 (ay, specialy); + uint32x2_t special = vaddhn_u64 (specialx, specialy); + + float64x2_t sqsum = vfmaq_f64 (vmulq_f64 (ax, ax), ay, ay); + + if (__glibc_unlikely (v_any_u32h (special))) + return special_case (x, y, sqsum, special); + + return vsqrtq_f64 (sqsum); +} +#else + +float64x2_t VPCS_ATTR V_NAME_D2 (hypot) (float64x2_t x, float64x2_t y) +{ + const struct data *d = ptr_barrier (&data); + + float64x2_t sqsum = vfmaq_f64 (vmulq_f64 (x, x), y, y); + + uint32x2_t special = vcge_u32 ( + vsubhn_u64 (vreinterpretq_u64_f64 (sqsum), d->tiny_bound), + vget_low_u32 (d->thres)); + + if (__glibc_unlikely (v_any_u32h (special))) + return special_case (x, y, sqsum, special); + + return vsqrtq_f64 (sqsum); +} +#endif diff --git a/sysdeps/aarch64/fpu/hypot_sve.c b/sysdeps/aarch64/fpu/hypot_sve.c new file mode 100644 index 0000000000..74417040ac --- /dev/null +++ b/sysdeps/aarch64/fpu/hypot_sve.c @@ -0,0 +1,54 @@ +/* Double-precision vector (SVE) hypot function + + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "sv_math.h" + +static const struct data +{ + uint64_t tiny_bound, thres; +} data = { + .tiny_bound = 0x0c80000000000000, /* asuint (0x1p-102). */ + .thres = 0x7300000000000000, /* asuint (inf) - tiny_bound. */ +}; + +static svfloat64_t NOINLINE +special_case (svfloat64_t sqsum, svfloat64_t x, svfloat64_t y, svbool_t pg, + svbool_t special) +{ + return sv_call2_f64 (hypot, x, y, svsqrt_x (pg, sqsum), special); +} + +/* SVE implementation of double-precision hypot. + Maximum error observed is 1.21 ULP: + _ZGVsMxvv_hypot (-0x1.6a22d0412cdd3p+352, 0x1.d3d89bd66fb1ap+330) + got 0x1.6a22d0412cfp+352 + want 0x1.6a22d0412cf01p+352. */ +svfloat64_t SV_NAME_D2 (hypot) (svfloat64_t x, svfloat64_t y, svbool_t pg) +{ + const struct data *d = ptr_barrier (&data); + + svfloat64_t sqsum = svmla_x (pg, svmul_x (pg, x, x), y, y); + + svbool_t special = svcmpge ( + pg, svsub_x (pg, svreinterpret_u64 (sqsum), d->tiny_bound), d->thres); + + if (__glibc_unlikely (svptest_any (pg, special))) + return special_case (sqsum, x, y, pg, special); + return svsqrt_x (pg, sqsum); +} diff --git a/sysdeps/aarch64/fpu/hypotf_advsimd.c b/sysdeps/aarch64/fpu/hypotf_advsimd.c new file mode 100644 index 0000000000..34818b021a --- /dev/null +++ b/sysdeps/aarch64/fpu/hypotf_advsimd.c @@ -0,0 +1,98 @@ +/* Single-precision vector (Advanced SIMD) hypot function + + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "v_math.h" + +#if WANT_SIMD_EXCEPT +static const struct data +{ + uint32x4_t tiny_bound, thres; +} data = { + .tiny_bound = V4 (0x20000000), /* asuint (0x1p-63). */ + .thres = V4 (0x3f000000), /* asuint (0x1p63) - tiny_bound. */ +}; +#else +static const struct data +{ + uint32x4_t tiny_bound; + uint16x8_t thres; +} data = { + .tiny_bound = V4 (0x0C800000), /* asuint (0x1p-102). */ + .thres = V8 (0x7300), /* asuint (inf) - tiny_bound. */ +}; +#endif + +static float32x4_t VPCS_ATTR NOINLINE +special_case (float32x4_t x, float32x4_t y, float32x4_t sqsum, + uint16x4_t special) +{ + return v_call2_f32 (hypotf, x, y, vsqrtq_f32 (sqsum), vmovl_u16 (special)); +} + +/* Vector implementation of single-precision hypot. + Maximum error observed is 1.21 ULP: + _ZGVnN4vv_hypotf (0x1.6a419cp-13, 0x1.82a852p-22) got 0x1.6a41d2p-13 + want 0x1.6a41dp-13. */ +#if WANT_SIMD_EXCEPT + +float32x4_t VPCS_ATTR V_NAME_F2 (hypot) (float32x4_t x, float32x4_t y) +{ + const struct data *d = ptr_barrier (&data); + + float32x4_t ax = vabsq_f32 (x); + float32x4_t ay = vabsq_f32 (y); + + uint32x4_t ix = vreinterpretq_u32_f32 (ax); + uint32x4_t iy = vreinterpretq_u32_f32 (ay); + + /* Extreme values, NaNs, and infinities should be handled by the scalar + fallback for correct flag handling. */ + uint32x4_t specialx = vcgeq_u32 (vsubq_u32 (ix, d->tiny_bound), d->thres); + uint32x4_t specialy = vcgeq_u32 (vsubq_u32 (iy, d->tiny_bound), d->thres); + ax = v_zerofy_f32 (ax, specialx); + ay = v_zerofy_f32 (ay, specialy); + uint16x4_t special = vaddhn_u32 (specialx, specialy); + + float32x4_t sqsum = vfmaq_f32 (vmulq_f32 (ax, ax), ay, ay); + + if (__glibc_unlikely (v_any_u16h (special))) + return special_case (x, y, sqsum, special); + + return vsqrtq_f32 (sqsum); +} +#else + +float32x4_t VPCS_ATTR V_NAME_F2 (hypot) (float32x4_t x, float32x4_t y) +{ + const struct data *d = ptr_barrier (&data); + + float32x4_t sqsum = vfmaq_f32 (vmulq_f32 (x, x), y, y); + + uint16x4_t special = vcge_u16 ( + vsubhn_u32 (vreinterpretq_u32_f32 (sqsum), d->tiny_bound), + vget_low_u16 (d->thres)); + + if (__glibc_unlikely (v_any_u16h (special))) + return special_case (x, y, sqsum, special); + + return vsqrtq_f32 (sqsum); +} +#endif +libmvec_hidden_def (V_NAME_F2 (hypot)) +HALF_WIDTH_ALIAS_F2(hypot) diff --git a/sysdeps/aarch64/fpu/hypotf_sve.c b/sysdeps/aarch64/fpu/hypotf_sve.c new file mode 100644 index 0000000000..3a403de66e --- /dev/null +++ b/sysdeps/aarch64/fpu/hypotf_sve.c @@ -0,0 +1,48 @@ +/* Single-precision vector (SVE) hypot function + + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "sv_math.h" + +#define TinyBound 0x0c800000 /* asuint (0x1p-102). */ +#define Thres 0x73000000 /* 0x70000000 - TinyBound. */ + +static svfloat32_t NOINLINE +special_case (svfloat32_t sqsum, svfloat32_t x, svfloat32_t y, svbool_t pg, + svbool_t special) +{ + return sv_call2_f32 (hypotf, x, y, svsqrt_x (pg, sqsum), special); +} + +/* SVE implementation of single-precision hypot. + Maximum error observed is 1.21 ULP: + _ZGVsMxvv_hypotf (0x1.6a213cp-19, -0x1.32b982p-26) got 0x1.6a2346p-19 + want 0x1.6a2344p-19. */ +svfloat32_t SV_NAME_F2 (hypot) (svfloat32_t x, svfloat32_t y, + const svbool_t pg) +{ + svfloat32_t sqsum = svmla_x (pg, svmul_x (pg, x, x), y, y); + + svbool_t special = svcmpge ( + pg, svsub_x (pg, svreinterpret_u32 (sqsum), TinyBound), Thres); + + if (__glibc_unlikely (svptest_any (pg, special))) + return special_case (sqsum, x, y, pg, special); + + return svsqrt_x (pg, sqsum); +} diff --git a/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c b/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c index f2d8714075..417125be47 100644 --- a/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c +++ b/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c @@ -38,6 +38,7 @@ VPCS_VECTOR_WRAPPER (exp_advsimd, _ZGVnN2v_exp) VPCS_VECTOR_WRAPPER (exp10_advsimd, _ZGVnN2v_exp10) VPCS_VECTOR_WRAPPER (exp2_advsimd, _ZGVnN2v_exp2) VPCS_VECTOR_WRAPPER (expm1_advsimd, _ZGVnN2v_expm1) +VPCS_VECTOR_WRAPPER_ff (hypot_advsimd, _ZGVnN2vv_hypot) VPCS_VECTOR_WRAPPER (log_advsimd, _ZGVnN2v_log) VPCS_VECTOR_WRAPPER (log10_advsimd, _ZGVnN2v_log10) VPCS_VECTOR_WRAPPER (log1p_advsimd, _ZGVnN2v_log1p) diff --git a/sysdeps/aarch64/fpu/test-double-sve-wrappers.c b/sysdeps/aarch64/fpu/test-double-sve-wrappers.c index 37873d5e43..31ebf18705 100644 --- a/sysdeps/aarch64/fpu/test-double-sve-wrappers.c +++ b/sysdeps/aarch64/fpu/test-double-sve-wrappers.c @@ -57,6 +57,7 @@ SVE_VECTOR_WRAPPER (exp_sve, _ZGVsMxv_exp) SVE_VECTOR_WRAPPER (exp10_sve, _ZGVsMxv_exp10) SVE_VECTOR_WRAPPER (exp2_sve, _ZGVsMxv_exp2) SVE_VECTOR_WRAPPER (expm1_sve, _ZGVsMxv_expm1) +SVE_VECTOR_WRAPPER_ff (hypot_sve, _ZGVsMxvv_hypot) SVE_VECTOR_WRAPPER (log_sve, _ZGVsMxv_log) SVE_VECTOR_WRAPPER (log10_sve, _ZGVsMxv_log10) SVE_VECTOR_WRAPPER (log1p_sve, _ZGVsMxv_log1p) diff --git a/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c b/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c index 08e33115b9..dab0f1cfcb 100644 --- a/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c +++ b/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c @@ -38,6 +38,7 @@ VPCS_VECTOR_WRAPPER (expf_advsimd, _ZGVnN4v_expf) VPCS_VECTOR_WRAPPER (exp10f_advsimd, _ZGVnN4v_exp10f) VPCS_VECTOR_WRAPPER (exp2f_advsimd, _ZGVnN4v_exp2f) VPCS_VECTOR_WRAPPER (expm1f_advsimd, _ZGVnN4v_expm1f) +VPCS_VECTOR_WRAPPER_ff (hypotf_advsimd, _ZGVnN4vv_hypotf) VPCS_VECTOR_WRAPPER (logf_advsimd, _ZGVnN4v_logf) VPCS_VECTOR_WRAPPER (log10f_advsimd, _ZGVnN4v_log10f) VPCS_VECTOR_WRAPPER (log1pf_advsimd, _ZGVnN4v_log1pf) diff --git a/sysdeps/aarch64/fpu/test-float-sve-wrappers.c b/sysdeps/aarch64/fpu/test-float-sve-wrappers.c index 025daa662e..2aa6cbcc28 100644 --- a/sysdeps/aarch64/fpu/test-float-sve-wrappers.c +++ b/sysdeps/aarch64/fpu/test-float-sve-wrappers.c @@ -57,6 +57,7 @@ SVE_VECTOR_WRAPPER (expf_sve, _ZGVsMxv_expf) SVE_VECTOR_WRAPPER (exp10f_sve, _ZGVsMxv_exp10f) SVE_VECTOR_WRAPPER (exp2f_sve, _ZGVsMxv_exp2f) SVE_VECTOR_WRAPPER (expm1f_sve, _ZGVsMxv_expm1f) +SVE_VECTOR_WRAPPER_ff (hypotf_sve, _ZGVsMxvv_hypotf) SVE_VECTOR_WRAPPER (logf_sve, _ZGVsMxv_logf) SVE_VECTOR_WRAPPER (log10f_sve, _ZGVsMxv_log10f) SVE_VECTOR_WRAPPER (log1pf_sve, _ZGVsMxv_log1pf) diff --git a/sysdeps/aarch64/libm-test-ulps b/sysdeps/aarch64/libm-test-ulps index 116a5404f5..e7463d30bc 100644 --- a/sysdeps/aarch64/libm-test-ulps +++ b/sysdeps/aarch64/libm-test-ulps @@ -1174,10 +1174,18 @@ double: 1 float: 1 ldouble: 1 +Function: "hypot_advsimd": +double: 1 +float: 1 + Function: "hypot_downward": double: 1 ldouble: 1 +Function: "hypot_sve": +double: 1 +float: 1 + Function: "hypot_towardzero": double: 1 ldouble: 1 diff --git a/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist b/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist index 26c3fbf18b..1184374efd 100644 --- a/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist +++ b/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist @@ -89,6 +89,8 @@ GLIBC_2.40 _ZGVnN2v_sinh F GLIBC_2.40 _ZGVnN2v_sinhf F GLIBC_2.40 _ZGVnN2v_tanh F GLIBC_2.40 _ZGVnN2v_tanhf F +GLIBC_2.40 _ZGVnN2vv_hypot F +GLIBC_2.40 _ZGVnN2vv_hypotf F GLIBC_2.40 _ZGVnN4v_acoshf F GLIBC_2.40 _ZGVnN4v_asinhf F GLIBC_2.40 _ZGVnN4v_atanhf F @@ -97,6 +99,7 @@ GLIBC_2.40 _ZGVnN4v_erfcf F GLIBC_2.40 _ZGVnN4v_erff F GLIBC_2.40 _ZGVnN4v_sinhf F GLIBC_2.40 _ZGVnN4v_tanhf F +GLIBC_2.40 _ZGVnN4vv_hypotf F GLIBC_2.40 _ZGVsMxv_acosh F GLIBC_2.40 _ZGVsMxv_acoshf F GLIBC_2.40 _ZGVsMxv_asinh F @@ -113,3 +116,5 @@ GLIBC_2.40 _ZGVsMxv_sinh F GLIBC_2.40 _ZGVsMxv_sinhf F GLIBC_2.40 _ZGVsMxv_tanh F GLIBC_2.40 _ZGVsMxv_tanhf F +GLIBC_2.40 _ZGVsMxvv_hypot F +GLIBC_2.40 _ZGVsMxvv_hypotf F From patchwork Tue Apr 30 12:49:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joe Ramsay X-Patchwork-Id: 1929497 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=sA9Sgo1q; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=sA9Sgo1q; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4VTKn26wwqz20fY for ; Tue, 30 Apr 2024 22:50:46 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3964B384AB50 for ; Tue, 30 Apr 2024 12:50:45 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR05-AM6-obe.outbound.protection.outlook.com (mail-am6eur05on2046.outbound.protection.outlook.com [40.107.22.46]) by sourceware.org (Postfix) with ESMTPS id 13F463858D20 for ; Tue, 30 Apr 2024 12:50:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 13F463858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 13F463858D20 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.22.46 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1714481424; cv=pass; b=MWxLyX5NB7K/5WTAHswdR51DCxIdblT8kh6NnobVUOKffSE2Re7Jv3u2dXl6nkV5tCFYO9QrJoHTUlmP0G/CArzHJGTMZpI4yAGUiWlwPhdbFBxHKmg94VWx366Gv3fk4M5xnN2BeEJf5WoT8SeFv4aLpUYaYTkYIesoykZIt9g= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1714481424; c=relaxed/simple; bh=qCv5N7Oqtf6h7ibq2lCYNudaodS3JOG/uVoIM1LGktY=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=PO5u1aBPl3peiCldvoaLT29BprGfLCosbpUPtFFMhvG44Yv4P96gGZwf/9RY9l8kW15QotgH/094umtp1w54Ny7YQt2IuiRreEPfftpjcyRBZfoFb+lbVHe4iGpCn95BukIonmsH+g7WSGoH1yeK7Wwa++XAeiH5fpvo3dsn7Yk= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=bLiZ3u7tmyOsmwDPfsNIuydV2U+ihCjeFVReZw83USCMIqADKYCcLhpv8i8v/56WVEh2+Oe/zQSrEhPoo4Zd97N+2Z2JQ1ciQkral3euMSyoFNmhNFnPoKym+2Jmx24Xkcaj2pcozNUi5K6qKGD/jjYBbvTAfqxnIaGILIgAiY4/tQK6GTUz1EIR5XMYMRkau+R5bUNQZdcr7yQj6VCidNsi+VTkHLp9q1hVH0jXSnKO7/ao2CVYpqlmLq1U8JP5UynahJcbkM1NyD9hGdjroB2p3y0M9LsQxff8OAmWZmNGtXc9I6LOezS642hJFWpoUS5y01TORYzCRwBmjQ/H6w== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=+4PUhsAbvHPU47Ilyav/hU9Rf9i1/hlbjhP/xjE3JDw=; b=RUsg367tjFM3F0XpaP7ZuNcSZhwLAD7kHssq/CVXRq7szIWBSctVIcIrGoQw8VPsDmyfU6bfxCrylGl9JQKNO9bISFXdlBPmrYOKtjcj8hR0EzhuvhICwxuzwTirg4QDJzJirWY1Cm9yKdMApIPJfAuEy10s3tTpOIKZsJjVdLsdQH9OtH8j92m/QJCY3jBSkj5xMlBnj87yxbagOmsHduNGcqKghqQm/1yKa/Ctpmr4ARwTQWDZ9fEbZZpewT6OjYoBWdIc7zoHk3u2YfqEAqq5hhYGXzLdozluqSDGd0YG9U9qZCxvvrffjl1yovQ/Zf753Oj355ulVMBS1l9b5g== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=arm.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=+4PUhsAbvHPU47Ilyav/hU9Rf9i1/hlbjhP/xjE3JDw=; b=sA9Sgo1q4l8ggMYtd2q/GiJKesw11jjPhk8ObGJSZZNYWKcse8dszH7LtQp5aJPUb91PBps8k+cgdnZCrJFS3JzXmTD2wAVv2wUuzsBUoSLvAeqOeWlpIHLypzeiabcPGMh38ujEdNJ1WKRm7XSP5kXc0SWM1xVVN89RiMERX1Q= Received: from DB8P191CA0005.EURP191.PROD.OUTLOOK.COM (2603:10a6:10:130::15) by AS8PR08MB9868.eurprd08.prod.outlook.com (2603:10a6:20b:5ac::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7519.34; Tue, 30 Apr 2024 12:50:16 +0000 Received: from DB3PEPF0000885F.eurprd02.prod.outlook.com (2603:10a6:10:130:cafe::6f) by DB8P191CA0005.outlook.office365.com (2603:10a6:10:130::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7472.34 via Frontend Transport; Tue, 30 Apr 2024 12:50:16 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=arm.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DB3PEPF0000885F.mail.protection.outlook.com (10.167.242.10) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.7544.18 via Frontend Transport; Tue, 30 Apr 2024 12:50:16 +0000 Received: ("Tessian outbound e46bb127ed3d:v315"); Tue, 30 Apr 2024 12:50:16 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 6a7af5f129c33850 X-CR-MTA-TID: 64aa7808 Received: from 76d77542700f.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 314061F7-C9E4-4022-A211-A4CACC50A90B.1; Tue, 30 Apr 2024 12:50:09 +0000 Received: from EUR02-DB5-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 76d77542700f.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Tue, 30 Apr 2024 12:50:09 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=CovJdTL9a64sL1TjEbSwwkyEZEiXOkx8mKF1BNOJmK5iAZ2hGzzg4nx2VTXczCS4Tgfs3iUEeRQUbFSDQzKvP4CHma+f1GdymimbjdZ+BD5OrLNLqtsvknQDOdvejCCnk8h/SQFhZYHlG1eKfAdMm1BW99R2OmFDvMOQ+Q2oNkDoCZq9vQfanZJW/BCO/+eL++wEnxHdmBJECEZYAFXnwla07C4BLx5udYkdb5ZHUF6kUbE09zUduD9cqABv9PnkRsDZ8QhF4ZA+0Oo3yfJk0nKa1OuU9M4CJTLV8bDnVwftTH9SXWEobn2/+vEIzDmp/Df3P8loc2vgz6I9PPeT3w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=+4PUhsAbvHPU47Ilyav/hU9Rf9i1/hlbjhP/xjE3JDw=; b=K8gZO+gOmA4/9ujoDfKkMy9vCwazEECdx9EYXIS88DEO3UkKzRlxVWAtMXLhqjT0ec1340OVepXKlCQGW8rieaOUFyqo5tgRFBLvVpltOJ4xannmllHDs/a6oV7TE7YG/mjGZP7agmC/v6lDTg0KF/EA6bSBr5R+9+mp40aTKtDAUsbsfYANe/7pt6l7AwPpBROrQH+i0yDo4mtJC/a5qgzJ5vjt1Vi5xmynYf3rg1UH8V0ueR/h7WLcFqmK7ZVBw7AEUvt9nDjJ5I5VnUikZJwFo9jKb4bSYL45VoJW718CPWx4do6F+dbflQgnUti/Xjfgm4Ot1xDHWnkAeogeFA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 40.67.248.234) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=+4PUhsAbvHPU47Ilyav/hU9Rf9i1/hlbjhP/xjE3JDw=; b=sA9Sgo1q4l8ggMYtd2q/GiJKesw11jjPhk8ObGJSZZNYWKcse8dszH7LtQp5aJPUb91PBps8k+cgdnZCrJFS3JzXmTD2wAVv2wUuzsBUoSLvAeqOeWlpIHLypzeiabcPGMh38ujEdNJ1WKRm7XSP5kXc0SWM1xVVN89RiMERX1Q= Received: from DB9PR01CA0022.eurprd01.prod.exchangelabs.com (2603:10a6:10:1d8::27) by AM7PR08MB5352.eurprd08.prod.outlook.com (2603:10a6:20b:10e::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7519.36; Tue, 30 Apr 2024 12:50:07 +0000 Received: from DU2PEPF00028D08.eurprd03.prod.outlook.com (2603:10a6:10:1d8:cafe::e2) by DB9PR01CA0022.outlook.office365.com (2603:10a6:10:1d8::27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7519.34 via Frontend Transport; Tue, 30 Apr 2024 12:50:07 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 40.67.248.234) smtp.mailfrom=arm.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 40.67.248.234 as permitted sender) receiver=protection.outlook.com; client-ip=40.67.248.234; helo=nebula.arm.com; pr=C Received: from nebula.arm.com (40.67.248.234) by DU2PEPF00028D08.mail.protection.outlook.com (10.167.242.168) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7544.18 via Frontend Transport; Tue, 30 Apr 2024 12:50:07 +0000 Received: from AZ-NEU-EX02.Emea.Arm.com (10.251.26.5) by AZ-NEU-EX03.Arm.com (10.251.24.31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Tue, 30 Apr 2024 12:50:03 +0000 Received: from AZ-NEU-EX04.Arm.com (10.251.24.32) by AZ-NEU-EX02.Emea.Arm.com (10.251.26.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Tue, 30 Apr 2024 12:50:03 +0000 Received: from vcn-man-apps.manchester.arm.com (10.32.108.22) by mail.arm.com (10.251.24.32) with Microsoft SMTP Server id 15.1.2507.35 via Frontend Transport; Tue, 30 Apr 2024 12:50:03 +0000 From: Joe Ramsay To: CC: Joe Ramsay Subject: [PATCH 2/3] aarch64/fpu: Add vector variants of cbrt Date: Tue, 30 Apr 2024 13:49:59 +0100 Message-ID: <20240430125000.50324-2-Joe.Ramsay@arm.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20240430125000.50324-1-Joe.Ramsay@arm.com> References: <20240430125000.50324-1-Joe.Ramsay@arm.com> MIME-Version: 1.0 X-EOPAttributedMessage: 1 X-MS-TrafficTypeDiagnostic: DU2PEPF00028D08:EE_|AM7PR08MB5352:EE_|DB3PEPF0000885F:EE_|AS8PR08MB9868:EE_ X-MS-Office365-Filtering-Correlation-Id: b345ef4f-400f-44eb-79c9-08dc6914159a x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; ARA:13230031|36860700004|376005|82310400014|1800799015; X-Microsoft-Antispam-Message-Info-Original: 6nfKEkgWHHQs8tBtkiaVnvEkzrIkrlqYQJHhajeNyn2uTycIYlH+mo2cdJ1KX4qu/6L4a/VGdFaSanu7KbYgmysByySpeaqQ4/7Amgjytr0YHgDhMK/xSAifAq3HfG0S6FsGjH+drxJ96cQ3N7L4Li3DBMtVp0bQNRvqCxHsJLu35WPVQaPydEDzQN5xf7Ejh98VaiKVZhjcfWZt5XVFEKnu1Fe/ri6YGgbwqximKfwpL4Nwk5EIcu3MXgEP+DhhoQ6EtZRIBzrxwDkubIW4k2lSira6W3EuQP5lzqabW0CBT0UjvvYIITRdPwZwd7bWTcDVW7kL+teW9FB7irnbqzr13XU0oOYbnrdk4cQ/jdURSVF6M2UtU/4Jly5+CgPYpFaLzPWiGqofF+H4/f3V7KMOKC6olpeHxPveYS0fqtk3IxlZbRDvdlIyrPpmJ53itIFr3NVnaigjrerequ0EFmwIP2kEx5dYFrtZoN46KFF/nCJ7DDYnMo7IcP/Ct39DpN6sk7X27oAVYQCJa15ipAi4yO8EgdSFuKDRMzox2TOFl6AbHv3fPujY6WHFWSb5WbzGWSm5QD+9AguJFiBSHuEZNHm4FOxUdU7xdHTD3iGlSjMm1NxH9cFmTV89f10jgGb8/FdASpnBwvevh6hm6pM23yRkVmSFTSrfqnmNnrHJw8AL8rliSmrakXpY9oZOtqKc7gZevGk2yMTnFtnqgSjyJ1GIQOuVLzv5Hr5V9r8ZeUx00U2wWMx7+JaAnuEsn2gFDUP8pVpA2zJF8znSpbCzu0Es3AfSSjqZM879eaiFSbvyNxsR1V57pD1LZ/sLltGRW2jUlqYpPnuybu4dMHos+z83yKB4H58VSDrPGp8tltKQ6JQC1LL4lFXNcdTye+3zoyGWoAP7u9SnZD/gXAm0LwH8Awqz2NnJMFGJShTvsiD0nIAZV0Ny+16RWNdg23UgaaFAyHvw7wv/MHz1ox07yIiuGufW4Ep0UUi0JqJpmKKaUaoay3Kyzigq11mgmsCYkbq78vo632X3Iw7UyrtQvFUW40FCOJPi7VZqEUZXxoQAtmFispboM/pTLW9Ee7mIDemjjzPORLp4yXdMsf9hSSYY617xh8KnasMqvI4MnbKSaptWG+zLEfUS5LVhuPiZDcPAF2WIHMNe0PhUqStLGaxk0VrhhuzQIYOLlkGVcF3wE57EWvSToPNctk0JigUVzWDlHvAfk6p322YKdVv4G4m551rAUgJsL+hFO5nWGhuGTEvhcXZ9MFEQfNM6AKpU0+tq2xK4TCe/5xiIqtXpW4Jo7EwkcotO6Nh4RIKddIAnm/XvCBGEhiWDAi6s X-Forefront-Antispam-Report-Untrusted: CIP:40.67.248.234; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:nebula.arm.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230031)(36860700004)(376005)(82310400014)(1800799015); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM7PR08MB5352 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB3PEPF0000885F.eurprd02.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: d52fba91-a6d6-411e-a85e-08dc69141022 X-Microsoft-Antispam: BCL:0; ARA:13230031|82310400014|1800799015|36860700004|376005|35042699010; X-Microsoft-Antispam-Message-Info: gGWTvKGQBFRaMZhhFOvg5H15J5Dg3yyi337bEmuhb0FixbnR55ehPIFFxu1I3YUv+d9PczxALtGXOWujnznhhfthjNjjSrNnylpfasQL2rQsVojwso/wrJsN9A3IjmDlZovmjSPUi77o2bsIBgQ3r3iQynzoUtfTTD9CTHgCXlyKe/To9Y2XRMcWLKFsgytCPb9M/TaqFfdGSdr3tlGb1Phw+LpVoMdm+9y/4N0pOMSe8PJe/ENDeSYATuziUCWb/hzVUdMDCMt27xSBK1WDkzr55i1wk2/jvgn9Pa7mTCVyK/NCFsAl5u9xk4cp+wFIK83jqnZYe9POu4galE4Ia4of3JNfX2EnEWsF4LByMSqx9aU6+sLSUSr4msppHPtTAytl7G+/+lnbQ1iOlQA24UkRdkVng4YqLVSEDnkKiM4/Vz1YCeMfehO8R+L8VdDqkUnzIANsh6erL8GahaGaaGGbvVncMr6Cb0YcxwbzX9xpb3SheGllSrjROreevg/uqdvlNLIvvSwDsxaCzL17o202Hxr9DKe++HE+vgfGLJg4pi90N2m2tKCMNChCgER8LcpZWLrVoYfZBoXmhijDXznwYZbgI3Lc+ZgDf/rzI60ra29Ug9HNijUL+n9EFxtIUAnDTiuXFgXeYsej6y3RQVYhvWZ9NxjESOL6IH3DrrU86YtUd+pUAbzJIdWtZtYrQPuSXNwwGvL6p/QNjKk2YDR174v0lsmEihJkpvffCFR2S3Y0cP/NwObiX6DE7YaweZNGR18sOvrlvlMhoB4rZT1OANcc416mZWIIVrsDAEkWNK10FHSvA4hzpzKyRDqzvUsYtVElx1SmVldWJ2MY66hOZycBSmp6g2OsuF5Eqk1H8UAUCFPy6hALQMuLYjMWTRReSz0B0VdS7MO60DL6p2EMFUmEJx6EKGf3Xi3Thasn/UCFanSwEl+tGjRtLjj8WfCVCTwQ+7GRk03w5Wv2Wj8Hbw7yI/Nt176Bd9dzNC7szF80FJGARPPQbqHlLZ+/29bJPf56IljvaEsZG7QgveQZkS93ZyhbcUfHUEdO32jqToxv5s6rj3HbEjzUSbH4LtHSBN/ThNmUnp0HOuHD/HN4i3kHIM4ljEbSdB6KqPzxIl1lxJne7z2WjR62J2UlGmyafS9kuBUV7mqhClqkAd2c9zxLIzdRRvPSIcaL0/+JbK5J3Mjh9t9AiaDWqpgTEiUuy5W49vh0xUmx/Mls+BN45xjzk6zut+g4Oyp/+9IZevRMoBnJrCb4RwHRhvfrwd8wU/Vl3BekweXPmS4sYxSH8+iizaFQyaU9SZPhSgxKB27MjIpGcRsjyyzckqoo X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230031)(82310400014)(1800799015)(36860700004)(376005)(35042699010); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 Apr 2024 12:50:16.2319 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: b345ef4f-400f-44eb-79c9-08dc6914159a X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DB3PEPF0000885F.eurprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB9868 X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FORGED_SPF_HELO, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org --- Thanks, Joe sysdeps/aarch64/fpu/Makefile | 1 + sysdeps/aarch64/fpu/Versions | 5 + sysdeps/aarch64/fpu/advsimd_f32_protos.h | 1 + sysdeps/aarch64/fpu/bits/math-vector.h | 8 ++ sysdeps/aarch64/fpu/cbrt_advsimd.c | 121 +++++++++++++++++ sysdeps/aarch64/fpu/cbrt_sve.c | 128 ++++++++++++++++++ sysdeps/aarch64/fpu/cbrtf_advsimd.c | 123 +++++++++++++++++ sysdeps/aarch64/fpu/cbrtf_sve.c | 122 +++++++++++++++++ .../fpu/test-double-advsimd-wrappers.c | 1 + .../aarch64/fpu/test-double-sve-wrappers.c | 1 + .../aarch64/fpu/test-float-advsimd-wrappers.c | 1 + sysdeps/aarch64/fpu/test-float-sve-wrappers.c | 1 + sysdeps/aarch64/libm-test-ulps | 8 ++ .../unix/sysv/linux/aarch64/libmvec.abilist | 5 + 14 files changed, 526 insertions(+) create mode 100644 sysdeps/aarch64/fpu/cbrt_advsimd.c create mode 100644 sysdeps/aarch64/fpu/cbrt_sve.c create mode 100644 sysdeps/aarch64/fpu/cbrtf_advsimd.c create mode 100644 sysdeps/aarch64/fpu/cbrtf_sve.c Reviewed-by: Szabolcs Nagy diff --git a/sysdeps/aarch64/fpu/Makefile b/sysdeps/aarch64/fpu/Makefile index 06657782a1..990d1135b9 100644 --- a/sysdeps/aarch64/fpu/Makefile +++ b/sysdeps/aarch64/fpu/Makefile @@ -5,6 +5,7 @@ libmvec-supported-funcs = acos \ atan \ atanh \ atan2 \ + cbrt \ cos \ cosh \ erf \ diff --git a/sysdeps/aarch64/fpu/Versions b/sysdeps/aarch64/fpu/Versions index aedae9457b..36a9e4df1e 100644 --- a/sysdeps/aarch64/fpu/Versions +++ b/sysdeps/aarch64/fpu/Versions @@ -94,6 +94,11 @@ libmvec { _ZGVnN4v_atanhf; _ZGVsMxv_atanh; _ZGVsMxv_atanhf; + _ZGVnN2v_cbrt; + _ZGVnN2v_cbrtf; + _ZGVnN4v_cbrtf; + _ZGVsMxv_cbrt; + _ZGVsMxv_cbrtf; _ZGVnN2v_cosh; _ZGVnN2v_coshf; _ZGVnN4v_coshf; diff --git a/sysdeps/aarch64/fpu/advsimd_f32_protos.h b/sysdeps/aarch64/fpu/advsimd_f32_protos.h index a8889a92fd..54858efd8a 100644 --- a/sysdeps/aarch64/fpu/advsimd_f32_protos.h +++ b/sysdeps/aarch64/fpu/advsimd_f32_protos.h @@ -23,6 +23,7 @@ libmvec_hidden_proto (V_NAME_F1(asin)); libmvec_hidden_proto (V_NAME_F1(asinh)); libmvec_hidden_proto (V_NAME_F1(atan)); libmvec_hidden_proto (V_NAME_F1(atanh)); +libmvec_hidden_proto (V_NAME_F1(cbrt)); libmvec_hidden_proto (V_NAME_F1(cos)); libmvec_hidden_proto (V_NAME_F1(cosh)); libmvec_hidden_proto (V_NAME_F1(erf)); diff --git a/sysdeps/aarch64/fpu/bits/math-vector.h b/sysdeps/aarch64/fpu/bits/math-vector.h index ca30177339..b1c024fe13 100644 --- a/sysdeps/aarch64/fpu/bits/math-vector.h +++ b/sysdeps/aarch64/fpu/bits/math-vector.h @@ -57,6 +57,10 @@ # define __DECL_SIMD_atan2 __DECL_SIMD_aarch64 # undef __DECL_SIMD_atan2f # define __DECL_SIMD_atan2f __DECL_SIMD_aarch64 +# undef __DECL_SIMD_cbrt +# define __DECL_SIMD_cbrt __DECL_SIMD_aarch64 +# undef __DECL_SIMD_cbrtf +# define __DECL_SIMD_cbrtf __DECL_SIMD_aarch64 # undef __DECL_SIMD_cos # define __DECL_SIMD_cos __DECL_SIMD_aarch64 # undef __DECL_SIMD_cosf @@ -158,6 +162,7 @@ __vpcs __f32x4_t _ZGVnN4v_asinf (__f32x4_t); __vpcs __f32x4_t _ZGVnN4v_asinhf (__f32x4_t); __vpcs __f32x4_t _ZGVnN4v_atanf (__f32x4_t); __vpcs __f32x4_t _ZGVnN4v_atanhf (__f32x4_t); +__vpcs __f32x4_t _ZGVnN4v_cbrtf (__f32x4_t); __vpcs __f32x4_t _ZGVnN4v_cosf (__f32x4_t); __vpcs __f32x4_t _ZGVnN4v_coshf (__f32x4_t); __vpcs __f32x4_t _ZGVnN4v_erff (__f32x4_t); @@ -183,6 +188,7 @@ __vpcs __f64x2_t _ZGVnN2v_asin (__f64x2_t); __vpcs __f64x2_t _ZGVnN2v_asinh (__f64x2_t); __vpcs __f64x2_t _ZGVnN2v_atan (__f64x2_t); __vpcs __f64x2_t _ZGVnN2v_atanh (__f64x2_t); +__vpcs __f64x2_t _ZGVnN2v_cbrt (__f64x2_t); __vpcs __f64x2_t _ZGVnN2v_cos (__f64x2_t); __vpcs __f64x2_t _ZGVnN2v_cosh (__f64x2_t); __vpcs __f64x2_t _ZGVnN2v_erf (__f64x2_t); @@ -213,6 +219,7 @@ __sv_f32_t _ZGVsMxv_asinf (__sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxv_asinhf (__sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxv_atanf (__sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxv_atanhf (__sv_f32_t, __sv_bool_t); +__sv_f32_t _ZGVsMxv_cbrtf (__sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxv_cosf (__sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxv_coshf (__sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxv_erff (__sv_f32_t, __sv_bool_t); @@ -238,6 +245,7 @@ __sv_f64_t _ZGVsMxv_asin (__sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_asinh (__sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_atan (__sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_atanh (__sv_f64_t, __sv_bool_t); +__sv_f64_t _ZGVsMxv_cbrt (__sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_cos (__sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_cosh (__sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_erf (__sv_f64_t, __sv_bool_t); diff --git a/sysdeps/aarch64/fpu/cbrt_advsimd.c b/sysdeps/aarch64/fpu/cbrt_advsimd.c new file mode 100644 index 0000000000..adfbb60cd3 --- /dev/null +++ b/sysdeps/aarch64/fpu/cbrt_advsimd.c @@ -0,0 +1,121 @@ +/* Double-precision vector (AdvSIMD) cbrt function + + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "v_math.h" +#include "poly_advsimd_f64.h" + +const static struct data +{ + float64x2_t poly[4], one_third, shift; + int64x2_t exp_bias; + uint64x2_t abs_mask, tiny_bound; + uint32x4_t thresh; + double table[5]; +} data = { + .shift = V2 (0x1.8p52), + .poly = { /* Generated with fpminimax in [0.5, 1]. */ + V2 (0x1.c14e8ee44767p-2), V2 (0x1.dd2d3f99e4c0ep-1), + V2 (-0x1.08e83026b7e74p-1), V2 (0x1.2c74eaa3ba428p-3) }, + .exp_bias = V2 (1022), + .abs_mask = V2(0x7fffffffffffffff), + .tiny_bound = V2(0x0010000000000000), /* Smallest normal. */ + .thresh = V4(0x7fe00000), /* asuint64 (infinity) - tiny_bound. */ + .one_third = V2(0x1.5555555555555p-2), + .table = { /* table[i] = 2^((i - 2) / 3). */ + 0x1.428a2f98d728bp-1, 0x1.965fea53d6e3dp-1, 0x1p0, + 0x1.428a2f98d728bp0, 0x1.965fea53d6e3dp0 } +}; + +#define MantissaMask v_u64 (0x000fffffffffffff) + +static float64x2_t NOINLINE VPCS_ATTR +special_case (float64x2_t x, float64x2_t y, uint32x2_t special) +{ + return v_call_f64 (cbrt, x, y, vmovl_u32 (special)); +} + +/* Approximation for double-precision vector cbrt(x), using low-order polynomial + and two Newton iterations. Greatest observed error is 1.79 ULP. Errors repeat + according to the exponent, for instance an error observed for double value + m * 2^e will be observed for any input m * 2^(e + 3*i), where i is an + integer. + __v_cbrt(0x1.fffff403f0bc6p+1) got 0x1.965fe72821e9bp+0 + want 0x1.965fe72821e99p+0. */ +VPCS_ATTR float64x2_t V_NAME_D1 (cbrt) (float64x2_t x) +{ + const struct data *d = ptr_barrier (&data); + uint64x2_t iax = vreinterpretq_u64_f64 (vabsq_f64 (x)); + + /* Subnormal, +/-0 and special values. */ + uint32x2_t special + = vcge_u32 (vsubhn_u64 (iax, d->tiny_bound), vget_low_u32 (d->thresh)); + + /* Decompose |x| into m * 2^e, where m is in [0.5, 1.0]. This is a vector + version of frexp, which gets subnormal values wrong - these have to be + special-cased as a result. */ + float64x2_t m = vbslq_f64 (MantissaMask, x, v_f64 (0.5)); + int64x2_t exp_bias = d->exp_bias; + uint64x2_t ia12 = vshrq_n_u64 (iax, 52); + int64x2_t e = vsubq_s64 (vreinterpretq_s64_u64 (ia12), exp_bias); + + /* Calculate rough approximation for cbrt(m) in [0.5, 1.0], starting point for + Newton iterations. */ + float64x2_t p = v_pairwise_poly_3_f64 (m, vmulq_f64 (m, m), d->poly); + float64x2_t one_third = d->one_third; + /* Two iterations of Newton's method for iteratively approximating cbrt. */ + float64x2_t m_by_3 = vmulq_f64 (m, one_third); + float64x2_t two_thirds = vaddq_f64 (one_third, one_third); + float64x2_t a + = vfmaq_f64 (vdivq_f64 (m_by_3, vmulq_f64 (p, p)), two_thirds, p); + a = vfmaq_f64 (vdivq_f64 (m_by_3, vmulq_f64 (a, a)), two_thirds, a); + + /* Assemble the result by the following: + + cbrt(x) = cbrt(m) * 2 ^ (e / 3). + + We can get 2 ^ round(e / 3) using ldexp and integer divide, but since e is + not necessarily a multiple of 3 we lose some information. + + Let q = 2 ^ round(e / 3), then t = 2 ^ (e / 3) / q. + + Then we know t = 2 ^ (i / 3), where i is the remainder from e / 3, which is + an integer in [-2, 2], and can be looked up in the table T. Hence the + result is assembled as: + + cbrt(x) = cbrt(m) * t * 2 ^ round(e / 3) * sign. */ + + float64x2_t ef = vcvtq_f64_s64 (e); + float64x2_t eb3f = vrndnq_f64 (vmulq_f64 (ef, one_third)); + int64x2_t em3 = vcvtq_s64_f64 (vfmsq_f64 (ef, eb3f, v_f64 (3))); + int64x2_t ey = vcvtq_s64_f64 (eb3f); + + float64x2_t my = (float64x2_t){ d->table[em3[0] + 2], d->table[em3[1] + 2] }; + my = vmulq_f64 (my, a); + + /* Vector version of ldexp. */ + float64x2_t y = vreinterpretq_f64_s64 ( + vshlq_n_s64 (vaddq_s64 (ey, vaddq_s64 (exp_bias, v_s64 (1))), 52)); + y = vmulq_f64 (y, my); + + if (__glibc_unlikely (v_any_u32h (special))) + return special_case (x, vbslq_f64 (d->abs_mask, y, x), special); + + /* Copy sign. */ + return vbslq_f64 (d->abs_mask, y, x); +} diff --git a/sysdeps/aarch64/fpu/cbrt_sve.c b/sysdeps/aarch64/fpu/cbrt_sve.c new file mode 100644 index 0000000000..fc976eda2a --- /dev/null +++ b/sysdeps/aarch64/fpu/cbrt_sve.c @@ -0,0 +1,128 @@ +/* Double-precision vector (SVE) cbrt function + + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "sv_math.h" +#include "poly_sve_f64.h" + +const static struct data +{ + float64_t poly[4]; + float64_t table[5]; + float64_t one_third, two_thirds, shift; + int64_t exp_bias; + uint64_t tiny_bound, thresh; +} data = { + /* Generated with FPMinimax in [0.5, 1]. */ + .poly = { 0x1.c14e8ee44767p-2, 0x1.dd2d3f99e4c0ep-1, -0x1.08e83026b7e74p-1, + 0x1.2c74eaa3ba428p-3, }, + /* table[i] = 2^((i - 2) / 3). */ + .table = { 0x1.428a2f98d728bp-1, 0x1.965fea53d6e3dp-1, 0x1p0, + 0x1.428a2f98d728bp0, 0x1.965fea53d6e3dp0, }, + .one_third = 0x1.5555555555555p-2, + .two_thirds = 0x1.5555555555555p-1, + .shift = 0x1.8p52, + .exp_bias = 1022, + .tiny_bound = 0x0010000000000000, /* Smallest normal. */ + .thresh = 0x7fe0000000000000, /* asuint64 (infinity) - tiny_bound. */ +}; + +#define MantissaMask 0x000fffffffffffff +#define HalfExp 0x3fe0000000000000 + +static svfloat64_t NOINLINE +special_case (svfloat64_t x, svfloat64_t y, svbool_t special) +{ + return sv_call_f64 (cbrt, x, y, special); +} + +static inline svfloat64_t +shifted_lookup (const svbool_t pg, const float64_t *table, svint64_t i) +{ + return svld1_gather_index (pg, table, svadd_x (pg, i, 2)); +} + +/* Approximation for double-precision vector cbrt(x), using low-order + polynomial and two Newton iterations. Greatest observed error is 1.79 ULP. + Errors repeat according to the exponent, for instance an error observed for + double value m * 2^e will be observed for any input m * 2^(e + 3*i), where i + is an integer. + _ZGVsMxv_cbrt (0x0.3fffb8d4413f3p-1022) got 0x1.965f53b0e5d97p-342 + want 0x1.965f53b0e5d95p-342. */ +svfloat64_t SV_NAME_D1 (cbrt) (svfloat64_t x, const svbool_t pg) +{ + const struct data *d = ptr_barrier (&data); + + svfloat64_t ax = svabs_x (pg, x); + svuint64_t iax = svreinterpret_u64 (ax); + svuint64_t sign = sveor_x (pg, svreinterpret_u64 (x), iax); + + /* Subnormal, +/-0 and special values. */ + svbool_t special = svcmpge (pg, svsub_x (pg, iax, d->tiny_bound), d->thresh); + + /* Decompose |x| into m * 2^e, where m is in [0.5, 1.0]. This is a vector + version of frexp, which gets subnormal values wrong - these have to be + special-cased as a result. */ + svfloat64_t m = svreinterpret_f64 (svorr_x ( + pg, svand_x (pg, svreinterpret_u64 (x), MantissaMask), HalfExp)); + svint64_t e + = svsub_x (pg, svreinterpret_s64 (svlsr_x (pg, iax, 52)), d->exp_bias); + + /* Calculate rough approximation for cbrt(m) in [0.5, 1.0], starting point + for Newton iterations. */ + svfloat64_t p + = sv_pairwise_poly_3_f64_x (pg, m, svmul_x (pg, m, m), d->poly); + + /* Two iterations of Newton's method for iteratively approximating cbrt. */ + svfloat64_t m_by_3 = svmul_x (pg, m, d->one_third); + svfloat64_t a = svmla_x (pg, svdiv_x (pg, m_by_3, svmul_x (pg, p, p)), p, + d->two_thirds); + a = svmla_x (pg, svdiv_x (pg, m_by_3, svmul_x (pg, a, a)), a, d->two_thirds); + + /* Assemble the result by the following: + + cbrt(x) = cbrt(m) * 2 ^ (e / 3). + + We can get 2 ^ round(e / 3) using ldexp and integer divide, but since e is + not necessarily a multiple of 3 we lose some information. + + Let q = 2 ^ round(e / 3), then t = 2 ^ (e / 3) / q. + + Then we know t = 2 ^ (i / 3), where i is the remainder from e / 3, which + is an integer in [-2, 2], and can be looked up in the table T. Hence the + result is assembled as: + + cbrt(x) = cbrt(m) * t * 2 ^ round(e / 3) * sign. */ + svfloat64_t eb3f = svmul_x (pg, svcvt_f64_x (pg, e), d->one_third); + svint64_t ey = svcvt_s64_x (pg, eb3f); + svint64_t em3 = svmls_x (pg, e, ey, 3); + + svfloat64_t my = shifted_lookup (pg, d->table, em3); + my = svmul_x (pg, my, a); + + /* Vector version of ldexp. */ + svfloat64_t y = svscale_x (pg, my, ey); + + if (__glibc_unlikely (svptest_any (pg, special))) + return special_case ( + x, svreinterpret_f64 (svorr_x (pg, svreinterpret_u64 (y), sign)), + special); + + /* Copy sign. */ + return svreinterpret_f64 (svorr_x (pg, svreinterpret_u64 (y), sign)); +} diff --git a/sysdeps/aarch64/fpu/cbrtf_advsimd.c b/sysdeps/aarch64/fpu/cbrtf_advsimd.c new file mode 100644 index 0000000000..27debb8b57 --- /dev/null +++ b/sysdeps/aarch64/fpu/cbrtf_advsimd.c @@ -0,0 +1,123 @@ +/* Single-precision vector (AdvSIMD) cbrt function + + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "v_math.h" +#include "poly_advsimd_f32.h" + +const static struct data +{ + float32x4_t poly[4], one_third; + float table[5]; +} data = { + .poly = { /* Very rough approximation of cbrt(x) in [0.5, 1], generated with + FPMinimax. */ + V4 (0x1.c14e96p-2), V4 (0x1.dd2d3p-1), V4 (-0x1.08e81ap-1), + V4 (0x1.2c74c2p-3) }, + .table = { /* table[i] = 2^((i - 2) / 3). */ + 0x1.428a3p-1, 0x1.965feap-1, 0x1p0, 0x1.428a3p0, 0x1.965feap0 }, + .one_third = V4 (0x1.555556p-2f), +}; + +#define SignMask v_u32 (0x80000000) +#define SmallestNormal v_u32 (0x00800000) +#define Thresh vdup_n_u16 (0x7f00) /* asuint(INFINITY) - SmallestNormal. */ +#define MantissaMask v_u32 (0x007fffff) +#define HalfExp v_u32 (0x3f000000) + +static float32x4_t VPCS_ATTR NOINLINE +special_case (float32x4_t x, float32x4_t y, uint16x4_t special) +{ + return v_call_f32 (cbrtf, x, y, vmovl_u16 (special)); +} + +static inline float32x4_t +shifted_lookup (const float *table, int32x4_t i) +{ + return (float32x4_t){ table[i[0] + 2], table[i[1] + 2], table[i[2] + 2], + table[i[3] + 2] }; +} + +/* Approximation for vector single-precision cbrt(x) using Newton iteration + with initial guess obtained by a low-order polynomial. Greatest error + is 1.64 ULP. This is observed for every value where the mantissa is + 0x1.85a2aa and the exponent is a multiple of 3, for example: + _ZGVnN4v_cbrtf(0x1.85a2aap+3) got 0x1.267936p+1 + want 0x1.267932p+1. */ +VPCS_ATTR float32x4_t V_NAME_F1 (cbrt) (float32x4_t x) +{ + const struct data *d = ptr_barrier (&data); + uint32x4_t iax = vreinterpretq_u32_f32 (vabsq_f32 (x)); + + /* Subnormal, +/-0 and special values. */ + uint16x4_t special = vcge_u16 (vsubhn_u32 (iax, SmallestNormal), Thresh); + + /* Decompose |x| into m * 2^e, where m is in [0.5, 1.0]. This is a vector + version of frexpf, which gets subnormal values wrong - these have to be + special-cased as a result. */ + float32x4_t m = vbslq_f32 (MantissaMask, x, v_f32 (0.5)); + int32x4_t e + = vsubq_s32 (vreinterpretq_s32_u32 (vshrq_n_u32 (iax, 23)), v_s32 (126)); + + /* p is a rough approximation for cbrt(m) in [0.5, 1.0]. The better this is, + the less accurate the next stage of the algorithm needs to be. An order-4 + polynomial is enough for one Newton iteration. */ + float32x4_t p = v_pairwise_poly_3_f32 (m, vmulq_f32 (m, m), d->poly); + + float32x4_t one_third = d->one_third; + float32x4_t two_thirds = vaddq_f32 (one_third, one_third); + + /* One iteration of Newton's method for iteratively approximating cbrt. */ + float32x4_t m_by_3 = vmulq_f32 (m, one_third); + float32x4_t a + = vfmaq_f32 (vdivq_f32 (m_by_3, vmulq_f32 (p, p)), two_thirds, p); + + /* Assemble the result by the following: + + cbrt(x) = cbrt(m) * 2 ^ (e / 3). + + We can get 2 ^ round(e / 3) using ldexp and integer divide, but since e is + not necessarily a multiple of 3 we lose some information. + + Let q = 2 ^ round(e / 3), then t = 2 ^ (e / 3) / q. + + Then we know t = 2 ^ (i / 3), where i is the remainder from e / 3, which + is an integer in [-2, 2], and can be looked up in the table T. Hence the + result is assembled as: + + cbrt(x) = cbrt(m) * t * 2 ^ round(e / 3) * sign. */ + float32x4_t ef = vmulq_f32 (vcvtq_f32_s32 (e), one_third); + int32x4_t ey = vcvtq_s32_f32 (ef); + int32x4_t em3 = vsubq_s32 (e, vmulq_s32 (ey, v_s32 (3))); + + float32x4_t my = shifted_lookup (d->table, em3); + my = vmulq_f32 (my, a); + + /* Vector version of ldexpf. */ + float32x4_t y + = vreinterpretq_f32_s32 (vshlq_n_s32 (vaddq_s32 (ey, v_s32 (127)), 23)); + y = vmulq_f32 (y, my); + + if (__glibc_unlikely (v_any_u16h (special))) + return special_case (x, vbslq_f32 (SignMask, x, y), special); + + /* Copy sign. */ + return vbslq_f32 (SignMask, x, y); +} +libmvec_hidden_def (V_NAME_F1 (cbrt)) +HALF_WIDTH_ALIAS_F1 (cbrt) diff --git a/sysdeps/aarch64/fpu/cbrtf_sve.c b/sysdeps/aarch64/fpu/cbrtf_sve.c new file mode 100644 index 0000000000..23c220c202 --- /dev/null +++ b/sysdeps/aarch64/fpu/cbrtf_sve.c @@ -0,0 +1,122 @@ +/* Single-precision vector (SVE) cbrt function + + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "sv_math.h" +#include "poly_sve_f32.h" + +const static struct data +{ + float32_t poly[4]; + float32_t table[5]; + float32_t one_third, two_thirds; +} data = { + /* Very rough approximation of cbrt(x) in [0.5, 1], generated with FPMinimax. + */ + .poly = { 0x1.c14e96p-2, 0x1.dd2d3p-1, -0x1.08e81ap-1, + 0x1.2c74c2p-3, }, + /* table[i] = 2^((i - 2) / 3). */ + .table = { 0x1.428a3p-1, 0x1.965feap-1, 0x1p0, 0x1.428a3p0, 0x1.965feap0 }, + .one_third = 0x1.555556p-2f, + .two_thirds = 0x1.555556p-1f, +}; + +#define SmallestNormal 0x00800000 +#define Thresh 0x7f000000 /* asuint(INFINITY) - SmallestNormal. */ +#define MantissaMask 0x007fffff +#define HalfExp 0x3f000000 + +static svfloat32_t NOINLINE +special_case (svfloat32_t x, svfloat32_t y, svbool_t special) +{ + return sv_call_f32 (cbrtf, x, y, special); +} + +static inline svfloat32_t +shifted_lookup (const svbool_t pg, const float32_t *table, svint32_t i) +{ + return svld1_gather_index (pg, table, svadd_x (pg, i, 2)); +} + +/* Approximation for vector single-precision cbrt(x) using Newton iteration + with initial guess obtained by a low-order polynomial. Greatest error + is 1.64 ULP. This is observed for every value where the mantissa is + 0x1.85a2aa and the exponent is a multiple of 3, for example: + _ZGVsMxv_cbrtf (0x1.85a2aap+3) got 0x1.267936p+1 + want 0x1.267932p+1. */ +svfloat32_t SV_NAME_F1 (cbrt) (svfloat32_t x, const svbool_t pg) +{ + const struct data *d = ptr_barrier (&data); + + svfloat32_t ax = svabs_x (pg, x); + svuint32_t iax = svreinterpret_u32 (ax); + svuint32_t sign = sveor_x (pg, svreinterpret_u32 (x), iax); + + /* Subnormal, +/-0 and special values. */ + svbool_t special = svcmpge (pg, svsub_x (pg, iax, SmallestNormal), Thresh); + + /* Decompose |x| into m * 2^e, where m is in [0.5, 1.0]. This is a vector + version of frexpf, which gets subnormal values wrong - these have to be + special-cased as a result. */ + svfloat32_t m = svreinterpret_f32 (svorr_x ( + pg, svand_x (pg, svreinterpret_u32 (x), MantissaMask), HalfExp)); + svint32_t e = svsub_x (pg, svreinterpret_s32 (svlsr_x (pg, iax, 23)), 126); + + /* p is a rough approximation for cbrt(m) in [0.5, 1.0]. The better this is, + the less accurate the next stage of the algorithm needs to be. An order-4 + polynomial is enough for one Newton iteration. */ + svfloat32_t p + = sv_pairwise_poly_3_f32_x (pg, m, svmul_x (pg, m, m), d->poly); + + /* One iteration of Newton's method for iteratively approximating cbrt. */ + svfloat32_t m_by_3 = svmul_x (pg, m, d->one_third); + svfloat32_t a = svmla_x (pg, svdiv_x (pg, m_by_3, svmul_x (pg, p, p)), p, + d->two_thirds); + + /* Assemble the result by the following: + + cbrt(x) = cbrt(m) * 2 ^ (e / 3). + + We can get 2 ^ round(e / 3) using ldexp and integer divide, but since e is + not necessarily a multiple of 3 we lose some information. + + Let q = 2 ^ round(e / 3), then t = 2 ^ (e / 3) / q. + + Then we know t = 2 ^ (i / 3), where i is the remainder from e / 3, which + is an integer in [-2, 2], and can be looked up in the table T. Hence the + result is assembled as: + + cbrt(x) = cbrt(m) * t * 2 ^ round(e / 3) * sign. */ + svfloat32_t ef = svmul_x (pg, svcvt_f32_x (pg, e), d->one_third); + svint32_t ey = svcvt_s32_x (pg, ef); + svint32_t em3 = svmls_x (pg, e, ey, 3); + + svfloat32_t my = shifted_lookup (pg, d->table, em3); + my = svmul_x (pg, my, a); + + /* Vector version of ldexpf. */ + svfloat32_t y = svscale_x (pg, my, ey); + + if (__glibc_unlikely (svptest_any (pg, special))) + return special_case ( + x, svreinterpret_f32 (svorr_x (pg, svreinterpret_u32 (y), sign)), + special); + + /* Copy sign. */ + return svreinterpret_f32 (svorr_x (pg, svreinterpret_u32 (y), sign)); +} diff --git a/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c b/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c index 417125be47..1877db3ac6 100644 --- a/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c +++ b/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c @@ -30,6 +30,7 @@ VPCS_VECTOR_WRAPPER (asinh_advsimd, _ZGVnN2v_asinh) VPCS_VECTOR_WRAPPER (atan_advsimd, _ZGVnN2v_atan) VPCS_VECTOR_WRAPPER (atanh_advsimd, _ZGVnN2v_atanh) VPCS_VECTOR_WRAPPER_ff (atan2_advsimd, _ZGVnN2vv_atan2) +VPCS_VECTOR_WRAPPER (cbrt_advsimd, _ZGVnN2v_cbrt) VPCS_VECTOR_WRAPPER (cos_advsimd, _ZGVnN2v_cos) VPCS_VECTOR_WRAPPER (cosh_advsimd, _ZGVnN2v_cosh) VPCS_VECTOR_WRAPPER (erf_advsimd, _ZGVnN2v_erf) diff --git a/sysdeps/aarch64/fpu/test-double-sve-wrappers.c b/sysdeps/aarch64/fpu/test-double-sve-wrappers.c index 31ebf18705..b702f942de 100644 --- a/sysdeps/aarch64/fpu/test-double-sve-wrappers.c +++ b/sysdeps/aarch64/fpu/test-double-sve-wrappers.c @@ -49,6 +49,7 @@ SVE_VECTOR_WRAPPER (asinh_sve, _ZGVsMxv_asinh) SVE_VECTOR_WRAPPER (atan_sve, _ZGVsMxv_atan) SVE_VECTOR_WRAPPER (atanh_sve, _ZGVsMxv_atanh) SVE_VECTOR_WRAPPER_ff (atan2_sve, _ZGVsMxvv_atan2) +SVE_VECTOR_WRAPPER (cbrt_sve, _ZGVsMxv_cbrt) SVE_VECTOR_WRAPPER (cos_sve, _ZGVsMxv_cos) SVE_VECTOR_WRAPPER (cosh_sve, _ZGVsMxv_cosh) SVE_VECTOR_WRAPPER (erf_sve, _ZGVsMxv_erf) diff --git a/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c b/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c index dab0f1cfcb..9cb451b4f0 100644 --- a/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c +++ b/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c @@ -30,6 +30,7 @@ VPCS_VECTOR_WRAPPER (asinhf_advsimd, _ZGVnN4v_asinhf) VPCS_VECTOR_WRAPPER (atanf_advsimd, _ZGVnN4v_atanf) VPCS_VECTOR_WRAPPER (atanhf_advsimd, _ZGVnN4v_atanhf) VPCS_VECTOR_WRAPPER_ff (atan2f_advsimd, _ZGVnN4vv_atan2f) +VPCS_VECTOR_WRAPPER (cbrtf_advsimd, _ZGVnN4v_cbrtf) VPCS_VECTOR_WRAPPER (cosf_advsimd, _ZGVnN4v_cosf) VPCS_VECTOR_WRAPPER (coshf_advsimd, _ZGVnN4v_coshf) VPCS_VECTOR_WRAPPER (erff_advsimd, _ZGVnN4v_erff) diff --git a/sysdeps/aarch64/fpu/test-float-sve-wrappers.c b/sysdeps/aarch64/fpu/test-float-sve-wrappers.c index 2aa6cbcc28..5b3dd22916 100644 --- a/sysdeps/aarch64/fpu/test-float-sve-wrappers.c +++ b/sysdeps/aarch64/fpu/test-float-sve-wrappers.c @@ -49,6 +49,7 @@ SVE_VECTOR_WRAPPER (asinhf_sve, _ZGVsMxv_asinhf) SVE_VECTOR_WRAPPER (atanf_sve, _ZGVsMxv_atanf) SVE_VECTOR_WRAPPER (atanhf_sve, _ZGVsMxv_atanhf) SVE_VECTOR_WRAPPER_ff (atan2f_sve, _ZGVsMxvv_atan2f) +SVE_VECTOR_WRAPPER (cbrtf_sve, _ZGVsMxv_cbrtf) SVE_VECTOR_WRAPPER (cosf_sve, _ZGVsMxv_cosf) SVE_VECTOR_WRAPPER (coshf_sve, _ZGVsMxv_coshf) SVE_VECTOR_WRAPPER (erff_sve, _ZGVsMxv_erff) diff --git a/sysdeps/aarch64/libm-test-ulps b/sysdeps/aarch64/libm-test-ulps index e7463d30bc..6d083c4e32 100644 --- a/sysdeps/aarch64/libm-test-ulps +++ b/sysdeps/aarch64/libm-test-ulps @@ -477,11 +477,19 @@ double: 4 float: 1 ldouble: 1 +Function: "cbrt_advsimd": +double: 1 +float: 1 + Function: "cbrt_downward": double: 4 float: 1 ldouble: 1 +Function: "cbrt_sve": +double: 1 +float: 1 + Function: "cbrt_towardzero": double: 3 float: 1 diff --git a/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist b/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist index 1184374efd..89ac1dfa36 100644 --- a/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist +++ b/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist @@ -79,6 +79,8 @@ GLIBC_2.40 _ZGVnN2v_asinh F GLIBC_2.40 _ZGVnN2v_asinhf F GLIBC_2.40 _ZGVnN2v_atanh F GLIBC_2.40 _ZGVnN2v_atanhf F +GLIBC_2.40 _ZGVnN2v_cbrt F +GLIBC_2.40 _ZGVnN2v_cbrtf F GLIBC_2.40 _ZGVnN2v_cosh F GLIBC_2.40 _ZGVnN2v_coshf F GLIBC_2.40 _ZGVnN2v_erf F @@ -94,6 +96,7 @@ GLIBC_2.40 _ZGVnN2vv_hypotf F GLIBC_2.40 _ZGVnN4v_acoshf F GLIBC_2.40 _ZGVnN4v_asinhf F GLIBC_2.40 _ZGVnN4v_atanhf F +GLIBC_2.40 _ZGVnN4v_cbrtf F GLIBC_2.40 _ZGVnN4v_coshf F GLIBC_2.40 _ZGVnN4v_erfcf F GLIBC_2.40 _ZGVnN4v_erff F @@ -106,6 +109,8 @@ GLIBC_2.40 _ZGVsMxv_asinh F GLIBC_2.40 _ZGVsMxv_asinhf F GLIBC_2.40 _ZGVsMxv_atanh F GLIBC_2.40 _ZGVsMxv_atanhf F +GLIBC_2.40 _ZGVsMxv_cbrt F +GLIBC_2.40 _ZGVsMxv_cbrtf F GLIBC_2.40 _ZGVsMxv_cosh F GLIBC_2.40 _ZGVsMxv_coshf F GLIBC_2.40 _ZGVsMxv_erf F From patchwork Tue Apr 30 12:50:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joe Ramsay X-Patchwork-Id: 1929499 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=o6FE5bWb; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=o6FE5bWb; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4VTKnJ1kWnz20fY for ; Tue, 30 Apr 2024 22:51:00 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 69199385842D for ; Tue, 30 Apr 2024 12:50:58 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR04-HE1-obe.outbound.protection.outlook.com (mail-he1eur04on2052.outbound.protection.outlook.com [40.107.7.52]) by sourceware.org (Postfix) with ESMTPS id D8CE03858C56 for ; Tue, 30 Apr 2024 12:50:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D8CE03858C56 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D8CE03858C56 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.7.52 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1714481436; cv=pass; b=PyjPy2bsu5EZI4kbNMLZZ0FdTDjWTySOnOpBA3bunf23JVKUACknJehZB4LON9xtm8EBeLOvnr4LJm9FTdfHWGDq3OPy5I5vY37Kcfkz22MXDf3UrvfIOFgsdCnSm8Y13BXJdrk4MXu2/TMLVaDaVD3HP2UV+FQ5PgQ5wvAyteQ= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1714481436; c=relaxed/simple; bh=chDQ/lFcMEI8gWOA2CSVhwcpYlaNd1pf6k+JX0bHv7k=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=ZuutHtgE19Z9Yop9w/XRGPjvypGtJ9BlA4zQK5NJGwJ22LkuOfkV7LTIpWVPjCW6pVee/7hyEBDabagEyhF0b6kzcxHvcYyOaV76AsS5sarDBt/ywMa9fLaCjpD6zhlfa8r7tgVu79qW7645mOb/9u2McjPSf2Pg1HQ6Lb4PFDU= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=eWIx+Ak3/F/wvZeNm1+ujxkec1bsLErs3PK0PVZJIubb4eI2A2BcWTDV3/VXJ/2Z2UQBhqUOQ2L69p347Y9yI0jSbWQYzQSxZnhpFO1XrBo6JQxKedvuds2gOaUiBgMxMSyJIc1Giwh5fq4xp/zq6M8g1yQ7I+pY8FBQBWdeqgzzvs1hj2GGt4J+a96DBGJvS4r+s20/kUZ5xqMC9wZ/3dDWNgxpO+FPFWm6ix4Ym1C27mPgJmA5g0acAcdE+LYTA30ls1NF3D6s0DIsHDms55FnSJ2w31w4jp85Y4j20y7hpnRduJ55K2lew/0AM7siZtH4yoAOi2bkWb4fJZeo0Q== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=CK7HWpYpyFmJOiCUL+0EKkal9XfsOy9s5iq2CbH6z0M=; b=IwjFvH9vb3jVf5qamiR0rDhGkBOyVLTx1yOhPB63jeCHwUClCIdTNalGXKtK1H+gCekrTr7HnBwRWE/7DZAA/rPIjI5CwDZAELE+lwXkkY0fsU0Q6oGDBaOBKhqkG3LdXtKGJrdZur4e4eINFsqm6rwwa3gfqqVWZAkbW2ipN3kQvALo2drZROCW3nOG2K53b0D90pNeWLPVyjT8YxLuntOsChSnT8hTxaN6lJAHlMvc96d52LSI45CQAnhhfXQPKhH7ROWlXWrrCjw7grcvjwNgsvy6Id1ODCtVml+MT0QqySQNoLhnZnV3h7SrKUoZXJk5N+eGPRhvjINvzbIODA== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=arm.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=CK7HWpYpyFmJOiCUL+0EKkal9XfsOy9s5iq2CbH6z0M=; b=o6FE5bWbZbyFlSlCKVFUf5Vi5k70qa2tePXSKMEFYryd6Pc+KrPk26QFSGf4Hk2E6uOq73azivvg2sT9PcEwqlKBuVfaqF5iz3b4T3QVjNBs7YYSJKQ6SPu3xW97cyqzipJvNOcBnMDBdFJ22wwQwZmRIlKy8A3cHg0ZhBFZwI0= Received: from AS8PR04CA0007.eurprd04.prod.outlook.com (2603:10a6:20b:310::12) by AS8PR08MB6277.eurprd08.prod.outlook.com (2603:10a6:20b:23d::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7519.36; Tue, 30 Apr 2024 12:50:23 +0000 Received: from AMS0EPF000001B7.eurprd05.prod.outlook.com (2603:10a6:20b:310:cafe::56) by AS8PR04CA0007.outlook.office365.com (2603:10a6:20b:310::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7519.35 via Frontend Transport; Tue, 30 Apr 2024 12:50:23 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=arm.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AMS0EPF000001B7.mail.protection.outlook.com (10.167.16.171) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.7544.18 via Frontend Transport; Tue, 30 Apr 2024 12:50:21 +0000 Received: ("Tessian outbound ba75727f6dca:v315"); Tue, 30 Apr 2024 12:50:21 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 8e3f59bcafd39b8b X-CR-MTA-TID: 64aa7808 Received: from 8d95d1ad307e.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id F792E464-C7B8-4801-97B9-3B8CE5FDEDE3.1; Tue, 30 Apr 2024 12:50:15 +0000 Received: from EUR03-AM7-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 8d95d1ad307e.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Tue, 30 Apr 2024 12:50:15 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=iz+0vUoGTHFlDtWnYXE8mzGg48cQSzwWCY0wRvEhFJMeJtO1m1a1nR6HgBSs+FuyoTexqD0xM4ruj6Te3RCJKv2FKPJRiZMEkM4gW2z4IGtdmRDeKXLo8YjhGPJoXzj2UruXxghOc0W8PtU2D82cRKYBSxNTBf8ire6sBHmySzZWlKNquWArgCsVnUO79FvDBSJPdeFddGT1VoiGIcTNJGBWjLo5q5HB4nje282HC1s6CCawgygt4VLOFxKk3UU8RsLafLEnpe7cx3JkcFwgnK29DG68wXOx2LEnknh8wPCCk6Hf3vWGLGJJz6qiiUKE0SVEFZ131whswaMukW6hMQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=CK7HWpYpyFmJOiCUL+0EKkal9XfsOy9s5iq2CbH6z0M=; b=jENvjSw7TSjSfXHye/CJ13MG/RNClXxWZr4ecqjuuYtHwKKLiqlEVZ9Z3uNwcfB/TE3xlzHLiIMvtMjlgxCnbhpjRuxXL+ekhyUAbQ/wTK4/Cr6f5UoIodz98ritOcWVyC+CYQw9d7yuVlK9xOoM0kjJgsAosauTbHdr9hXyjZnl4HiUAdpbv28OjbUGEXDLhi/0r/hg7MmPieykb7NhkOHIo9dA+efaGd7iBQ043ahiPA0Q9Sag4MMd4GgVirMLUUW6uALnd3y8o+3JvHvQAk6HUfL1HTDPjITb8aqqUlOFlu/zF6F05UI3jiikLWjUBlmOD1I7xH5LxlHS0mPgZQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 40.67.248.234) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=CK7HWpYpyFmJOiCUL+0EKkal9XfsOy9s5iq2CbH6z0M=; b=o6FE5bWbZbyFlSlCKVFUf5Vi5k70qa2tePXSKMEFYryd6Pc+KrPk26QFSGf4Hk2E6uOq73azivvg2sT9PcEwqlKBuVfaqF5iz3b4T3QVjNBs7YYSJKQ6SPu3xW97cyqzipJvNOcBnMDBdFJ22wwQwZmRIlKy8A3cHg0ZhBFZwI0= Received: from DB9PR01CA0025.eurprd01.prod.exchangelabs.com (2603:10a6:10:1d8::30) by GV1PR08MB7377.eurprd08.prod.outlook.com (2603:10a6:150:21::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7519.34; Tue, 30 Apr 2024 12:50:07 +0000 Received: from DU2PEPF00028D08.eurprd03.prod.outlook.com (2603:10a6:10:1d8:cafe::75) by DB9PR01CA0025.outlook.office365.com (2603:10a6:10:1d8::30) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7519.36 via Frontend Transport; Tue, 30 Apr 2024 12:50:07 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 40.67.248.234) smtp.mailfrom=arm.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 40.67.248.234 as permitted sender) receiver=protection.outlook.com; client-ip=40.67.248.234; helo=nebula.arm.com; pr=C Received: from nebula.arm.com (40.67.248.234) by DU2PEPF00028D08.mail.protection.outlook.com (10.167.242.168) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7544.18 via Frontend Transport; Tue, 30 Apr 2024 12:50:07 +0000 Received: from AZ-NEU-EX02.Emea.Arm.com (10.251.26.5) by AZ-NEU-EX03.Arm.com (10.251.24.31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Tue, 30 Apr 2024 12:50:04 +0000 Received: from AZ-NEU-EX04.Arm.com (10.251.24.32) by AZ-NEU-EX02.Emea.Arm.com (10.251.26.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Tue, 30 Apr 2024 12:50:03 +0000 Received: from vcn-man-apps.manchester.arm.com (10.32.108.22) by mail.arm.com (10.251.24.32) with Microsoft SMTP Server id 15.1.2507.35 via Frontend Transport; Tue, 30 Apr 2024 12:50:03 +0000 From: Joe Ramsay To: CC: Joe Ramsay Subject: [PATCH 3/3] aarch64/fpu: Add vector variants of pow Date: Tue, 30 Apr 2024 13:50:00 +0100 Message-ID: <20240430125000.50324-3-Joe.Ramsay@arm.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20240430125000.50324-1-Joe.Ramsay@arm.com> References: <20240430125000.50324-1-Joe.Ramsay@arm.com> MIME-Version: 1.0 X-EOPAttributedMessage: 1 X-MS-TrafficTypeDiagnostic: DU2PEPF00028D08:EE_|GV1PR08MB7377:EE_|AMS0EPF000001B7:EE_|AS8PR08MB6277:EE_ X-MS-Office365-Filtering-Correlation-Id: 5da8feb7-8cc5-495d-ced8-08dc691418d5 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; ARA:13230031|82310400014|36860700004|376005|1800799015; X-Microsoft-Antispam-Message-Info-Original: iGqwDnErrSBdzhoBQsXgshz/Em5QIZEoJ7jiybRKPAZvqXzg7TNbXpSnAobXSa4WgHowQ5GhMTt6iqz4MDIW2I2asLSZG2mu/daResrezxR+s/Udg4HIzgO4hWxPJvQta/NISxloA8n0L4k0fRcPXE6B6TL9iTkSc/jRDQHOhHRD7PLfbF1U3LWxe00txXFufOjzSmc6pifdWNKfxAX0lIFsSUxPnboNhQAj69I6jkN89AKZlZwiFjz6XuFLlQ/PNV5jZXdcrvhmC1f0w0Vn3VNc/I5nhR5zLmXKzftIg32C0Vq1NTu54dqpknJSJcH4VR61zKxn0P6I5kGkhXvR39Rw1f+7L0Jo/xqDXAj0wrffK9ICh1S30ECU6RLxbfPnkRnohvFHnTbCuimxE5AIrjqk/wUmMLL0e5wgVdp+FgQaEkIYIYFAVygFHw6aVruqfvtqX8ZTeoHGsBf2KAAxGWxuw7MAZYZ7/0B96W/5RWehhY6YFe1ppT/7thWzlY3rbDFXNtahzYizFlSCoX2hJHqAynE0ZDVPGfPOciXw4ZVZjlPozQe3dgP+V46O//U1zvP29ucscOtKQSCXIUqyb+iRVUoo1P5RqkpoOOfh70gaZA0WhESBmhFiq5jnOrjRPJ/qe1a++tjVGlbc8qTFIWcDUpbUL6Z4Oz46gp8a6QN24p/InSppZzpFwSJV7pe1IwPyVZJwMf4gLm8yUURDrAPA7+x1oLaS5r/7zzALPd4PBAtH1c9aA/VObmlRaTTMZ1H/NSDEMEmw7/YPC9AXXuduTPYafK4jcTbaK0u2W2w1IWALMjKp/9sUW5h+P5a6dQLO2B8Fa16sTPqUuOAWCykyiiKZM+UWXF0rx+mOT+R2MlDzn1a68av0+gbJmywKRGB0pJWHlljqWHKMSF5t3f5408s0uJ/1kEb4KpY0nJD8bxJeR9i8IYwnOJWHFy5CiXOoiQApvAddvQ5G3YwrROvLO2MUmsWIs0xJdImADKmiQ2t01W1PZEEie8tAgtiR77P1zV2R9QfwPFiMV/eMvMLDrtudz9CQnolz5upxRVUngZRC1AUSOCsrqE+inCIONgitwSi/ztUZZ1QbZ/+RTnPonSb0yKr1Pj3pvX2thawAhmT+bIfBEQ13ABqGrkBiGJk3pHMvuQtpQXMqECyRQfwSYjOSSbGNm3TVabgjhWHc+Ybr5Spsh9Gi+l9G3LfEFczMv7UCu/B8zVzm7Qu4IpCIYFEUsfoBCWQYI54rjZzdSioxm24cO0LnefeTsVU5GUULbfbDtq2BEpqTvDT2ul/u3UGt/o78YBccl3fiA1CoSa8puq2QadFhQHEWnzdH X-Forefront-Antispam-Report-Untrusted: CIP:40.67.248.234; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:nebula.arm.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230031)(82310400014)(36860700004)(376005)(1800799015); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: GV1PR08MB7377 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AMS0EPF000001B7.eurprd05.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 45695592-82da-4996-6928-08dc69141084 X-Microsoft-Antispam: BCL:0; ARA:13230031|1800799015|82310400014|376005|36860700004|35042699010; X-Microsoft-Antispam-Message-Info: mojBOSuBuMYyOd6jPZbKablxTFX65C0qN9XVRfGhhheX9KOM418yBJRH/Jl1gFJq9dagxqkRolTrNpU0mFcr7NCySD3jV84qTJaikE90m/Uz4rxxQofu2r4N/QLyBBSSeaERvyWLuxx/j3cS6R4FQHJg5ue5mvY+ymZhb4kwObzx0OldyY4YXIU84WWUopzdxX5rgNlqZLig1W6kTyNnOXhzLPQ9Tp+kgggBtFPDEY/wtR7y2iVlnJuNAvKgThlHrYa/R+wIyO+pDzlQ9AF29nXIaq+obQqDByIFRZYGIuElWpJSRp96sj23XpZ37+IIyxdp6/iHdFPO74x9t73GVLZ7BVd+D7Fg3AQWOZLmyEARvcC1DmbfOm75fzyr0wQ2LRXCoJ1eXn28iAaYmiCHtak8KkvAbLcAMlDjPMXk1RWpNYyfEMWs3DH/OXivWMUqKSFiPOEygTYs0/UvnI1XFSdWLq0EUHKvR5SWRdJCZLLrirA2ml7yTI3UfS5LhLjePn8XeTY48IcVOHjaJnItECG6nZ3CZHqDQ/rCc/BOHKBjY1PgauhTVQM3mbADtq8HbtsWiKwXE8Urwn1GiOyhGkyR2vmf+e1fA/2Y4CG/HVY53GuFp9ifWjmKLdHkSjqF/lsKOA6bxiM0rT64eA9euQDovEiSJMt1RaDNaGLBnZ1be3jJsscAwNOpBZyHBR4PhRgPZ0Ycx1rZtEi/B092ccbstQfJ62Xx6fJX9JAdNDEJod3vkBgW8V+JUx4gukpjDbGLzb39ksR74coWC5JLO6WY7uEOwWVoCdmyhPeNpV80ucbc/qr11G/LHq3Fy6U2RImU0vsZ6v/fqxPBNBAbPhDN53TXu85UYmWfdedHY/BdgTQDYYsLoQyIOk3WaMsT8TE9EBB6kqgbdD7YhtIdd4+nixmuRNoP8R85mvWiNP0k5MjOIfj8RPMzH+F9SLbSw/FXOSPY6uX2IFnKGx5tSjwSBMDRj8/hZfk87Omgb9EPXHHZO/2MFVDtd3ujmoqcSUxf/uNA6iNwhMvsaWM3t9olGrhGIQ6Zs2gmpUNedrlSZl2bICYvZbRTLDrY7g3s2rBs/zEu83/07Ee1WIUAn3TNq8M6arNJoJAePV/IzXRMnGxDGP8H0q0laXP0ZhQHggVfhquIE9oHwNlo2zB3m7b0bSYPEpQ8LJG/xJd5z8+rzVa5nFYC4MRc1wEGxl77Jvg60zHWQumYdQXkoLX9UvT+LcQP902M46aw3WIOMFKwEPakWXFCD34S7tf11wvKHcsyPE8aLM+7I/TR6njwjfh/H3tsRfGb+X+5csipA9TQwQe6dZQELu7rB9Pa/0Wv X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230031)(1800799015)(82310400014)(376005)(36860700004)(35042699010); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 Apr 2024 12:50:21.5914 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 5da8feb7-8cc5-495d-ced8-08dc691418d5 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AMS0EPF000001B7.eurprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB6277 X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FORGED_SPF_HELO, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Plus a small amount of moving includes around in order to be able to remove duplicate definition of asuint64. --- Thanks, Joe sysdeps/aarch64/fpu/Makefile | 6 +- sysdeps/aarch64/fpu/Versions | 5 + sysdeps/aarch64/fpu/advsimd_f32_protos.h | 1 + sysdeps/aarch64/fpu/atan2_advsimd.c | 1 + sysdeps/aarch64/fpu/atan2_sve.c | 1 + sysdeps/aarch64/fpu/bits/math-vector.h | 8 + sysdeps/aarch64/fpu/finite_pow.h | 373 ++++++++++++++++ sysdeps/aarch64/fpu/pow_advsimd.c | 249 +++++++++++ sysdeps/aarch64/fpu/pow_sve.c | 411 ++++++++++++++++++ sysdeps/aarch64/fpu/powf_advsimd.c | 210 +++++++++ sysdeps/aarch64/fpu/powf_sve.c | 335 ++++++++++++++ .../fpu/test-double-advsimd-wrappers.c | 1 + .../aarch64/fpu/test-double-sve-wrappers.c | 1 + .../aarch64/fpu/test-float-advsimd-wrappers.c | 1 + sysdeps/aarch64/fpu/test-float-sve-wrappers.c | 1 + sysdeps/aarch64/fpu/v_pow_exp_data.c | 301 +++++++++++++ sysdeps/aarch64/fpu/v_pow_log_data.c | 186 ++++++++ sysdeps/aarch64/fpu/v_powf_data.c | 102 +++++ sysdeps/aarch64/fpu/vecmath_config.h | 42 +- sysdeps/aarch64/libm-test-ulps | 8 + .../unix/sysv/linux/aarch64/libmvec.abilist | 5 + 21 files changed, 2236 insertions(+), 12 deletions(-) create mode 100644 sysdeps/aarch64/fpu/finite_pow.h create mode 100644 sysdeps/aarch64/fpu/pow_advsimd.c create mode 100644 sysdeps/aarch64/fpu/pow_sve.c create mode 100644 sysdeps/aarch64/fpu/powf_advsimd.c create mode 100644 sysdeps/aarch64/fpu/powf_sve.c create mode 100644 sysdeps/aarch64/fpu/v_pow_exp_data.c create mode 100644 sysdeps/aarch64/fpu/v_pow_log_data.c create mode 100644 sysdeps/aarch64/fpu/v_powf_data.c diff --git a/sysdeps/aarch64/fpu/Makefile b/sysdeps/aarch64/fpu/Makefile index 990d1135b9..234a6c457c 100644 --- a/sysdeps/aarch64/fpu/Makefile +++ b/sysdeps/aarch64/fpu/Makefile @@ -19,6 +19,7 @@ libmvec-supported-funcs = acos \ log10 \ log1p \ log2 \ + pow \ sin \ sinh \ tan \ @@ -44,7 +45,10 @@ libmvec-support = $(addsuffix f_advsimd,$(float-advsimd-funcs)) \ sv_erff_data \ v_exp_tail_data \ erfc_data \ - erfcf_data + erfcf_data \ + v_pow_exp_data \ + v_pow_log_data \ + v_powf_data endif sve-cflags = -march=armv8-a+sve diff --git a/sysdeps/aarch64/fpu/Versions b/sysdeps/aarch64/fpu/Versions index 36a9e4df1e..cc15ce2d1e 100644 --- a/sysdeps/aarch64/fpu/Versions +++ b/sysdeps/aarch64/fpu/Versions @@ -119,6 +119,11 @@ libmvec { _ZGVnN2vv_hypot; _ZGVsMxvv_hypotf; _ZGVsMxvv_hypot; + _ZGVnN4vv_powf; + _ZGVnN2vv_powf; + _ZGVnN2vv_pow; + _ZGVsMxvv_powf; + _ZGVsMxvv_pow; _ZGVnN2v_sinh; _ZGVnN2v_sinhf; _ZGVnN4v_sinhf; diff --git a/sysdeps/aarch64/fpu/advsimd_f32_protos.h b/sysdeps/aarch64/fpu/advsimd_f32_protos.h index 54858efd8a..097d403ffe 100644 --- a/sysdeps/aarch64/fpu/advsimd_f32_protos.h +++ b/sysdeps/aarch64/fpu/advsimd_f32_protos.h @@ -37,6 +37,7 @@ libmvec_hidden_proto (V_NAME_F1(log10)); libmvec_hidden_proto (V_NAME_F1(log1p)); libmvec_hidden_proto (V_NAME_F1(log2)); libmvec_hidden_proto (V_NAME_F1(log)); +libmvec_hidden_proto (V_NAME_F2(pow)); libmvec_hidden_proto (V_NAME_F1(sin)); libmvec_hidden_proto (V_NAME_F1(sinh)); libmvec_hidden_proto (V_NAME_F1(tan)); diff --git a/sysdeps/aarch64/fpu/atan2_advsimd.c b/sysdeps/aarch64/fpu/atan2_advsimd.c index 2fd6164134..b1e7a9b8fc 100644 --- a/sysdeps/aarch64/fpu/atan2_advsimd.c +++ b/sysdeps/aarch64/fpu/atan2_advsimd.c @@ -17,6 +17,7 @@ License along with the GNU C Library; if not, see . */ +#include "math_config.h" #include "v_math.h" #include "poly_advsimd_f64.h" diff --git a/sysdeps/aarch64/fpu/atan2_sve.c b/sysdeps/aarch64/fpu/atan2_sve.c index 04fa71fa37..ed9f683436 100644 --- a/sysdeps/aarch64/fpu/atan2_sve.c +++ b/sysdeps/aarch64/fpu/atan2_sve.c @@ -17,6 +17,7 @@ License along with the GNU C Library; if not, see . */ +#include "math_config.h" #include "sv_math.h" #include "poly_sve_f64.h" diff --git a/sysdeps/aarch64/fpu/bits/math-vector.h b/sysdeps/aarch64/fpu/bits/math-vector.h index b1c024fe13..7484150131 100644 --- a/sysdeps/aarch64/fpu/bits/math-vector.h +++ b/sysdeps/aarch64/fpu/bits/math-vector.h @@ -113,6 +113,10 @@ # define __DECL_SIMD_log2 __DECL_SIMD_aarch64 # undef __DECL_SIMD_log2f # define __DECL_SIMD_log2f __DECL_SIMD_aarch64 +# undef __DECL_SIMD_pow +# define __DECL_SIMD_pow __DECL_SIMD_aarch64 +# undef __DECL_SIMD_powf +# define __DECL_SIMD_powf __DECL_SIMD_aarch64 # undef __DECL_SIMD_sin # define __DECL_SIMD_sin __DECL_SIMD_aarch64 # undef __DECL_SIMD_sinf @@ -176,6 +180,7 @@ __vpcs __f32x4_t _ZGVnN4v_logf (__f32x4_t); __vpcs __f32x4_t _ZGVnN4v_log10f (__f32x4_t); __vpcs __f32x4_t _ZGVnN4v_log1pf (__f32x4_t); __vpcs __f32x4_t _ZGVnN4v_log2f (__f32x4_t); +__vpcs __f32x4_t _ZGVnN4vv_powf (__f32x4_t, __f32x4_t); __vpcs __f32x4_t _ZGVnN4v_sinf (__f32x4_t); __vpcs __f32x4_t _ZGVnN4v_sinhf (__f32x4_t); __vpcs __f32x4_t _ZGVnN4v_tanf (__f32x4_t); @@ -202,6 +207,7 @@ __vpcs __f64x2_t _ZGVnN2v_log (__f64x2_t); __vpcs __f64x2_t _ZGVnN2v_log10 (__f64x2_t); __vpcs __f64x2_t _ZGVnN2v_log1p (__f64x2_t); __vpcs __f64x2_t _ZGVnN2v_log2 (__f64x2_t); +__vpcs __f64x2_t _ZGVnN2vv_pow (__f64x2_t, __f64x2_t); __vpcs __f64x2_t _ZGVnN2v_sin (__f64x2_t); __vpcs __f64x2_t _ZGVnN2v_sinh (__f64x2_t); __vpcs __f64x2_t _ZGVnN2v_tan (__f64x2_t); @@ -233,6 +239,7 @@ __sv_f32_t _ZGVsMxv_logf (__sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxv_log10f (__sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxv_log1pf (__sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxv_log2f (__sv_f32_t, __sv_bool_t); +__sv_f32_t _ZGVsMxvv_powf (__sv_f32_t, __sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxv_sinf (__sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxv_sinhf (__sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxv_tanf (__sv_f32_t, __sv_bool_t); @@ -259,6 +266,7 @@ __sv_f64_t _ZGVsMxv_log (__sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_log10 (__sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_log1p (__sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_log2 (__sv_f64_t, __sv_bool_t); +__sv_f64_t _ZGVsMxvv_pow (__sv_f64_t, __sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_sin (__sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_sinh (__sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_tan (__sv_f64_t, __sv_bool_t); diff --git a/sysdeps/aarch64/fpu/finite_pow.h b/sysdeps/aarch64/fpu/finite_pow.h new file mode 100644 index 0000000000..f9d5038145 --- /dev/null +++ b/sysdeps/aarch64/fpu/finite_pow.h @@ -0,0 +1,373 @@ +/* Double-precision x^y function. + + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "math_config.h" + +/* Scalar version of pow used for fallbacks in vector implementations. */ + +/* Data is defined in v_pow_log_data.c. */ +#define N_LOG (1 << V_POW_LOG_TABLE_BITS) +#define Off 0x3fe6955500000000 +#define As __v_pow_log_data.poly + +/* Data is defined in v_pow_exp_data.c. */ +#define N_EXP (1 << V_POW_EXP_TABLE_BITS) +#define SignBias (0x800 << V_POW_EXP_TABLE_BITS) +#define SmallExp 0x3c9 /* top12(0x1p-54). */ +#define BigExp 0x408 /* top12(512.0). */ +#define ThresExp 0x03f /* BigExp - SmallExp. */ +#define InvLn2N __v_pow_exp_data.n_over_ln2 +#define Ln2HiN __v_pow_exp_data.ln2_over_n_hi +#define Ln2LoN __v_pow_exp_data.ln2_over_n_lo +#define SBits __v_pow_exp_data.sbits +#define Cs __v_pow_exp_data.poly + +/* Constants associated with pow. */ +#define SmallPowX 0x001 /* top12(0x1p-126). */ +#define BigPowX 0x7ff /* top12(INFINITY). */ +#define ThresPowX 0x7fe /* BigPowX - SmallPowX. */ +#define SmallPowY 0x3be /* top12(0x1.e7b6p-65). */ +#define BigPowY 0x43e /* top12(0x1.749p62). */ +#define ThresPowY 0x080 /* BigPowY - SmallPowY. */ + +/* Top 12 bits of a double (sign and exponent bits). */ +static inline uint32_t +top12 (double x) +{ + return asuint64 (x) >> 52; +} + +/* Compute y+TAIL = log(x) where the rounded result is y and TAIL has about + additional 15 bits precision. IX is the bit representation of x, but + normalized in the subnormal range using the sign bit for the exponent. */ +static inline double +log_inline (uint64_t ix, double *tail) +{ + /* x = 2^k z; where z is in range [Off,2*Off) and exact. + The range is split into N subintervals. + The ith subinterval contains z and c is near its center. */ + uint64_t tmp = ix - Off; + int i = (tmp >> (52 - V_POW_LOG_TABLE_BITS)) & (N_LOG - 1); + int k = (int64_t) tmp >> 52; /* arithmetic shift. */ + uint64_t iz = ix - (tmp & 0xfffULL << 52); + double z = asdouble (iz); + double kd = (double) k; + + /* log(x) = k*Ln2 + log(c) + log1p(z/c-1). */ + double invc = __v_pow_log_data.invc[i]; + double logc = __v_pow_log_data.logc[i]; + double logctail = __v_pow_log_data.logctail[i]; + + /* Note: 1/c is j/N or j/N/2 where j is an integer in [N,2N) and + |z/c - 1| < 1/N, so r = z/c - 1 is exactly representible. */ + double r = fma (z, invc, -1.0); + + /* k*Ln2 + log(c) + r. */ + double t1 = kd * __v_pow_log_data.ln2_hi + logc; + double t2 = t1 + r; + double lo1 = kd * __v_pow_log_data.ln2_lo + logctail; + double lo2 = t1 - t2 + r; + + /* Evaluation is optimized assuming superscalar pipelined execution. */ + double ar = As[0] * r; + double ar2 = r * ar; + double ar3 = r * ar2; + /* k*Ln2 + log(c) + r + A[0]*r*r. */ + double hi = t2 + ar2; + double lo3 = fma (ar, r, -ar2); + double lo4 = t2 - hi + ar2; + /* p = log1p(r) - r - A[0]*r*r. */ + double p = (ar3 + * (As[1] + r * As[2] + + ar2 * (As[3] + r * As[4] + ar2 * (As[5] + r * As[6])))); + double lo = lo1 + lo2 + lo3 + lo4 + p; + double y = hi + lo; + *tail = hi - y + lo; + return y; +} + +/* Handle cases that may overflow or underflow when computing the result that + is scale*(1+TMP) without intermediate rounding. The bit representation of + scale is in SBITS, however it has a computed exponent that may have + overflown into the sign bit so that needs to be adjusted before using it as + a double. (int32_t)KI is the k used in the argument reduction and exponent + adjustment of scale, positive k here means the result may overflow and + negative k means the result may underflow. */ +static inline double +special_case (double tmp, uint64_t sbits, uint64_t ki) +{ + double scale, y; + + if ((ki & 0x80000000) == 0) + { + /* k > 0, the exponent of scale might have overflowed by <= 460. */ + sbits -= 1009ull << 52; + scale = asdouble (sbits); + y = 0x1p1009 * (scale + scale * tmp); + return y; + } + /* k < 0, need special care in the subnormal range. */ + sbits += 1022ull << 52; + /* Note: sbits is signed scale. */ + scale = asdouble (sbits); + y = scale + scale * tmp; +#if WANT_SIMD_EXCEPT + if (fabs (y) < 1.0) + { + /* Round y to the right precision before scaling it into the subnormal + range to avoid double rounding that can cause 0.5+E/2 ulp error where + E is the worst-case ulp error outside the subnormal range. So this + is only useful if the goal is better than 1 ulp worst-case error. */ + double hi, lo, one = 1.0; + if (y < 0.0) + one = -1.0; + lo = scale - y + scale * tmp; + hi = one + y; + lo = one - hi + y + lo; + y = (hi + lo) - one; + /* Fix the sign of 0. */ + if (y == 0.0) + y = asdouble (sbits & 0x8000000000000000); + /* The underflow exception needs to be signaled explicitly. */ + force_eval_double (opt_barrier_double (0x1p-1022) * 0x1p-1022); + } +#endif + y = 0x1p-1022 * y; + return y; +} + +/* Computes sign*exp(x+xtail) where |xtail| < 2^-8/N and |xtail| <= |x|. + The sign_bias argument is SignBias or 0 and sets the sign to -1 or 1. */ +static inline double +exp_inline (double x, double xtail, uint32_t sign_bias) +{ + uint32_t abstop = top12 (x) & 0x7ff; + if (__glibc_unlikely (abstop - SmallExp >= ThresExp)) + { + if (abstop - SmallExp >= 0x80000000) + { + /* Avoid spurious underflow for tiny x. */ + /* Note: 0 is common input. */ + return sign_bias ? -1.0 : 1.0; + } + if (abstop >= top12 (1024.0)) + { + /* Note: inf and nan are already handled. */ + /* Skip errno handling. */ +#if WANT_SIMD_EXCEPT + return asuint64 (x) >> 63 ? __math_uflow (sign_bias) + : __math_oflow (sign_bias); +#else + double res_uoflow = asuint64 (x) >> 63 ? 0.0 : INFINITY; + return sign_bias ? -res_uoflow : res_uoflow; +#endif + } + /* Large x is special cased below. */ + abstop = 0; + } + + /* exp(x) = 2^(k/N) * exp(r), with exp(r) in [2^(-1/2N),2^(1/2N)]. */ + /* x = ln2/N*k + r, with int k and r in [-ln2/2N, ln2/2N]. */ + double z = InvLn2N * x; + double kd = round (z); + uint64_t ki = lround (z); + double r = x - kd * Ln2HiN - kd * Ln2LoN; + /* The code assumes 2^-200 < |xtail| < 2^-8/N. */ + r += xtail; + /* 2^(k/N) ~= scale. */ + uint64_t idx = ki & (N_EXP - 1); + uint64_t top = (ki + sign_bias) << (52 - V_POW_EXP_TABLE_BITS); + /* This is only a valid scale when -1023*N < k < 1024*N. */ + uint64_t sbits = SBits[idx] + top; + /* exp(x) = 2^(k/N) * exp(r) ~= scale + scale * (exp(r) - 1). */ + /* Evaluation is optimized assuming superscalar pipelined execution. */ + double r2 = r * r; + double tmp = r + r2 * Cs[0] + r * r2 * (Cs[1] + r * Cs[2]); + if (__glibc_unlikely (abstop == 0)) + return special_case (tmp, sbits, ki); + double scale = asdouble (sbits); + /* Note: tmp == 0 or |tmp| > 2^-200 and scale > 2^-739, so there + is no spurious underflow here even without fma. */ + return scale + scale * tmp; +} + +/* Computes exp(x+xtail) where |xtail| < 2^-8/N and |xtail| <= |x|. + A version of exp_inline that is not inlined and for which sign_bias is + equal to 0. */ +static double NOINLINE +exp_nosignbias (double x, double xtail) +{ + uint32_t abstop = top12 (x) & 0x7ff; + if (__glibc_unlikely (abstop - SmallExp >= ThresExp)) + { + /* Avoid spurious underflow for tiny x. */ + if (abstop - SmallExp >= 0x80000000) + return 1.0; + /* Note: inf and nan are already handled. */ + if (abstop >= top12 (1024.0)) +#if WANT_SIMD_EXCEPT + return asuint64 (x) >> 63 ? __math_uflow (0) : __math_oflow (0); +#else + return asuint64 (x) >> 63 ? 0.0 : INFINITY; +#endif + /* Large x is special cased below. */ + abstop = 0; + } + + /* exp(x) = 2^(k/N) * exp(r), with exp(r) in [2^(-1/2N),2^(1/2N)]. */ + /* x = ln2/N*k + r, with k integer and r in [-ln2/2N, ln2/2N]. */ + double z = InvLn2N * x; + double kd = round (z); + uint64_t ki = lround (z); + double r = x - kd * Ln2HiN - kd * Ln2LoN; + /* The code assumes 2^-200 < |xtail| < 2^-8/N. */ + r += xtail; + /* 2^(k/N) ~= scale. */ + uint64_t idx = ki & (N_EXP - 1); + uint64_t top = ki << (52 - V_POW_EXP_TABLE_BITS); + /* This is only a valid scale when -1023*N < k < 1024*N. */ + uint64_t sbits = SBits[idx] + top; + /* exp(x) = 2^(k/N) * exp(r) ~= scale + scale * (tail + exp(r) - 1). */ + double r2 = r * r; + double tmp = r + r2 * Cs[0] + r * r2 * (Cs[1] + r * Cs[2]); + if (__glibc_unlikely (abstop == 0)) + return special_case (tmp, sbits, ki); + double scale = asdouble (sbits); + /* Note: tmp == 0 or |tmp| > 2^-200 and scale > 2^-739, so there + is no spurious underflow here even without fma. */ + return scale + scale * tmp; +} + +/* Returns 0 if not int, 1 if odd int, 2 if even int. The argument is + the bit representation of a non-zero finite floating-point value. */ +static inline int +checkint (uint64_t iy) +{ + int e = iy >> 52 & 0x7ff; + if (e < 0x3ff) + return 0; + if (e > 0x3ff + 52) + return 2; + if (iy & ((1ULL << (0x3ff + 52 - e)) - 1)) + return 0; + if (iy & (1ULL << (0x3ff + 52 - e))) + return 1; + return 2; +} + +/* Returns 1 if input is the bit representation of 0, infinity or nan. */ +static inline int +zeroinfnan (uint64_t i) +{ + return 2 * i - 1 >= 2 * asuint64 (INFINITY) - 1; +} + +static double NOINLINE +__pl_finite_pow (double x, double y) +{ + uint32_t sign_bias = 0; + uint64_t ix, iy; + uint32_t topx, topy; + + ix = asuint64 (x); + iy = asuint64 (y); + topx = top12 (x); + topy = top12 (y); + if (__glibc_unlikely (topx - SmallPowX >= ThresPowX + || (topy & 0x7ff) - SmallPowY >= ThresPowY)) + { + /* Note: if |y| > 1075 * ln2 * 2^53 ~= 0x1.749p62 then pow(x,y) = inf/0 + and if |y| < 2^-54 / 1075 ~= 0x1.e7b6p-65 then pow(x,y) = +-1. */ + /* Special cases: (x < 0x1p-126 or inf or nan) or + (|y| < 0x1p-65 or |y| >= 0x1p63 or nan). */ + if (__glibc_unlikely (zeroinfnan (iy))) + { + if (2 * iy == 0) + return issignaling_inline (x) ? x + y : 1.0; + if (ix == asuint64 (1.0)) + return issignaling_inline (y) ? x + y : 1.0; + if (2 * ix > 2 * asuint64 (INFINITY) + || 2 * iy > 2 * asuint64 (INFINITY)) + return x + y; + if (2 * ix == 2 * asuint64 (1.0)) + return 1.0; + if ((2 * ix < 2 * asuint64 (1.0)) == !(iy >> 63)) + return 0.0; /* |x|<1 && y==inf or |x|>1 && y==-inf. */ + return y * y; + } + if (__glibc_unlikely (zeroinfnan (ix))) + { + double x2 = x * x; + if (ix >> 63 && checkint (iy) == 1) + { + x2 = -x2; + sign_bias = 1; + } +#if WANT_SIMD_EXCEPT + if (2 * ix == 0 && iy >> 63) + return __math_divzero (sign_bias); +#endif + return iy >> 63 ? 1 / x2 : x2; + } + /* Here x and y are non-zero finite. */ + if (ix >> 63) + { + /* Finite x < 0. */ + int yint = checkint (iy); + if (yint == 0) +#if WANT_SIMD_EXCEPT + return __math_invalid (x); +#else + return __builtin_nan (""); +#endif + if (yint == 1) + sign_bias = SignBias; + ix &= 0x7fffffffffffffff; + topx &= 0x7ff; + } + if ((topy & 0x7ff) - SmallPowY >= ThresPowY) + { + /* Note: sign_bias == 0 here because y is not odd. */ + if (ix == asuint64 (1.0)) + return 1.0; + /* |y| < 2^-65, x^y ~= 1 + y*log(x). */ + if ((topy & 0x7ff) < SmallPowY) + return 1.0; +#if WANT_SIMD_EXCEPT + return (ix > asuint64 (1.0)) == (topy < 0x800) ? __math_oflow (0) + : __math_uflow (0); +#else + return (ix > asuint64 (1.0)) == (topy < 0x800) ? INFINITY : 0; +#endif + } + if (topx == 0) + { + /* Normalize subnormal x so exponent becomes negative. */ + ix = asuint64 (x * 0x1p52); + ix &= 0x7fffffffffffffff; + ix -= 52ULL << 52; + } + } + + double lo; + double hi = log_inline (ix, &lo); + double ehi = y * hi; + double elo = y * lo + fma (y, hi, -ehi); + return exp_inline (ehi, elo, sign_bias); +} diff --git a/sysdeps/aarch64/fpu/pow_advsimd.c b/sysdeps/aarch64/fpu/pow_advsimd.c new file mode 100644 index 0000000000..decda207a4 --- /dev/null +++ b/sysdeps/aarch64/fpu/pow_advsimd.c @@ -0,0 +1,249 @@ +/* Double-precision vector (AdvSIMD) pow function + + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "v_math.h" + +/* Defines parameters of the approximation and scalar fallback. */ +#include "finite_pow.h" + +#define VecSmallExp v_u64 (SmallExp) +#define VecThresExp v_u64 (ThresExp) + +#define VecSmallPowX v_u64 (SmallPowX) +#define VecThresPowX v_u64 (ThresPowX) +#define VecSmallPowY v_u64 (SmallPowY) +#define VecThresPowY v_u64 (ThresPowY) + +static const struct data +{ + float64x2_t log_poly[6]; + float64x2_t exp_poly[3]; + float64x2_t ln2_hi, ln2_lo; + float64x2_t shift, inv_ln2_n, ln2_hi_n, ln2_lo_n, small_powx; + uint64x2_t inf; +} data = { + /* Coefficients copied from v_pow_log_data.c + relative error: 0x1.11922ap-70 in [-0x1.6bp-8, 0x1.6bp-8] + Coefficients are scaled to match the scaling during evaluation. */ + .log_poly + = { V2 (0x1.555555555556p-2 * -2), V2 (-0x1.0000000000006p-2 * -2), + V2 (0x1.999999959554ep-3 * 4), V2 (-0x1.555555529a47ap-3 * 4), + V2 (0x1.2495b9b4845e9p-3 * -8), V2 (-0x1.0002b8b263fc3p-3 * -8) }, + .ln2_hi = V2 (0x1.62e42fefa3800p-1), + .ln2_lo = V2 (0x1.ef35793c76730p-45), + /* Polynomial coefficients: abs error: 1.43*2^-58, ulp error: 0.549 + (0.550 without fma) if |x| < ln2/512. */ + .exp_poly = { V2 (0x1.fffffffffffd4p-2), V2 (0x1.5555571d6ef9p-3), + V2 (0x1.5555576a5adcep-5) }, + .shift = V2 (0x1.8p52), /* round to nearest int. without intrinsics. */ + .inv_ln2_n = V2 (0x1.71547652b82fep8), /* N/ln2. */ + .ln2_hi_n = V2 (0x1.62e42fefc0000p-9), /* ln2/N. */ + .ln2_lo_n = V2 (-0x1.c610ca86c3899p-45), + .small_powx = V2 (0x1p-126), + .inf = V2 (0x7ff0000000000000) +}; + +#define A(i) data.log_poly[i] +#define C(i) data.exp_poly[i] + +/* This version implements an algorithm close to scalar pow but + - does not implement the trick in the exp's specialcase subroutine to avoid + double-rounding, + - does not use a tail in the exponential core computation, + - and pow's exp polynomial order and table bits might differ. + + Maximum measured error is 1.04 ULPs: + _ZGVnN2vv_pow(0x1.024a3e56b3c3p-136, 0x1.87910248b58acp-13) + got 0x1.f71162f473251p-1 + want 0x1.f71162f473252p-1. */ + +static inline float64x2_t +v_masked_lookup_f64 (const double *table, uint64x2_t i) +{ + return (float64x2_t){ + table[(i[0] >> (52 - V_POW_LOG_TABLE_BITS)) & (N_LOG - 1)], + table[(i[1] >> (52 - V_POW_LOG_TABLE_BITS)) & (N_LOG - 1)] + }; +} + +/* Compute y+TAIL = log(x) where the rounded result is y and TAIL has about + additional 15 bits precision. IX is the bit representation of x, but + normalized in the subnormal range using the sign bit for the exponent. */ +static inline float64x2_t +v_log_inline (uint64x2_t ix, float64x2_t *tail, const struct data *d) +{ + /* x = 2^k z; where z is in range [OFF,2*OFF) and exact. + The range is split into N subintervals. + The ith subinterval contains z and c is near its center. */ + uint64x2_t tmp = vsubq_u64 (ix, v_u64 (Off)); + int64x2_t k + = vshrq_n_s64 (vreinterpretq_s64_u64 (tmp), 52); /* arithmetic shift. */ + uint64x2_t iz = vsubq_u64 (ix, vandq_u64 (tmp, v_u64 (0xfffULL << 52))); + float64x2_t z = vreinterpretq_f64_u64 (iz); + float64x2_t kd = vcvtq_f64_s64 (k); + /* log(x) = k*Ln2 + log(c) + log1p(z/c-1). */ + float64x2_t invc = v_masked_lookup_f64 (__v_pow_log_data.invc, tmp); + float64x2_t logc = v_masked_lookup_f64 (__v_pow_log_data.logc, tmp); + float64x2_t logctail = v_masked_lookup_f64 (__v_pow_log_data.logctail, tmp); + /* Note: 1/c is j/N or j/N/2 where j is an integer in [N,2N) and + |z/c - 1| < 1/N, so r = z/c - 1 is exactly representible. */ + float64x2_t r = vfmaq_f64 (v_f64 (-1.0), z, invc); + /* k*Ln2 + log(c) + r. */ + float64x2_t t1 = vfmaq_f64 (logc, kd, d->ln2_hi); + float64x2_t t2 = vaddq_f64 (t1, r); + float64x2_t lo1 = vfmaq_f64 (logctail, kd, d->ln2_lo); + float64x2_t lo2 = vaddq_f64 (vsubq_f64 (t1, t2), r); + /* Evaluation is optimized assuming superscalar pipelined execution. */ + float64x2_t ar = vmulq_f64 (v_f64 (-0.5), r); + float64x2_t ar2 = vmulq_f64 (r, ar); + float64x2_t ar3 = vmulq_f64 (r, ar2); + /* k*Ln2 + log(c) + r + A[0]*r*r. */ + float64x2_t hi = vaddq_f64 (t2, ar2); + float64x2_t lo3 = vfmaq_f64 (vnegq_f64 (ar2), ar, r); + float64x2_t lo4 = vaddq_f64 (vsubq_f64 (t2, hi), ar2); + /* p = log1p(r) - r - A[0]*r*r. */ + float64x2_t a56 = vfmaq_f64 (A (4), r, A (5)); + float64x2_t a34 = vfmaq_f64 (A (2), r, A (3)); + float64x2_t a12 = vfmaq_f64 (A (0), r, A (1)); + float64x2_t p = vfmaq_f64 (a34, ar2, a56); + p = vfmaq_f64 (a12, ar2, p); + p = vmulq_f64 (ar3, p); + float64x2_t lo + = vaddq_f64 (vaddq_f64 (vaddq_f64 (vaddq_f64 (lo1, lo2), lo3), lo4), p); + float64x2_t y = vaddq_f64 (hi, lo); + *tail = vaddq_f64 (vsubq_f64 (hi, y), lo); + return y; +} + +static float64x2_t VPCS_ATTR NOINLINE +exp_special_case (float64x2_t x, float64x2_t xtail) +{ + return (float64x2_t){ exp_nosignbias (x[0], xtail[0]), + exp_nosignbias (x[1], xtail[1]) }; +} + +/* Computes sign*exp(x+xtail) where |xtail| < 2^-8/N and |xtail| <= |x|. */ +static inline float64x2_t +v_exp_inline (float64x2_t x, float64x2_t xtail, const struct data *d) +{ + /* Fallback to scalar exp_inline for all lanes if any lane + contains value of x s.t. |x| <= 2^-54 or >= 512. */ + uint64x2_t abstop + = vshrq_n_u64 (vandq_u64 (vreinterpretq_u64_f64 (x), d->inf), 52); + uint64x2_t uoflowx + = vcgeq_u64 (vsubq_u64 (abstop, VecSmallExp), VecThresExp); + if (__glibc_unlikely (v_any_u64 (uoflowx))) + return exp_special_case (x, xtail); + + /* exp(x) = 2^(k/N) * exp(r), with exp(r) in [2^(-1/2N),2^(1/2N)]. */ + /* x = ln2/N*k + r, with k integer and r in [-ln2/2N, ln2/2N]. */ + float64x2_t z = vmulq_f64 (d->inv_ln2_n, x); + /* z - kd is in [-1, 1] in non-nearest rounding modes. */ + float64x2_t kd = vaddq_f64 (z, d->shift); + uint64x2_t ki = vreinterpretq_u64_f64 (kd); + kd = vsubq_f64 (kd, d->shift); + float64x2_t r = vfmsq_f64 (x, kd, d->ln2_hi_n); + r = vfmsq_f64 (r, kd, d->ln2_lo_n); + /* The code assumes 2^-200 < |xtail| < 2^-8/N. */ + r = vaddq_f64 (r, xtail); + /* 2^(k/N) ~= scale. */ + uint64x2_t idx = vandq_u64 (ki, v_u64 (N_EXP - 1)); + uint64x2_t top = vshlq_n_u64 (ki, 52 - V_POW_EXP_TABLE_BITS); + /* This is only a valid scale when -1023*N < k < 1024*N. */ + uint64x2_t sbits = v_lookup_u64 (SBits, idx); + sbits = vaddq_u64 (sbits, top); + /* exp(x) = 2^(k/N) * exp(r) ~= scale + scale * (exp(r) - 1). */ + float64x2_t r2 = vmulq_f64 (r, r); + float64x2_t tmp = vfmaq_f64 (C (1), r, C (2)); + tmp = vfmaq_f64 (C (0), r, tmp); + tmp = vfmaq_f64 (r, r2, tmp); + float64x2_t scale = vreinterpretq_f64_u64 (sbits); + /* Note: tmp == 0 or |tmp| > 2^-200 and scale > 2^-739, so there + is no spurious underflow here even without fma. */ + return vfmaq_f64 (scale, scale, tmp); +} + +static float64x2_t NOINLINE VPCS_ATTR +scalar_fallback (float64x2_t x, float64x2_t y) +{ + return (float64x2_t){ __pl_finite_pow (x[0], y[0]), + __pl_finite_pow (x[1], y[1]) }; +} + +float64x2_t VPCS_ATTR V_NAME_D2 (pow) (float64x2_t x, float64x2_t y) +{ + const struct data *d = ptr_barrier (&data); + /* Case of x <= 0 is too complicated to be vectorised efficiently here, + fallback to scalar pow for all lanes if any x < 0 detected. */ + if (v_any_u64 (vclezq_s64 (vreinterpretq_s64_f64 (x)))) + return scalar_fallback (x, y); + + uint64x2_t vix = vreinterpretq_u64_f64 (x); + uint64x2_t viy = vreinterpretq_u64_f64 (y); + uint64x2_t iay = vandq_u64 (viy, d->inf); + + /* Special cases of x or y. */ +#if WANT_SIMD_EXCEPT + /* Small or large. */ + uint64x2_t vtopx = vshrq_n_u64 (vix, 52); + uint64x2_t vabstopy = vshrq_n_u64 (iay, 52); + uint64x2_t specialx + = vcgeq_u64 (vsubq_u64 (vtopx, VecSmallPowX), VecThresPowX); + uint64x2_t specialy + = vcgeq_u64 (vsubq_u64 (vabstopy, VecSmallPowY), VecThresPowY); +#else + /* The case y==0 does not trigger a special case, since in this case it is + necessary to fix the result only if x is a signalling nan, which already + triggers a special case. We test y==0 directly in the scalar fallback. */ + uint64x2_t iax = vandq_u64 (vix, d->inf); + uint64x2_t specialx = vcgeq_u64 (iax, d->inf); + uint64x2_t specialy = vcgeq_u64 (iay, d->inf); +#endif + uint64x2_t special = vorrq_u64 (specialx, specialy); + /* Fallback to scalar on all lanes if any lane is inf or nan. */ + if (__glibc_unlikely (v_any_u64 (special))) + return scalar_fallback (x, y); + + /* Small cases of x: |x| < 0x1p-126. */ + uint64x2_t smallx = vcaltq_f64 (x, d->small_powx); + if (__glibc_unlikely (v_any_u64 (smallx))) + { + /* Update ix if top 12 bits of x are 0. */ + uint64x2_t sub_x = vceqzq_u64 (vshrq_n_u64 (vix, 52)); + if (__glibc_unlikely (v_any_u64 (sub_x))) + { + /* Normalize subnormal x so exponent becomes negative. */ + uint64x2_t vix_norm = vreinterpretq_u64_f64 ( + vabsq_f64 (vmulq_f64 (x, vcvtq_f64_u64 (v_u64 (1ULL << 52))))); + vix_norm = vsubq_u64 (vix_norm, v_u64 (52ULL << 52)); + vix = vbslq_u64 (sub_x, vix_norm, vix); + } + } + + /* Vector Log(ix, &lo). */ + float64x2_t vlo; + float64x2_t vhi = v_log_inline (vix, &vlo, d); + + /* Vector Exp(y_loghi, y_loglo). */ + float64x2_t vehi = vmulq_f64 (y, vhi); + float64x2_t velo = vmulq_f64 (y, vlo); + float64x2_t vemi = vfmsq_f64 (vehi, y, vhi); + velo = vsubq_f64 (velo, vemi); + return v_exp_inline (vehi, velo, d); +} diff --git a/sysdeps/aarch64/fpu/pow_sve.c b/sysdeps/aarch64/fpu/pow_sve.c new file mode 100644 index 0000000000..4c0bf8956c --- /dev/null +++ b/sysdeps/aarch64/fpu/pow_sve.c @@ -0,0 +1,411 @@ +/* Double-precision vector (SVE) pow function + + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* This version share a similar algorithm as AOR scalar pow. + + The core computation consists in computing pow(x, y) as + + exp (y * log (x)). + + The algorithms for exp and log are very similar to scalar exp and log. + The log relies on table lookup for 3 variables and an order 8 polynomial. + It returns a high and a low contribution that are then passed to the exp, + to minimise the loss of accuracy in both routines. + The exp is based on 8-bit table lookup for scale and order-4 polynomial. + The SVE algorithm drops the tail in the exp computation at the price of + a lower accuracy, slightly above 1ULP. + The SVE algorithm also drops the special treatement of small (< 2^-65) and + large (> 2^63) finite values of |y|, as they only affect non-round to nearest + modes. + + Maximum measured error is 1.04 ULPs: + SV_NAME_D2 (pow) (0x1.3d2d45bc848acp+63, -0x1.a48a38b40cd43p-12) + got 0x1.f7116284221fcp-1 + want 0x1.f7116284221fdp-1. */ + +#include "math_config.h" +#include "sv_math.h" + +/* Data is defined in v_pow_log_data.c. */ +#define N_LOG (1 << V_POW_LOG_TABLE_BITS) +#define A __v_pow_log_data.poly +#define Off 0x3fe6955500000000 + +/* Data is defined in v_pow_exp_data.c. */ +#define N_EXP (1 << V_POW_EXP_TABLE_BITS) +#define SignBias (0x800 << V_POW_EXP_TABLE_BITS) +#define C __v_pow_exp_data.poly +#define SmallExp 0x3c9 /* top12(0x1p-54). */ +#define BigExp 0x408 /* top12(512.). */ +#define ThresExp 0x03f /* BigExp - SmallExp. */ +#define HugeExp 0x409 /* top12(1024.). */ + +/* Constants associated with pow. */ +#define SmallPowX 0x001 /* top12(0x1p-126). */ +#define BigPowX 0x7ff /* top12(INFINITY). */ +#define ThresPowX 0x7fe /* BigPowX - SmallPowX. */ +#define SmallPowY 0x3be /* top12(0x1.e7b6p-65). */ +#define BigPowY 0x43e /* top12(0x1.749p62). */ +#define ThresPowY 0x080 /* BigPowY - SmallPowY. */ + +/* Check if x is an integer. */ +static inline svbool_t +sv_isint (svbool_t pg, svfloat64_t x) +{ + return svcmpeq (pg, svrintz_z (pg, x), x); +} + +/* Check if x is real not integer valued. */ +static inline svbool_t +sv_isnotint (svbool_t pg, svfloat64_t x) +{ + return svcmpne (pg, svrintz_z (pg, x), x); +} + +/* Check if x is an odd integer. */ +static inline svbool_t +sv_isodd (svbool_t pg, svfloat64_t x) +{ + svfloat64_t y = svmul_x (pg, x, 0.5); + return sv_isnotint (pg, y); +} + +/* Returns 0 if not int, 1 if odd int, 2 if even int. The argument is + the bit representation of a non-zero finite floating-point value. */ +static inline int +checkint (uint64_t iy) +{ + int e = iy >> 52 & 0x7ff; + if (e < 0x3ff) + return 0; + if (e > 0x3ff + 52) + return 2; + if (iy & ((1ULL << (0x3ff + 52 - e)) - 1)) + return 0; + if (iy & (1ULL << (0x3ff + 52 - e))) + return 1; + return 2; +} + +/* Top 12 bits (sign and exponent of each double float lane). */ +static inline svuint64_t +sv_top12 (svfloat64_t x) +{ + return svlsr_x (svptrue_b64 (), svreinterpret_u64 (x), 52); +} + +/* Returns 1 if input is the bit representation of 0, infinity or nan. */ +static inline int +zeroinfnan (uint64_t i) +{ + return 2 * i - 1 >= 2 * asuint64 (INFINITY) - 1; +} + +/* Returns 1 if input is the bit representation of 0, infinity or nan. */ +static inline svbool_t +sv_zeroinfnan (svbool_t pg, svuint64_t i) +{ + return svcmpge (pg, svsub_x (pg, svmul_x (pg, i, 2), 1), + 2 * asuint64 (INFINITY) - 1); +} + +/* Handle cases that may overflow or underflow when computing the result that + is scale*(1+TMP) without intermediate rounding. The bit representation of + scale is in SBITS, however it has a computed exponent that may have + overflown into the sign bit so that needs to be adjusted before using it as + a double. (int32_t)KI is the k used in the argument reduction and exponent + adjustment of scale, positive k here means the result may overflow and + negative k means the result may underflow. */ +static inline double +specialcase (double tmp, uint64_t sbits, uint64_t ki) +{ + double scale; + if ((ki & 0x80000000) == 0) + { + /* k > 0, the exponent of scale might have overflowed by <= 460. */ + sbits -= 1009ull << 52; + scale = asdouble (sbits); + return 0x1p1009 * (scale + scale * tmp); + } + /* k < 0, need special care in the subnormal range. */ + sbits += 1022ull << 52; + /* Note: sbits is signed scale. */ + scale = asdouble (sbits); + double y = scale + scale * tmp; + return 0x1p-1022 * y; +} + +/* Scalar fallback for special cases of SVE pow's exp. */ +static inline svfloat64_t +sv_call_specialcase (svfloat64_t x1, svuint64_t u1, svuint64_t u2, + svfloat64_t y, svbool_t cmp) +{ + svbool_t p = svpfirst (cmp, svpfalse ()); + while (svptest_any (cmp, p)) + { + double sx1 = svclastb (p, 0, x1); + uint64_t su1 = svclastb (p, 0, u1); + uint64_t su2 = svclastb (p, 0, u2); + double elem = specialcase (sx1, su1, su2); + svfloat64_t y2 = sv_f64 (elem); + y = svsel (p, y2, y); + p = svpnext_b64 (cmp, p); + } + return y; +} + +/* Compute y+TAIL = log(x) where the rounded result is y and TAIL has about + additional 15 bits precision. IX is the bit representation of x, but + normalized in the subnormal range using the sign bit for the exponent. */ +static inline svfloat64_t +sv_log_inline (svbool_t pg, svuint64_t ix, svfloat64_t *tail) +{ + /* x = 2^k z; where z is in range [Off,2*Off) and exact. + The range is split into N subintervals. + The ith subinterval contains z and c is near its center. */ + svuint64_t tmp = svsub_x (pg, ix, Off); + svuint64_t i = svand_x (pg, svlsr_x (pg, tmp, 52 - V_POW_LOG_TABLE_BITS), + sv_u64 (N_LOG - 1)); + svint64_t k = svasr_x (pg, svreinterpret_s64 (tmp), 52); + svuint64_t iz = svsub_x (pg, ix, svand_x (pg, tmp, sv_u64 (0xfffULL << 52))); + svfloat64_t z = svreinterpret_f64 (iz); + svfloat64_t kd = svcvt_f64_x (pg, k); + + /* log(x) = k*Ln2 + log(c) + log1p(z/c-1). */ + /* SVE lookup requires 3 separate lookup tables, as opposed to scalar version + that uses array of structures. We also do the lookup earlier in the code to + make sure it finishes as early as possible. */ + svfloat64_t invc = svld1_gather_index (pg, __v_pow_log_data.invc, i); + svfloat64_t logc = svld1_gather_index (pg, __v_pow_log_data.logc, i); + svfloat64_t logctail = svld1_gather_index (pg, __v_pow_log_data.logctail, i); + + /* Note: 1/c is j/N or j/N/2 where j is an integer in [N,2N) and + |z/c - 1| < 1/N, so r = z/c - 1 is exactly representible. */ + svfloat64_t r = svmad_x (pg, z, invc, -1.0); + /* k*Ln2 + log(c) + r. */ + svfloat64_t t1 = svmla_x (pg, logc, kd, __v_pow_log_data.ln2_hi); + svfloat64_t t2 = svadd_x (pg, t1, r); + svfloat64_t lo1 = svmla_x (pg, logctail, kd, __v_pow_log_data.ln2_lo); + svfloat64_t lo2 = svadd_x (pg, svsub_x (pg, t1, t2), r); + + /* Evaluation is optimized assuming superscalar pipelined execution. */ + svfloat64_t ar = svmul_x (pg, r, -0.5); /* A[0] = -0.5. */ + svfloat64_t ar2 = svmul_x (pg, r, ar); + svfloat64_t ar3 = svmul_x (pg, r, ar2); + /* k*Ln2 + log(c) + r + A[0]*r*r. */ + svfloat64_t hi = svadd_x (pg, t2, ar2); + svfloat64_t lo3 = svmla_x (pg, svneg_x (pg, ar2), ar, r); + svfloat64_t lo4 = svadd_x (pg, svsub_x (pg, t2, hi), ar2); + /* p = log1p(r) - r - A[0]*r*r. */ + /* p = (ar3 * (A[1] + r * A[2] + ar2 * (A[3] + r * A[4] + ar2 * (A[5] + r * + A[6])))). */ + svfloat64_t a56 = svmla_x (pg, sv_f64 (A[5]), r, A[6]); + svfloat64_t a34 = svmla_x (pg, sv_f64 (A[3]), r, A[4]); + svfloat64_t a12 = svmla_x (pg, sv_f64 (A[1]), r, A[2]); + svfloat64_t p = svmla_x (pg, a34, ar2, a56); + p = svmla_x (pg, a12, ar2, p); + p = svmul_x (pg, ar3, p); + svfloat64_t lo = svadd_x ( + pg, svadd_x (pg, svadd_x (pg, svadd_x (pg, lo1, lo2), lo3), lo4), p); + svfloat64_t y = svadd_x (pg, hi, lo); + *tail = svadd_x (pg, svsub_x (pg, hi, y), lo); + return y; +} + +/* Computes sign*exp(x+xtail) where |xtail| < 2^-8/N and |xtail| <= |x|. + The sign_bias argument is SignBias or 0 and sets the sign to -1 or 1. */ +static inline svfloat64_t +sv_exp_inline (svbool_t pg, svfloat64_t x, svfloat64_t xtail, + svuint64_t sign_bias) +{ + /* 3 types of special cases: tiny (uflow and spurious uflow), huge (oflow) + and other cases of large values of x (scale * (1 + TMP) oflow). */ + svuint64_t abstop = svand_x (pg, sv_top12 (x), 0x7ff); + /* |x| is large (|x| >= 512) or tiny (|x| <= 0x1p-54). */ + svbool_t uoflow = svcmpge (pg, svsub_x (pg, abstop, SmallExp), ThresExp); + + /* Conditions special, uflow and oflow are all expressed as uoflow && + something, hence do not bother computing anything if no lane in uoflow is + true. */ + svbool_t special = svpfalse_b (); + svbool_t uflow = svpfalse_b (); + svbool_t oflow = svpfalse_b (); + if (__glibc_unlikely (svptest_any (pg, uoflow))) + { + /* |x| is tiny (|x| <= 0x1p-54). */ + uflow = svcmpge (pg, svsub_x (pg, abstop, SmallExp), 0x80000000); + uflow = svand_z (pg, uoflow, uflow); + /* |x| is huge (|x| >= 1024). */ + oflow = svcmpge (pg, abstop, HugeExp); + oflow = svand_z (pg, uoflow, svbic_z (pg, oflow, uflow)); + /* For large |x| values (512 < |x| < 1024) scale * (1 + TMP) can overflow + or underflow. */ + special = svbic_z (pg, uoflow, svorr_z (pg, uflow, oflow)); + } + + /* exp(x) = 2^(k/N) * exp(r), with exp(r) in [2^(-1/2N),2^(1/2N)]. */ + /* x = ln2/N*k + r, with int k and r in [-ln2/2N, ln2/2N]. */ + svfloat64_t z = svmul_x (pg, x, __v_pow_exp_data.n_over_ln2); + /* z - kd is in [-1, 1] in non-nearest rounding modes. */ + svfloat64_t shift = sv_f64 (__v_pow_exp_data.shift); + svfloat64_t kd = svadd_x (pg, z, shift); + svuint64_t ki = svreinterpret_u64 (kd); + kd = svsub_x (pg, kd, shift); + svfloat64_t r = x; + r = svmls_x (pg, r, kd, __v_pow_exp_data.ln2_over_n_hi); + r = svmls_x (pg, r, kd, __v_pow_exp_data.ln2_over_n_lo); + /* The code assumes 2^-200 < |xtail| < 2^-8/N. */ + r = svadd_x (pg, r, xtail); + /* 2^(k/N) ~= scale. */ + svuint64_t idx = svand_x (pg, ki, N_EXP - 1); + svuint64_t top + = svlsl_x (pg, svadd_x (pg, ki, sign_bias), 52 - V_POW_EXP_TABLE_BITS); + /* This is only a valid scale when -1023*N < k < 1024*N. */ + svuint64_t sbits = svld1_gather_index (pg, __v_pow_exp_data.sbits, idx); + sbits = svadd_x (pg, sbits, top); + /* exp(x) = 2^(k/N) * exp(r) ~= scale + scale * (exp(r) - 1). */ + svfloat64_t r2 = svmul_x (pg, r, r); + svfloat64_t tmp = svmla_x (pg, sv_f64 (C[1]), r, C[2]); + tmp = svmla_x (pg, sv_f64 (C[0]), r, tmp); + tmp = svmla_x (pg, r, r2, tmp); + svfloat64_t scale = svreinterpret_f64 (sbits); + /* Note: tmp == 0 or |tmp| > 2^-200 and scale > 2^-739, so there + is no spurious underflow here even without fma. */ + z = svmla_x (pg, scale, scale, tmp); + + /* Update result with special and large cases. */ + if (__glibc_unlikely (svptest_any (pg, special))) + z = sv_call_specialcase (tmp, sbits, ki, z, special); + + /* Handle underflow and overflow. */ + svuint64_t sign_bit = svlsr_x (pg, svreinterpret_u64 (x), 63); + svbool_t x_is_neg = svcmpne (pg, sign_bit, 0); + svuint64_t sign_mask = svlsl_x (pg, sign_bias, 52 - V_POW_EXP_TABLE_BITS); + svfloat64_t res_uoflow = svsel (x_is_neg, sv_f64 (0.0), sv_f64 (INFINITY)); + res_uoflow = svreinterpret_f64 ( + svorr_x (pg, svreinterpret_u64 (res_uoflow), sign_mask)); + z = svsel (oflow, res_uoflow, z); + /* Avoid spurious underflow for tiny x. */ + svfloat64_t res_spurious_uflow + = svreinterpret_f64 (svorr_x (pg, sign_mask, 0x3ff0000000000000)); + z = svsel (uflow, res_spurious_uflow, z); + + return z; +} + +static inline double +pow_sc (double x, double y) +{ + uint64_t ix = asuint64 (x); + uint64_t iy = asuint64 (y); + /* Special cases: |x| or |y| is 0, inf or nan. */ + if (__glibc_unlikely (zeroinfnan (iy))) + { + if (2 * iy == 0) + return issignaling_inline (x) ? x + y : 1.0; + if (ix == asuint64 (1.0)) + return issignaling_inline (y) ? x + y : 1.0; + if (2 * ix > 2 * asuint64 (INFINITY) || 2 * iy > 2 * asuint64 (INFINITY)) + return x + y; + if (2 * ix == 2 * asuint64 (1.0)) + return 1.0; + if ((2 * ix < 2 * asuint64 (1.0)) == !(iy >> 63)) + return 0.0; /* |x|<1 && y==inf or |x|>1 && y==-inf. */ + return y * y; + } + if (__glibc_unlikely (zeroinfnan (ix))) + { + double_t x2 = x * x; + if (ix >> 63 && checkint (iy) == 1) + x2 = -x2; + return (iy >> 63) ? 1 / x2 : x2; + } + return x; +} + +svfloat64_t SV_NAME_D2 (pow) (svfloat64_t x, svfloat64_t y, const svbool_t pg) +{ + /* This preamble handles special case conditions used in the final scalar + fallbacks. It also updates ix and sign_bias, that are used in the core + computation too, i.e., exp( y * log (x) ). */ + svuint64_t vix0 = svreinterpret_u64 (x); + svuint64_t viy0 = svreinterpret_u64 (y); + svuint64_t vtopx0 = svlsr_x (svptrue_b64 (), vix0, 52); + + /* Negative x cases. */ + svuint64_t sign_bit = svlsr_m (pg, vix0, 63); + svbool_t xisneg = svcmpeq (pg, sign_bit, 1); + + /* Set sign_bias and ix depending on sign of x and nature of y. */ + svbool_t yisnotint_xisneg = svpfalse_b (); + svuint64_t sign_bias = sv_u64 (0); + svuint64_t vix = vix0; + svuint64_t vtopx1 = vtopx0; + if (__glibc_unlikely (svptest_any (pg, xisneg))) + { + /* Determine nature of y. */ + yisnotint_xisneg = sv_isnotint (xisneg, y); + svbool_t yisint_xisneg = sv_isint (xisneg, y); + svbool_t yisodd_xisneg = sv_isodd (xisneg, y); + /* ix set to abs(ix) if y is integer. */ + vix = svand_m (yisint_xisneg, vix0, 0x7fffffffffffffff); + vtopx1 = svand_m (yisint_xisneg, vtopx0, 0x7ff); + /* Set to SignBias if x is negative and y is odd. */ + sign_bias = svsel (yisodd_xisneg, sv_u64 (SignBias), sv_u64 (0)); + } + + /* Special cases of x or y: zero, inf and nan. */ + svbool_t xspecial = sv_zeroinfnan (pg, vix0); + svbool_t yspecial = sv_zeroinfnan (pg, viy0); + svbool_t special = svorr_z (pg, xspecial, yspecial); + + /* Small cases of x: |x| < 0x1p-126. */ + svuint64_t vabstopx0 = svand_x (pg, vtopx0, 0x7ff); + svbool_t xsmall = svcmplt (pg, vabstopx0, SmallPowX); + if (__glibc_unlikely (svptest_any (pg, xsmall))) + { + /* Normalize subnormal x so exponent becomes negative. */ + svbool_t topx_is_null = svcmpeq (xsmall, vtopx1, 0); + + svuint64_t vix_norm = svreinterpret_u64 (svmul_m (xsmall, x, 0x1p52)); + vix_norm = svand_m (xsmall, vix_norm, 0x7fffffffffffffff); + vix_norm = svsub_m (xsmall, vix_norm, 52ULL << 52); + vix = svsel (topx_is_null, vix_norm, vix); + } + + /* y_hi = log(ix, &y_lo). */ + svfloat64_t vlo; + svfloat64_t vhi = sv_log_inline (pg, vix, &vlo); + + /* z = exp(y_hi, y_lo, sign_bias). */ + svfloat64_t vehi = svmul_x (pg, y, vhi); + svfloat64_t velo = svmul_x (pg, y, vlo); + svfloat64_t vemi = svmls_x (pg, vehi, y, vhi); + velo = svsub_x (pg, velo, vemi); + svfloat64_t vz = sv_exp_inline (pg, vehi, velo, sign_bias); + + /* Cases of finite y and finite negative x. */ + vz = svsel (yisnotint_xisneg, sv_f64 (__builtin_nan ("")), vz); + + /* Cases of zero/inf/nan x or y. */ + if (__glibc_unlikely (svptest_any (pg, special))) + vz = sv_call2_f64 (pow_sc, x, y, vz, special); + + return vz; +} diff --git a/sysdeps/aarch64/fpu/powf_advsimd.c b/sysdeps/aarch64/fpu/powf_advsimd.c new file mode 100644 index 0000000000..8232e70436 --- /dev/null +++ b/sysdeps/aarch64/fpu/powf_advsimd.c @@ -0,0 +1,210 @@ +/* Single-precision vector (AdvSIMD) pow function + + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "math_config.h" +#include "v_math.h" + +#define Min v_u32 (0x00800000) +#define Max v_u32 (0x7f800000) +#define Thresh v_u32 (0x7f000000) /* Max - Min. */ +#define MantissaMask v_u32 (0x007fffff) + +#define A d->log2_poly +#define C d->exp2f_poly + +/* 2.6 ulp ~ 0.5 + 2^24 (128*Ln2*relerr_log2 + relerr_exp2). */ +#define Off v_u32 (0x3f35d000) + +#define V_POWF_LOG2_TABLE_BITS 5 +#define V_EXP2F_TABLE_BITS 5 +#define Log2IdxMask ((1 << V_POWF_LOG2_TABLE_BITS) - 1) +#define Scale ((double) (1 << V_EXP2F_TABLE_BITS)) + +static const struct data +{ + struct + { + double invc, logc; + } log2_tab[1 << V_POWF_LOG2_TABLE_BITS]; + float64x2_t log2_poly[4]; + uint64_t exp2f_tab[1 << V_EXP2F_TABLE_BITS]; + float64x2_t exp2f_poly[3]; +} data = { + .log2_tab = {{0x1.6489890582816p+0, -0x1.e960f97b22702p-2 * Scale}, + {0x1.5cf19b35e3472p+0, -0x1.c993406cd4db6p-2 * Scale}, + {0x1.55aac0e956d65p+0, -0x1.aa711d9a7d0f3p-2 * Scale}, + {0x1.4eb0022977e01p+0, -0x1.8bf37bacdce9bp-2 * Scale}, + {0x1.47fcccda1dd1fp+0, -0x1.6e13b3519946ep-2 * Scale}, + {0x1.418ceabab68c1p+0, -0x1.50cb8281e4089p-2 * Scale}, + {0x1.3b5c788f1edb3p+0, -0x1.341504a237e2bp-2 * Scale}, + {0x1.3567de48e9c9ap+0, -0x1.17eaab624ffbbp-2 * Scale}, + {0x1.2fabc80fd19bap+0, -0x1.f88e708f8c853p-3 * Scale}, + {0x1.2a25200ce536bp+0, -0x1.c24b6da113914p-3 * Scale}, + {0x1.24d108e0152e3p+0, -0x1.8d02ee397cb1dp-3 * Scale}, + {0x1.1facd8ab2fbe1p+0, -0x1.58ac1223408b3p-3 * Scale}, + {0x1.1ab614a03efdfp+0, -0x1.253e6fd190e89p-3 * Scale}, + {0x1.15ea6d03af9ffp+0, -0x1.e5641882c12ffp-4 * Scale}, + {0x1.1147b994bb776p+0, -0x1.81fea712926f7p-4 * Scale}, + {0x1.0ccbf650593aap+0, -0x1.203e240de64a3p-4 * Scale}, + {0x1.0875408477302p+0, -0x1.8029b86a78281p-5 * Scale}, + {0x1.0441d42a93328p+0, -0x1.85d713190fb9p-6 * Scale}, + {0x1p+0, 0x0p+0 * Scale}, + {0x1.f1d006c855e86p-1, 0x1.4c1cc07312997p-5 * Scale}, + {0x1.e28c3341aa301p-1, 0x1.5e1848ccec948p-4 * Scale}, + {0x1.d4bdf9aa64747p-1, 0x1.04cfcb7f1196fp-3 * Scale}, + {0x1.c7b45a24e5803p-1, 0x1.582813d463c21p-3 * Scale}, + {0x1.bb5f5eb2ed60ap-1, 0x1.a936fa68760ccp-3 * Scale}, + {0x1.afb0bff8fe6b4p-1, 0x1.f81bc31d6cc4ep-3 * Scale}, + {0x1.a49badf7ab1f5p-1, 0x1.2279a09fae6b1p-2 * Scale}, + {0x1.9a14a111fc4c9p-1, 0x1.47ec0b6df5526p-2 * Scale}, + {0x1.901131f5b2fdcp-1, 0x1.6c71762280f1p-2 * Scale}, + {0x1.8687f73f6d865p-1, 0x1.90155070798dap-2 * Scale}, + {0x1.7d7067eb77986p-1, 0x1.b2e23b1d3068cp-2 * Scale}, + {0x1.74c2c1cf97b65p-1, 0x1.d4e21b0daa86ap-2 * Scale}, + {0x1.6c77f37cff2a1p-1, 0x1.f61e2a2f67f3fp-2 * Scale},}, + .log2_poly = { /* rel err: 1.5 * 2^-30. */ + V2 (-0x1.6ff5daa3b3d7cp-2 * Scale), + V2 (0x1.ec81d03c01aebp-2 * Scale), + V2 (-0x1.71547bb43f101p-1 * Scale), + V2 (0x1.7154764a815cbp0 * Scale)}, + .exp2f_tab = {0x3ff0000000000000, 0x3fefd9b0d3158574, 0x3fefb5586cf9890f, + 0x3fef9301d0125b51, 0x3fef72b83c7d517b, 0x3fef54873168b9aa, + 0x3fef387a6e756238, 0x3fef1e9df51fdee1, 0x3fef06fe0a31b715, + 0x3feef1a7373aa9cb, 0x3feedea64c123422, 0x3feece086061892d, + 0x3feebfdad5362a27, 0x3feeb42b569d4f82, 0x3feeab07dd485429, + 0x3feea47eb03a5585, 0x3feea09e667f3bcd, 0x3fee9f75e8ec5f74, + 0x3feea11473eb0187, 0x3feea589994cce13, 0x3feeace5422aa0db, + 0x3feeb737b0cdc5e5, 0x3feec49182a3f090, 0x3feed503b23e255d, + 0x3feee89f995ad3ad, 0x3feeff76f2fb5e47, 0x3fef199bdd85529c, + 0x3fef3720dcef9069, 0x3fef5818dcfba487, 0x3fef7c97337b9b5f, + 0x3fefa4afa2a490da, 0x3fefd0765b6e4540,}, + .exp2f_poly = { /* rel err: 1.69 * 2^-34. */ + V2 (0x1.c6af84b912394p-5 / Scale / Scale / Scale), + V2 (0x1.ebfce50fac4f3p-3 / Scale / Scale), + V2 (0x1.62e42ff0c52d6p-1 / Scale)}}; + +static float32x4_t VPCS_ATTR NOINLINE +special_case (float32x4_t x, float32x4_t y, float32x4_t ret, uint32x4_t cmp) +{ + return v_call2_f32 (powf, x, y, ret, cmp); +} + +static inline float64x2_t +ylogx_core (const struct data *d, float64x2_t iz, float64x2_t k, + float64x2_t invc, float64x2_t logc, float64x2_t y) +{ + + /* log2(x) = log1p(z/c-1)/ln2 + log2(c) + k. */ + float64x2_t r = vfmaq_f64 (v_f64 (-1.0), iz, invc); + float64x2_t y0 = vaddq_f64 (logc, k); + + /* Polynomial to approximate log1p(r)/ln2. */ + float64x2_t logx = vfmaq_f64 (A[1], r, A[0]); + logx = vfmaq_f64 (A[2], logx, r); + logx = vfmaq_f64 (A[3], logx, r); + logx = vfmaq_f64 (y0, logx, r); + + return vmulq_f64 (logx, y); +} + +static inline float64x2_t +log2_lookup (const struct data *d, uint32_t i) +{ + return vld1q_f64 ( + &d->log2_tab[(i >> (23 - V_POWF_LOG2_TABLE_BITS)) & Log2IdxMask].invc); +} + +static inline uint64x1_t +exp2f_lookup (const struct data *d, uint64_t i) +{ + return vld1_u64 (&d->exp2f_tab[i % (1 << V_EXP2F_TABLE_BITS)]); +} + +static inline float32x2_t +powf_core (const struct data *d, float64x2_t ylogx) +{ + /* N*x = k + r with r in [-1/2, 1/2]. */ + float64x2_t kd = vrndnq_f64 (ylogx); + int64x2_t ki = vcvtaq_s64_f64 (ylogx); + float64x2_t r = vsubq_f64 (ylogx, kd); + + /* exp2(x) = 2^(k/N) * 2^r ~= s * (C0*r^3 + C1*r^2 + C2*r + 1). */ + uint64x2_t t = vcombine_u64 (exp2f_lookup (d, vgetq_lane_s64 (ki, 0)), + exp2f_lookup (d, vgetq_lane_s64 (ki, 1))); + t = vaddq_u64 ( + t, vreinterpretq_u64_s64 (vshlq_n_s64 (ki, 52 - V_EXP2F_TABLE_BITS))); + float64x2_t s = vreinterpretq_f64_u64 (t); + float64x2_t p = vfmaq_f64 (C[1], r, C[0]); + p = vfmaq_f64 (C[2], r, p); + p = vfmaq_f64 (s, p, vmulq_f64 (s, r)); + return vcvt_f32_f64 (p); +} + +float32x4_t VPCS_ATTR V_NAME_F2 (pow) (float32x4_t x, float32x4_t y) +{ + const struct data *d = ptr_barrier (&data); + uint32x4_t u = vreinterpretq_u32_f32 (x); + uint32x4_t cmp = vcgeq_u32 (vsubq_u32 (u, Min), Thresh); + uint32x4_t tmp = vsubq_u32 (u, Off); + uint32x4_t top = vbicq_u32 (tmp, MantissaMask); + float32x4_t iz = vreinterpretq_f32_u32 (vsubq_u32 (u, top)); + int32x4_t k = vshrq_n_s32 (vreinterpretq_s32_u32 (top), + 23 - V_EXP2F_TABLE_BITS); /* arithmetic shift. */ + + /* Use double precision for each lane: split input vectors into lo and hi + halves and promote. */ + float64x2_t tab0 = log2_lookup (d, vgetq_lane_u32 (tmp, 0)), + tab1 = log2_lookup (d, vgetq_lane_u32 (tmp, 1)), + tab2 = log2_lookup (d, vgetq_lane_u32 (tmp, 2)), + tab3 = log2_lookup (d, vgetq_lane_u32 (tmp, 3)); + + float64x2_t iz_lo = vcvt_f64_f32 (vget_low_f32 (iz)), + iz_hi = vcvt_high_f64_f32 (iz); + + float64x2_t k_lo = vcvtq_f64_s64 (vmovl_s32 (vget_low_s32 (k))), + k_hi = vcvtq_f64_s64 (vmovl_high_s32 (k)); + + float64x2_t invc_lo = vzip1q_f64 (tab0, tab1), + invc_hi = vzip1q_f64 (tab2, tab3), + logc_lo = vzip2q_f64 (tab0, tab1), + logc_hi = vzip2q_f64 (tab2, tab3); + + float64x2_t y_lo = vcvt_f64_f32 (vget_low_f32 (y)), + y_hi = vcvt_high_f64_f32 (y); + + float64x2_t ylogx_lo = ylogx_core (d, iz_lo, k_lo, invc_lo, logc_lo, y_lo); + float64x2_t ylogx_hi = ylogx_core (d, iz_hi, k_hi, invc_hi, logc_hi, y_hi); + + uint32x4_t ylogx_top = vuzp2q_u32 (vreinterpretq_u32_f64 (ylogx_lo), + vreinterpretq_u32_f64 (ylogx_hi)); + + cmp = vorrq_u32 ( + cmp, vcgeq_u32 (vandq_u32 (vshrq_n_u32 (ylogx_top, 15), v_u32 (0xffff)), + vdupq_n_u32 (asuint64 (126.0 * (1 << V_EXP2F_TABLE_BITS)) + >> 47))); + + float32x2_t p_lo = powf_core (d, ylogx_lo); + float32x2_t p_hi = powf_core (d, ylogx_hi); + + if (__glibc_unlikely (v_any_u32 (cmp))) + return special_case (x, y, vcombine_f32 (p_lo, p_hi), cmp); + return vcombine_f32 (p_lo, p_hi); +} +libmvec_hidden_def (V_NAME_F2 (pow)) +HALF_WIDTH_ALIAS_F2(pow) diff --git a/sysdeps/aarch64/fpu/powf_sve.c b/sysdeps/aarch64/fpu/powf_sve.c new file mode 100644 index 0000000000..13d21da58d --- /dev/null +++ b/sysdeps/aarch64/fpu/powf_sve.c @@ -0,0 +1,335 @@ +/* Single-precision vector (SVE) pow function + + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "../ieee754/flt-32/math_config.h" +#include "sv_math.h" + +/* The following data is used in the SVE pow core computation + and special case detection. */ +#define Tinvc __v_powf_data.invc +#define Tlogc __v_powf_data.logc +#define Texp __v_powf_data.scale +#define SignBias (1 << (V_POWF_EXP2_TABLE_BITS + 11)) +#define Shift 0x1.8p52 +#define Norm 0x1p23f /* 0x4b000000. */ + +/* Overall ULP error bound for pow is 2.6 ulp + ~ 0.5 + 2^24 (128*Ln2*relerr_log2 + relerr_exp2). */ +static const struct data +{ + double log_poly[4]; + double exp_poly[3]; + float uflow_bound, oflow_bound, small_bound; + uint32_t sign_bias, sign_mask, subnormal_bias, off; +} data = { + /* rel err: 1.5 * 2^-30. Each coefficients is multiplied the value of + V_POWF_EXP2_N. */ + .log_poly = { -0x1.6ff5daa3b3d7cp+3, 0x1.ec81d03c01aebp+3, + -0x1.71547bb43f101p+4, 0x1.7154764a815cbp+5 }, + /* rel err: 1.69 * 2^-34. */ + .exp_poly = { + 0x1.c6af84b912394p-20, /* A0 / V_POWF_EXP2_N^3. */ + 0x1.ebfce50fac4f3p-13, /* A1 / V_POWF_EXP2_N^2. */ + 0x1.62e42ff0c52d6p-6, /* A3 / V_POWF_EXP2_N. */ + }, + .uflow_bound = -0x1.2cp+12f, /* -150.0 * V_POWF_EXP2_N. */ + .oflow_bound = 0x1p+12f, /* 128.0 * V_POWF_EXP2_N. */ + .small_bound = 0x1p-126f, + .off = 0x3f35d000, + .sign_bias = SignBias, + .sign_mask = 0x80000000, + .subnormal_bias = 0x0b800000, /* 23 << 23. */ +}; + +#define A(i) sv_f64 (d->log_poly[i]) +#define C(i) sv_f64 (d->exp_poly[i]) + +/* Check if x is an integer. */ +static inline svbool_t +svisint (svbool_t pg, svfloat32_t x) +{ + return svcmpeq (pg, svrintz_z (pg, x), x); +} + +/* Check if x is real not integer valued. */ +static inline svbool_t +svisnotint (svbool_t pg, svfloat32_t x) +{ + return svcmpne (pg, svrintz_z (pg, x), x); +} + +/* Check if x is an odd integer. */ +static inline svbool_t +svisodd (svbool_t pg, svfloat32_t x) +{ + svfloat32_t y = svmul_x (pg, x, 0.5f); + return svisnotint (pg, y); +} + +/* Check if zero, inf or nan. */ +static inline svbool_t +sv_zeroinfnan (svbool_t pg, svuint32_t i) +{ + return svcmpge (pg, svsub_x (pg, svmul_x (pg, i, 2u), 1), + 2u * 0x7f800000 - 1); +} + +/* Returns 0 if not int, 1 if odd int, 2 if even int. The argument is + the bit representation of a non-zero finite floating-point value. */ +static inline int +checkint (uint32_t iy) +{ + int e = iy >> 23 & 0xff; + if (e < 0x7f) + return 0; + if (e > 0x7f + 23) + return 2; + if (iy & ((1 << (0x7f + 23 - e)) - 1)) + return 0; + if (iy & (1 << (0x7f + 23 - e))) + return 1; + return 2; +} + +/* Check if zero, inf or nan. */ +static inline int +zeroinfnan (uint32_t ix) +{ + return 2 * ix - 1 >= 2u * 0x7f800000 - 1; +} + +/* A scalar subroutine used to fix main power special cases. Similar to the + preamble of finite_powf except that we do not update ix and sign_bias. This + is done in the preamble of the SVE powf. */ +static inline float +powf_specialcase (float x, float y, float z) +{ + uint32_t ix = asuint (x); + uint32_t iy = asuint (y); + /* Either (x < 0x1p-126 or inf or nan) or (y is 0 or inf or nan). */ + if (__glibc_unlikely (zeroinfnan (iy))) + { + if (2 * iy == 0) + return issignalingf_inline (x) ? x + y : 1.0f; + if (ix == 0x3f800000) + return issignalingf_inline (y) ? x + y : 1.0f; + if (2 * ix > 2u * 0x7f800000 || 2 * iy > 2u * 0x7f800000) + return x + y; + if (2 * ix == 2 * 0x3f800000) + return 1.0f; + if ((2 * ix < 2 * 0x3f800000) == !(iy & 0x80000000)) + return 0.0f; /* |x|<1 && y==inf or |x|>1 && y==-inf. */ + return y * y; + } + if (__glibc_unlikely (zeroinfnan (ix))) + { + float_t x2 = x * x; + if (ix & 0x80000000 && checkint (iy) == 1) + x2 = -x2; + return iy & 0x80000000 ? 1 / x2 : x2; + } + /* We need a return here in case x<0 and y is integer, but all other tests + need to be run. */ + return z; +} + +/* Scalar fallback for special case routines with custom signature. */ +static inline svfloat32_t +sv_call_powf_sc (svfloat32_t x1, svfloat32_t x2, svfloat32_t y, svbool_t cmp) +{ + svbool_t p = svpfirst (cmp, svpfalse ()); + while (svptest_any (cmp, p)) + { + float sx1 = svclastb (p, 0, x1); + float sx2 = svclastb (p, 0, x2); + float elem = svclastb (p, 0, y); + elem = powf_specialcase (sx1, sx2, elem); + svfloat32_t y2 = sv_f32 (elem); + y = svsel (p, y2, y); + p = svpnext_b32 (cmp, p); + } + return y; +} + +/* Compute core for half of the lanes in double precision. */ +static inline svfloat64_t +sv_powf_core_ext (const svbool_t pg, svuint64_t i, svfloat64_t z, svint64_t k, + svfloat64_t y, svuint64_t sign_bias, svfloat64_t *pylogx, + const struct data *d) +{ + svfloat64_t invc = svld1_gather_index (pg, Tinvc, i); + svfloat64_t logc = svld1_gather_index (pg, Tlogc, i); + + /* log2(x) = log1p(z/c-1)/ln2 + log2(c) + k. */ + svfloat64_t r = svmla_x (pg, sv_f64 (-1.0), z, invc); + svfloat64_t y0 = svadd_x (pg, logc, svcvt_f64_x (pg, k)); + + /* Polynomial to approximate log1p(r)/ln2. */ + svfloat64_t logx = A (0); + logx = svmla_x (pg, A (1), r, logx); + logx = svmla_x (pg, A (2), r, logx); + logx = svmla_x (pg, A (3), r, logx); + logx = svmla_x (pg, y0, r, logx); + *pylogx = svmul_x (pg, y, logx); + + /* z - kd is in [-1, 1] in non-nearest rounding modes. */ + svfloat64_t kd = svadd_x (pg, *pylogx, Shift); + svuint64_t ki = svreinterpret_u64 (kd); + kd = svsub_x (pg, kd, Shift); + + r = svsub_x (pg, *pylogx, kd); + + /* exp2(x) = 2^(k/N) * 2^r ~= s * (C0*r^3 + C1*r^2 + C2*r + 1). */ + svuint64_t t + = svld1_gather_index (pg, Texp, svand_x (pg, ki, V_POWF_EXP2_N - 1)); + svuint64_t ski = svadd_x (pg, ki, sign_bias); + t = svadd_x (pg, t, svlsl_x (pg, ski, 52 - V_POWF_EXP2_TABLE_BITS)); + svfloat64_t s = svreinterpret_f64 (t); + + svfloat64_t p = C (0); + p = svmla_x (pg, C (1), p, r); + p = svmla_x (pg, C (2), p, r); + p = svmla_x (pg, s, p, svmul_x (pg, s, r)); + + return p; +} + +/* Widen vector to double precision and compute core on both halves of the + vector. Lower cost of promotion by considering all lanes active. */ +static inline svfloat32_t +sv_powf_core (const svbool_t pg, svuint32_t i, svuint32_t iz, svint32_t k, + svfloat32_t y, svuint32_t sign_bias, svfloat32_t *pylogx, + const struct data *d) +{ + const svbool_t ptrue = svptrue_b64 (); + + /* Unpack and promote input vectors (pg, y, z, i, k and sign_bias) into two in + order to perform core computation in double precision. */ + const svbool_t pg_lo = svunpklo (pg); + const svbool_t pg_hi = svunpkhi (pg); + svfloat64_t y_lo = svcvt_f64_x ( + ptrue, svreinterpret_f32 (svunpklo (svreinterpret_u32 (y)))); + svfloat64_t y_hi = svcvt_f64_x ( + ptrue, svreinterpret_f32 (svunpkhi (svreinterpret_u32 (y)))); + svfloat32_t z = svreinterpret_f32 (iz); + svfloat64_t z_lo = svcvt_f64_x ( + ptrue, svreinterpret_f32 (svunpklo (svreinterpret_u32 (z)))); + svfloat64_t z_hi = svcvt_f64_x ( + ptrue, svreinterpret_f32 (svunpkhi (svreinterpret_u32 (z)))); + svuint64_t i_lo = svunpklo (i); + svuint64_t i_hi = svunpkhi (i); + svint64_t k_lo = svunpklo (k); + svint64_t k_hi = svunpkhi (k); + svuint64_t sign_bias_lo = svunpklo (sign_bias); + svuint64_t sign_bias_hi = svunpkhi (sign_bias); + + /* Compute each part in double precision. */ + svfloat64_t ylogx_lo, ylogx_hi; + svfloat64_t lo = sv_powf_core_ext (pg_lo, i_lo, z_lo, k_lo, y_lo, + sign_bias_lo, &ylogx_lo, d); + svfloat64_t hi = sv_powf_core_ext (pg_hi, i_hi, z_hi, k_hi, y_hi, + sign_bias_hi, &ylogx_hi, d); + + /* Convert back to single-precision and interleave. */ + svfloat32_t ylogx_lo_32 = svcvt_f32_x (ptrue, ylogx_lo); + svfloat32_t ylogx_hi_32 = svcvt_f32_x (ptrue, ylogx_hi); + *pylogx = svuzp1 (ylogx_lo_32, ylogx_hi_32); + svfloat32_t lo_32 = svcvt_f32_x (ptrue, lo); + svfloat32_t hi_32 = svcvt_f32_x (ptrue, hi); + return svuzp1 (lo_32, hi_32); +} + +/* Implementation of SVE powf. + Provides the same accuracy as AdvSIMD powf, since it relies on the same + algorithm. The theoretical maximum error is under 2.60 ULPs. + Maximum measured error is 2.56 ULPs: + SV_NAME_F2 (pow) (0x1.004118p+0, 0x1.5d14a4p+16) got 0x1.fd4bp+127 + want 0x1.fd4b06p+127. */ +svfloat32_t SV_NAME_F2 (pow) (svfloat32_t x, svfloat32_t y, const svbool_t pg) +{ + const struct data *d = ptr_barrier (&data); + + svuint32_t vix0 = svreinterpret_u32 (x); + svuint32_t viy0 = svreinterpret_u32 (y); + + /* Negative x cases. */ + svuint32_t sign_bit = svand_m (pg, vix0, d->sign_mask); + svbool_t xisneg = svcmpeq (pg, sign_bit, d->sign_mask); + + /* Set sign_bias and ix depending on sign of x and nature of y. */ + svbool_t yisnotint_xisneg = svpfalse_b (); + svuint32_t sign_bias = sv_u32 (0); + svuint32_t vix = vix0; + if (__glibc_unlikely (svptest_any (pg, xisneg))) + { + /* Determine nature of y. */ + yisnotint_xisneg = svisnotint (xisneg, y); + svbool_t yisint_xisneg = svisint (xisneg, y); + svbool_t yisodd_xisneg = svisodd (xisneg, y); + /* ix set to abs(ix) if y is integer. */ + vix = svand_m (yisint_xisneg, vix0, 0x7fffffff); + /* Set to SignBias if x is negative and y is odd. */ + sign_bias = svsel (yisodd_xisneg, sv_u32 (d->sign_bias), sv_u32 (0)); + } + + /* Special cases of x or y: zero, inf and nan. */ + svbool_t xspecial = sv_zeroinfnan (pg, vix0); + svbool_t yspecial = sv_zeroinfnan (pg, viy0); + svbool_t cmp = svorr_z (pg, xspecial, yspecial); + + /* Small cases of x: |x| < 0x1p-126. */ + svbool_t xsmall = svaclt (pg, x, d->small_bound); + if (__glibc_unlikely (svptest_any (pg, xsmall))) + { + /* Normalize subnormal x so exponent becomes negative. */ + svuint32_t vix_norm = svreinterpret_u32 (svmul_x (xsmall, x, Norm)); + vix_norm = svand_x (xsmall, vix_norm, 0x7fffffff); + vix_norm = svsub_x (xsmall, vix_norm, d->subnormal_bias); + vix = svsel (xsmall, vix_norm, vix); + } + /* Part of core computation carried in working precision. */ + svuint32_t tmp = svsub_x (pg, vix, d->off); + svuint32_t i = svand_x (pg, svlsr_x (pg, tmp, (23 - V_POWF_LOG2_TABLE_BITS)), + V_POWF_LOG2_N - 1); + svuint32_t top = svand_x (pg, tmp, 0xff800000); + svuint32_t iz = svsub_x (pg, vix, top); + svint32_t k + = svasr_x (pg, svreinterpret_s32 (top), (23 - V_POWF_EXP2_TABLE_BITS)); + + /* Compute core in extended precision and return intermediate ylogx results to + handle cases of underflow and underflow in exp. */ + svfloat32_t ylogx; + svfloat32_t ret = sv_powf_core (pg, i, iz, k, y, sign_bias, &ylogx, d); + + /* Handle exp special cases of underflow and overflow. */ + svuint32_t sign = svlsl_x (pg, sign_bias, 20 - V_POWF_EXP2_TABLE_BITS); + svfloat32_t ret_oflow + = svreinterpret_f32 (svorr_x (pg, sign, asuint (INFINITY))); + svfloat32_t ret_uflow = svreinterpret_f32 (sign); + ret = svsel (svcmple (pg, ylogx, d->uflow_bound), ret_uflow, ret); + ret = svsel (svcmpgt (pg, ylogx, d->oflow_bound), ret_oflow, ret); + + /* Cases of finite y and finite negative x. */ + ret = svsel (yisnotint_xisneg, sv_f32 (__builtin_nanf ("")), ret); + + if (__glibc_unlikely (svptest_any (pg, cmp))) + return sv_call_powf_sc (x, y, ret, cmp); + + return ret; +} diff --git a/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c b/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c index 1877db3ac6..8c98161662 100644 --- a/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c +++ b/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c @@ -44,6 +44,7 @@ VPCS_VECTOR_WRAPPER (log_advsimd, _ZGVnN2v_log) VPCS_VECTOR_WRAPPER (log10_advsimd, _ZGVnN2v_log10) VPCS_VECTOR_WRAPPER (log1p_advsimd, _ZGVnN2v_log1p) VPCS_VECTOR_WRAPPER (log2_advsimd, _ZGVnN2v_log2) +VPCS_VECTOR_WRAPPER_ff (pow_advsimd, _ZGVnN2vv_pow) VPCS_VECTOR_WRAPPER (sin_advsimd, _ZGVnN2v_sin) VPCS_VECTOR_WRAPPER (sinh_advsimd, _ZGVnN2v_sinh) VPCS_VECTOR_WRAPPER (tan_advsimd, _ZGVnN2v_tan) diff --git a/sysdeps/aarch64/fpu/test-double-sve-wrappers.c b/sysdeps/aarch64/fpu/test-double-sve-wrappers.c index b702f942de..2583428af5 100644 --- a/sysdeps/aarch64/fpu/test-double-sve-wrappers.c +++ b/sysdeps/aarch64/fpu/test-double-sve-wrappers.c @@ -63,6 +63,7 @@ SVE_VECTOR_WRAPPER (log_sve, _ZGVsMxv_log) SVE_VECTOR_WRAPPER (log10_sve, _ZGVsMxv_log10) SVE_VECTOR_WRAPPER (log1p_sve, _ZGVsMxv_log1p) SVE_VECTOR_WRAPPER (log2_sve, _ZGVsMxv_log2) +SVE_VECTOR_WRAPPER_ff (pow_sve, _ZGVsMxvv_pow) SVE_VECTOR_WRAPPER (sin_sve, _ZGVsMxv_sin) SVE_VECTOR_WRAPPER (sinh_sve, _ZGVsMxv_sinh) SVE_VECTOR_WRAPPER (tan_sve, _ZGVsMxv_tan) diff --git a/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c b/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c index 9cb451b4f0..26679018d6 100644 --- a/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c +++ b/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c @@ -44,6 +44,7 @@ VPCS_VECTOR_WRAPPER (logf_advsimd, _ZGVnN4v_logf) VPCS_VECTOR_WRAPPER (log10f_advsimd, _ZGVnN4v_log10f) VPCS_VECTOR_WRAPPER (log1pf_advsimd, _ZGVnN4v_log1pf) VPCS_VECTOR_WRAPPER (log2f_advsimd, _ZGVnN4v_log2f) +VPCS_VECTOR_WRAPPER_ff (powf_advsimd, _ZGVnN4vv_powf) VPCS_VECTOR_WRAPPER (sinf_advsimd, _ZGVnN4v_sinf) VPCS_VECTOR_WRAPPER (sinhf_advsimd, _ZGVnN4v_sinhf) VPCS_VECTOR_WRAPPER (tanf_advsimd, _ZGVnN4v_tanf) diff --git a/sysdeps/aarch64/fpu/test-float-sve-wrappers.c b/sysdeps/aarch64/fpu/test-float-sve-wrappers.c index 5b3dd22916..0f972b7886 100644 --- a/sysdeps/aarch64/fpu/test-float-sve-wrappers.c +++ b/sysdeps/aarch64/fpu/test-float-sve-wrappers.c @@ -63,6 +63,7 @@ SVE_VECTOR_WRAPPER (logf_sve, _ZGVsMxv_logf) SVE_VECTOR_WRAPPER (log10f_sve, _ZGVsMxv_log10f) SVE_VECTOR_WRAPPER (log1pf_sve, _ZGVsMxv_log1pf) SVE_VECTOR_WRAPPER (log2f_sve, _ZGVsMxv_log2f) +SVE_VECTOR_WRAPPER_ff (powf_sve, _ZGVsMxvv_powf) SVE_VECTOR_WRAPPER (sinf_sve, _ZGVsMxv_sinf) SVE_VECTOR_WRAPPER (sinhf_sve, _ZGVsMxv_sinhf) SVE_VECTOR_WRAPPER (tanf_sve, _ZGVsMxv_tanf) diff --git a/sysdeps/aarch64/fpu/v_pow_exp_data.c b/sysdeps/aarch64/fpu/v_pow_exp_data.c new file mode 100644 index 0000000000..8b7fb83668 --- /dev/null +++ b/sysdeps/aarch64/fpu/v_pow_exp_data.c @@ -0,0 +1,301 @@ +/* Shared data between exp, exp2 and pow. + + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "vecmath_config.h" + +#define N (1 << V_POW_EXP_TABLE_BITS) + +const struct v_pow_exp_data __v_pow_exp_data = { +// exp polynomial coefficients. +.poly = { +// abs error: 1.43*2^-58 +// ulp error: 0.549 (0.550 without fma) +// if |x| < ln2/512 +0x1.fffffffffffd4p-2, +0x1.5555571d6ef9p-3, +0x1.5555576a5adcep-5, +}, +// N/ln2 +.n_over_ln2 = 0x1.71547652b82fep0 * N, +// ln2/N +.ln2_over_n_hi = 0x1.62e42fefc0000p-9, +.ln2_over_n_lo = -0x1.c610ca86c3899p-45, +// Used for rounding to nearest integer without using intrinsics. +.shift = 0x1.8p52, +// 2^(k/N) ~= H[k]*(1 + T[k]) for int k in [0,N) +// sbits[k] = asuint64(H[k]) - (k << 52)/N +.sbits = { +0x3ff0000000000000, +0x3feffb1afa5abcbf, +0x3feff63da9fb3335, +0x3feff168143b0281, +0x3fefec9a3e778061, +0x3fefe7d42e11bbcc, +0x3fefe315e86e7f85, +0x3fefde5f72f654b1, +0x3fefd9b0d3158574, +0x3fefd50a0e3c1f89, +0x3fefd06b29ddf6de, +0x3fefcbd42b72a836, +0x3fefc74518759bc8, +0x3fefc2bdf66607e0, +0x3fefbe3ecac6f383, +0x3fefb9c79b1f3919, +0x3fefb5586cf9890f, +0x3fefb0f145e46c85, +0x3fefac922b7247f7, +0x3fefa83b23395dec, +0x3fefa3ec32d3d1a2, +0x3fef9fa55fdfa9c5, +0x3fef9b66affed31b, +0x3fef973028d7233e, +0x3fef9301d0125b51, +0x3fef8edbab5e2ab6, +0x3fef8abdc06c31cc, +0x3fef86a814f204ab, +0x3fef829aaea92de0, +0x3fef7e95934f312e, +0x3fef7a98c8a58e51, +0x3fef76a45471c3c2, +0x3fef72b83c7d517b, +0x3fef6ed48695bbc0, +0x3fef6af9388c8dea, +0x3fef672658375d2f, +0x3fef635beb6fcb75, +0x3fef5f99f8138a1c, +0x3fef5be084045cd4, +0x3fef582f95281c6b, +0x3fef54873168b9aa, +0x3fef50e75eb44027, +0x3fef4d5022fcd91d, +0x3fef49c18438ce4d, +0x3fef463b88628cd6, +0x3fef42be3578a819, +0x3fef3f49917ddc96, +0x3fef3bdda27912d1, +0x3fef387a6e756238, +0x3fef351ffb82140a, +0x3fef31ce4fb2a63f, +0x3fef2e85711ece75, +0x3fef2b4565e27cdd, +0x3fef280e341ddf29, +0x3fef24dfe1f56381, +0x3fef21ba7591bb70, +0x3fef1e9df51fdee1, +0x3fef1b8a66d10f13, +0x3fef187fd0dad990, +0x3fef157e39771b2f, +0x3fef1285a6e4030b, +0x3fef0f961f641589, +0x3fef0cafa93e2f56, +0x3fef09d24abd886b, +0x3fef06fe0a31b715, +0x3fef0432edeeb2fd, +0x3fef0170fc4cd831, +0x3feefeb83ba8ea32, +0x3feefc08b26416ff, +0x3feef96266e3fa2d, +0x3feef6c55f929ff1, +0x3feef431a2de883b, +0x3feef1a7373aa9cb, +0x3feeef26231e754a, +0x3feeecae6d05d866, +0x3feeea401b7140ef, +0x3feee7db34e59ff7, +0x3feee57fbfec6cf4, +0x3feee32dc313a8e5, +0x3feee0e544ede173, +0x3feedea64c123422, +0x3feedc70df1c5175, +0x3feeda4504ac801c, +0x3feed822c367a024, +0x3feed60a21f72e2a, +0x3feed3fb2709468a, +0x3feed1f5d950a897, +0x3feecffa3f84b9d4, +0x3feece086061892d, +0x3feecc2042a7d232, +0x3feeca41ed1d0057, +0x3feec86d668b3237, +0x3feec6a2b5c13cd0, +0x3feec4e1e192aed2, +0x3feec32af0d7d3de, +0x3feec17dea6db7d7, +0x3feebfdad5362a27, +0x3feebe41b817c114, +0x3feebcb299fddd0d, +0x3feebb2d81d8abff, +0x3feeb9b2769d2ca7, +0x3feeb8417f4531ee, +0x3feeb6daa2cf6642, +0x3feeb57de83f4eef, +0x3feeb42b569d4f82, +0x3feeb2e2f4f6ad27, +0x3feeb1a4ca5d920f, +0x3feeb070dde910d2, +0x3feeaf4736b527da, +0x3feeae27dbe2c4cf, +0x3feead12d497c7fd, +0x3feeac0827ff07cc, +0x3feeab07dd485429, +0x3feeaa11fba87a03, +0x3feea9268a5946b7, +0x3feea84590998b93, +0x3feea76f15ad2148, +0x3feea6a320dceb71, +0x3feea5e1b976dc09, +0x3feea52ae6cdf6f4, +0x3feea47eb03a5585, +0x3feea3dd1d1929fd, +0x3feea34634ccc320, +0x3feea2b9febc8fb7, +0x3feea23882552225, +0x3feea1c1c70833f6, +0x3feea155d44ca973, +0x3feea0f4b19e9538, +0x3feea09e667f3bcd, +0x3feea052fa75173e, +0x3feea012750bdabf, +0x3fee9fdcddd47645, +0x3fee9fb23c651a2f, +0x3fee9f9298593ae5, +0x3fee9f7df9519484, +0x3fee9f7466f42e87, +0x3fee9f75e8ec5f74, +0x3fee9f8286ead08a, +0x3fee9f9a48a58174, +0x3fee9fbd35d7cbfd, +0x3fee9feb564267c9, +0x3feea024b1ab6e09, +0x3feea0694fde5d3f, +0x3feea0b938ac1cf6, +0x3feea11473eb0187, +0x3feea17b0976cfdb, +0x3feea1ed0130c132, +0x3feea26a62ff86f0, +0x3feea2f336cf4e62, +0x3feea3878491c491, +0x3feea427543e1a12, +0x3feea4d2add106d9, +0x3feea589994cce13, +0x3feea64c1eb941f7, +0x3feea71a4623c7ad, +0x3feea7f4179f5b21, +0x3feea8d99b4492ed, +0x3feea9cad931a436, +0x3feeaac7d98a6699, +0x3feeabd0a478580f, +0x3feeace5422aa0db, +0x3feeae05bad61778, +0x3feeaf3216b5448c, +0x3feeb06a5e0866d9, +0x3feeb1ae99157736, +0x3feeb2fed0282c8a, +0x3feeb45b0b91ffc6, +0x3feeb5c353aa2fe2, +0x3feeb737b0cdc5e5, +0x3feeb8b82b5f98e5, +0x3feeba44cbc8520f, +0x3feebbdd9a7670b3, +0x3feebd829fde4e50, +0x3feebf33e47a22a2, +0x3feec0f170ca07ba, +0x3feec2bb4d53fe0d, +0x3feec49182a3f090, +0x3feec674194bb8d5, +0x3feec86319e32323, +0x3feeca5e8d07f29e, +0x3feecc667b5de565, +0x3feece7aed8eb8bb, +0x3feed09bec4a2d33, +0x3feed2c980460ad8, +0x3feed503b23e255d, +0x3feed74a8af46052, +0x3feed99e1330b358, +0x3feedbfe53c12e59, +0x3feede6b5579fdbf, +0x3feee0e521356eba, +0x3feee36bbfd3f37a, +0x3feee5ff3a3c2774, +0x3feee89f995ad3ad, +0x3feeeb4ce622f2ff, +0x3feeee07298db666, +0x3feef0ce6c9a8952, +0x3feef3a2b84f15fb, +0x3feef68415b749b1, +0x3feef9728de5593a, +0x3feefc6e29f1c52a, +0x3feeff76f2fb5e47, +0x3fef028cf22749e4, +0x3fef05b030a1064a, +0x3fef08e0b79a6f1f, +0x3fef0c1e904bc1d2, +0x3fef0f69c3f3a207, +0x3fef12c25bd71e09, +0x3fef16286141b33d, +0x3fef199bdd85529c, +0x3fef1d1cd9fa652c, +0x3fef20ab5fffd07a, +0x3fef244778fafb22, +0x3fef27f12e57d14b, +0x3fef2ba88988c933, +0x3fef2f6d9406e7b5, +0x3fef33405751c4db, +0x3fef3720dcef9069, +0x3fef3b0f2e6d1675, +0x3fef3f0b555dc3fa, +0x3fef43155b5bab74, +0x3fef472d4a07897c, +0x3fef4b532b08c968, +0x3fef4f87080d89f2, +0x3fef53c8eacaa1d6, +0x3fef5818dcfba487, +0x3fef5c76e862e6d3, +0x3fef60e316c98398, +0x3fef655d71ff6075, +0x3fef69e603db3285, +0x3fef6e7cd63a8315, +0x3fef7321f301b460, +0x3fef77d5641c0658, +0x3fef7c97337b9b5f, +0x3fef81676b197d17, +0x3fef864614f5a129, +0x3fef8b333b16ee12, +0x3fef902ee78b3ff6, +0x3fef953924676d76, +0x3fef9a51fbc74c83, +0x3fef9f7977cdb740, +0x3fefa4afa2a490da, +0x3fefa9f4867cca6e, +0x3fefaf482d8e67f1, +0x3fefb4aaa2188510, +0x3fefba1bee615a27, +0x3fefbf9c1cb6412a, +0x3fefc52b376bba97, +0x3fefcac948dd7274, +0x3fefd0765b6e4540, +0x3fefd632798844f8, +0x3fefdbfdad9cbe14, +0x3fefe1d802243c89, +0x3fefe7c1819e90d8, +0x3fefedba3692d514, +0x3feff3c22b8f71f1, +0x3feff9d96b2a23d9, +}, +}; diff --git a/sysdeps/aarch64/fpu/v_pow_log_data.c b/sysdeps/aarch64/fpu/v_pow_log_data.c new file mode 100644 index 0000000000..0242fff477 --- /dev/null +++ b/sysdeps/aarch64/fpu/v_pow_log_data.c @@ -0,0 +1,186 @@ +/* Data for the log part of pow. + + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "vecmath_config.h" + +#define N (1 << V_POW_LOG_TABLE_BITS) + +/* Algorithm: + + x = 2^k z + log(x) = k ln2 + log(c) + log(z/c) + log(z/c) = poly(z/c - 1) + + where z is in [0x1.69555p-1; 0x1.69555p0] which is split into N subintervals + and z falls into the ith one, then table entries are computed as + + tab[i].invc = 1/c + tab[i].logc = round(0x1p43*log(c))/0x1p43 + tab[i].logctail = (double)(log(c) - logc) + + where c is chosen near the center of the subinterval such that 1/c has only + a few precision bits so z/c - 1 is exactly representible as double: + + 1/c = center < 1 ? round(N/center)/N : round(2*N/center)/N/2 + + Note: |z/c - 1| < 1/N for the chosen c, |log(c) - logc - logctail| < + 0x1p-97, the last few bits of logc are rounded away so k*ln2hi + logc has no + rounding error and the interval for z is selected such that near x == 1, + where log(x) + is tiny, large cancellation error is avoided in logc + poly(z/c - 1). */ +const struct v_pow_log_data __v_pow_log_data = { + /* relative error: 0x1.11922ap-70 in [-0x1.6bp-8, 0x1.6bp-8] + Coefficients are scaled to match the scaling during evaluation. */ + .poly = { -0x1p-1, -0x1.555555555556p-1, 0x1.0000000000006p-1, + 0x1.999999959554ep-1, -0x1.555555529a47ap-1, -0x1.2495b9b4845e9p0, + 0x1.0002b8b263fc3p0, }, + .ln2_hi = 0x1.62e42fefa3800p-1, + .ln2_lo = 0x1.ef35793c76730p-45, + .invc = { 0x1.6a00000000000p+0, 0x1.6800000000000p+0, 0x1.6600000000000p+0, + 0x1.6400000000000p+0, 0x1.6200000000000p+0, 0x1.6000000000000p+0, + 0x1.5e00000000000p+0, 0x1.5c00000000000p+0, 0x1.5a00000000000p+0, + 0x1.5800000000000p+0, 0x1.5600000000000p+0, 0x1.5600000000000p+0, + 0x1.5400000000000p+0, 0x1.5200000000000p+0, 0x1.5000000000000p+0, + 0x1.4e00000000000p+0, 0x1.4c00000000000p+0, 0x1.4a00000000000p+0, + 0x1.4a00000000000p+0, 0x1.4800000000000p+0, 0x1.4600000000000p+0, + 0x1.4400000000000p+0, 0x1.4200000000000p+0, 0x1.4000000000000p+0, + 0x1.4000000000000p+0, 0x1.3e00000000000p+0, 0x1.3c00000000000p+0, + 0x1.3a00000000000p+0, 0x1.3a00000000000p+0, 0x1.3800000000000p+0, + 0x1.3600000000000p+0, 0x1.3400000000000p+0, 0x1.3400000000000p+0, + 0x1.3200000000000p+0, 0x1.3000000000000p+0, 0x1.3000000000000p+0, + 0x1.2e00000000000p+0, 0x1.2c00000000000p+0, 0x1.2c00000000000p+0, + 0x1.2a00000000000p+0, 0x1.2800000000000p+0, 0x1.2600000000000p+0, + 0x1.2600000000000p+0, 0x1.2400000000000p+0, 0x1.2400000000000p+0, + 0x1.2200000000000p+0, 0x1.2000000000000p+0, 0x1.2000000000000p+0, + 0x1.1e00000000000p+0, 0x1.1c00000000000p+0, 0x1.1c00000000000p+0, + 0x1.1a00000000000p+0, 0x1.1a00000000000p+0, 0x1.1800000000000p+0, + 0x1.1600000000000p+0, 0x1.1600000000000p+0, 0x1.1400000000000p+0, + 0x1.1400000000000p+0, 0x1.1200000000000p+0, 0x1.1000000000000p+0, + 0x1.1000000000000p+0, 0x1.0e00000000000p+0, 0x1.0e00000000000p+0, + 0x1.0c00000000000p+0, 0x1.0c00000000000p+0, 0x1.0a00000000000p+0, + 0x1.0a00000000000p+0, 0x1.0800000000000p+0, 0x1.0800000000000p+0, + 0x1.0600000000000p+0, 0x1.0400000000000p+0, 0x1.0400000000000p+0, + 0x1.0200000000000p+0, 0x1.0200000000000p+0, 0x1.0000000000000p+0, + 0x1.0000000000000p+0, 0x1.fc00000000000p-1, 0x1.f800000000000p-1, + 0x1.f400000000000p-1, 0x1.f000000000000p-1, 0x1.ec00000000000p-1, + 0x1.e800000000000p-1, 0x1.e400000000000p-1, 0x1.e200000000000p-1, + 0x1.de00000000000p-1, 0x1.da00000000000p-1, 0x1.d600000000000p-1, + 0x1.d400000000000p-1, 0x1.d000000000000p-1, 0x1.cc00000000000p-1, + 0x1.ca00000000000p-1, 0x1.c600000000000p-1, 0x1.c400000000000p-1, + 0x1.c000000000000p-1, 0x1.be00000000000p-1, 0x1.ba00000000000p-1, + 0x1.b800000000000p-1, 0x1.b400000000000p-1, 0x1.b200000000000p-1, + 0x1.ae00000000000p-1, 0x1.ac00000000000p-1, 0x1.aa00000000000p-1, + 0x1.a600000000000p-1, 0x1.a400000000000p-1, 0x1.a000000000000p-1, + 0x1.9e00000000000p-1, 0x1.9c00000000000p-1, 0x1.9a00000000000p-1, + 0x1.9600000000000p-1, 0x1.9400000000000p-1, 0x1.9200000000000p-1, + 0x1.9000000000000p-1, 0x1.8c00000000000p-1, 0x1.8a00000000000p-1, + 0x1.8800000000000p-1, 0x1.8600000000000p-1, 0x1.8400000000000p-1, + 0x1.8200000000000p-1, 0x1.7e00000000000p-1, 0x1.7c00000000000p-1, + 0x1.7a00000000000p-1, 0x1.7800000000000p-1, 0x1.7600000000000p-1, + 0x1.7400000000000p-1, 0x1.7200000000000p-1, 0x1.7000000000000p-1, + 0x1.6e00000000000p-1, 0x1.6c00000000000p-1, }, + .logc + = { -0x1.62c82f2b9c800p-2, -0x1.5d1bdbf580800p-2, -0x1.5767717455800p-2, + -0x1.51aad872df800p-2, -0x1.4be5f95777800p-2, -0x1.4618bc21c6000p-2, + -0x1.404308686a800p-2, -0x1.3a64c55694800p-2, -0x1.347dd9a988000p-2, + -0x1.2e8e2bae12000p-2, -0x1.2895a13de8800p-2, -0x1.2895a13de8800p-2, + -0x1.22941fbcf7800p-2, -0x1.1c898c1699800p-2, -0x1.1675cababa800p-2, + -0x1.1058bf9ae4800p-2, -0x1.0a324e2739000p-2, -0x1.0402594b4d000p-2, + -0x1.0402594b4d000p-2, -0x1.fb9186d5e4000p-3, -0x1.ef0adcbdc6000p-3, + -0x1.e27076e2af000p-3, -0x1.d5c216b4fc000p-3, -0x1.c8ff7c79aa000p-3, + -0x1.c8ff7c79aa000p-3, -0x1.bc286742d9000p-3, -0x1.af3c94e80c000p-3, + -0x1.a23bc1fe2b000p-3, -0x1.a23bc1fe2b000p-3, -0x1.9525a9cf45000p-3, + -0x1.87fa06520d000p-3, -0x1.7ab890210e000p-3, -0x1.7ab890210e000p-3, + -0x1.6d60fe719d000p-3, -0x1.5ff3070a79000p-3, -0x1.5ff3070a79000p-3, + -0x1.526e5e3a1b000p-3, -0x1.44d2b6ccb8000p-3, -0x1.44d2b6ccb8000p-3, + -0x1.371fc201e9000p-3, -0x1.29552f81ff000p-3, -0x1.1b72ad52f6000p-3, + -0x1.1b72ad52f6000p-3, -0x1.0d77e7cd09000p-3, -0x1.0d77e7cd09000p-3, + -0x1.fec9131dbe000p-4, -0x1.e27076e2b0000p-4, -0x1.e27076e2b0000p-4, + -0x1.c5e548f5bc000p-4, -0x1.a926d3a4ae000p-4, -0x1.a926d3a4ae000p-4, + -0x1.8c345d631a000p-4, -0x1.8c345d631a000p-4, -0x1.6f0d28ae56000p-4, + -0x1.51b073f062000p-4, -0x1.51b073f062000p-4, -0x1.341d7961be000p-4, + -0x1.341d7961be000p-4, -0x1.16536eea38000p-4, -0x1.f0a30c0118000p-5, + -0x1.f0a30c0118000p-5, -0x1.b42dd71198000p-5, -0x1.b42dd71198000p-5, + -0x1.77458f632c000p-5, -0x1.77458f632c000p-5, -0x1.39e87b9fec000p-5, + -0x1.39e87b9fec000p-5, -0x1.f829b0e780000p-6, -0x1.f829b0e780000p-6, + -0x1.7b91b07d58000p-6, -0x1.fc0a8b0fc0000p-7, -0x1.fc0a8b0fc0000p-7, + -0x1.fe02a6b100000p-8, -0x1.fe02a6b100000p-8, 0x0.0000000000000p+0, + 0x0.0000000000000p+0, 0x1.0101575890000p-7, 0x1.0205658938000p-6, + 0x1.8492528c90000p-6, 0x1.0415d89e74000p-5, 0x1.466aed42e0000p-5, + 0x1.894aa149fc000p-5, 0x1.ccb73cdddc000p-5, 0x1.eea31c006c000p-5, + 0x1.1973bd1466000p-4, 0x1.3bdf5a7d1e000p-4, 0x1.5e95a4d97a000p-4, + 0x1.700d30aeac000p-4, 0x1.9335e5d594000p-4, 0x1.b6ac88dad6000p-4, + 0x1.c885801bc4000p-4, 0x1.ec739830a2000p-4, 0x1.fe89139dbe000p-4, + 0x1.1178e8227e000p-3, 0x1.1aa2b7e23f000p-3, 0x1.2d1610c868000p-3, + 0x1.365fcb0159000p-3, 0x1.4913d8333b000p-3, 0x1.527e5e4a1b000p-3, + 0x1.6574ebe8c1000p-3, 0x1.6f0128b757000p-3, 0x1.7898d85445000p-3, + 0x1.8beafeb390000p-3, 0x1.95a5adcf70000p-3, 0x1.a93ed3c8ae000p-3, + 0x1.b31d8575bd000p-3, 0x1.bd087383be000p-3, 0x1.c6ffbc6f01000p-3, + 0x1.db13db0d49000p-3, 0x1.e530effe71000p-3, 0x1.ef5ade4dd0000p-3, + 0x1.f991c6cb3b000p-3, 0x1.07138604d5800p-2, 0x1.0c42d67616000p-2, + 0x1.1178e8227e800p-2, 0x1.16b5ccbacf800p-2, 0x1.1bf99635a6800p-2, + 0x1.214456d0eb800p-2, 0x1.2bef07cdc9000p-2, 0x1.314f1e1d36000p-2, + 0x1.36b6776be1000p-2, 0x1.3c25277333000p-2, 0x1.419b423d5e800p-2, + 0x1.4718dc271c800p-2, 0x1.4c9e09e173000p-2, 0x1.522ae0738a000p-2, + 0x1.57bf753c8d000p-2, 0x1.5d5bddf596000p-2, }, + .logctail + = { 0x1.ab42428375680p-48, -0x1.ca508d8e0f720p-46, -0x1.362a4d5b6506dp-45, + -0x1.684e49eb067d5p-49, -0x1.41b6993293ee0p-47, 0x1.3d82f484c84ccp-46, + 0x1.c42f3ed820b3ap-50, 0x1.0b1c686519460p-45, 0x1.5594dd4c58092p-45, + 0x1.67b1e99b72bd8p-45, 0x1.5ca14b6cfb03fp-46, 0x1.5ca14b6cfb03fp-46, + -0x1.65a242853da76p-46, -0x1.fafbc68e75404p-46, 0x1.f1fc63382a8f0p-46, + -0x1.6a8c4fd055a66p-45, -0x1.c6bee7ef4030ep-47, -0x1.036b89ef42d7fp-48, + -0x1.036b89ef42d7fp-48, 0x1.d572aab993c87p-47, 0x1.b26b79c86af24p-45, + -0x1.72f4f543fff10p-46, 0x1.1ba91bbca681bp-45, 0x1.7794f689f8434p-45, + 0x1.7794f689f8434p-45, 0x1.94eb0318bb78fp-46, 0x1.a4e633fcd9066p-52, + -0x1.58c64dc46c1eap-45, -0x1.58c64dc46c1eap-45, -0x1.ad1d904c1d4e3p-45, + 0x1.bbdbf7fdbfa09p-45, 0x1.bdb9072534a58p-45, 0x1.bdb9072534a58p-45, + -0x1.0e46aa3b2e266p-46, -0x1.e9e439f105039p-46, -0x1.e9e439f105039p-46, + -0x1.0de8b90075b8fp-45, 0x1.70cc16135783cp-46, 0x1.70cc16135783cp-46, + 0x1.178864d27543ap-48, -0x1.48d301771c408p-45, -0x1.e80a41811a396p-45, + -0x1.e80a41811a396p-45, 0x1.a699688e85bf4p-47, 0x1.a699688e85bf4p-47, + -0x1.575545ca333f2p-45, 0x1.a342c2af0003cp-45, 0x1.a342c2af0003cp-45, + -0x1.d0c57585fbe06p-46, 0x1.53935e85baac8p-45, 0x1.53935e85baac8p-45, + 0x1.37c294d2f5668p-46, 0x1.37c294d2f5668p-46, -0x1.69737c93373dap-45, + 0x1.f025b61c65e57p-46, 0x1.f025b61c65e57p-46, 0x1.c5edaccf913dfp-45, + 0x1.c5edaccf913dfp-45, 0x1.47c5e768fa309p-46, 0x1.d599e83368e91p-45, + 0x1.d599e83368e91p-45, 0x1.c827ae5d6704cp-46, 0x1.c827ae5d6704cp-46, + -0x1.cfc4634f2a1eep-45, -0x1.cfc4634f2a1eep-45, 0x1.502b7f526feaap-48, + 0x1.502b7f526feaap-48, -0x1.980267c7e09e4p-45, -0x1.980267c7e09e4p-45, + -0x1.88d5493faa639p-45, -0x1.f1e7cf6d3a69cp-50, -0x1.f1e7cf6d3a69cp-50, + -0x1.9e23f0dda40e4p-46, -0x1.9e23f0dda40e4p-46, 0x0.0000000000000p+0, + 0x0.0000000000000p+0, -0x1.0c76b999d2be8p-46, -0x1.3dc5b06e2f7d2p-45, + -0x1.aa0ba325a0c34p-45, 0x1.111c05cf1d753p-47, -0x1.c167375bdfd28p-45, + -0x1.97995d05a267dp-46, -0x1.a68f247d82807p-46, -0x1.e113e4fc93b7bp-47, + -0x1.5325d560d9e9bp-45, 0x1.cc85ea5db4ed7p-45, -0x1.c69063c5d1d1ep-45, + 0x1.c1e8da99ded32p-49, 0x1.3115c3abd47dap-45, -0x1.390802bf768e5p-46, + 0x1.646d1c65aacd3p-45, -0x1.dc068afe645e0p-45, -0x1.534d64fa10afdp-45, + 0x1.1ef78ce2d07f2p-45, 0x1.ca78e44389934p-45, 0x1.39d6ccb81b4a1p-47, + 0x1.62fa8234b7289p-51, 0x1.5837954fdb678p-45, 0x1.633e8e5697dc7p-45, + 0x1.9cf8b2c3c2e78p-46, -0x1.5118de59c21e1p-45, -0x1.c661070914305p-46, + -0x1.73d54aae92cd1p-47, 0x1.7f22858a0ff6fp-47, -0x1.8724350562169p-45, + -0x1.c358d4eace1aap-47, -0x1.d4bc4595412b6p-45, -0x1.1ec72c5962bd2p-48, + -0x1.aff2af715b035p-45, 0x1.212276041f430p-51, -0x1.a211565bb8e11p-51, + 0x1.bcbecca0cdf30p-46, 0x1.89cdb16ed4e91p-48, 0x1.7188b163ceae9p-45, + -0x1.c210e63a5f01cp-45, 0x1.b9acdf7a51681p-45, 0x1.ca6ed5147bdb7p-45, + 0x1.a87deba46baeap-47, 0x1.a9cfa4a5004f4p-45, -0x1.8e27ad3213cb8p-45, + 0x1.16ecdb0f177c8p-46, 0x1.83b54b606bd5cp-46, 0x1.8e436ec90e09dp-47, + -0x1.f27ce0967d675p-45, -0x1.e20891b0ad8a4p-45, 0x1.ebe708164c759p-45, + 0x1.fadedee5d40efp-46, -0x1.a0b2a08a465dcp-47, }, +}; diff --git a/sysdeps/aarch64/fpu/v_powf_data.c b/sysdeps/aarch64/fpu/v_powf_data.c new file mode 100644 index 0000000000..f789b84850 --- /dev/null +++ b/sysdeps/aarch64/fpu/v_powf_data.c @@ -0,0 +1,102 @@ +/* Coefficients for single-precision SVE pow(x) function. + + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + + +#include "vecmath_config.h" + +const struct v_powf_data __v_powf_data = { + .invc = { 0x1.6489890582816p+0, + 0x1.5cf19b35e3472p+0, + 0x1.55aac0e956d65p+0, + 0x1.4eb0022977e01p+0, + 0x1.47fcccda1dd1fp+0, + 0x1.418ceabab68c1p+0, + 0x1.3b5c788f1edb3p+0, + 0x1.3567de48e9c9ap+0, + 0x1.2fabc80fd19bap+0, + 0x1.2a25200ce536bp+0, + 0x1.24d108e0152e3p+0, + 0x1.1facd8ab2fbe1p+0, + 0x1.1ab614a03efdfp+0, + 0x1.15ea6d03af9ffp+0, + 0x1.1147b994bb776p+0, + 0x1.0ccbf650593aap+0, + 0x1.0875408477302p+0, + 0x1.0441d42a93328p+0, + 0x1p+0, + 0x1.f1d006c855e86p-1, + 0x1.e28c3341aa301p-1, + 0x1.d4bdf9aa64747p-1, + 0x1.c7b45a24e5803p-1, + 0x1.bb5f5eb2ed60ap-1, + 0x1.afb0bff8fe6b4p-1, + 0x1.a49badf7ab1f5p-1, + 0x1.9a14a111fc4c9p-1, + 0x1.901131f5b2fdcp-1, + 0x1.8687f73f6d865p-1, + 0x1.7d7067eb77986p-1, + 0x1.74c2c1cf97b65p-1, + 0x1.6c77f37cff2a1p-1 + }, + .logc = { -0x1.e960f97b22702p+3, + -0x1.c993406cd4db6p+3, + -0x1.aa711d9a7d0f3p+3, + -0x1.8bf37bacdce9bp+3, + -0x1.6e13b3519946ep+3, + -0x1.50cb8281e4089p+3, + -0x1.341504a237e2bp+3, + -0x1.17eaab624ffbbp+3, + -0x1.f88e708f8c853p+2, + -0x1.c24b6da113914p+2, + -0x1.8d02ee397cb1dp+2, + -0x1.58ac1223408b3p+2, + -0x1.253e6fd190e89p+2, + -0x1.e5641882c12ffp+1, + -0x1.81fea712926f7p+1, + -0x1.203e240de64a3p+1, + -0x1.8029b86a78281p0, + -0x1.85d713190fb9p-1, + 0x0p+0, + 0x1.4c1cc07312997p0, + 0x1.5e1848ccec948p+1, + 0x1.04cfcb7f1196fp+2, + 0x1.582813d463c21p+2, + 0x1.a936fa68760ccp+2, + 0x1.f81bc31d6cc4ep+2, + 0x1.2279a09fae6b1p+3, + 0x1.47ec0b6df5526p+3, + 0x1.6c71762280f1p+3, + 0x1.90155070798dap+3, + 0x1.b2e23b1d3068cp+3, + 0x1.d4e21b0daa86ap+3, + 0x1.f61e2a2f67f3fp+3 + }, + .scale = { 0x3ff0000000000000, 0x3fefd9b0d3158574, 0x3fefb5586cf9890f, + 0x3fef9301d0125b51, 0x3fef72b83c7d517b, 0x3fef54873168b9aa, + 0x3fef387a6e756238, 0x3fef1e9df51fdee1, 0x3fef06fe0a31b715, + 0x3feef1a7373aa9cb, 0x3feedea64c123422, 0x3feece086061892d, + 0x3feebfdad5362a27, 0x3feeb42b569d4f82, 0x3feeab07dd485429, + 0x3feea47eb03a5585, 0x3feea09e667f3bcd, 0x3fee9f75e8ec5f74, + 0x3feea11473eb0187, 0x3feea589994cce13, 0x3feeace5422aa0db, + 0x3feeb737b0cdc5e5, 0x3feec49182a3f090, 0x3feed503b23e255d, + 0x3feee89f995ad3ad, 0x3feeff76f2fb5e47, 0x3fef199bdd85529c, + 0x3fef3720dcef9069, 0x3fef5818dcfba487, 0x3fef7c97337b9b5f, + 0x3fefa4afa2a490da, 0x3fefd0765b6e4540, + }, +}; diff --git a/sysdeps/aarch64/fpu/vecmath_config.h b/sysdeps/aarch64/fpu/vecmath_config.h index c8cfc03bc0..7f0a8aa5f2 100644 --- a/sysdeps/aarch64/fpu/vecmath_config.h +++ b/sysdeps/aarch64/fpu/vecmath_config.h @@ -35,17 +35,6 @@ __ptr; \ }) -static inline uint64_t -asuint64 (double f) -{ - union - { - double f; - uint64_t i; - } u = { f }; - return u.i; -} - #define V_LOG_POLY_ORDER 6 #define V_LOG_TABLE_BITS 7 extern const struct v_log_data @@ -130,4 +119,35 @@ extern const struct erfcf_data } tab[645]; } __erfcf_data attribute_hidden; +/* Some data for AdvSIMD and SVE pow's internal exp and log. */ +#define V_POW_EXP_TABLE_BITS 8 +extern const struct v_pow_exp_data +{ + double poly[3]; + double n_over_ln2, ln2_over_n_hi, ln2_over_n_lo, shift; + uint64_t sbits[1 << V_POW_EXP_TABLE_BITS]; +} __v_pow_exp_data attribute_hidden; + +#define V_POW_LOG_TABLE_BITS 7 +extern const struct v_pow_log_data +{ + double poly[7]; /* First coefficient is 1. */ + double ln2_hi, ln2_lo; + double invc[1 << V_POW_LOG_TABLE_BITS]; + double logc[1 << V_POW_LOG_TABLE_BITS]; + double logctail[1 << V_POW_LOG_TABLE_BITS]; +} __v_pow_log_data attribute_hidden; + +/* Some data for SVE powf's internal exp and log. */ +#define V_POWF_EXP2_TABLE_BITS 5 +#define V_POWF_EXP2_N (1 << V_POWF_EXP2_TABLE_BITS) +#define V_POWF_LOG2_TABLE_BITS 5 +#define V_POWF_LOG2_N (1 << V_POWF_LOG2_TABLE_BITS) +extern const struct v_powf_data +{ + double invc[V_POWF_LOG2_N]; + double logc[V_POWF_LOG2_N]; + uint64_t scale[V_POWF_EXP2_N]; +} __v_powf_data attribute_hidden; + #endif diff --git a/sysdeps/aarch64/libm-test-ulps b/sysdeps/aarch64/libm-test-ulps index 6d083c4e32..c1b0502160 100644 --- a/sysdeps/aarch64/libm-test-ulps +++ b/sysdeps/aarch64/libm-test-ulps @@ -1397,11 +1397,19 @@ double: 1 float: 1 ldouble: 2 +Function: "pow_advsimd": +double: 1 +float: 2 + Function: "pow_downward": double: 1 float: 1 ldouble: 2 +Function: "pow_sve": +double: 1 +float: 2 + Function: "pow_towardzero": double: 1 float: 1 diff --git a/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist b/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist index 89ac1dfa36..b685106954 100644 --- a/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist +++ b/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist @@ -93,6 +93,8 @@ GLIBC_2.40 _ZGVnN2v_tanh F GLIBC_2.40 _ZGVnN2v_tanhf F GLIBC_2.40 _ZGVnN2vv_hypot F GLIBC_2.40 _ZGVnN2vv_hypotf F +GLIBC_2.40 _ZGVnN2vv_pow F +GLIBC_2.40 _ZGVnN2vv_powf F GLIBC_2.40 _ZGVnN4v_acoshf F GLIBC_2.40 _ZGVnN4v_asinhf F GLIBC_2.40 _ZGVnN4v_atanhf F @@ -103,6 +105,7 @@ GLIBC_2.40 _ZGVnN4v_erff F GLIBC_2.40 _ZGVnN4v_sinhf F GLIBC_2.40 _ZGVnN4v_tanhf F GLIBC_2.40 _ZGVnN4vv_hypotf F +GLIBC_2.40 _ZGVnN4vv_powf F GLIBC_2.40 _ZGVsMxv_acosh F GLIBC_2.40 _ZGVsMxv_acoshf F GLIBC_2.40 _ZGVsMxv_asinh F @@ -123,3 +126,5 @@ GLIBC_2.40 _ZGVsMxv_tanh F GLIBC_2.40 _ZGVsMxv_tanhf F GLIBC_2.40 _ZGVsMxvv_hypot F GLIBC_2.40 _ZGVsMxvv_hypotf F +GLIBC_2.40 _ZGVsMxvv_pow F +GLIBC_2.40 _ZGVsMxvv_powf F