From patchwork Tue Apr 30 12:49:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joe Ramsay X-Patchwork-Id: 1929497 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=sA9Sgo1q; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=sA9Sgo1q; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4VTKn26wwqz20fY for ; Tue, 30 Apr 2024 22:50:46 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3964B384AB50 for ; Tue, 30 Apr 2024 12:50:45 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR05-AM6-obe.outbound.protection.outlook.com (mail-am6eur05on2046.outbound.protection.outlook.com [40.107.22.46]) by sourceware.org (Postfix) with ESMTPS id 13F463858D20 for ; Tue, 30 Apr 2024 12:50:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 13F463858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 13F463858D20 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.22.46 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1714481424; cv=pass; b=MWxLyX5NB7K/5WTAHswdR51DCxIdblT8kh6NnobVUOKffSE2Re7Jv3u2dXl6nkV5tCFYO9QrJoHTUlmP0G/CArzHJGTMZpI4yAGUiWlwPhdbFBxHKmg94VWx366Gv3fk4M5xnN2BeEJf5WoT8SeFv4aLpUYaYTkYIesoykZIt9g= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1714481424; c=relaxed/simple; bh=qCv5N7Oqtf6h7ibq2lCYNudaodS3JOG/uVoIM1LGktY=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=PO5u1aBPl3peiCldvoaLT29BprGfLCosbpUPtFFMhvG44Yv4P96gGZwf/9RY9l8kW15QotgH/094umtp1w54Ny7YQt2IuiRreEPfftpjcyRBZfoFb+lbVHe4iGpCn95BukIonmsH+g7WSGoH1yeK7Wwa++XAeiH5fpvo3dsn7Yk= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=bLiZ3u7tmyOsmwDPfsNIuydV2U+ihCjeFVReZw83USCMIqADKYCcLhpv8i8v/56WVEh2+Oe/zQSrEhPoo4Zd97N+2Z2JQ1ciQkral3euMSyoFNmhNFnPoKym+2Jmx24Xkcaj2pcozNUi5K6qKGD/jjYBbvTAfqxnIaGILIgAiY4/tQK6GTUz1EIR5XMYMRkau+R5bUNQZdcr7yQj6VCidNsi+VTkHLp9q1hVH0jXSnKO7/ao2CVYpqlmLq1U8JP5UynahJcbkM1NyD9hGdjroB2p3y0M9LsQxff8OAmWZmNGtXc9I6LOezS642hJFWpoUS5y01TORYzCRwBmjQ/H6w== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=+4PUhsAbvHPU47Ilyav/hU9Rf9i1/hlbjhP/xjE3JDw=; b=RUsg367tjFM3F0XpaP7ZuNcSZhwLAD7kHssq/CVXRq7szIWBSctVIcIrGoQw8VPsDmyfU6bfxCrylGl9JQKNO9bISFXdlBPmrYOKtjcj8hR0EzhuvhICwxuzwTirg4QDJzJirWY1Cm9yKdMApIPJfAuEy10s3tTpOIKZsJjVdLsdQH9OtH8j92m/QJCY3jBSkj5xMlBnj87yxbagOmsHduNGcqKghqQm/1yKa/Ctpmr4ARwTQWDZ9fEbZZpewT6OjYoBWdIc7zoHk3u2YfqEAqq5hhYGXzLdozluqSDGd0YG9U9qZCxvvrffjl1yovQ/Zf753Oj355ulVMBS1l9b5g== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=arm.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=+4PUhsAbvHPU47Ilyav/hU9Rf9i1/hlbjhP/xjE3JDw=; b=sA9Sgo1q4l8ggMYtd2q/GiJKesw11jjPhk8ObGJSZZNYWKcse8dszH7LtQp5aJPUb91PBps8k+cgdnZCrJFS3JzXmTD2wAVv2wUuzsBUoSLvAeqOeWlpIHLypzeiabcPGMh38ujEdNJ1WKRm7XSP5kXc0SWM1xVVN89RiMERX1Q= Received: from DB8P191CA0005.EURP191.PROD.OUTLOOK.COM (2603:10a6:10:130::15) by AS8PR08MB9868.eurprd08.prod.outlook.com (2603:10a6:20b:5ac::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7519.34; Tue, 30 Apr 2024 12:50:16 +0000 Received: from DB3PEPF0000885F.eurprd02.prod.outlook.com (2603:10a6:10:130:cafe::6f) by DB8P191CA0005.outlook.office365.com (2603:10a6:10:130::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7472.34 via Frontend Transport; Tue, 30 Apr 2024 12:50:16 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=arm.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DB3PEPF0000885F.mail.protection.outlook.com (10.167.242.10) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.7544.18 via Frontend Transport; Tue, 30 Apr 2024 12:50:16 +0000 Received: ("Tessian outbound e46bb127ed3d:v315"); Tue, 30 Apr 2024 12:50:16 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 6a7af5f129c33850 X-CR-MTA-TID: 64aa7808 Received: from 76d77542700f.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 314061F7-C9E4-4022-A211-A4CACC50A90B.1; Tue, 30 Apr 2024 12:50:09 +0000 Received: from EUR02-DB5-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 76d77542700f.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Tue, 30 Apr 2024 12:50:09 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=CovJdTL9a64sL1TjEbSwwkyEZEiXOkx8mKF1BNOJmK5iAZ2hGzzg4nx2VTXczCS4Tgfs3iUEeRQUbFSDQzKvP4CHma+f1GdymimbjdZ+BD5OrLNLqtsvknQDOdvejCCnk8h/SQFhZYHlG1eKfAdMm1BW99R2OmFDvMOQ+Q2oNkDoCZq9vQfanZJW/BCO/+eL++wEnxHdmBJECEZYAFXnwla07C4BLx5udYkdb5ZHUF6kUbE09zUduD9cqABv9PnkRsDZ8QhF4ZA+0Oo3yfJk0nKa1OuU9M4CJTLV8bDnVwftTH9SXWEobn2/+vEIzDmp/Df3P8loc2vgz6I9PPeT3w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=+4PUhsAbvHPU47Ilyav/hU9Rf9i1/hlbjhP/xjE3JDw=; b=K8gZO+gOmA4/9ujoDfKkMy9vCwazEECdx9EYXIS88DEO3UkKzRlxVWAtMXLhqjT0ec1340OVepXKlCQGW8rieaOUFyqo5tgRFBLvVpltOJ4xannmllHDs/a6oV7TE7YG/mjGZP7agmC/v6lDTg0KF/EA6bSBr5R+9+mp40aTKtDAUsbsfYANe/7pt6l7AwPpBROrQH+i0yDo4mtJC/a5qgzJ5vjt1Vi5xmynYf3rg1UH8V0ueR/h7WLcFqmK7ZVBw7AEUvt9nDjJ5I5VnUikZJwFo9jKb4bSYL45VoJW718CPWx4do6F+dbflQgnUti/Xjfgm4Ot1xDHWnkAeogeFA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 40.67.248.234) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=+4PUhsAbvHPU47Ilyav/hU9Rf9i1/hlbjhP/xjE3JDw=; b=sA9Sgo1q4l8ggMYtd2q/GiJKesw11jjPhk8ObGJSZZNYWKcse8dszH7LtQp5aJPUb91PBps8k+cgdnZCrJFS3JzXmTD2wAVv2wUuzsBUoSLvAeqOeWlpIHLypzeiabcPGMh38ujEdNJ1WKRm7XSP5kXc0SWM1xVVN89RiMERX1Q= Received: from DB9PR01CA0022.eurprd01.prod.exchangelabs.com (2603:10a6:10:1d8::27) by AM7PR08MB5352.eurprd08.prod.outlook.com (2603:10a6:20b:10e::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7519.36; Tue, 30 Apr 2024 12:50:07 +0000 Received: from DU2PEPF00028D08.eurprd03.prod.outlook.com (2603:10a6:10:1d8:cafe::e2) by DB9PR01CA0022.outlook.office365.com (2603:10a6:10:1d8::27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7519.34 via Frontend Transport; Tue, 30 Apr 2024 12:50:07 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 40.67.248.234) smtp.mailfrom=arm.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 40.67.248.234 as permitted sender) receiver=protection.outlook.com; client-ip=40.67.248.234; helo=nebula.arm.com; pr=C Received: from nebula.arm.com (40.67.248.234) by DU2PEPF00028D08.mail.protection.outlook.com (10.167.242.168) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7544.18 via Frontend Transport; Tue, 30 Apr 2024 12:50:07 +0000 Received: from AZ-NEU-EX02.Emea.Arm.com (10.251.26.5) by AZ-NEU-EX03.Arm.com (10.251.24.31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Tue, 30 Apr 2024 12:50:03 +0000 Received: from AZ-NEU-EX04.Arm.com (10.251.24.32) by AZ-NEU-EX02.Emea.Arm.com (10.251.26.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Tue, 30 Apr 2024 12:50:03 +0000 Received: from vcn-man-apps.manchester.arm.com (10.32.108.22) by mail.arm.com (10.251.24.32) with Microsoft SMTP Server id 15.1.2507.35 via Frontend Transport; Tue, 30 Apr 2024 12:50:03 +0000 From: Joe Ramsay To: CC: Joe Ramsay Subject: [PATCH 2/3] aarch64/fpu: Add vector variants of cbrt Date: Tue, 30 Apr 2024 13:49:59 +0100 Message-ID: <20240430125000.50324-2-Joe.Ramsay@arm.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20240430125000.50324-1-Joe.Ramsay@arm.com> References: <20240430125000.50324-1-Joe.Ramsay@arm.com> MIME-Version: 1.0 X-EOPAttributedMessage: 1 X-MS-TrafficTypeDiagnostic: DU2PEPF00028D08:EE_|AM7PR08MB5352:EE_|DB3PEPF0000885F:EE_|AS8PR08MB9868:EE_ X-MS-Office365-Filtering-Correlation-Id: b345ef4f-400f-44eb-79c9-08dc6914159a x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; ARA:13230031|36860700004|376005|82310400014|1800799015; X-Microsoft-Antispam-Message-Info-Original: 6nfKEkgWHHQs8tBtkiaVnvEkzrIkrlqYQJHhajeNyn2uTycIYlH+mo2cdJ1KX4qu/6L4a/VGdFaSanu7KbYgmysByySpeaqQ4/7Amgjytr0YHgDhMK/xSAifAq3HfG0S6FsGjH+drxJ96cQ3N7L4Li3DBMtVp0bQNRvqCxHsJLu35WPVQaPydEDzQN5xf7Ejh98VaiKVZhjcfWZt5XVFEKnu1Fe/ri6YGgbwqximKfwpL4Nwk5EIcu3MXgEP+DhhoQ6EtZRIBzrxwDkubIW4k2lSira6W3EuQP5lzqabW0CBT0UjvvYIITRdPwZwd7bWTcDVW7kL+teW9FB7irnbqzr13XU0oOYbnrdk4cQ/jdURSVF6M2UtU/4Jly5+CgPYpFaLzPWiGqofF+H4/f3V7KMOKC6olpeHxPveYS0fqtk3IxlZbRDvdlIyrPpmJ53itIFr3NVnaigjrerequ0EFmwIP2kEx5dYFrtZoN46KFF/nCJ7DDYnMo7IcP/Ct39DpN6sk7X27oAVYQCJa15ipAi4yO8EgdSFuKDRMzox2TOFl6AbHv3fPujY6WHFWSb5WbzGWSm5QD+9AguJFiBSHuEZNHm4FOxUdU7xdHTD3iGlSjMm1NxH9cFmTV89f10jgGb8/FdASpnBwvevh6hm6pM23yRkVmSFTSrfqnmNnrHJw8AL8rliSmrakXpY9oZOtqKc7gZevGk2yMTnFtnqgSjyJ1GIQOuVLzv5Hr5V9r8ZeUx00U2wWMx7+JaAnuEsn2gFDUP8pVpA2zJF8znSpbCzu0Es3AfSSjqZM879eaiFSbvyNxsR1V57pD1LZ/sLltGRW2jUlqYpPnuybu4dMHos+z83yKB4H58VSDrPGp8tltKQ6JQC1LL4lFXNcdTye+3zoyGWoAP7u9SnZD/gXAm0LwH8Awqz2NnJMFGJShTvsiD0nIAZV0Ny+16RWNdg23UgaaFAyHvw7wv/MHz1ox07yIiuGufW4Ep0UUi0JqJpmKKaUaoay3Kyzigq11mgmsCYkbq78vo632X3Iw7UyrtQvFUW40FCOJPi7VZqEUZXxoQAtmFispboM/pTLW9Ee7mIDemjjzPORLp4yXdMsf9hSSYY617xh8KnasMqvI4MnbKSaptWG+zLEfUS5LVhuPiZDcPAF2WIHMNe0PhUqStLGaxk0VrhhuzQIYOLlkGVcF3wE57EWvSToPNctk0JigUVzWDlHvAfk6p322YKdVv4G4m551rAUgJsL+hFO5nWGhuGTEvhcXZ9MFEQfNM6AKpU0+tq2xK4TCe/5xiIqtXpW4Jo7EwkcotO6Nh4RIKddIAnm/XvCBGEhiWDAi6s X-Forefront-Antispam-Report-Untrusted: CIP:40.67.248.234; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:nebula.arm.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230031)(36860700004)(376005)(82310400014)(1800799015); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM7PR08MB5352 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB3PEPF0000885F.eurprd02.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: d52fba91-a6d6-411e-a85e-08dc69141022 X-Microsoft-Antispam: BCL:0; ARA:13230031|82310400014|1800799015|36860700004|376005|35042699010; X-Microsoft-Antispam-Message-Info: gGWTvKGQBFRaMZhhFOvg5H15J5Dg3yyi337bEmuhb0FixbnR55ehPIFFxu1I3YUv+d9PczxALtGXOWujnznhhfthjNjjSrNnylpfasQL2rQsVojwso/wrJsN9A3IjmDlZovmjSPUi77o2bsIBgQ3r3iQynzoUtfTTD9CTHgCXlyKe/To9Y2XRMcWLKFsgytCPb9M/TaqFfdGSdr3tlGb1Phw+LpVoMdm+9y/4N0pOMSe8PJe/ENDeSYATuziUCWb/hzVUdMDCMt27xSBK1WDkzr55i1wk2/jvgn9Pa7mTCVyK/NCFsAl5u9xk4cp+wFIK83jqnZYe9POu4galE4Ia4of3JNfX2EnEWsF4LByMSqx9aU6+sLSUSr4msppHPtTAytl7G+/+lnbQ1iOlQA24UkRdkVng4YqLVSEDnkKiM4/Vz1YCeMfehO8R+L8VdDqkUnzIANsh6erL8GahaGaaGGbvVncMr6Cb0YcxwbzX9xpb3SheGllSrjROreevg/uqdvlNLIvvSwDsxaCzL17o202Hxr9DKe++HE+vgfGLJg4pi90N2m2tKCMNChCgER8LcpZWLrVoYfZBoXmhijDXznwYZbgI3Lc+ZgDf/rzI60ra29Ug9HNijUL+n9EFxtIUAnDTiuXFgXeYsej6y3RQVYhvWZ9NxjESOL6IH3DrrU86YtUd+pUAbzJIdWtZtYrQPuSXNwwGvL6p/QNjKk2YDR174v0lsmEihJkpvffCFR2S3Y0cP/NwObiX6DE7YaweZNGR18sOvrlvlMhoB4rZT1OANcc416mZWIIVrsDAEkWNK10FHSvA4hzpzKyRDqzvUsYtVElx1SmVldWJ2MY66hOZycBSmp6g2OsuF5Eqk1H8UAUCFPy6hALQMuLYjMWTRReSz0B0VdS7MO60DL6p2EMFUmEJx6EKGf3Xi3Thasn/UCFanSwEl+tGjRtLjj8WfCVCTwQ+7GRk03w5Wv2Wj8Hbw7yI/Nt176Bd9dzNC7szF80FJGARPPQbqHlLZ+/29bJPf56IljvaEsZG7QgveQZkS93ZyhbcUfHUEdO32jqToxv5s6rj3HbEjzUSbH4LtHSBN/ThNmUnp0HOuHD/HN4i3kHIM4ljEbSdB6KqPzxIl1lxJne7z2WjR62J2UlGmyafS9kuBUV7mqhClqkAd2c9zxLIzdRRvPSIcaL0/+JbK5J3Mjh9t9AiaDWqpgTEiUuy5W49vh0xUmx/Mls+BN45xjzk6zut+g4Oyp/+9IZevRMoBnJrCb4RwHRhvfrwd8wU/Vl3BekweXPmS4sYxSH8+iizaFQyaU9SZPhSgxKB27MjIpGcRsjyyzckqoo X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230031)(82310400014)(1800799015)(36860700004)(376005)(35042699010); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 Apr 2024 12:50:16.2319 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: b345ef4f-400f-44eb-79c9-08dc6914159a X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DB3PEPF0000885F.eurprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB9868 X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FORGED_SPF_HELO, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org --- Thanks, Joe sysdeps/aarch64/fpu/Makefile | 1 + sysdeps/aarch64/fpu/Versions | 5 + sysdeps/aarch64/fpu/advsimd_f32_protos.h | 1 + sysdeps/aarch64/fpu/bits/math-vector.h | 8 ++ sysdeps/aarch64/fpu/cbrt_advsimd.c | 121 +++++++++++++++++ sysdeps/aarch64/fpu/cbrt_sve.c | 128 ++++++++++++++++++ sysdeps/aarch64/fpu/cbrtf_advsimd.c | 123 +++++++++++++++++ sysdeps/aarch64/fpu/cbrtf_sve.c | 122 +++++++++++++++++ .../fpu/test-double-advsimd-wrappers.c | 1 + .../aarch64/fpu/test-double-sve-wrappers.c | 1 + .../aarch64/fpu/test-float-advsimd-wrappers.c | 1 + sysdeps/aarch64/fpu/test-float-sve-wrappers.c | 1 + sysdeps/aarch64/libm-test-ulps | 8 ++ .../unix/sysv/linux/aarch64/libmvec.abilist | 5 + 14 files changed, 526 insertions(+) create mode 100644 sysdeps/aarch64/fpu/cbrt_advsimd.c create mode 100644 sysdeps/aarch64/fpu/cbrt_sve.c create mode 100644 sysdeps/aarch64/fpu/cbrtf_advsimd.c create mode 100644 sysdeps/aarch64/fpu/cbrtf_sve.c Reviewed-by: Szabolcs Nagy diff --git a/sysdeps/aarch64/fpu/Makefile b/sysdeps/aarch64/fpu/Makefile index 06657782a1..990d1135b9 100644 --- a/sysdeps/aarch64/fpu/Makefile +++ b/sysdeps/aarch64/fpu/Makefile @@ -5,6 +5,7 @@ libmvec-supported-funcs = acos \ atan \ atanh \ atan2 \ + cbrt \ cos \ cosh \ erf \ diff --git a/sysdeps/aarch64/fpu/Versions b/sysdeps/aarch64/fpu/Versions index aedae9457b..36a9e4df1e 100644 --- a/sysdeps/aarch64/fpu/Versions +++ b/sysdeps/aarch64/fpu/Versions @@ -94,6 +94,11 @@ libmvec { _ZGVnN4v_atanhf; _ZGVsMxv_atanh; _ZGVsMxv_atanhf; + _ZGVnN2v_cbrt; + _ZGVnN2v_cbrtf; + _ZGVnN4v_cbrtf; + _ZGVsMxv_cbrt; + _ZGVsMxv_cbrtf; _ZGVnN2v_cosh; _ZGVnN2v_coshf; _ZGVnN4v_coshf; diff --git a/sysdeps/aarch64/fpu/advsimd_f32_protos.h b/sysdeps/aarch64/fpu/advsimd_f32_protos.h index a8889a92fd..54858efd8a 100644 --- a/sysdeps/aarch64/fpu/advsimd_f32_protos.h +++ b/sysdeps/aarch64/fpu/advsimd_f32_protos.h @@ -23,6 +23,7 @@ libmvec_hidden_proto (V_NAME_F1(asin)); libmvec_hidden_proto (V_NAME_F1(asinh)); libmvec_hidden_proto (V_NAME_F1(atan)); libmvec_hidden_proto (V_NAME_F1(atanh)); +libmvec_hidden_proto (V_NAME_F1(cbrt)); libmvec_hidden_proto (V_NAME_F1(cos)); libmvec_hidden_proto (V_NAME_F1(cosh)); libmvec_hidden_proto (V_NAME_F1(erf)); diff --git a/sysdeps/aarch64/fpu/bits/math-vector.h b/sysdeps/aarch64/fpu/bits/math-vector.h index ca30177339..b1c024fe13 100644 --- a/sysdeps/aarch64/fpu/bits/math-vector.h +++ b/sysdeps/aarch64/fpu/bits/math-vector.h @@ -57,6 +57,10 @@ # define __DECL_SIMD_atan2 __DECL_SIMD_aarch64 # undef __DECL_SIMD_atan2f # define __DECL_SIMD_atan2f __DECL_SIMD_aarch64 +# undef __DECL_SIMD_cbrt +# define __DECL_SIMD_cbrt __DECL_SIMD_aarch64 +# undef __DECL_SIMD_cbrtf +# define __DECL_SIMD_cbrtf __DECL_SIMD_aarch64 # undef __DECL_SIMD_cos # define __DECL_SIMD_cos __DECL_SIMD_aarch64 # undef __DECL_SIMD_cosf @@ -158,6 +162,7 @@ __vpcs __f32x4_t _ZGVnN4v_asinf (__f32x4_t); __vpcs __f32x4_t _ZGVnN4v_asinhf (__f32x4_t); __vpcs __f32x4_t _ZGVnN4v_atanf (__f32x4_t); __vpcs __f32x4_t _ZGVnN4v_atanhf (__f32x4_t); +__vpcs __f32x4_t _ZGVnN4v_cbrtf (__f32x4_t); __vpcs __f32x4_t _ZGVnN4v_cosf (__f32x4_t); __vpcs __f32x4_t _ZGVnN4v_coshf (__f32x4_t); __vpcs __f32x4_t _ZGVnN4v_erff (__f32x4_t); @@ -183,6 +188,7 @@ __vpcs __f64x2_t _ZGVnN2v_asin (__f64x2_t); __vpcs __f64x2_t _ZGVnN2v_asinh (__f64x2_t); __vpcs __f64x2_t _ZGVnN2v_atan (__f64x2_t); __vpcs __f64x2_t _ZGVnN2v_atanh (__f64x2_t); +__vpcs __f64x2_t _ZGVnN2v_cbrt (__f64x2_t); __vpcs __f64x2_t _ZGVnN2v_cos (__f64x2_t); __vpcs __f64x2_t _ZGVnN2v_cosh (__f64x2_t); __vpcs __f64x2_t _ZGVnN2v_erf (__f64x2_t); @@ -213,6 +219,7 @@ __sv_f32_t _ZGVsMxv_asinf (__sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxv_asinhf (__sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxv_atanf (__sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxv_atanhf (__sv_f32_t, __sv_bool_t); +__sv_f32_t _ZGVsMxv_cbrtf (__sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxv_cosf (__sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxv_coshf (__sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxv_erff (__sv_f32_t, __sv_bool_t); @@ -238,6 +245,7 @@ __sv_f64_t _ZGVsMxv_asin (__sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_asinh (__sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_atan (__sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_atanh (__sv_f64_t, __sv_bool_t); +__sv_f64_t _ZGVsMxv_cbrt (__sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_cos (__sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_cosh (__sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_erf (__sv_f64_t, __sv_bool_t); diff --git a/sysdeps/aarch64/fpu/cbrt_advsimd.c b/sysdeps/aarch64/fpu/cbrt_advsimd.c new file mode 100644 index 0000000000..adfbb60cd3 --- /dev/null +++ b/sysdeps/aarch64/fpu/cbrt_advsimd.c @@ -0,0 +1,121 @@ +/* Double-precision vector (AdvSIMD) cbrt function + + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "v_math.h" +#include "poly_advsimd_f64.h" + +const static struct data +{ + float64x2_t poly[4], one_third, shift; + int64x2_t exp_bias; + uint64x2_t abs_mask, tiny_bound; + uint32x4_t thresh; + double table[5]; +} data = { + .shift = V2 (0x1.8p52), + .poly = { /* Generated with fpminimax in [0.5, 1]. */ + V2 (0x1.c14e8ee44767p-2), V2 (0x1.dd2d3f99e4c0ep-1), + V2 (-0x1.08e83026b7e74p-1), V2 (0x1.2c74eaa3ba428p-3) }, + .exp_bias = V2 (1022), + .abs_mask = V2(0x7fffffffffffffff), + .tiny_bound = V2(0x0010000000000000), /* Smallest normal. */ + .thresh = V4(0x7fe00000), /* asuint64 (infinity) - tiny_bound. */ + .one_third = V2(0x1.5555555555555p-2), + .table = { /* table[i] = 2^((i - 2) / 3). */ + 0x1.428a2f98d728bp-1, 0x1.965fea53d6e3dp-1, 0x1p0, + 0x1.428a2f98d728bp0, 0x1.965fea53d6e3dp0 } +}; + +#define MantissaMask v_u64 (0x000fffffffffffff) + +static float64x2_t NOINLINE VPCS_ATTR +special_case (float64x2_t x, float64x2_t y, uint32x2_t special) +{ + return v_call_f64 (cbrt, x, y, vmovl_u32 (special)); +} + +/* Approximation for double-precision vector cbrt(x), using low-order polynomial + and two Newton iterations. Greatest observed error is 1.79 ULP. Errors repeat + according to the exponent, for instance an error observed for double value + m * 2^e will be observed for any input m * 2^(e + 3*i), where i is an + integer. + __v_cbrt(0x1.fffff403f0bc6p+1) got 0x1.965fe72821e9bp+0 + want 0x1.965fe72821e99p+0. */ +VPCS_ATTR float64x2_t V_NAME_D1 (cbrt) (float64x2_t x) +{ + const struct data *d = ptr_barrier (&data); + uint64x2_t iax = vreinterpretq_u64_f64 (vabsq_f64 (x)); + + /* Subnormal, +/-0 and special values. */ + uint32x2_t special + = vcge_u32 (vsubhn_u64 (iax, d->tiny_bound), vget_low_u32 (d->thresh)); + + /* Decompose |x| into m * 2^e, where m is in [0.5, 1.0]. This is a vector + version of frexp, which gets subnormal values wrong - these have to be + special-cased as a result. */ + float64x2_t m = vbslq_f64 (MantissaMask, x, v_f64 (0.5)); + int64x2_t exp_bias = d->exp_bias; + uint64x2_t ia12 = vshrq_n_u64 (iax, 52); + int64x2_t e = vsubq_s64 (vreinterpretq_s64_u64 (ia12), exp_bias); + + /* Calculate rough approximation for cbrt(m) in [0.5, 1.0], starting point for + Newton iterations. */ + float64x2_t p = v_pairwise_poly_3_f64 (m, vmulq_f64 (m, m), d->poly); + float64x2_t one_third = d->one_third; + /* Two iterations of Newton's method for iteratively approximating cbrt. */ + float64x2_t m_by_3 = vmulq_f64 (m, one_third); + float64x2_t two_thirds = vaddq_f64 (one_third, one_third); + float64x2_t a + = vfmaq_f64 (vdivq_f64 (m_by_3, vmulq_f64 (p, p)), two_thirds, p); + a = vfmaq_f64 (vdivq_f64 (m_by_3, vmulq_f64 (a, a)), two_thirds, a); + + /* Assemble the result by the following: + + cbrt(x) = cbrt(m) * 2 ^ (e / 3). + + We can get 2 ^ round(e / 3) using ldexp and integer divide, but since e is + not necessarily a multiple of 3 we lose some information. + + Let q = 2 ^ round(e / 3), then t = 2 ^ (e / 3) / q. + + Then we know t = 2 ^ (i / 3), where i is the remainder from e / 3, which is + an integer in [-2, 2], and can be looked up in the table T. Hence the + result is assembled as: + + cbrt(x) = cbrt(m) * t * 2 ^ round(e / 3) * sign. */ + + float64x2_t ef = vcvtq_f64_s64 (e); + float64x2_t eb3f = vrndnq_f64 (vmulq_f64 (ef, one_third)); + int64x2_t em3 = vcvtq_s64_f64 (vfmsq_f64 (ef, eb3f, v_f64 (3))); + int64x2_t ey = vcvtq_s64_f64 (eb3f); + + float64x2_t my = (float64x2_t){ d->table[em3[0] + 2], d->table[em3[1] + 2] }; + my = vmulq_f64 (my, a); + + /* Vector version of ldexp. */ + float64x2_t y = vreinterpretq_f64_s64 ( + vshlq_n_s64 (vaddq_s64 (ey, vaddq_s64 (exp_bias, v_s64 (1))), 52)); + y = vmulq_f64 (y, my); + + if (__glibc_unlikely (v_any_u32h (special))) + return special_case (x, vbslq_f64 (d->abs_mask, y, x), special); + + /* Copy sign. */ + return vbslq_f64 (d->abs_mask, y, x); +} diff --git a/sysdeps/aarch64/fpu/cbrt_sve.c b/sysdeps/aarch64/fpu/cbrt_sve.c new file mode 100644 index 0000000000..fc976eda2a --- /dev/null +++ b/sysdeps/aarch64/fpu/cbrt_sve.c @@ -0,0 +1,128 @@ +/* Double-precision vector (SVE) cbrt function + + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "sv_math.h" +#include "poly_sve_f64.h" + +const static struct data +{ + float64_t poly[4]; + float64_t table[5]; + float64_t one_third, two_thirds, shift; + int64_t exp_bias; + uint64_t tiny_bound, thresh; +} data = { + /* Generated with FPMinimax in [0.5, 1]. */ + .poly = { 0x1.c14e8ee44767p-2, 0x1.dd2d3f99e4c0ep-1, -0x1.08e83026b7e74p-1, + 0x1.2c74eaa3ba428p-3, }, + /* table[i] = 2^((i - 2) / 3). */ + .table = { 0x1.428a2f98d728bp-1, 0x1.965fea53d6e3dp-1, 0x1p0, + 0x1.428a2f98d728bp0, 0x1.965fea53d6e3dp0, }, + .one_third = 0x1.5555555555555p-2, + .two_thirds = 0x1.5555555555555p-1, + .shift = 0x1.8p52, + .exp_bias = 1022, + .tiny_bound = 0x0010000000000000, /* Smallest normal. */ + .thresh = 0x7fe0000000000000, /* asuint64 (infinity) - tiny_bound. */ +}; + +#define MantissaMask 0x000fffffffffffff +#define HalfExp 0x3fe0000000000000 + +static svfloat64_t NOINLINE +special_case (svfloat64_t x, svfloat64_t y, svbool_t special) +{ + return sv_call_f64 (cbrt, x, y, special); +} + +static inline svfloat64_t +shifted_lookup (const svbool_t pg, const float64_t *table, svint64_t i) +{ + return svld1_gather_index (pg, table, svadd_x (pg, i, 2)); +} + +/* Approximation for double-precision vector cbrt(x), using low-order + polynomial and two Newton iterations. Greatest observed error is 1.79 ULP. + Errors repeat according to the exponent, for instance an error observed for + double value m * 2^e will be observed for any input m * 2^(e + 3*i), where i + is an integer. + _ZGVsMxv_cbrt (0x0.3fffb8d4413f3p-1022) got 0x1.965f53b0e5d97p-342 + want 0x1.965f53b0e5d95p-342. */ +svfloat64_t SV_NAME_D1 (cbrt) (svfloat64_t x, const svbool_t pg) +{ + const struct data *d = ptr_barrier (&data); + + svfloat64_t ax = svabs_x (pg, x); + svuint64_t iax = svreinterpret_u64 (ax); + svuint64_t sign = sveor_x (pg, svreinterpret_u64 (x), iax); + + /* Subnormal, +/-0 and special values. */ + svbool_t special = svcmpge (pg, svsub_x (pg, iax, d->tiny_bound), d->thresh); + + /* Decompose |x| into m * 2^e, where m is in [0.5, 1.0]. This is a vector + version of frexp, which gets subnormal values wrong - these have to be + special-cased as a result. */ + svfloat64_t m = svreinterpret_f64 (svorr_x ( + pg, svand_x (pg, svreinterpret_u64 (x), MantissaMask), HalfExp)); + svint64_t e + = svsub_x (pg, svreinterpret_s64 (svlsr_x (pg, iax, 52)), d->exp_bias); + + /* Calculate rough approximation for cbrt(m) in [0.5, 1.0], starting point + for Newton iterations. */ + svfloat64_t p + = sv_pairwise_poly_3_f64_x (pg, m, svmul_x (pg, m, m), d->poly); + + /* Two iterations of Newton's method for iteratively approximating cbrt. */ + svfloat64_t m_by_3 = svmul_x (pg, m, d->one_third); + svfloat64_t a = svmla_x (pg, svdiv_x (pg, m_by_3, svmul_x (pg, p, p)), p, + d->two_thirds); + a = svmla_x (pg, svdiv_x (pg, m_by_3, svmul_x (pg, a, a)), a, d->two_thirds); + + /* Assemble the result by the following: + + cbrt(x) = cbrt(m) * 2 ^ (e / 3). + + We can get 2 ^ round(e / 3) using ldexp and integer divide, but since e is + not necessarily a multiple of 3 we lose some information. + + Let q = 2 ^ round(e / 3), then t = 2 ^ (e / 3) / q. + + Then we know t = 2 ^ (i / 3), where i is the remainder from e / 3, which + is an integer in [-2, 2], and can be looked up in the table T. Hence the + result is assembled as: + + cbrt(x) = cbrt(m) * t * 2 ^ round(e / 3) * sign. */ + svfloat64_t eb3f = svmul_x (pg, svcvt_f64_x (pg, e), d->one_third); + svint64_t ey = svcvt_s64_x (pg, eb3f); + svint64_t em3 = svmls_x (pg, e, ey, 3); + + svfloat64_t my = shifted_lookup (pg, d->table, em3); + my = svmul_x (pg, my, a); + + /* Vector version of ldexp. */ + svfloat64_t y = svscale_x (pg, my, ey); + + if (__glibc_unlikely (svptest_any (pg, special))) + return special_case ( + x, svreinterpret_f64 (svorr_x (pg, svreinterpret_u64 (y), sign)), + special); + + /* Copy sign. */ + return svreinterpret_f64 (svorr_x (pg, svreinterpret_u64 (y), sign)); +} diff --git a/sysdeps/aarch64/fpu/cbrtf_advsimd.c b/sysdeps/aarch64/fpu/cbrtf_advsimd.c new file mode 100644 index 0000000000..27debb8b57 --- /dev/null +++ b/sysdeps/aarch64/fpu/cbrtf_advsimd.c @@ -0,0 +1,123 @@ +/* Single-precision vector (AdvSIMD) cbrt function + + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "v_math.h" +#include "poly_advsimd_f32.h" + +const static struct data +{ + float32x4_t poly[4], one_third; + float table[5]; +} data = { + .poly = { /* Very rough approximation of cbrt(x) in [0.5, 1], generated with + FPMinimax. */ + V4 (0x1.c14e96p-2), V4 (0x1.dd2d3p-1), V4 (-0x1.08e81ap-1), + V4 (0x1.2c74c2p-3) }, + .table = { /* table[i] = 2^((i - 2) / 3). */ + 0x1.428a3p-1, 0x1.965feap-1, 0x1p0, 0x1.428a3p0, 0x1.965feap0 }, + .one_third = V4 (0x1.555556p-2f), +}; + +#define SignMask v_u32 (0x80000000) +#define SmallestNormal v_u32 (0x00800000) +#define Thresh vdup_n_u16 (0x7f00) /* asuint(INFINITY) - SmallestNormal. */ +#define MantissaMask v_u32 (0x007fffff) +#define HalfExp v_u32 (0x3f000000) + +static float32x4_t VPCS_ATTR NOINLINE +special_case (float32x4_t x, float32x4_t y, uint16x4_t special) +{ + return v_call_f32 (cbrtf, x, y, vmovl_u16 (special)); +} + +static inline float32x4_t +shifted_lookup (const float *table, int32x4_t i) +{ + return (float32x4_t){ table[i[0] + 2], table[i[1] + 2], table[i[2] + 2], + table[i[3] + 2] }; +} + +/* Approximation for vector single-precision cbrt(x) using Newton iteration + with initial guess obtained by a low-order polynomial. Greatest error + is 1.64 ULP. This is observed for every value where the mantissa is + 0x1.85a2aa and the exponent is a multiple of 3, for example: + _ZGVnN4v_cbrtf(0x1.85a2aap+3) got 0x1.267936p+1 + want 0x1.267932p+1. */ +VPCS_ATTR float32x4_t V_NAME_F1 (cbrt) (float32x4_t x) +{ + const struct data *d = ptr_barrier (&data); + uint32x4_t iax = vreinterpretq_u32_f32 (vabsq_f32 (x)); + + /* Subnormal, +/-0 and special values. */ + uint16x4_t special = vcge_u16 (vsubhn_u32 (iax, SmallestNormal), Thresh); + + /* Decompose |x| into m * 2^e, where m is in [0.5, 1.0]. This is a vector + version of frexpf, which gets subnormal values wrong - these have to be + special-cased as a result. */ + float32x4_t m = vbslq_f32 (MantissaMask, x, v_f32 (0.5)); + int32x4_t e + = vsubq_s32 (vreinterpretq_s32_u32 (vshrq_n_u32 (iax, 23)), v_s32 (126)); + + /* p is a rough approximation for cbrt(m) in [0.5, 1.0]. The better this is, + the less accurate the next stage of the algorithm needs to be. An order-4 + polynomial is enough for one Newton iteration. */ + float32x4_t p = v_pairwise_poly_3_f32 (m, vmulq_f32 (m, m), d->poly); + + float32x4_t one_third = d->one_third; + float32x4_t two_thirds = vaddq_f32 (one_third, one_third); + + /* One iteration of Newton's method for iteratively approximating cbrt. */ + float32x4_t m_by_3 = vmulq_f32 (m, one_third); + float32x4_t a + = vfmaq_f32 (vdivq_f32 (m_by_3, vmulq_f32 (p, p)), two_thirds, p); + + /* Assemble the result by the following: + + cbrt(x) = cbrt(m) * 2 ^ (e / 3). + + We can get 2 ^ round(e / 3) using ldexp and integer divide, but since e is + not necessarily a multiple of 3 we lose some information. + + Let q = 2 ^ round(e / 3), then t = 2 ^ (e / 3) / q. + + Then we know t = 2 ^ (i / 3), where i is the remainder from e / 3, which + is an integer in [-2, 2], and can be looked up in the table T. Hence the + result is assembled as: + + cbrt(x) = cbrt(m) * t * 2 ^ round(e / 3) * sign. */ + float32x4_t ef = vmulq_f32 (vcvtq_f32_s32 (e), one_third); + int32x4_t ey = vcvtq_s32_f32 (ef); + int32x4_t em3 = vsubq_s32 (e, vmulq_s32 (ey, v_s32 (3))); + + float32x4_t my = shifted_lookup (d->table, em3); + my = vmulq_f32 (my, a); + + /* Vector version of ldexpf. */ + float32x4_t y + = vreinterpretq_f32_s32 (vshlq_n_s32 (vaddq_s32 (ey, v_s32 (127)), 23)); + y = vmulq_f32 (y, my); + + if (__glibc_unlikely (v_any_u16h (special))) + return special_case (x, vbslq_f32 (SignMask, x, y), special); + + /* Copy sign. */ + return vbslq_f32 (SignMask, x, y); +} +libmvec_hidden_def (V_NAME_F1 (cbrt)) +HALF_WIDTH_ALIAS_F1 (cbrt) diff --git a/sysdeps/aarch64/fpu/cbrtf_sve.c b/sysdeps/aarch64/fpu/cbrtf_sve.c new file mode 100644 index 0000000000..23c220c202 --- /dev/null +++ b/sysdeps/aarch64/fpu/cbrtf_sve.c @@ -0,0 +1,122 @@ +/* Single-precision vector (SVE) cbrt function + + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "sv_math.h" +#include "poly_sve_f32.h" + +const static struct data +{ + float32_t poly[4]; + float32_t table[5]; + float32_t one_third, two_thirds; +} data = { + /* Very rough approximation of cbrt(x) in [0.5, 1], generated with FPMinimax. + */ + .poly = { 0x1.c14e96p-2, 0x1.dd2d3p-1, -0x1.08e81ap-1, + 0x1.2c74c2p-3, }, + /* table[i] = 2^((i - 2) / 3). */ + .table = { 0x1.428a3p-1, 0x1.965feap-1, 0x1p0, 0x1.428a3p0, 0x1.965feap0 }, + .one_third = 0x1.555556p-2f, + .two_thirds = 0x1.555556p-1f, +}; + +#define SmallestNormal 0x00800000 +#define Thresh 0x7f000000 /* asuint(INFINITY) - SmallestNormal. */ +#define MantissaMask 0x007fffff +#define HalfExp 0x3f000000 + +static svfloat32_t NOINLINE +special_case (svfloat32_t x, svfloat32_t y, svbool_t special) +{ + return sv_call_f32 (cbrtf, x, y, special); +} + +static inline svfloat32_t +shifted_lookup (const svbool_t pg, const float32_t *table, svint32_t i) +{ + return svld1_gather_index (pg, table, svadd_x (pg, i, 2)); +} + +/* Approximation for vector single-precision cbrt(x) using Newton iteration + with initial guess obtained by a low-order polynomial. Greatest error + is 1.64 ULP. This is observed for every value where the mantissa is + 0x1.85a2aa and the exponent is a multiple of 3, for example: + _ZGVsMxv_cbrtf (0x1.85a2aap+3) got 0x1.267936p+1 + want 0x1.267932p+1. */ +svfloat32_t SV_NAME_F1 (cbrt) (svfloat32_t x, const svbool_t pg) +{ + const struct data *d = ptr_barrier (&data); + + svfloat32_t ax = svabs_x (pg, x); + svuint32_t iax = svreinterpret_u32 (ax); + svuint32_t sign = sveor_x (pg, svreinterpret_u32 (x), iax); + + /* Subnormal, +/-0 and special values. */ + svbool_t special = svcmpge (pg, svsub_x (pg, iax, SmallestNormal), Thresh); + + /* Decompose |x| into m * 2^e, where m is in [0.5, 1.0]. This is a vector + version of frexpf, which gets subnormal values wrong - these have to be + special-cased as a result. */ + svfloat32_t m = svreinterpret_f32 (svorr_x ( + pg, svand_x (pg, svreinterpret_u32 (x), MantissaMask), HalfExp)); + svint32_t e = svsub_x (pg, svreinterpret_s32 (svlsr_x (pg, iax, 23)), 126); + + /* p is a rough approximation for cbrt(m) in [0.5, 1.0]. The better this is, + the less accurate the next stage of the algorithm needs to be. An order-4 + polynomial is enough for one Newton iteration. */ + svfloat32_t p + = sv_pairwise_poly_3_f32_x (pg, m, svmul_x (pg, m, m), d->poly); + + /* One iteration of Newton's method for iteratively approximating cbrt. */ + svfloat32_t m_by_3 = svmul_x (pg, m, d->one_third); + svfloat32_t a = svmla_x (pg, svdiv_x (pg, m_by_3, svmul_x (pg, p, p)), p, + d->two_thirds); + + /* Assemble the result by the following: + + cbrt(x) = cbrt(m) * 2 ^ (e / 3). + + We can get 2 ^ round(e / 3) using ldexp and integer divide, but since e is + not necessarily a multiple of 3 we lose some information. + + Let q = 2 ^ round(e / 3), then t = 2 ^ (e / 3) / q. + + Then we know t = 2 ^ (i / 3), where i is the remainder from e / 3, which + is an integer in [-2, 2], and can be looked up in the table T. Hence the + result is assembled as: + + cbrt(x) = cbrt(m) * t * 2 ^ round(e / 3) * sign. */ + svfloat32_t ef = svmul_x (pg, svcvt_f32_x (pg, e), d->one_third); + svint32_t ey = svcvt_s32_x (pg, ef); + svint32_t em3 = svmls_x (pg, e, ey, 3); + + svfloat32_t my = shifted_lookup (pg, d->table, em3); + my = svmul_x (pg, my, a); + + /* Vector version of ldexpf. */ + svfloat32_t y = svscale_x (pg, my, ey); + + if (__glibc_unlikely (svptest_any (pg, special))) + return special_case ( + x, svreinterpret_f32 (svorr_x (pg, svreinterpret_u32 (y), sign)), + special); + + /* Copy sign. */ + return svreinterpret_f32 (svorr_x (pg, svreinterpret_u32 (y), sign)); +} diff --git a/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c b/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c index 417125be47..1877db3ac6 100644 --- a/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c +++ b/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c @@ -30,6 +30,7 @@ VPCS_VECTOR_WRAPPER (asinh_advsimd, _ZGVnN2v_asinh) VPCS_VECTOR_WRAPPER (atan_advsimd, _ZGVnN2v_atan) VPCS_VECTOR_WRAPPER (atanh_advsimd, _ZGVnN2v_atanh) VPCS_VECTOR_WRAPPER_ff (atan2_advsimd, _ZGVnN2vv_atan2) +VPCS_VECTOR_WRAPPER (cbrt_advsimd, _ZGVnN2v_cbrt) VPCS_VECTOR_WRAPPER (cos_advsimd, _ZGVnN2v_cos) VPCS_VECTOR_WRAPPER (cosh_advsimd, _ZGVnN2v_cosh) VPCS_VECTOR_WRAPPER (erf_advsimd, _ZGVnN2v_erf) diff --git a/sysdeps/aarch64/fpu/test-double-sve-wrappers.c b/sysdeps/aarch64/fpu/test-double-sve-wrappers.c index 31ebf18705..b702f942de 100644 --- a/sysdeps/aarch64/fpu/test-double-sve-wrappers.c +++ b/sysdeps/aarch64/fpu/test-double-sve-wrappers.c @@ -49,6 +49,7 @@ SVE_VECTOR_WRAPPER (asinh_sve, _ZGVsMxv_asinh) SVE_VECTOR_WRAPPER (atan_sve, _ZGVsMxv_atan) SVE_VECTOR_WRAPPER (atanh_sve, _ZGVsMxv_atanh) SVE_VECTOR_WRAPPER_ff (atan2_sve, _ZGVsMxvv_atan2) +SVE_VECTOR_WRAPPER (cbrt_sve, _ZGVsMxv_cbrt) SVE_VECTOR_WRAPPER (cos_sve, _ZGVsMxv_cos) SVE_VECTOR_WRAPPER (cosh_sve, _ZGVsMxv_cosh) SVE_VECTOR_WRAPPER (erf_sve, _ZGVsMxv_erf) diff --git a/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c b/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c index dab0f1cfcb..9cb451b4f0 100644 --- a/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c +++ b/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c @@ -30,6 +30,7 @@ VPCS_VECTOR_WRAPPER (asinhf_advsimd, _ZGVnN4v_asinhf) VPCS_VECTOR_WRAPPER (atanf_advsimd, _ZGVnN4v_atanf) VPCS_VECTOR_WRAPPER (atanhf_advsimd, _ZGVnN4v_atanhf) VPCS_VECTOR_WRAPPER_ff (atan2f_advsimd, _ZGVnN4vv_atan2f) +VPCS_VECTOR_WRAPPER (cbrtf_advsimd, _ZGVnN4v_cbrtf) VPCS_VECTOR_WRAPPER (cosf_advsimd, _ZGVnN4v_cosf) VPCS_VECTOR_WRAPPER (coshf_advsimd, _ZGVnN4v_coshf) VPCS_VECTOR_WRAPPER (erff_advsimd, _ZGVnN4v_erff) diff --git a/sysdeps/aarch64/fpu/test-float-sve-wrappers.c b/sysdeps/aarch64/fpu/test-float-sve-wrappers.c index 2aa6cbcc28..5b3dd22916 100644 --- a/sysdeps/aarch64/fpu/test-float-sve-wrappers.c +++ b/sysdeps/aarch64/fpu/test-float-sve-wrappers.c @@ -49,6 +49,7 @@ SVE_VECTOR_WRAPPER (asinhf_sve, _ZGVsMxv_asinhf) SVE_VECTOR_WRAPPER (atanf_sve, _ZGVsMxv_atanf) SVE_VECTOR_WRAPPER (atanhf_sve, _ZGVsMxv_atanhf) SVE_VECTOR_WRAPPER_ff (atan2f_sve, _ZGVsMxvv_atan2f) +SVE_VECTOR_WRAPPER (cbrtf_sve, _ZGVsMxv_cbrtf) SVE_VECTOR_WRAPPER (cosf_sve, _ZGVsMxv_cosf) SVE_VECTOR_WRAPPER (coshf_sve, _ZGVsMxv_coshf) SVE_VECTOR_WRAPPER (erff_sve, _ZGVsMxv_erff) diff --git a/sysdeps/aarch64/libm-test-ulps b/sysdeps/aarch64/libm-test-ulps index e7463d30bc..6d083c4e32 100644 --- a/sysdeps/aarch64/libm-test-ulps +++ b/sysdeps/aarch64/libm-test-ulps @@ -477,11 +477,19 @@ double: 4 float: 1 ldouble: 1 +Function: "cbrt_advsimd": +double: 1 +float: 1 + Function: "cbrt_downward": double: 4 float: 1 ldouble: 1 +Function: "cbrt_sve": +double: 1 +float: 1 + Function: "cbrt_towardzero": double: 3 float: 1 diff --git a/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist b/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist index 1184374efd..89ac1dfa36 100644 --- a/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist +++ b/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist @@ -79,6 +79,8 @@ GLIBC_2.40 _ZGVnN2v_asinh F GLIBC_2.40 _ZGVnN2v_asinhf F GLIBC_2.40 _ZGVnN2v_atanh F GLIBC_2.40 _ZGVnN2v_atanhf F +GLIBC_2.40 _ZGVnN2v_cbrt F +GLIBC_2.40 _ZGVnN2v_cbrtf F GLIBC_2.40 _ZGVnN2v_cosh F GLIBC_2.40 _ZGVnN2v_coshf F GLIBC_2.40 _ZGVnN2v_erf F @@ -94,6 +96,7 @@ GLIBC_2.40 _ZGVnN2vv_hypotf F GLIBC_2.40 _ZGVnN4v_acoshf F GLIBC_2.40 _ZGVnN4v_asinhf F GLIBC_2.40 _ZGVnN4v_atanhf F +GLIBC_2.40 _ZGVnN4v_cbrtf F GLIBC_2.40 _ZGVnN4v_coshf F GLIBC_2.40 _ZGVnN4v_erfcf F GLIBC_2.40 _ZGVnN4v_erff F @@ -106,6 +109,8 @@ GLIBC_2.40 _ZGVsMxv_asinh F GLIBC_2.40 _ZGVsMxv_asinhf F GLIBC_2.40 _ZGVsMxv_atanh F GLIBC_2.40 _ZGVsMxv_atanhf F +GLIBC_2.40 _ZGVsMxv_cbrt F +GLIBC_2.40 _ZGVsMxv_cbrtf F GLIBC_2.40 _ZGVsMxv_cosh F GLIBC_2.40 _ZGVsMxv_coshf F GLIBC_2.40 _ZGVsMxv_erf F