From patchwork Thu May  2 15:43:13 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Joe Ramsay <Joe.Ramsay@arm.com>
X-Patchwork-Id: 1930670
Return-Path: <libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@legolas.ozlabs.org
Authentication-Results: legolas.ozlabs.org;
	dkim=pass (1024-bit key;
 unprotected) header.d=arm.com header.i=@arm.com header.a=rsa-sha256
 header.s=selector1 header.b=kPEkQFoX;
	dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com
 header.a=rsa-sha256 header.s=selector1 header.b=kPEkQFoX;
	dkim-atps=neutral
Authentication-Results: legolas.ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org
 (client-ip=8.43.85.97; helo=server2.sourceware.org;
 envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org;
 receiver=patchwork.ozlabs.org)
Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384)
	(No client certificate requested)
	by legolas.ozlabs.org (Postfix) with ESMTPS id 4VVdWs5C7Zz1ydX
	for <incoming@patchwork.ozlabs.org>; Fri,  3 May 2024 01:43:53 +1000 (AEST)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id D89B03858D34
	for <incoming@patchwork.ozlabs.org>; Thu,  2 May 2024 15:43:51 +0000 (GMT)
X-Original-To: libc-alpha@sourceware.org
Delivered-To: libc-alpha@sourceware.org
Received: from EUR04-HE1-obe.outbound.protection.outlook.com
 (mail-he1eur04on2040.outbound.protection.outlook.com [40.107.7.40])
 by sourceware.org (Postfix) with ESMTPS id E70CE3858D20
 for <libc-alpha@sourceware.org>; Thu,  2 May 2024 15:43:29 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E70CE3858D20
Authentication-Results: sourceware.org;
 dmarc=pass (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E70CE3858D20
Authentication-Results: server2.sourceware.org;
 arc=pass smtp.remote-ip=40.107.7.40
ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1714664613; cv=pass;
 b=ebfAF6z6r80nWOSdJNq1cFSMU2XnpS/BuW+6kxLHfA8DrriRHtYj5g1Gg5M/pGz2TK73JVOUqBhEM/0bTv7h6HA4Vb+QRIiJmKg09CAOC3PGwvz630D+dn/Oo4qG/zjSjwxUnaVaRApjb0vDOXQ/1KhGS1Uo3LzRhMklUGiER6k=
ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key;
 t=1714664613; c=relaxed/simple;
 bh=oVhrIcdLH3JS0amd7KjL8+rLe8EqgwhqWTYdzKPc+Lg=;
 h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID:
 MIME-Version;
 b=ArXYiaz3PyWCW3vH3qzRM1ENGbIBbe/RzwA4eKpb8VN1zep3fc5hSycgApV4lelxPlCeyEeAtNUUjTjDyvvkYzU6bUMtkB1EeCh6rlNo+jIot0Z/nUpNDKk0PP2Fv+XFqyVBA3b2goCZS0EcpbMue9dSPJjqR/e2c7M/0ZHPyuI=
ARC-Authentication-Results: i=3; server2.sourceware.org
ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass;
 b=YOQVcVzwvnais1646Wv/uptgbl8io2xBC1AA8h0rFnBNSEGO+Ud6U62O+67wvWeDHqe/3L6xTRVUEBIV2BX9W33/Yw5Vxhcwvi9+lbdGKw9Tb5lHflVE5PgY/1cmKdQVMsjoARsvLblOv19IqQOKNAHZhCo/CJz/nw0IjoJu7nFqsoNOBX+Gfh86350gXYEwnBwgyL6Fkk+KJkX5ahW9wcyEoNIMLnIAEMbV4qAjNg8FwUg3laGqm9lAyYDMW6ZsXB3eW5f6sYZKTj+vdv71Kn+6F33rUMfrU6myJhRakynCZwIbwSdiTgYL31au2DL90naERVXh4/WgZq411w5c5Q==
ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
 s=arcselector9901;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=67xf04wYEG0CdhvNKaLk1ZpOjtQ3bqmLA3kamwun6pQ=;
 b=CvRpawMVfe0y2ATGH/FsWp4mKBM6WTk/KzQYOajLkLBhqTNUHHu5t/oBqIaV2o25HF5jKegeBTcshfPgDWZyRTISjyDgNru/WFSdFnAvkCJffM7MFqOl2ZKppGVeQp74Cd09cwPnaOqdj32+Er+i5bso8q3LaqrwPPWO38TjU+aBsaFemBZsrKGg2D2uUQso8eUFf+h0+SoA2kKglPoe8Lg2BZPvB8c38gZV1oAqkolUu2bMdRzjHUeU9gAXd7RR4WyE7cdexuVg6/7yVSK95LJSu3Xo6aw2weoYRrLaJusGttCb3ZM69j81fBgqTD7DIOG67Cer1tASKTNothJvRg==
ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is
 63.35.35.123) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com;
 dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com;
 dkim=pass (signature was verified) header.d=arm.com; arc=pass (0 oda=1 ltdi=1
 spf=[1,1,smtp.mailfrom=arm.com] dmarc=[1,1,header.from=arm.com])
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=67xf04wYEG0CdhvNKaLk1ZpOjtQ3bqmLA3kamwun6pQ=;
 b=kPEkQFoX7M1bFojuYyjV00XOu7PU5ISybghceGaCkzU+xnjCxwR1wosBzOHVMQ6xMgjjFcGdJI1c186mMlwWS3rTswtaI6ESKqdtH4GYtpfgyieBYhCu5qWnKQpF41pIWIF7XxXmKIiQkjUhSZ3NNfYVmuscQSPqVyHAwEa4e30=
Received: from AS4P189CA0060.EURP189.PROD.OUTLOOK.COM (2603:10a6:20b:659::15)
 by AS8PR08MB7372.eurprd08.prod.outlook.com (2603:10a6:20b:448::21)
 with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7544.29; Thu, 2 May
 2024 15:43:25 +0000
Received: from AM4PEPF00027A5E.eurprd04.prod.outlook.com
 (2603:10a6:20b:659:cafe::40) by AS4P189CA0060.outlook.office365.com
 (2603:10a6:20b:659::15) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7472.34 via Frontend
 Transport; Thu, 2 May 2024 15:43:25 +0000
X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123)
 smtp.mailfrom=arm.com; dkim=pass (signature was verified)
 header.d=arm.com;dmarc=pass action=none header.from=arm.com;
Received-SPF: Pass (protection.outlook.com: domain of arm.com designates
 63.35.35.123 as permitted sender) receiver=protection.outlook.com;
 client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com;
 pr=C
Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by
 AM4PEPF00027A5E.mail.protection.outlook.com (10.167.16.72) with
 Microsoft
 SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.7544.18
 via Frontend Transport; Thu, 2 May 2024 15:43:25 +0000
Received: ("Tessian outbound b7675f20d34d:v315");
 Thu, 02 May 2024 15:43:25 +0000
X-CheckRecipientChecked: true
X-CR-MTA-CID: 345c8742eefa464d
X-CR-MTA-TID: 64aa7808
Received: from f64dd3629efa.1
 by 64aa7808-outbound-1.mta.getcheckrecipient.com id
 34E898CE-046B-4D54-94B4-07DA427887D0.1;
 Thu, 02 May 2024 15:43:19 +0000
Received: from EUR04-VI1-obe.outbound.protection.outlook.com
 by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id
 f64dd3629efa.1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384);
 Thu, 02 May 2024 15:43:19 +0000
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none;
 b=eS55+DvEn0xqwQCbYkXqTOx1NQbdUFDntkLQBRVGcdZSmk101fbMsSXxn89jwxtSyXYxY/QvsoJYNkajh5ezWBv1MwXtirzrhBqVr/unq/m/Z0ZanxazrNHe57Z8und/CBl+lcVBB5BMxYoFROjnA+5repuTpRsP/QL483USV2qm4u108U27ed2RT4ObEMzQ1AXv5AUhlZpGcwnXj0L1z8htCUxL98MGARUd5kK2wv9aadx2xY+DhkcOmuGAh0Uq8Mu1KZNfg68wI3DAQA+hmmBiqc76Shtlz+CzwUPBIaJMkOxLKBBboBJ2QJZ1ruXeFVJ0PmiIJdaABhu6j9zfxg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
 s=arcselector9901;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=67xf04wYEG0CdhvNKaLk1ZpOjtQ3bqmLA3kamwun6pQ=;
 b=UR2lfcSCVmutNTbZMvsP8zcVxwnuod0cQPhM/xAOvvNx//MVYJuxuM5SYGTT4oKf9ZC04GoDWTIwLPz8RG0SLo+u5R0dcg50XyoomyAaQA5kC0sruQll13NkPMKLAe/a/OvwH9hmkK/9OaZw3xIKznwP+ZuL0NZBb2Mv8iiNC5ROCcF5Y0Z6zbboFFId4UicBCAobFH1QUv+6UyOP29Hlpr5dplUgBkf2xXktUuOMaKAKGw2TrOKQl3zHpRmZank7Rn2s79a8UWm1nyszzM0hRU5h512l2DTZ/qugo6TMHv3lH4rf+EjrmffqiRlxJnR5ngUoDDu8gqgXha9Gv9vyQ==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is
 40.67.248.234) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com;
 dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com;
 dkim=none (message not signed); arc=none (0)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=67xf04wYEG0CdhvNKaLk1ZpOjtQ3bqmLA3kamwun6pQ=;
 b=kPEkQFoX7M1bFojuYyjV00XOu7PU5ISybghceGaCkzU+xnjCxwR1wosBzOHVMQ6xMgjjFcGdJI1c186mMlwWS3rTswtaI6ESKqdtH4GYtpfgyieBYhCu5qWnKQpF41pIWIF7XxXmKIiQkjUhSZ3NNfYVmuscQSPqVyHAwEa4e30=
Received: from DB7PR05CA0053.eurprd05.prod.outlook.com (2603:10a6:10:2e::30)
 by PAVPR08MB9723.eurprd08.prod.outlook.com (2603:10a6:102:31e::13) with
 Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7519.34; Thu, 2 May
 2024 15:43:16 +0000
Received: from DB5PEPF00014B97.eurprd02.prod.outlook.com
 (2603:10a6:10:2e:cafe::72) by DB7PR05CA0053.outlook.office365.com
 (2603:10a6:10:2e::30) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7544.29 via Frontend
 Transport; Thu, 2 May 2024 15:43:16 +0000
X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 40.67.248.234)
 smtp.mailfrom=arm.com; dkim=none (message not signed)
 header.d=none;dmarc=pass action=none header.from=arm.com;
Received-SPF: Pass (protection.outlook.com: domain of arm.com designates
 40.67.248.234 as permitted sender) receiver=protection.outlook.com;
 client-ip=40.67.248.234; helo=nebula.arm.com; pr=C
Received: from nebula.arm.com (40.67.248.234) by
 DB5PEPF00014B97.mail.protection.outlook.com (10.167.8.235) with Microsoft
 SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.20.7544.18 via Frontend Transport; Thu, 2 May 2024 15:43:16 +0000
Received: from AZ-NEU-EX02.Emea.Arm.com (10.251.26.5) by AZ-NEU-EX03.Arm.com
 (10.251.24.31) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Thu, 2 May
 2024 15:43:15 +0000
Received: from AZ-NEU-EX04.Arm.com (10.251.24.32) by AZ-NEU-EX02.Emea.Arm.com
 (10.251.26.5) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Thu, 2 May
 2024 15:43:15 +0000
Received: from vcn-man-apps.manchester.arm.com (10.32.108.22) by mail.arm.com
 (10.251.24.32) with Microsoft SMTP Server id 15.1.2507.35 via
 Frontend Transport; Thu, 2 May 2024 15:43:14 +0000
From: Joe Ramsay <Joe.Ramsay@arm.com>
To: <libc-alpha@sourceware.org>
CC: Joe Ramsay <Joe.Ramsay@arm.com>
Subject: [PATCH v2] aarch64: Fix AdvSIMD libmvec routines for big-endian
Date: Thu, 2 May 2024 16:43:13 +0100
Message-ID: <20240502154313.37476-1-Joe.Ramsay@arm.com>
X-Mailer: git-send-email 2.27.0
MIME-Version: 1.0
X-EOPAttributedMessage: 1
X-MS-TrafficTypeDiagnostic: 
 DB5PEPF00014B97:EE_|PAVPR08MB9723:EE_|AM4PEPF00027A5E:EE_|AS8PR08MB7372:EE_
X-MS-Office365-Filtering-Correlation-Id: d176265e-a5ed-498a-96cb-08dc6abe9b21
x-checkrecipientrouted: true
NoDisclaimer: true
X-MS-Exchange-SenderADCheck: 1
X-MS-Exchange-AntiSpam-Relay: 0
X-Microsoft-Antispam-Untrusted: BCL:0;
 ARA:13230031|376005|1800799015|36860700004;
X-Microsoft-Antispam-Message-Info-Original: 
 KCJ3RbeRFx4bxOAI+JLaiGz2IkuKJnBZ04boaaZFBReEsrcZYqoQHwayoZq5Twu8G3Y5nj66g040HuhlaggkWGH3WZ+hHptYLhFr1EZirijln6YMxk1TAtTYIS1QZ3AMGOtpekFTLPYRnu4JKWZBHV9rBZSPFVYTCwN8xI4TVw8OlO9t9vCG1uaXRnQ5P/6kDIjQ9SIfva6YDb9vuyoGMZKRpNd1tqw9cGfi2fMlc0HSF1O3kCdj6sgMSXul5oZVfRIlwYchsNdCtOTnYF9iyRzB15A16kvRC5jJXRDvIxlE62gSTxhs3EZDqKg5e7K37JWVwaZYGTjsqerc1HlR521An+lmIrMOANeYrCkpGeI75l3OzrS3mrM6kBy0mN+dozA6PH/ohwtiqEgXiv4HZSyQ1aAnJamazchO0DAYkES07ySyY/J5l02Jj/35oXAjE2DiDpFlnwCNOOk1X9HuGDZHl38DZL6vJLESf/PtREQnYDHVJrGAkE6TnyXR/R9WcOR2UV4QGpzt9pBIhJ8pKv6PvWaY8+mgb5qJZ2tYPrbtLZzLIn59/AEWW9ZTBNlmC3JrMiwGAICBfJarZx0w27nQgO7BXr5Z3y1uvM7S30mdV1j6BcXsMjC7Xzvekxp3u4PprlDSoY1juxFxqA68xQowiLAk1vEs45EUxZ9hbS9spw1qMtD+f4V+EDo4Pr+TRJWIMG0vwDEBhV8obxeVuJkWP6AZt6ZQLG6sap9c+F2QgP8wFFEX9JeXJeMaZJXD+zFXlDVXD8ndgvwPTnun48gpaqZkvi11hjGNum1bLs/xv9r921tY76KubnuhFbq1hDyeu2llxIcAT7OnIc8N62sme0g3a0/3BnUZ/su20AmStN/a+dJC6X/6WxFHak9fov8fXhVk/miAskdfdF98W80YF6LDdyJS9uoGShOwb6qwEW3IlbXKQc8ppafkin0F/EnTJDpYDs8ro0BXVAC1OqkBxgukEpVltaxchOlMoZyIs3Q1d2zV+IEXRzLTdUx/+RBAKOHHKLsESvqHXojz5wpKeECtTCmJicW2IIKw867LSfbCYDIvwud2wz5Mr0+pWpM8llHBRAp/Kf129BwgL/v5s5hs2WEGhtOb/Mt8bHZlGOHXOG7eKX8lHjNcU06RBNaxViSUzziik8InehoVczzaF9xodoebRST598yJX7Gdmluo4ZeXJUg0PpXnTfGxlNJ7l1TNLx0O+yOQey7BX2ZhmAkCsDm6nPdqs8DBYIX7ZoQTYKWIYJViLGDipH4VPYfQq5xsHgwz1EgAUkX3GF+reIKVlf+TcnoO8Isqw0GFmJIrtukS20iM+cHCMS/4FsX2Si+m+5V4T+nAXLgmgcPIxgkLOgm6+SHEGV5LZoddvbBueuEQifbico0D5Aa7
X-Forefront-Antispam-Report-Untrusted: CIP:40.67.248.234; CTRY:IE; LANG:en;
 SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:nebula.arm.com; PTR:InfoDomainNonexistent;
 CAT:NONE; SFS:(13230031)(376005)(1800799015)(36860700004); DIR:OUT; SFP:1101;
X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAVPR08MB9723
X-MS-Exchange-Transport-CrossTenantHeadersStripped: 
 AM4PEPF00027A5E.eurprd04.prod.outlook.com
X-MS-PublicTrafficType: Email
X-MS-Office365-Filtering-Correlation-Id-Prvs: 
 4acb852b-a891-49c9-b1a6-08dc6abe959b
X-Microsoft-Antispam: BCL:0;
 ARA:13230031|35042699013|376005|36860700004|1800799015;
X-Microsoft-Antispam-Message-Info: 
 kr6/CZ6JawyXB6/IA1dLt/OAVV/0Bjva33ZLhw5G23q3smlIWNvbgW7E0wPhBclir5mV6mxR5XSc9y6R0w4xQEtxGjy73wyFv7DWEvp6s7tLMZVx6YXIjKDLmWw49r1pIfsUiIRDnGxvZOaC9/0olivyyTQoJwIy9JsP2Mjp3kQPglP1aPwPUU8GwbeKdSKQgKUmJZ+Mra8yxNbsItgrQNzqkEeque+MOFiG2ATu/99kN25d7gooFNHH3Vo8vo8FJ7jvPHnr7g5cEkphMWU3svzZ9Yfat8hV9RiWVJIvB1W0CrwYt74Xm2qlxgrIDhDO3/Hwm0tlZQ2YFkBy6PyLLBc48V0kHHJJru5YBo5B5nRZYEL4+U7J3iddjDW8Jp1U0DUvp18x57ORu1m1b0Y+t63HX4pPevrKf4vnzKqmOle0Pw4hBlQVk8oZ7TTFgt+EYpc4agx4dOicXImMEzWQp3i/5JGTNG5Vpch+BGxA8awNXAmVlde5hh7atFrnRWSdN5u8CwqWTXfm2lnLtIMQG6GYDGs4frIGp1XVUSukYJlkyk24ZQKGb+hsOfeR14iDWrkrqqqmPTmJw12/OLclwZkP8INl7zt5Puu5/cZI6wmRmfxLySpYdO7GPesXwI0ooLlGQqzLrQn+5LVv1jndL3s2fzgUh12q124hxLy09NMxgo1OE/YsKlB0E+NlyOVefoK32hlfeCevQ+ByRFP8Fp+Mi4KGHO2J7G1aZM+5ewDUMTSqeWh4pu8LmVRovAJQxHc4w/ZQ7zaTwHBSLBpxfoNtOTWmSU7jLa33N/ytffaCSCaifmsKrcXHUwk8H7PTBU0C+LsejBd4GJpXNM0RAfYQOEctRecsq4L3fnCZypYHKKIHSjhliCDAeojfLB+4vvp32NbXNf+fpQU0X5Ij7Vs02do5hROgFMDwRd/3mFqgCEtT1H/9nAQmP6RgbrOfanRboCy1T15E+9H8jfSTrd6rcAeAD+1+WN5F7/Qf9H/Q9V4urr9po3NiRe+rocUFth9hIWE96xteAo4DmxLpZyCfTFQLZgzWm6v/0X/ll3x+K7TCfFBv2pEQB5/pzP4siyf7G/xF4HMuG36wbT60GWFoOrM5S62kZzS9Jo8DyMizKkxRiy0EeG8TBBK3ESsvDPHQWwhlk/6fqhXhS9SwiAMb9S0U35UOKcZ0hXBMWRtPYm4ktsv7sUYQsPSESMNhElULwIYks43C6gM1VU50s4xDnG6ySgmf6GncDSzQ3b715F/EidEeMqRjSd4/aXw3BrhNBZ8N+mCl4dAWW3ahS55aVobMgxaVmxD349rwxlFUY7wFvzecEU+qkGaC/s2saiW2QOX3Jyr6Nvja8UE7eS39BBSy8id2PZWVGJQ0i7BQCwIuDYrSpK72fsow7rw9
X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:;
 IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com;
 PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE;
 SFS:(13230031)(35042699013)(376005)(36860700004)(1800799015); DIR:OUT;
 SFP:1101;
X-OriginatorOrg: arm.com
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 02 May 2024 15:43:25.7986 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 d176265e-a5ed-498a-96cb-08dc6abe9b21
X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: 
 TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123];
 Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com]
X-MS-Exchange-CrossTenant-AuthSource: 
 AM4PEPF00027A5E.eurprd04.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Anonymous
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB7372
X-Spam-Status: No, score=-12.9 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FORGED_SPF_HELO, GIT_PATCH_0,
 RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP,
 UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org

Previously many routines used * to load from vector types stored
in the data table. This is emitted as ldr, which byte-swaps the
entire vector register, and causes bugs for big-endian when not
all lanes contain the same value. When a vector is to be used
this way, it has been replaced with an array and the load with an
explicit ld1 intrinsic, which byte-swaps only within lanes.

As well, many routines previously used non-standard GCC syntax
for vector operations such as indexing into vectors types with []
and assembling vectors using {}. This syntax should not be mixed
with ACLE, as the former does not respect endianness whereas the
latter does. Such examples have been replaced with, for instance,
vcombine_* and vgetq_lane* intrinsics. Helpers which only use the
GCC syntax, such as the v_call helpers, do not need changing as
they do not use intrinsics.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
---
Changes from v1:
- More detailed commit message
Thanks,
Joe
 sysdeps/aarch64/fpu/asinh_advsimd.c   | 15 ++++++++-----
 sysdeps/aarch64/fpu/cosh_advsimd.c    |  9 +++++---
 sysdeps/aarch64/fpu/erf_advsimd.c     |  4 ++--
 sysdeps/aarch64/fpu/erfc_advsimd.c    | 31 ++++++++++++++++-----------
 sysdeps/aarch64/fpu/erfcf_advsimd.c   | 28 ++++++++++++++----------
 sysdeps/aarch64/fpu/erff_advsimd.c    | 12 +++++------
 sysdeps/aarch64/fpu/exp10f_advsimd.c  | 10 +++++----
 sysdeps/aarch64/fpu/expm1_advsimd.c   |  9 +++++---
 sysdeps/aarch64/fpu/expm1f_advsimd.c  | 11 +++++-----
 sysdeps/aarch64/fpu/log10_advsimd.c   |  6 ++++--
 sysdeps/aarch64/fpu/log2_advsimd.c    |  6 ++++--
 sysdeps/aarch64/fpu/log_advsimd.c     |  9 ++------
 sysdeps/aarch64/fpu/sinh_advsimd.c    | 13 ++++++-----
 sysdeps/aarch64/fpu/tan_advsimd.c     |  8 ++++---
 sysdeps/aarch64/fpu/tanf_advsimd.c    | 11 +++++-----
 sysdeps/aarch64/fpu/v_expf_inline.h   | 10 +++++----
 sysdeps/aarch64/fpu/v_expm1f_inline.h | 12 ++++++-----
 17 files changed, 119 insertions(+), 85 deletions(-)

diff --git a/sysdeps/aarch64/fpu/asinh_advsimd.c b/sysdeps/aarch64/fpu/asinh_advsimd.c
index 544a52f651..6207e7da95 100644
--- a/sysdeps/aarch64/fpu/asinh_advsimd.c
+++ b/sysdeps/aarch64/fpu/asinh_advsimd.c
@@ -22,6 +22,7 @@
 
 #define A(i) v_f64 (__v_log_data.poly[i])
 #define N (1 << V_LOG_TABLE_BITS)
+#define IndexMask (N - 1)
 
 const static struct data
 {
@@ -63,11 +64,15 @@ struct entry
 static inline struct entry
 lookup (uint64x2_t i)
 {
-  float64x2_t e0 = vld1q_f64 (
-      &__v_log_data.table[(i[0] >> (52 - V_LOG_TABLE_BITS)) & (N - 1)].invc);
-  float64x2_t e1 = vld1q_f64 (
-      &__v_log_data.table[(i[1] >> (52 - V_LOG_TABLE_BITS)) & (N - 1)].invc);
-  return (struct entry){ vuzp1q_f64 (e0, e1), vuzp2q_f64 (e0, e1) };
+  /* Since N is a power of 2, n % N = n & (N - 1).  */
+  struct entry e;
+  uint64_t i0 = (vgetq_lane_u64 (i, 0) >> (52 - V_LOG_TABLE_BITS)) & IndexMask;
+  uint64_t i1 = (vgetq_lane_u64 (i, 1) >> (52 - V_LOG_TABLE_BITS)) & IndexMask;
+  float64x2_t e0 = vld1q_f64 (&__v_log_data.table[i0].invc);
+  float64x2_t e1 = vld1q_f64 (&__v_log_data.table[i1].invc);
+  e.invc = vuzp1q_f64 (e0, e1);
+  e.logc = vuzp2q_f64 (e0, e1);
+  return e;
 }
 
 static inline float64x2_t
diff --git a/sysdeps/aarch64/fpu/cosh_advsimd.c b/sysdeps/aarch64/fpu/cosh_advsimd.c
index ec7b59637e..4bee734f00 100644
--- a/sysdeps/aarch64/fpu/cosh_advsimd.c
+++ b/sysdeps/aarch64/fpu/cosh_advsimd.c
@@ -22,7 +22,9 @@
 static const struct data
 {
   float64x2_t poly[3];
-  float64x2_t inv_ln2, ln2, shift, thres;
+  float64x2_t inv_ln2;
+  double ln2[2];
+  float64x2_t shift, thres;
   uint64x2_t index_mask, special_bound;
 } data = {
   .poly = { V2 (0x1.fffffffffffd4p-2), V2 (0x1.5555571d6b68cp-3),
@@ -58,8 +60,9 @@ exp_inline (float64x2_t x)
   float64x2_t n = vsubq_f64 (z, d->shift);
 
   /* r = x - n*ln2/N.  */
-  float64x2_t r = vfmaq_laneq_f64 (x, n, d->ln2, 0);
-  r = vfmaq_laneq_f64 (r, n, d->ln2, 1);
+  float64x2_t ln2 = vld1q_f64 (d->ln2);
+  float64x2_t r = vfmaq_laneq_f64 (x, n, ln2, 0);
+  r = vfmaq_laneq_f64 (r, n, ln2, 1);
 
   uint64x2_t e = vshlq_n_u64 (u, 52 - V_EXP_TAIL_TABLE_BITS);
   uint64x2_t i = vandq_u64 (u, d->index_mask);
diff --git a/sysdeps/aarch64/fpu/erf_advsimd.c b/sysdeps/aarch64/fpu/erf_advsimd.c
index 3e70cbc025..19cbb7d0f4 100644
--- a/sysdeps/aarch64/fpu/erf_advsimd.c
+++ b/sysdeps/aarch64/fpu/erf_advsimd.c
@@ -56,8 +56,8 @@ static inline struct entry
 lookup (uint64x2_t i)
 {
   struct entry e;
-  float64x2_t e1 = vld1q_f64 ((float64_t *) (__erf_data.tab + i[0])),
-	      e2 = vld1q_f64 ((float64_t *) (__erf_data.tab + i[1]));
+  float64x2_t e1 = vld1q_f64 (&__erf_data.tab[vgetq_lane_u64 (i, 0)].erf),
+	      e2 = vld1q_f64 (&__erf_data.tab[vgetq_lane_u64 (i, 1)].erf);
   e.erf = vuzp1q_f64 (e1, e2);
   e.scale = vuzp2q_f64 (e1, e2);
   return e;
diff --git a/sysdeps/aarch64/fpu/erfc_advsimd.c b/sysdeps/aarch64/fpu/erfc_advsimd.c
index 548f21a3d6..f1b3bfe830 100644
--- a/sysdeps/aarch64/fpu/erfc_advsimd.c
+++ b/sysdeps/aarch64/fpu/erfc_advsimd.c
@@ -26,7 +26,7 @@ static const struct data
   float64x2_t max, shift;
   float64x2_t p20, p40, p41, p42;
   float64x2_t p51, p52;
-  float64x2_t qr5, qr6, qr7, qr8, qr9;
+  double qr5[2], qr6[2], qr7[2], qr8[2], qr9[2];
 #if WANT_SIMD_EXCEPT
   float64x2_t uflow_bound;
 #endif
@@ -68,8 +68,10 @@ static inline struct entry
 lookup (uint64x2_t i)
 {
   struct entry e;
-  float64x2_t e1 = vld1q_f64 ((float64_t *) (__erfc_data.tab - Off + i[0])),
-	      e2 = vld1q_f64 ((float64_t *) (__erfc_data.tab - Off + i[1]));
+  float64x2_t e1
+      = vld1q_f64 (&__erfc_data.tab[vgetq_lane_u64 (i, 0) - Off].erfc);
+  float64x2_t e2
+      = vld1q_f64 (&__erfc_data.tab[vgetq_lane_u64 (i, 1) - Off].erfc);
   e.erfc = vuzp1q_f64 (e1, e2);
   e.scale = vuzp2q_f64 (e1, e2);
   return e;
@@ -161,16 +163,19 @@ float64x2_t V_NAME_D1 (erfc) (float64x2_t x)
   p5 = vmulq_f64 (r, vfmaq_f64 (vmulq_f64 (v_f64 (0.5), dat->p20), r2, p5));
   /* Compute p_i using recurrence relation:
      p_{i+2} = (p_i + r * Q_{i+1} * p_{i+1}) * R_{i+1}.  */
-  float64x2_t p6 = vfmaq_f64 (p4, p5, vmulq_laneq_f64 (r, dat->qr5, 0));
-  p6 = vmulq_laneq_f64 (p6, dat->qr5, 1);
-  float64x2_t p7 = vfmaq_f64 (p5, p6, vmulq_laneq_f64 (r, dat->qr6, 0));
-  p7 = vmulq_laneq_f64 (p7, dat->qr6, 1);
-  float64x2_t p8 = vfmaq_f64 (p6, p7, vmulq_laneq_f64 (r, dat->qr7, 0));
-  p8 = vmulq_laneq_f64 (p8, dat->qr7, 1);
-  float64x2_t p9 = vfmaq_f64 (p7, p8, vmulq_laneq_f64 (r, dat->qr8, 0));
-  p9 = vmulq_laneq_f64 (p9, dat->qr8, 1);
-  float64x2_t p10 = vfmaq_f64 (p8, p9, vmulq_laneq_f64 (r, dat->qr9, 0));
-  p10 = vmulq_laneq_f64 (p10, dat->qr9, 1);
+  float64x2_t qr5 = vld1q_f64 (dat->qr5), qr6 = vld1q_f64 (dat->qr6),
+	      qr7 = vld1q_f64 (dat->qr7), qr8 = vld1q_f64 (dat->qr8),
+	      qr9 = vld1q_f64 (dat->qr9);
+  float64x2_t p6 = vfmaq_f64 (p4, p5, vmulq_laneq_f64 (r, qr5, 0));
+  p6 = vmulq_laneq_f64 (p6, qr5, 1);
+  float64x2_t p7 = vfmaq_f64 (p5, p6, vmulq_laneq_f64 (r, qr6, 0));
+  p7 = vmulq_laneq_f64 (p7, qr6, 1);
+  float64x2_t p8 = vfmaq_f64 (p6, p7, vmulq_laneq_f64 (r, qr7, 0));
+  p8 = vmulq_laneq_f64 (p8, qr7, 1);
+  float64x2_t p9 = vfmaq_f64 (p7, p8, vmulq_laneq_f64 (r, qr8, 0));
+  p9 = vmulq_laneq_f64 (p9, qr8, 1);
+  float64x2_t p10 = vfmaq_f64 (p8, p9, vmulq_laneq_f64 (r, qr9, 0));
+  p10 = vmulq_laneq_f64 (p10, qr9, 1);
   /* Compute polynomial in d using pairwise Horner scheme.  */
   float64x2_t p90 = vfmaq_f64 (p9, d, p10);
   float64x2_t p78 = vfmaq_f64 (p7, d, p8);
diff --git a/sysdeps/aarch64/fpu/erfcf_advsimd.c b/sysdeps/aarch64/fpu/erfcf_advsimd.c
index 30b9e48dd4..ca5bc3ab33 100644
--- a/sysdeps/aarch64/fpu/erfcf_advsimd.c
+++ b/sysdeps/aarch64/fpu/erfcf_advsimd.c
@@ -23,7 +23,8 @@ static const struct data
 {
   uint32x4_t offset, table_scale;
   float32x4_t max, shift;
-  float32x4_t coeffs, third, two_over_five, tenth;
+  float coeffs[4];
+  float32x4_t third, two_over_five, tenth;
 #if WANT_SIMD_EXCEPT
   float32x4_t uflow_bound;
 #endif
@@ -37,7 +38,7 @@ static const struct data
   .shift = V4 (0x1p17f),
   /* Store 1/3, 2/3 and 2/15 in a single register for use with indexed muls and
      fmas.  */
-  .coeffs = (float32x4_t){ 0x1.555556p-2f, 0x1.555556p-1f, 0x1.111112p-3f, 0 },
+  .coeffs = { 0x1.555556p-2f, 0x1.555556p-1f, 0x1.111112p-3f, 0 },
   .third = V4 (0x1.555556p-2f),
   .two_over_five = V4 (-0x1.99999ap-2f),
   .tenth = V4 (-0x1.99999ap-4f),
@@ -60,12 +61,16 @@ static inline struct entry
 lookup (uint32x4_t i)
 {
   struct entry e;
-  float64_t t0 = *((float64_t *) (__erfcf_data.tab - Off + i[0]));
-  float64_t t1 = *((float64_t *) (__erfcf_data.tab - Off + i[1]));
-  float64_t t2 = *((float64_t *) (__erfcf_data.tab - Off + i[2]));
-  float64_t t3 = *((float64_t *) (__erfcf_data.tab - Off + i[3]));
-  float32x4_t e1 = vreinterpretq_f32_f64 ((float64x2_t){ t0, t1 });
-  float32x4_t e2 = vreinterpretq_f32_f64 ((float64x2_t){ t2, t3 });
+  float32x2_t t0
+      = vld1_f32 (&__erfcf_data.tab[vgetq_lane_u32 (i, 0) - Off].erfc);
+  float32x2_t t1
+      = vld1_f32 (&__erfcf_data.tab[vgetq_lane_u32 (i, 1) - Off].erfc);
+  float32x2_t t2
+      = vld1_f32 (&__erfcf_data.tab[vgetq_lane_u32 (i, 2) - Off].erfc);
+  float32x2_t t3
+      = vld1_f32 (&__erfcf_data.tab[vgetq_lane_u32 (i, 3) - Off].erfc);
+  float32x4_t e1 = vcombine_f32 (t0, t1);
+  float32x4_t e2 = vcombine_f32 (t2, t3);
   e.erfc = vuzp1q_f32 (e1, e2);
   e.scale = vuzp2q_f32 (e1, e2);
   return e;
@@ -140,10 +145,11 @@ float32x4_t NOINLINE V_NAME_F1 (erfc) (float32x4_t x)
   float32x4_t r2 = vmulq_f32 (r, r);
 
   float32x4_t p1 = r;
-  float32x4_t p2 = vfmsq_laneq_f32 (dat->third, r2, dat->coeffs, 1);
+  float32x4_t coeffs = vld1q_f32 (dat->coeffs);
+  float32x4_t p2 = vfmsq_laneq_f32 (dat->third, r2, coeffs, 1);
   float32x4_t p3
-      = vmulq_f32 (r, vfmaq_laneq_f32 (v_f32 (-0.5), r2, dat->coeffs, 0));
-  float32x4_t p4 = vfmaq_laneq_f32 (dat->two_over_five, r2, dat->coeffs, 2);
+      = vmulq_f32 (r, vfmaq_laneq_f32 (v_f32 (-0.5), r2, coeffs, 0));
+  float32x4_t p4 = vfmaq_laneq_f32 (dat->two_over_five, r2, coeffs, 2);
   p4 = vfmsq_f32 (dat->tenth, r2, p4);
 
   float32x4_t y = vfmaq_f32 (p3, d, p4);
diff --git a/sysdeps/aarch64/fpu/erff_advsimd.c b/sysdeps/aarch64/fpu/erff_advsimd.c
index c44644a71c..f2fe6ff236 100644
--- a/sysdeps/aarch64/fpu/erff_advsimd.c
+++ b/sysdeps/aarch64/fpu/erff_advsimd.c
@@ -47,12 +47,12 @@ static inline struct entry
 lookup (uint32x4_t i)
 {
   struct entry e;
-  float64_t t0 = *((float64_t *) (__erff_data.tab + i[0]));
-  float64_t t1 = *((float64_t *) (__erff_data.tab + i[1]));
-  float64_t t2 = *((float64_t *) (__erff_data.tab + i[2]));
-  float64_t t3 = *((float64_t *) (__erff_data.tab + i[3]));
-  float32x4_t e1 = vreinterpretq_f32_f64 ((float64x2_t){ t0, t1 });
-  float32x4_t e2 = vreinterpretq_f32_f64 ((float64x2_t){ t2, t3 });
+  float32x2_t t0 = vld1_f32 (&__erff_data.tab[vgetq_lane_u32 (i, 0)].erf);
+  float32x2_t t1 = vld1_f32 (&__erff_data.tab[vgetq_lane_u32 (i, 1)].erf);
+  float32x2_t t2 = vld1_f32 (&__erff_data.tab[vgetq_lane_u32 (i, 2)].erf);
+  float32x2_t t3 = vld1_f32 (&__erff_data.tab[vgetq_lane_u32 (i, 3)].erf);
+  float32x4_t e1 = vcombine_f32 (t0, t1);
+  float32x4_t e2 = vcombine_f32 (t2, t3);
   e.erf = vuzp1q_f32 (e1, e2);
   e.scale = vuzp2q_f32 (e1, e2);
   return e;
diff --git a/sysdeps/aarch64/fpu/exp10f_advsimd.c b/sysdeps/aarch64/fpu/exp10f_advsimd.c
index ab117b69da..cf53e73290 100644
--- a/sysdeps/aarch64/fpu/exp10f_advsimd.c
+++ b/sysdeps/aarch64/fpu/exp10f_advsimd.c
@@ -25,7 +25,8 @@
 static const struct data
 {
   float32x4_t poly[5];
-  float32x4_t log10_2_and_inv, shift;
+  float log10_2_and_inv[4];
+  float32x4_t shift;
 
 #if !WANT_SIMD_EXCEPT
   float32x4_t scale_thresh;
@@ -111,10 +112,11 @@ float32x4_t VPCS_ATTR NOINLINE V_NAME_F1 (exp10) (float32x4_t x)
   /* exp10(x) = 2^n * 10^r = 2^n * (1 + poly (r)),
      with poly(r) in [1/sqrt(2), sqrt(2)] and
      x = r + n * log10 (2), with r in [-log10(2)/2, log10(2)/2].  */
-  float32x4_t z = vfmaq_laneq_f32 (d->shift, x, d->log10_2_and_inv, 0);
+  float32x4_t log10_2_and_inv = vld1q_f32 (d->log10_2_and_inv);
+  float32x4_t z = vfmaq_laneq_f32 (d->shift, x, log10_2_and_inv, 0);
   float32x4_t n = vsubq_f32 (z, d->shift);
-  float32x4_t r = vfmsq_laneq_f32 (x, n, d->log10_2_and_inv, 1);
-  r = vfmsq_laneq_f32 (r, n, d->log10_2_and_inv, 2);
+  float32x4_t r = vfmsq_laneq_f32 (x, n, log10_2_and_inv, 1);
+  r = vfmsq_laneq_f32 (r, n, log10_2_and_inv, 2);
   uint32x4_t e = vshlq_n_u32 (vreinterpretq_u32_f32 (z), 23);
 
   float32x4_t scale = vreinterpretq_f32_u32 (vaddq_u32 (e, ExponentBias));
diff --git a/sysdeps/aarch64/fpu/expm1_advsimd.c b/sysdeps/aarch64/fpu/expm1_advsimd.c
index 3628398674..3db3b80c49 100644
--- a/sysdeps/aarch64/fpu/expm1_advsimd.c
+++ b/sysdeps/aarch64/fpu/expm1_advsimd.c
@@ -23,7 +23,9 @@
 static const struct data
 {
   float64x2_t poly[11];
-  float64x2_t invln2, ln2, shift;
+  float64x2_t invln2;
+  double ln2[2];
+  float64x2_t shift;
   int64x2_t exponent_bias;
 #if WANT_SIMD_EXCEPT
   uint64x2_t thresh, tiny_bound;
@@ -92,8 +94,9 @@ float64x2_t VPCS_ATTR V_NAME_D1 (expm1) (float64x2_t x)
      where 2^i is exact because i is an integer.  */
   float64x2_t n = vsubq_f64 (vfmaq_f64 (d->shift, d->invln2, x), d->shift);
   int64x2_t i = vcvtq_s64_f64 (n);
-  float64x2_t f = vfmsq_laneq_f64 (x, n, d->ln2, 0);
-  f = vfmsq_laneq_f64 (f, n, d->ln2, 1);
+  float64x2_t ln2 = vld1q_f64 (&d->ln2[0]);
+  float64x2_t f = vfmsq_laneq_f64 (x, n, ln2, 0);
+  f = vfmsq_laneq_f64 (f, n, ln2, 1);
 
   /* Approximate expm1(f) using polynomial.
      Taylor expansion for expm1(x) has the form:
diff --git a/sysdeps/aarch64/fpu/expm1f_advsimd.c b/sysdeps/aarch64/fpu/expm1f_advsimd.c
index 93db200f61..a0616ec754 100644
--- a/sysdeps/aarch64/fpu/expm1f_advsimd.c
+++ b/sysdeps/aarch64/fpu/expm1f_advsimd.c
@@ -23,7 +23,7 @@
 static const struct data
 {
   float32x4_t poly[5];
-  float32x4_t invln2_and_ln2;
+  float invln2_and_ln2[4];
   float32x4_t shift;
   int32x4_t exponent_bias;
 #if WANT_SIMD_EXCEPT
@@ -88,11 +88,12 @@ float32x4_t VPCS_ATTR NOINLINE V_NAME_F1 (expm1) (float32x4_t x)
      and f = x - i * ln2, then f is in [-ln2/2, ln2/2].
      exp(x) - 1 = 2^i * (expm1(f) + 1) - 1
      where 2^i is exact because i is an integer.  */
-  float32x4_t j = vsubq_f32 (
-      vfmaq_laneq_f32 (d->shift, x, d->invln2_and_ln2, 0), d->shift);
+  float32x4_t invln2_and_ln2 = vld1q_f32 (d->invln2_and_ln2);
+  float32x4_t j
+      = vsubq_f32 (vfmaq_laneq_f32 (d->shift, x, invln2_and_ln2, 0), d->shift);
   int32x4_t i = vcvtq_s32_f32 (j);
-  float32x4_t f = vfmsq_laneq_f32 (x, j, d->invln2_and_ln2, 1);
-  f = vfmsq_laneq_f32 (f, j, d->invln2_and_ln2, 2);
+  float32x4_t f = vfmsq_laneq_f32 (x, j, invln2_and_ln2, 1);
+  f = vfmsq_laneq_f32 (f, j, invln2_and_ln2, 2);
 
   /* Approximate expm1(f) using polynomial.
      Taylor expansion for expm1(x) has the form:
diff --git a/sysdeps/aarch64/fpu/log10_advsimd.c b/sysdeps/aarch64/fpu/log10_advsimd.c
index 1e5ef99e89..c065aaebae 100644
--- a/sysdeps/aarch64/fpu/log10_advsimd.c
+++ b/sysdeps/aarch64/fpu/log10_advsimd.c
@@ -58,8 +58,10 @@ static inline struct entry
 lookup (uint64x2_t i)
 {
   struct entry e;
-  uint64_t i0 = (i[0] >> (52 - V_LOG10_TABLE_BITS)) & IndexMask;
-  uint64_t i1 = (i[1] >> (52 - V_LOG10_TABLE_BITS)) & IndexMask;
+  uint64_t i0
+      = (vgetq_lane_u64 (i, 0) >> (52 - V_LOG10_TABLE_BITS)) & IndexMask;
+  uint64_t i1
+      = (vgetq_lane_u64 (i, 1) >> (52 - V_LOG10_TABLE_BITS)) & IndexMask;
   float64x2_t e0 = vld1q_f64 (&__v_log10_data.table[i0].invc);
   float64x2_t e1 = vld1q_f64 (&__v_log10_data.table[i1].invc);
   e.invc = vuzp1q_f64 (e0, e1);
diff --git a/sysdeps/aarch64/fpu/log2_advsimd.c b/sysdeps/aarch64/fpu/log2_advsimd.c
index a34978f6cf..4057c552d8 100644
--- a/sysdeps/aarch64/fpu/log2_advsimd.c
+++ b/sysdeps/aarch64/fpu/log2_advsimd.c
@@ -55,8 +55,10 @@ static inline struct entry
 lookup (uint64x2_t i)
 {
   struct entry e;
-  uint64_t i0 = (i[0] >> (52 - V_LOG2_TABLE_BITS)) & IndexMask;
-  uint64_t i1 = (i[1] >> (52 - V_LOG2_TABLE_BITS)) & IndexMask;
+  uint64_t i0
+      = (vgetq_lane_u64 (i, 0) >> (52 - V_LOG2_TABLE_BITS)) & IndexMask;
+  uint64_t i1
+      = (vgetq_lane_u64 (i, 1) >> (52 - V_LOG2_TABLE_BITS)) & IndexMask;
   float64x2_t e0 = vld1q_f64 (&__v_log2_data.table[i0].invc);
   float64x2_t e1 = vld1q_f64 (&__v_log2_data.table[i1].invc);
   e.invc = vuzp1q_f64 (e0, e1);
diff --git a/sysdeps/aarch64/fpu/log_advsimd.c b/sysdeps/aarch64/fpu/log_advsimd.c
index 21df61728c..015a6da7d7 100644
--- a/sysdeps/aarch64/fpu/log_advsimd.c
+++ b/sysdeps/aarch64/fpu/log_advsimd.c
@@ -54,17 +54,12 @@ lookup (uint64x2_t i)
 {
   /* Since N is a power of 2, n % N = n & (N - 1).  */
   struct entry e;
-  uint64_t i0 = (i[0] >> (52 - V_LOG_TABLE_BITS)) & IndexMask;
-  uint64_t i1 = (i[1] >> (52 - V_LOG_TABLE_BITS)) & IndexMask;
+  uint64_t i0 = (vgetq_lane_u64 (i, 0) >> (52 - V_LOG_TABLE_BITS)) & IndexMask;
+  uint64_t i1 = (vgetq_lane_u64 (i, 1) >> (52 - V_LOG_TABLE_BITS)) & IndexMask;
   float64x2_t e0 = vld1q_f64 (&__v_log_data.table[i0].invc);
   float64x2_t e1 = vld1q_f64 (&__v_log_data.table[i1].invc);
-#if __BYTE_ORDER == __LITTLE_ENDIAN
   e.invc = vuzp1q_f64 (e0, e1);
   e.logc = vuzp2q_f64 (e0, e1);
-#else
-  e.invc = vuzp1q_f64 (e1, e0);
-  e.logc = vuzp2q_f64 (e1, e0);
-#endif
   return e;
 }
 
diff --git a/sysdeps/aarch64/fpu/sinh_advsimd.c b/sysdeps/aarch64/fpu/sinh_advsimd.c
index fa3723b10c..3e3b76c502 100644
--- a/sysdeps/aarch64/fpu/sinh_advsimd.c
+++ b/sysdeps/aarch64/fpu/sinh_advsimd.c
@@ -22,8 +22,9 @@
 
 static const struct data
 {
-  float64x2_t poly[11];
-  float64x2_t inv_ln2, m_ln2, shift;
+  float64x2_t poly[11], inv_ln2;
+  double m_ln2[2];
+  float64x2_t shift;
   uint64x2_t halff;
   int64x2_t onef;
 #if WANT_SIMD_EXCEPT
@@ -40,7 +41,7 @@ static const struct data
 	    V2 (0x1.af5eedae67435p-26), V2 (0x1.1f143d060a28ap-29), },
 
   .inv_ln2 = V2 (0x1.71547652b82fep0),
-  .m_ln2 = (float64x2_t) {-0x1.62e42fefa39efp-1, -0x1.abc9e3b39803fp-56},
+  .m_ln2 = {-0x1.62e42fefa39efp-1, -0x1.abc9e3b39803fp-56},
   .shift = V2 (0x1.8p52),
 
   .halff = V2 (0x3fe0000000000000),
@@ -67,8 +68,10 @@ expm1_inline (float64x2_t x)
      and   f = x - i * ln2 (f in [-ln2/2, ln2/2]).  */
   float64x2_t j = vsubq_f64 (vfmaq_f64 (d->shift, d->inv_ln2, x), d->shift);
   int64x2_t i = vcvtq_s64_f64 (j);
-  float64x2_t f = vfmaq_laneq_f64 (x, j, d->m_ln2, 0);
-  f = vfmaq_laneq_f64 (f, j, d->m_ln2, 1);
+
+  float64x2_t m_ln2 = vld1q_f64 (d->m_ln2);
+  float64x2_t f = vfmaq_laneq_f64 (x, j, m_ln2, 0);
+  f = vfmaq_laneq_f64 (f, j, m_ln2, 1);
   /* Approximate expm1(f) using polynomial.  */
   float64x2_t f2 = vmulq_f64 (f, f);
   float64x2_t f4 = vmulq_f64 (f2, f2);
diff --git a/sysdeps/aarch64/fpu/tan_advsimd.c b/sysdeps/aarch64/fpu/tan_advsimd.c
index 0459821ab2..d56a102dd1 100644
--- a/sysdeps/aarch64/fpu/tan_advsimd.c
+++ b/sysdeps/aarch64/fpu/tan_advsimd.c
@@ -23,7 +23,8 @@
 static const struct data
 {
   float64x2_t poly[9];
-  float64x2_t half_pi, two_over_pi, shift;
+  double half_pi[2];
+  float64x2_t two_over_pi, shift;
 #if !WANT_SIMD_EXCEPT
   float64x2_t range_val;
 #endif
@@ -81,8 +82,9 @@ float64x2_t VPCS_ATTR V_NAME_D1 (tan) (float64x2_t x)
   /* Use q to reduce x to r in [-pi/4, pi/4], by:
      r = x - q * pi/2, in extended precision.  */
   float64x2_t r = x;
-  r = vfmsq_laneq_f64 (r, q, dat->half_pi, 0);
-  r = vfmsq_laneq_f64 (r, q, dat->half_pi, 1);
+  float64x2_t half_pi = vld1q_f64 (dat->half_pi);
+  r = vfmsq_laneq_f64 (r, q, half_pi, 0);
+  r = vfmsq_laneq_f64 (r, q, half_pi, 1);
   /* Further reduce r to [-pi/8, pi/8], to be reconstructed using double angle
      formula.  */
   r = vmulq_n_f64 (r, 0.5);
diff --git a/sysdeps/aarch64/fpu/tanf_advsimd.c b/sysdeps/aarch64/fpu/tanf_advsimd.c
index 5a7489390a..705586f0c0 100644
--- a/sysdeps/aarch64/fpu/tanf_advsimd.c
+++ b/sysdeps/aarch64/fpu/tanf_advsimd.c
@@ -23,7 +23,7 @@
 static const struct data
 {
   float32x4_t poly[6];
-  float32x4_t pi_consts;
+  float pi_consts[4];
   float32x4_t shift;
 #if !WANT_SIMD_EXCEPT
   float32x4_t range_val;
@@ -95,16 +95,17 @@ float32x4_t VPCS_ATTR NOINLINE V_NAME_F1 (tan) (float32x4_t x)
 #endif
 
   /* n = rint(x/(pi/2)).  */
-  float32x4_t q = vfmaq_laneq_f32 (d->shift, x, d->pi_consts, 3);
+  float32x4_t pi_consts = vld1q_f32 (d->pi_consts);
+  float32x4_t q = vfmaq_laneq_f32 (d->shift, x, pi_consts, 3);
   float32x4_t n = vsubq_f32 (q, d->shift);
   /* Determine if x lives in an interval, where |tan(x)| grows to infinity.  */
   uint32x4_t pred_alt = vtstq_u32 (vreinterpretq_u32_f32 (q), v_u32 (1));
 
   /* r = x - n * (pi/2)  (range reduction into -pi./4 .. pi/4).  */
   float32x4_t r;
-  r = vfmaq_laneq_f32 (x, n, d->pi_consts, 0);
-  r = vfmaq_laneq_f32 (r, n, d->pi_consts, 1);
-  r = vfmaq_laneq_f32 (r, n, d->pi_consts, 2);
+  r = vfmaq_laneq_f32 (x, n, pi_consts, 0);
+  r = vfmaq_laneq_f32 (r, n, pi_consts, 1);
+  r = vfmaq_laneq_f32 (r, n, pi_consts, 2);
 
   /* If x lives in an interval, where |tan(x)|
      - is finite, then use a polynomial approximation of the form
diff --git a/sysdeps/aarch64/fpu/v_expf_inline.h b/sysdeps/aarch64/fpu/v_expf_inline.h
index a3b0e32f9e..08b06e0a6b 100644
--- a/sysdeps/aarch64/fpu/v_expf_inline.h
+++ b/sysdeps/aarch64/fpu/v_expf_inline.h
@@ -25,7 +25,8 @@
 struct v_expf_data
 {
   float32x4_t poly[5];
-  float32x4_t shift, invln2_and_ln2;
+  float32x4_t shift;
+  float invln2_and_ln2[4];
 };
 
 /* maxerr: 1.45358 +0.5 ulp.  */
@@ -50,10 +51,11 @@ v_expf_inline (float32x4_t x, const struct v_expf_data *d)
   /* exp(x) = 2^n (1 + poly(r)), with 1 + poly(r) in [1/sqrt(2),sqrt(2)]
      x = ln2*n + r, with r in [-ln2/2, ln2/2].  */
   float32x4_t n, r, z;
-  z = vfmaq_laneq_f32 (d->shift, x, d->invln2_and_ln2, 0);
+  float32x4_t invln2_and_ln2 = vld1q_f32 (d->invln2_and_ln2);
+  z = vfmaq_laneq_f32 (d->shift, x, invln2_and_ln2, 0);
   n = vsubq_f32 (z, d->shift);
-  r = vfmsq_laneq_f32 (x, n, d->invln2_and_ln2, 1);
-  r = vfmsq_laneq_f32 (r, n, d->invln2_and_ln2, 2);
+  r = vfmsq_laneq_f32 (x, n, invln2_and_ln2, 1);
+  r = vfmsq_laneq_f32 (r, n, invln2_and_ln2, 2);
   uint32x4_t e = vshlq_n_u32 (vreinterpretq_u32_f32 (z), 23);
   float32x4_t scale = vreinterpretq_f32_u32 (vaddq_u32 (e, ExponentBias));
 
diff --git a/sysdeps/aarch64/fpu/v_expm1f_inline.h b/sysdeps/aarch64/fpu/v_expm1f_inline.h
index 337ccfbfab..59b552da6b 100644
--- a/sysdeps/aarch64/fpu/v_expm1f_inline.h
+++ b/sysdeps/aarch64/fpu/v_expm1f_inline.h
@@ -26,7 +26,8 @@
 struct v_expm1f_data
 {
   float32x4_t poly[5];
-  float32x4_t invln2_and_ln2, shift;
+  float invln2_and_ln2[4];
+  float32x4_t shift;
   int32x4_t exponent_bias;
 };
 
@@ -49,11 +50,12 @@ expm1f_inline (float32x4_t x, const struct v_expm1f_data *d)
      calling routine should handle special values if required.  */
 
   /* Reduce argument: f in [-ln2/2, ln2/2], i is exact.  */
-  float32x4_t j = vsubq_f32 (
-      vfmaq_laneq_f32 (d->shift, x, d->invln2_and_ln2, 0), d->shift);
+  float32x4_t invln2_and_ln2 = vld1q_f32 (d->invln2_and_ln2);
+  float32x4_t j
+      = vsubq_f32 (vfmaq_laneq_f32 (d->shift, x, invln2_and_ln2, 0), d->shift);
   int32x4_t i = vcvtq_s32_f32 (j);
-  float32x4_t f = vfmsq_laneq_f32 (x, j, d->invln2_and_ln2, 1);
-  f = vfmsq_laneq_f32 (f, j, d->invln2_and_ln2, 2);
+  float32x4_t f = vfmsq_laneq_f32 (x, j, invln2_and_ln2, 1);
+  f = vfmsq_laneq_f32 (f, j, invln2_and_ln2, 2);
 
   /* Approximate expm1(f) with polynomial P, expm1(f) ~= f + f^2 * P(f).
      Uses Estrin scheme, where the main _ZGVnN4v_expm1f routine uses