From patchwork Mon Nov 20 04:18:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 1865858 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=I6pAJ3ZU; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SYZ5Q40frz1yRg for ; Mon, 20 Nov 2023 15:19:05 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4310D385829D for ; Mon, 20 Nov 2023 04:19:02 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id A7E9A3858C50 for ; Mon, 20 Nov 2023 04:18:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A7E9A3858C50 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A7E9A3858C50 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700453931; cv=none; b=EUSHev5T2auYzaT7gPQo6YOzmTfU/TN/1LgWadCWSc+JEtyIIwnt5yk9tGIk3Q39oMIp5k9GX52pDIwFbpV6CrVl1AmqHj1KJKlc8o+bRQo9ZlbSZ4dI1oZkWtaEY5j9gkGZxeZIJjoY7+y7DmFUfYvwg/7jHM5CxFox+Wjt+LE= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700453931; c=relaxed/simple; bh=A0kzJVWTrlm1Y1Yc6vawB3WhqjCcXXavz1s2IXfHKSA=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=vKVPGSAGxRXWJ8x1W2bnlg1/C5eOvIepzx86Kqp09FsBX/CS2bwUbKjCIoKkydWL5UrvyWJSUVtXE/+0RcddOpnKjEK8dmy/x4rrOxuWbVHGIBYwHSTxZwkOqO6qwxfRW1sHGTcx4JjEwtV4dPYpkW85e6eVKdAPkyF6tEMmlZY= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3AK2AbxX013285; Mon, 20 Nov 2023 04:18:47 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=date : from : to : subject : message-id : content-type : mime-version; s=pp1; bh=SmbIlG4zDAv85418WxgJQuucNSi7NOgkb0nOcMkv/WI=; b=I6pAJ3ZUpb4YOgh7RwGsqiWYh3iWrSFoDO8V8X20P87QJktSqyzA/XyEStAoC8gCTYKl /yO1zIkQOOxNhbid5x3fLpA/QzHBjrEe3tLeJTR0Ea9wLrhEwfzIdX3G8my9Mxscam7I 1XzGlUPz2QwujJnp3StnI6IkPZuUfO4iwK6lE/QhbTnBY/Wo3d6zp7AvjDzeZOK5n53W PoA76eG5SJnWdIWIsZo3O/H49vgJ6Z6pql0+Tjj4WxgTnRxo5H+BSb00GWaAozvqYHPe X3sutbL0H2kYIsmtMjs9THnMThVWholL0RNBTJIbQC70yHFRsFRXqnnu4InXQkwyr1b0 Jw== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3ueywy2yj7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 20 Nov 2023 04:18:47 +0000 Received: from m0353729.ppops.net (m0353729.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 3AK4HtlZ006092; Mon, 20 Nov 2023 04:18:46 GMT Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3ueywy2yhy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 20 Nov 2023 04:18:46 +0000 Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 3AK1Yfmr000446; Mon, 20 Nov 2023 04:18:45 GMT Received: from smtprelay02.dal12v.mail.ibm.com ([172.16.1.4]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 3ufaa1pa2u-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 20 Nov 2023 04:18:45 +0000 Received: from smtpav02.wdc07v.mail.ibm.com (smtpav02.wdc07v.mail.ibm.com [10.39.53.229]) by smtprelay02.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 3AK4IiRS33096392 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 20 Nov 2023 04:18:45 GMT Received: from smtpav02.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id BA33C5805B; Mon, 20 Nov 2023 04:18:44 +0000 (GMT) Received: from smtpav02.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 142FD5805C; Mon, 20 Nov 2023 04:18:44 +0000 (GMT) Received: from cowardly-lion.the-meissners.org (unknown [9.61.1.46]) by smtpav02.wdc07v.mail.ibm.com (Postfix) with ESMTPS; Mon, 20 Nov 2023 04:18:43 +0000 (GMT) Date: Sun, 19 Nov 2023 23:18:42 -0500 From: Michael Meissner To: gcc-patches@gcc.gnu.org, Michael Meissner , Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner Subject: [PATCH 0/4] Add vector pair support to PowerPC attribute((vector_size(32))) Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner Content-Disposition: inline X-TM-AS-GCONF: 00 X-Proofpoint-GUID: E4jbD196pSV1iAFaTnS1q00k9NEEwQTt X-Proofpoint-ORIG-GUID: PRWjoeVT5NHHTCdpUoLZPCkCWbkqzBwT X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.987,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-11-20_01,2023-11-17_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 phishscore=0 mlxlogscore=310 malwarescore=0 clxscore=1015 impostorscore=0 bulkscore=0 adultscore=0 priorityscore=1501 suspectscore=0 spamscore=0 lowpriorityscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311060000 definitions=main-2311200028 X-Spam-Status: No, score=-3.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, KAM_SHORT, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org This is simiilar to the patches on November 10th. * https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636077.html * https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636078.html * https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636083.html * https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636080.html * https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636081.html to add a set of built-in functions that use the PowePC __vector_pair type and that provide a set of functions to do basic operations on vector pair. After I posted these patches, it was decided that it would be better to have a new type that is used rather than a bunch of new built-in functions. Within the GCC context, the best way to add this support is to extend the vector modes so that V4DFmode, V8SFmode, V4DImode, V8SImode, V16HImode, and V32QImode are used. These patches are to provide this new implementation. While in theory you could add a whole new type that isn't a larger size vector, my experience with IEEE 128-bit floating point is that GCC really doesn't like 2 modes that are the same size but have different implementations (such as we see with IEEE 128-bit floating point and IBM double-double 128-bit floating point). So I did not consider adding a new mode for using with vector pairs. My original intention was to just implement V4DFmode and V8SFmode, since the primary users asking for vector pair support are people implementing the high end math libraries like Eigen and Blas. However in implementing this code, I discovered that we will need integer vector pair support as well as floating point vector pair. The integer modes and types are needed to properly implement byte shuffling and vector comparisons which need integer vector pairs. With the current patches, vector pair support is not enabled by default. The main reason is I have not implemented the support for byte shuffling which various tests depend on. I would also like to implement overloads for the vector built-in functions like vec_add, vec_sum, etc. that if you give it a vector pair, it would handle it just like if you give a vector type. In addition, once the various bugs are addressed, I would then implement the support so that automatic vectorization would consider using vector pairs instead of vectors. In terms of benchmarks, I wrote two benchmarks: 1) One benchmark is a saxpy type loop: value[i] += (a[i] * b[i]). That is a loop with 3 loads and a store per loop. 2) Another benchmark produces a scalar sun of an entire vector. This is a loop that just has a single load and no store. For the saxpy type loop, I get the following general numbers for both float and double: 1) The benchmarks that use attribute((vector_size(32))) are roughly 9-10% faster than using normal vector processing (both auto vectorize and using vector types). 2) The benchmarks that use attribute((vector_size(32))) are roughly 19-20% faster than if I write the loop using the vector pair loads using the exist built-ins, and then manually split the values and do the arithmetic and single vector stores, Unfortunately, for floating point, doing the sum of the whole vector is slower using the new vector pair built-in functions using a simple loop (compared to using the existing built-ins for disassembling vector pairs. If I write more complex loops that manually unroll the loop, then the floating point vector pair built-in functions become like the integer vector pair integer built-in functions. So there is some amount of tuning that will need to be done. There are 4 patches in this set: The first patch adds support for the types, and does moves, and provides some optimizations for extracting an element and setting an element. The second patch implements the floating point arithmetic operations. The third patch implements the integer operations. The fourth patch provides new tests to test these features.