From patchwork Wed Jul 22 17:01:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Bergner X-Patchwork-Id: 1334043 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=gcc.gnu.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=f95B97Hb; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4BBhc40RSbz9sQt for ; Thu, 23 Jul 2020 03:02:58 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E16ED3875483; Wed, 22 Jul 2020 17:02:55 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E16ED3875483 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1595437375; bh=ztdHYbtlS7f5ZgQwbGxjFxUj1l/Uu3w6Nq2UcqrtqcI=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=f95B97HbdDcBomR/BB/Adoh1uW9Ae91wbEMBWSM6r5mQtro+yFw2NpogusMLDbHNK QB3HLpRbfx1m6tJlKKk/sJX0JA0WnuMaeopPB9dEbS3IKdA2ErEE8JsEI0aGEQkGHn WOB3KFrgWNPsQLnsx9im4ycsgjx/P0EehPOr53v4= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id E0ED13851C31 for ; Wed, 22 Jul 2020 17:02:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org E0ED13851C31 Received: from pps.filterd (m0098421.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 06MH2VhV098680; Wed, 22 Jul 2020 13:02:53 -0400 Received: from ppma01wdc.us.ibm.com (fd.55.37a9.ip4.static.sl-reverse.com [169.55.85.253]) by mx0a-001b2d01.pphosted.com with ESMTP id 32e1x8acsc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 22 Jul 2020 13:02:30 -0400 Received: from pps.filterd (ppma01wdc.us.ibm.com [127.0.0.1]) by ppma01wdc.us.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 06MGxZqP026065; Wed, 22 Jul 2020 17:01:24 GMT Received: from b03cxnp08025.gho.boulder.ibm.com (b03cxnp08025.gho.boulder.ibm.com [9.17.130.17]) by ppma01wdc.us.ibm.com with ESMTP id 32brq95eb5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 22 Jul 2020 17:01:24 +0000 Received: from b03ledav001.gho.boulder.ibm.com (b03ledav001.gho.boulder.ibm.com [9.17.130.232]) by b03cxnp08025.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 06MH1L769765320 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 22 Jul 2020 17:01:21 GMT Received: from b03ledav001.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 87DC96E052; Wed, 22 Jul 2020 17:01:23 +0000 (GMT) Received: from b03ledav001.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id BE33F6E058; Wed, 22 Jul 2020 17:01:22 +0000 (GMT) Received: from [9.65.200.87] (unknown [9.65.200.87]) by b03ledav001.gho.boulder.ibm.com (Postfix) with ESMTP; Wed, 22 Jul 2020 17:01:22 +0000 (GMT) To: Segher Boessenkool Subject: [PATCH] rs6000: __builtin_mma_disassemble_acc() doesn't store elements correctly in LE mode Message-ID: Date: Wed, 22 Jul 2020 12:01:21 -0500 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 Content-Language: en-US X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235, 18.0.687 definitions=2020-07-22_10:2020-07-22, 2020-07-22 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 phishscore=0 malwarescore=0 spamscore=0 adultscore=0 mlxscore=0 bulkscore=0 lowpriorityscore=0 suspectscore=0 mlxlogscore=850 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2007220109 X-Spam-Status: No, score=-9.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Peter Bergner via Gcc-patches From: Peter Bergner Reply-To: Peter Bergner Cc: GCC Patches , Bill Schmidt Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" PR96236 shows a problem where we don't correctly store our 512-bit accumulators correctly in little-endian mode. The patch below detects when we're doing a little-endian memory access and stores to the correct memory locations. This passed bootstrap and regtesting with no regressions. Raji verified the runnable test case changes work with a fixed compiler. Ok for trunk and backport to the GCC 10 branch once it reopens? Peter gcc/ PR target/96236 * config/rs6000/rs6000-call.c (rs6000_gimple_fold_mma_builtin): Handle little-endian memory ordering. gcc/testsuite/ PR target/96236 * gcc.target/powerpc/mma-double-test.c: Update storing results for correct little-endian ordering. * gcc.target/powerpc/mma-single-test.c: Likewise. diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c index 5ec3f2c55ad..bb0fdf29688 100644 --- a/gcc/config/rs6000/rs6000-call.c +++ b/gcc/config/rs6000/rs6000-call.c @@ -11154,11 +11154,12 @@ rs6000_gimple_fold_mma_builtin (gimple_stmt_iterator *gsi) tree src_array = build1 (VIEW_CONVERT_EXPR, array_type, src); for (unsigned i = 0; i < 4; i++) { + unsigned index = WORDS_BIG_ENDIAN ? i : 3 - i; tree ref = build4 (ARRAY_REF, unsigned_V16QI_type_node, src_array, build_int_cst (size_type_node, i), NULL_TREE, NULL_TREE); tree dst = build2 (MEM_REF, unsigned_V16QI_type_node, dst_base, - build_int_cst (dst_type, i * 16)); + build_int_cst (dst_type, index * 16)); gimplify_assign (dst, ref, &new_seq); } pop_gimplify_context (NULL); diff --git a/gcc/testsuite/gcc.target/powerpc/mma-double-test.c b/gcc/testsuite/gcc.target/powerpc/mma-double-test.c index ac84ae30004..044a288ebcc 100755 --- a/gcc/testsuite/gcc.target/powerpc/mma-double-test.c +++ b/gcc/testsuite/gcc.target/powerpc/mma-double-test.c @@ -12,13 +12,13 @@ typedef double v4sf_t __attribute__ ((vector_size (16))); #define SAVE_ACC(ACC, ldc, J) \ __builtin_mma_disassemble_acc (result, ACC); \ rowC = (v4sf_t *) &CO[0*ldc+J]; \ - rowC[0] += result[3] ; \ + rowC[0] += result[0]; \ rowC = (v4sf_t *) &CO[1*ldc+J]; \ - rowC[0] += result[2] ; \ + rowC[0] += result[1]; \ rowC = (v4sf_t *) &CO[2*ldc+J]; \ - rowC[0] += result[1] ; \ + rowC[0] += result[2]; \ rowC = (v4sf_t *) &CO[3*ldc+J]; \ - rowC[0] += result[0] ; + rowC[0] += result[3]; void MMA (int m, int n, int k, double *A, double *B, double *C) diff --git a/gcc/testsuite/gcc.target/powerpc/mma-single-test.c b/gcc/testsuite/gcc.target/powerpc/mma-single-test.c index 15369a64025..7e628df45b7 100755 --- a/gcc/testsuite/gcc.target/powerpc/mma-single-test.c +++ b/gcc/testsuite/gcc.target/powerpc/mma-single-test.c @@ -12,24 +12,24 @@ typedef float v4sf_t __attribute__ ((vector_size (16))); #define SAVE_ACC(ACC, ldc,J) \ __builtin_mma_disassemble_acc (result, ACC); \ rowC = (v4sf_t *) &CO[0*ldc+J]; \ - rowC[0] += result[3] ; \ + rowC[0] += result[0]; \ rowC = (v4sf_t *) &CO[1*ldc+J]; \ - rowC[0] += result[2] ; \ + rowC[0] += result[1]; \ rowC = (v4sf_t *) &CO[2*ldc+J]; \ - rowC[0] += result[1] ; \ + rowC[0] += result[2]; \ rowC = (v4sf_t *) &CO[3*ldc+J]; \ - rowC[0] += result[0] ; + rowC[0] += result[3]; #define SAVE_ACC1(ACC,ldc, J) \ __builtin_mma_disassemble_acc (result, ACC); \ rowC = (v4sf_t *) &CO[4* ldc+J]; \ - rowC[0] += result[3] ; \ + rowC[0] += result[0]; \ rowC = (v4sf_t *) &CO[5*ldc+J]; \ - rowC[0] += result[2] ; \ + rowC[0] += result[1]; \ rowC = (v4sf_t *) &CO[6*ldc+J]; \ - rowC[0] += result[1] ; \ + rowC[0] += result[2]; \ rowC = (v4sf_t *) &CO[7*ldc+J]; \ - rowC[0] += result[0] ; + rowC[0] += result[3]; void MMA (int m, int n, int k, float *A, float *B, float *C) {