From patchwork Tue Jun 30 22:32:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Sawdey X-Patchwork-Id: 1320145 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=gcc.gnu.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=gfTJ3PWR; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49xJyl6qWmz9sPF for ; Wed, 1 Jul 2020 08:32:45 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 985363857005; Tue, 30 Jun 2020 22:32:41 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 985363857005 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1593556361; bh=k4C10E3qy7/699UoXrfMFeC+7alImRtwJgl2Us+AVLE=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=gfTJ3PWRXxkZOjsOJSF3v0IWbYzKB8VmfCMvcW/IIW5ACi2Uuld2TWDgQ8TmM710V cTs33SO46BdTGKuHhDvrpo7+ULGeMnSHuBD3n0CELpIm5hlL4MoZjIIjcRu37UNguD ImHegXBl9Y871Ia/7FqHbBx9CI1k9z+cr2WxPD2U= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id BD5DE3858D37 for ; Tue, 30 Jun 2020 22:32:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org BD5DE3858D37 Received: from pps.filterd (m0098421.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 05UMVO3P158462; Tue, 30 Jun 2020 18:32:38 -0400 Received: from ppma01dal.us.ibm.com (83.d6.3fa9.ip4.static.sl-reverse.com [169.63.214.131]) by mx0a-001b2d01.pphosted.com with ESMTP id 3204h4bah6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 30 Jun 2020 18:32:38 -0400 Received: from pps.filterd (ppma01dal.us.ibm.com [127.0.0.1]) by ppma01dal.us.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 05UMUj8t007181; Tue, 30 Jun 2020 22:32:37 GMT Received: from b03cxnp08026.gho.boulder.ibm.com (b03cxnp08026.gho.boulder.ibm.com [9.17.130.18]) by ppma01dal.us.ibm.com with ESMTP id 31wwr9a7ts-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 30 Jun 2020 22:32:37 +0000 Received: from b03ledav002.gho.boulder.ibm.com (b03ledav002.gho.boulder.ibm.com [9.17.130.233]) by b03cxnp08026.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 05UMWX1B30671138 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 30 Jun 2020 22:32:33 GMT Received: from b03ledav002.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B199F13606F; Tue, 30 Jun 2020 22:32:35 +0000 (GMT) Received: from b03ledav002.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 91A54136074; Tue, 30 Jun 2020 22:32:35 +0000 (GMT) Received: from marlin.aus.stglabs.ibm.com (unknown [9.40.194.84]) by b03ledav002.gho.boulder.ibm.com (Postfix) with ESMTPS; Tue, 30 Jun 2020 22:32:35 +0000 (GMT) Received: from marlin.aus.stglabs.ibm.com (localhost [127.0.0.1]) by marlin.aus.stglabs.ibm.com (8.15.2/8.15.2/Debian-10) with ESMTP id 05UMWYdJ030805; Tue, 30 Jun 2020 17:32:34 -0500 Received: (from sawdey@localhost) by marlin.aus.stglabs.ibm.com (8.15.2/8.15.2/Submit) id 05UMWWM5030180; Tue, 30 Jun 2020 17:32:32 -0500 To: gcc-patches@gcc.gnu.org Subject: [PATCH] rs6000: Add execution tests for mma builtins. Date: Tue, 30 Jun 2020 17:32:24 -0500 Message-Id: <20200630223224.28131-1-acsawdey@linux.ibm.com> X-Mailer: git-send-email 2.17.1 X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235, 18.0.687 definitions=2020-06-30_06:2020-06-30, 2020-06-30 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 bulkscore=0 impostorscore=0 cotscore=-2147483648 mlxscore=0 clxscore=1015 lowpriorityscore=0 adultscore=0 suspectscore=1 priorityscore=1501 mlxlogscore=999 phishscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2006300151 X-Spam-Status: No, score=-22.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Aaron Sawdey via Gcc-patches From: Aaron Sawdey Reply-To: Aaron Sawdey Cc: wschmidt@linux.ibm.com, segher@kernel.crashing.org, rajis@linux.vnet.ibm.com Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" This patch adds execution tests that use the MMA builtins, checks for the right answer, and checks that __builtin_cpu_supports and __builtin_cpu_is return sane answers given that the code executed correctly. Tested against P10 sim, should not execute anywhere else due to requiring power10_hw. Actually the power10_hw test I think requires current glibc to pick up the change that lets __builtin_cpu_is("power10") work. OK for trunk? Thanks, Aaron 2020-06-30 Rajalakshmi Srinivasaraghavan Aaron Sawdey gcc/testsuite/ * gcc.target/powerpc/mma-single-test.c: New file. * gcc.target/powerpc/mma-double-test.c: New file. --- .../gcc.target/powerpc/mma-double-test.c | 211 +++++++++++++++++ .../gcc.target/powerpc/mma-single-test.c | 220 ++++++++++++++++++ 2 files changed, 431 insertions(+) create mode 100755 gcc/testsuite/gcc.target/powerpc/mma-double-test.c create mode 100755 gcc/testsuite/gcc.target/powerpc/mma-single-test.c diff --git a/gcc/testsuite/gcc.target/powerpc/mma-double-test.c b/gcc/testsuite/gcc.target/powerpc/mma-double-test.c new file mode 100755 index 00000000000..e3807fa2eab --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/mma-double-test.c @@ -0,0 +1,211 @@ +/* { dg-do run } */ +/* { dg-require-effective-target power10_hw } */ +/* { dg-options "-Wno-psabi -mdejagnu-cpu=power10 -O2" } */ + +#include +#include +#include + +typedef unsigned char vec_t __attribute__ ((vector_size (16))); +typedef double v4sf_t __attribute__ ((vector_size (16))); +#define SAVE_ACC(ACC, ldc, J) \ + __builtin_mma_disassemble_acc (result, ACC); \ + rowC = (v4sf_t *) &CO[0*ldc+J]; \ + rowC[0] += result[3] ; \ + rowC = (v4sf_t *) &CO[1*ldc+J]; \ + rowC[0] += result[2] ; \ + rowC = (v4sf_t *) &CO[2*ldc+J]; \ + rowC[0] += result[1] ; \ + rowC = (v4sf_t *) &CO[3*ldc+J]; \ + rowC[0] += result[0] ; + +void +MMA (int m, int n, int k, double *A, double *B, double *C) +{ + __vector_quad acc0, acc1, acc2, acc3, acc4, acc5, acc6, acc7; + v4sf_t result[4]; + v4sf_t *rowC; + for (int l = 0; l < n; l += 4) + { + double *CO; + double *AO; + AO = A; + CO = C; + C += m * 4; + for (int j = 0; j < m; j += 16) + { + double *BO = B; + __builtin_mma_xxsetaccz (&acc0); + __builtin_mma_xxsetaccz (&acc1); + __builtin_mma_xxsetaccz (&acc2); + __builtin_mma_xxsetaccz (&acc3); + __builtin_mma_xxsetaccz (&acc4); + __builtin_mma_xxsetaccz (&acc5); + __builtin_mma_xxsetaccz (&acc6); + __builtin_mma_xxsetaccz (&acc7); + unsigned long i; + + for (i = 0; i < k; i++) + { + vec_t *rowA = (vec_t *) & AO[i * 16]; + __vector_pair rowB; + vec_t *rb = (vec_t *) & BO[i * 4]; + __builtin_mma_assemble_pair (&rowB, rb[1], rb[0]); + __builtin_mma_xvf64gerpp (&acc0, rowB, rowA[0]); + __builtin_mma_xvf64gerpp (&acc1, rowB, rowA[1]); + __builtin_mma_xvf64gerpp (&acc2, rowB, rowA[2]); + __builtin_mma_xvf64gerpp (&acc3, rowB, rowA[3]); + __builtin_mma_xvf64gerpp (&acc4, rowB, rowA[4]); + __builtin_mma_xvf64gerpp (&acc5, rowB, rowA[5]); + __builtin_mma_xvf64gerpp (&acc6, rowB, rowA[6]); + __builtin_mma_xvf64gerpp (&acc7, rowB, rowA[7]); + } + SAVE_ACC (&acc0, m, 0); + SAVE_ACC (&acc2, m, 4); + SAVE_ACC (&acc1, m, 2); + SAVE_ACC (&acc3, m, 6); + SAVE_ACC (&acc4, m, 8); + SAVE_ACC (&acc6, m, 12); + SAVE_ACC (&acc5, m, 10); + SAVE_ACC (&acc7, m, 14); + AO += k * 16; + BO += k * 4; + CO += 16; + } + B += k * 4; + } +} + +void +init (double *matrix, int row, int column) +{ + for (int j = 0; j < column; j++) + { + for (int i = 0; i < row; i++) + { + matrix[j * row + i] = (i * 16 + 2 + j) / 0.123; + } + } +} + +void +init0 (double *matrix, double *matrix1, int row, int column) +{ + for (int j = 0; j < column; j++) + for (int i = 0; i < row; i++) + matrix[j * row + i] = matrix1[j * row + i] = 0; +} + + +void +print (const char *name, const double *matrix, int row, int column) +{ + printf ("Matrix %s has %d rows and %d columns:\n", name, row, column); + for (int i = 0; i < row; i++) + { + for (int j = 0; j < column; j++) + { + printf ("%f ", matrix[j * row + i]); + } + printf ("\n"); + } + printf ("\n"); +} + +int +main (int argc, char *argv[]) +{ + int rowsA, colsB, common; + int i, j, k; + int ret = 0; + + for (int t = 16; t <= 128; t += 16) + { + for (int t1 = 4; t1 <= 16; t1 += 4) + { + rowsA = t; + colsB = t1; + common = 1; + /* printf ("Running test for rows = %d,cols = %d\n", t, t1); */ + double A[rowsA * common]; + double B[common * colsB]; + double C[rowsA * colsB]; + double D[rowsA * colsB]; + + + init (A, rowsA, common); + init (B, common, colsB); + init0 (C, D, rowsA, colsB); + MMA (rowsA, colsB, common, A, B, C); + + for (i = 0; i < colsB; i++) + { + for (j = 0; j < rowsA; j++) + { + D[i * rowsA + j] = 0; + for (k = 0; k < common; k++) + { + D[i * rowsA + j] += + A[k * rowsA + j] * B[k + common * i]; + } + } + } + for (i = 0; i < colsB; i++) + { + for (j = 0; j < rowsA; j++) + { + for (k = 0; k < common; k++) + { + if (D[i * rowsA + j] != C[i * rowsA + j]) + { + printf ("Error %d,%d,%d\n",i,j,k); + ret++; + } + } + } + } + /* + print ("A", A, rowsA, common); + print ("B", B, common, colsB); + print ("C", C, rowsA, colsB); + print ("D", D, rowsA, colsB); + */ + } + } + + if (ret) + printf ("MMA double test fail: %d errors\n",ret); +#ifdef VERBOSE +#ifdef __BUILTIN_CPU_SUPPORTS__ + else + { + printf ("MMA double test success: 0 MMA errors\n"); + + if ( __builtin_cpu_is ("power10")) + printf ("__builtin_cpu_is reports this is power10\n"); + else + { + printf ("Error: __builtin_cpu_is says this is not power10\n"); + ret++; + } + + if ( __builtin_cpu_supports ("arch_3_1")) + printf ("__builtin_cpu_supports arch_3_1\n"); + else + { + printf ("Error: __builtin_cpu_supports says arch_3_1 not supported.\n"); + ret++; + } + + if ( __builtin_cpu_supports ("mma")) + printf ("__builtin_cpu_supports mma\n"); + else + { + printf ("Error: __builtin_cpu_supports says mma not supported.\n"); + ret++; + } + } +#endif +#endif + return ret; +} diff --git a/gcc/testsuite/gcc.target/powerpc/mma-single-test.c b/gcc/testsuite/gcc.target/powerpc/mma-single-test.c new file mode 100755 index 00000000000..cb5f0ed3771 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/mma-single-test.c @@ -0,0 +1,220 @@ +/* { dg-do run } */ +/* { dg-require-effective-target power10_hw } */ +/* { dg-options "-Wno-psabi -mdejagnu-cpu=power10 -O2" } */ + +#include +#include +#include + +typedef unsigned char vec_t __attribute__ ((vector_size (16))); +typedef float v4sf_t __attribute__ ((vector_size (16))); +#define SAVE_ACC(ACC, ldc,J) \ + __builtin_mma_disassemble_acc (result, ACC); \ + rowC = (v4sf_t *) &CO[0*ldc+J]; \ + rowC[0] += result[3] ; \ + rowC = (v4sf_t *) &CO[1*ldc+J]; \ + rowC[0] += result[2] ; \ + rowC = (v4sf_t *) &CO[2*ldc+J]; \ + rowC[0] += result[1] ; \ + rowC = (v4sf_t *) &CO[3*ldc+J]; \ + rowC[0] += result[0] ; + +#define SAVE_ACC1(ACC,ldc, J) \ + __builtin_mma_disassemble_acc (result, ACC); \ + rowC = (v4sf_t *) &CO[4* ldc+J]; \ + rowC[0] += result[3] ; \ + rowC = (v4sf_t *) &CO[5*ldc+J]; \ + rowC[0] += result[2] ; \ + rowC = (v4sf_t *) &CO[6*ldc+J]; \ + rowC[0] += result[1] ; \ + rowC = (v4sf_t *) &CO[7*ldc+J]; \ + rowC[0] += result[0] ; +void +MMA (int m, int n, int k, float *A, float *B, float *C) +{ + __vector_quad acc0, acc1, acc2, acc3, acc4, acc5, acc6, acc7; + v4sf_t result[4]; + v4sf_t *rowC; + for (int l = 0; l < n; l += 8) + { + float *CO; + float *AO; + AO = A; + CO = C; + C += m * 8; + for (int j = 0; j < m; j += 16) + { + float *BO = B; + __builtin_mma_xxsetaccz (&acc0); + __builtin_mma_xxsetaccz (&acc1); + __builtin_mma_xxsetaccz (&acc2); + __builtin_mma_xxsetaccz (&acc3); + __builtin_mma_xxsetaccz (&acc4); + __builtin_mma_xxsetaccz (&acc5); + __builtin_mma_xxsetaccz (&acc6); + __builtin_mma_xxsetaccz (&acc7); + unsigned long i; + + for (i = 0; i < k; i++) + { + vec_t *rowA = (vec_t *) & AO[i * 16]; + vec_t *rowB = (vec_t *) & BO[i * 8]; + __builtin_mma_xvf32gerpp (&acc0, rowB[0], rowA[0]); + __builtin_mma_xvf32gerpp (&acc1, rowB[1], rowA[0]); + __builtin_mma_xvf32gerpp (&acc2, rowB[0], rowA[1]); + __builtin_mma_xvf32gerpp (&acc3, rowB[1], rowA[1]); + __builtin_mma_xvf32gerpp (&acc4, rowB[0], rowA[2]); + __builtin_mma_xvf32gerpp (&acc5, rowB[1], rowA[2]); + __builtin_mma_xvf32gerpp (&acc6, rowB[0], rowA[3]); + __builtin_mma_xvf32gerpp (&acc7, rowB[1], rowA[3]); + } + SAVE_ACC (&acc0, m, 0); + SAVE_ACC (&acc2, m, 4); + SAVE_ACC1 (&acc1, m, 0); + SAVE_ACC1 (&acc3, m, 4); + SAVE_ACC (&acc4, m, 8); + SAVE_ACC (&acc6, m, 12); + SAVE_ACC1 (&acc5, m, 8); + SAVE_ACC1 (&acc7, m, 12); + AO += k * 16; + BO += k * 8; + CO += 16; + } + B += k * 8; + } +} + +void +init (float *matrix, int row, int column) +{ + for (int j = 0; j < column; j++) + { + for (int i = 0; i < row; i++) + { + matrix[j * row + i] = (i * 16 + 2 + j) / 0.123; + } + } +} + +void +init0 (float *matrix, float *matrix1, int row, int column) +{ + for (int j = 0; j < column; j++) + for (int i = 0; i < row; i++) + matrix[j * row + i] = matrix1[j * row + i] = 0; +} + + +void +print (const char *name, const float *matrix, int row, int column) +{ + printf ("Matrix %s has %d rows and %d columns:\n", name, row, column); + for (int i = 0; i < row; i++) + { + for (int j = 0; j < column; j++) + { + printf ("%f ", matrix[j * row + i]); + } + printf ("\n"); + } + printf ("\n"); +} + +int +main (int argc, char *argv[]) +{ + int rowsA, colsB, common; + int i, j, k; + int ret = 0; + + for (int t = 16; t <= 128; t += 16) + { + for (int t1 = 8; t1 <= 16; t1 += 8) + { + rowsA = t; + colsB = t1; + common = 1; + /* printf ("Running test for rows = %d,cols = %d\n", t, t1); */ + float A[rowsA * common]; + float B[common * colsB]; + float C[rowsA * colsB]; + float D[rowsA * colsB]; + + + init (A, rowsA, common); + init (B, common, colsB); + init0 (C, D, rowsA, colsB); + MMA (rowsA, colsB, common, A, B, C); + + for (i = 0; i < colsB; i++) + { + for (j = 0; j < rowsA; j++) + { + D[i * rowsA + j] = 0; + for (k = 0; k < common; k++) + { + D[i * rowsA + j] += + A[k * rowsA + j] * B[k + common * i]; + } + } + } + for (i = 0; i < colsB; i++) + { + for (j = 0; j < rowsA; j++) + { + for (k = 0; k < common; k++) + { + if (D[i * rowsA + j] != C[i * rowsA + j]) + { + printf ("Error %d,%d,%d\n",i,j,k); + ret++; + } + } + } + } + if (ret) + { + print ("A", A, rowsA, common); + print ("B", B, common, colsB); + print ("C", C, rowsA, colsB); + print ("D", D, rowsA, colsB); + } + } + } + + if (ret) + printf ("MMA single test fail: %d errors\n",ret); +#ifdef VERBOSE +#ifdef __BUILTIN_CPU_SUPPORTS__ + else + { + printf ("MMA single test success: 0 MMA errors\n"); + + if ( __builtin_cpu_is ("power10")) + printf ("__builtin_cpu_is reports this is power10\n"); + else + { + printf ("Error: __builtin_cpu_is says this is not power10\n"); + ret++; + } + + if ( __builtin_cpu_supports ("arch_3_1")) + printf ("__builtin_cpu_supports arch_3_1\n"); + else + { + printf ("Error: __builtin_cpu_supports says arch_3_1 not supported.\n"); + ret++; + } + + if ( __builtin_cpu_supports ("mma")) + printf ("__builtin_cpu_supports mma\n"); + else + { + printf ("Error: __builtin_cpu_supports says mma not supported.\n"); + ret++; + } + } +#endif +#endif + return ret; +}