From patchwork Thu Oct 31 17:22:28 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1187616 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-512181-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="A9PcmvT8"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 473sb91FjNz9sP3 for ; Fri, 1 Nov 2019 04:22:43 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; q=dns; s= default; b=WhrvZ9syhJswKYM3b5dIaHbHfFN+3ByFj6Rzh/OXY3q39c6LrmqoB UfKi2TI4YEKHnif/gSe4z7Dqkit8ocWe/ksOO+/Eec4o53NMy3t+oGbBK7jUT6W9 ea+836TixRolrpr7Xi6g7eLYQX18PhHd4hWqyEq6FUOXXhl4ys8Pts= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; s= default; bh=SjQdnqZ7I3rLxXtwNCZ/ggEpIHw=; b=A9PcmvT8OLOmHgt9cern 9SFbHg9+Gq9BTfR2FiiTuyKWyEkeChD+luiCIAt+l2jAsxDaDCqn5X1n2rETwHpa CBBR0lmNxTcmIQ6f2B8nZi79EgDm9FLGl+V6uUYPVa750prdkpe1cICn1008ro+0 aua85A+E9qlHnTCV/QFB3+g= Received: (qmail 21906 invoked by alias); 31 Oct 2019 17:22:33 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 21890 invoked by uid 89); 31 Oct 2019 17:22:33 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-9.2 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS autolearn=ham version=3.3.1 spammy= X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.110.172) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 31 Oct 2019 17:22:31 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8A7381FB for ; Thu, 31 Oct 2019 10:22:29 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 2F5D93F6C4 for ; Thu, 31 Oct 2019 10:22:29 -0700 (PDT) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [committed][AArch64] Split gcc.target/aarch64/sve/reduc_strict_3.c Date: Thu, 31 Oct 2019 17:22:28 +0000 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 X-IsSubscribed: yes This patch splits gcc.target/aarch64/sve/reduc_strict_3.c into one test per function, so that it's easier to see what each scan is matching and also so that we no longer rely on the number of times that each dump message is printed. The patch also generalises the tests to work with scalable vectors. I think the test probably predates support for variable-length loop-aware SLP. Tested on aarch64-linux-gnu and applied as r277681. Richard 2019-10-31 Richard Sandiford gcc/testsuite/ * gcc.target/aarch64/sve/reduc_strict_3.c: Split all but the first function out into... * gcc.target/aarch64/sve/reduc_strict_4.c, * gcc.target/aarch64/sve/reduc_strict_5.c, * gcc.target/aarch64/sve/reduc_strict_6.c, * gcc.target/aarch64/sve/reduc_strict_7.c, * gcc.target/aarch64/sve/reduc_strict_8.c, * gcc.target/aarch64/sve/reduc_strict_9.c: ...these new tests. Test for scalable vectors instead of 256-bit vectors. Index: gcc/testsuite/gcc.target/aarch64/sve/reduc_strict_3.c =================================================================== --- gcc/testsuite/gcc.target/aarch64/sve/reduc_strict_3.c 2019-10-31 17:15:21.594544316 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve/reduc_strict_3.c 2019-10-31 17:20:02.404591908 +0000 @@ -1,12 +1,7 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -ftree-vectorize -fno-inline -msve-vector-bits=256 -fdump-tree-vect-details" } */ -/* Disabling epilogues until we find a better way to deal with scans. */ -/* { dg-additional-options "--param vect-epilogues-nomask=0" } */ +/* { dg-options "-O2 -ftree-vectorize" } */ -double mat[100][4]; -double mat2[100][8]; -double mat3[100][12]; -double mat4[100][3]; +double mat[100][2]; double slp_reduc_plus (int n) @@ -16,116 +11,8 @@ slp_reduc_plus (int n) { tmp = tmp + mat[i][0]; tmp = tmp + mat[i][1]; - tmp = tmp + mat[i][2]; - tmp = tmp + mat[i][3]; } return tmp; } -double -slp_reduc_plus2 (int n) -{ - double tmp = 0.0; - for (int i = 0; i < n; i++) - { - tmp = tmp + mat2[i][0]; - tmp = tmp + mat2[i][1]; - tmp = tmp + mat2[i][2]; - tmp = tmp + mat2[i][3]; - tmp = tmp + mat2[i][4]; - tmp = tmp + mat2[i][5]; - tmp = tmp + mat2[i][6]; - tmp = tmp + mat2[i][7]; - } - return tmp; -} - -double -slp_reduc_plus3 (int n) -{ - double tmp = 0.0; - for (int i = 0; i < n; i++) - { - tmp = tmp + mat3[i][0]; - tmp = tmp + mat3[i][1]; - tmp = tmp + mat3[i][2]; - tmp = tmp + mat3[i][3]; - tmp = tmp + mat3[i][4]; - tmp = tmp + mat3[i][5]; - tmp = tmp + mat3[i][6]; - tmp = tmp + mat3[i][7]; - tmp = tmp + mat3[i][8]; - tmp = tmp + mat3[i][9]; - tmp = tmp + mat3[i][10]; - tmp = tmp + mat3[i][11]; - } - return tmp; -} - -void -slp_non_chained_reduc (int n, double * restrict out) -{ - for (int i = 0; i < 3; i++) - out[i] = 0; - - for (int i = 0; i < n; i++) - { - out[0] = out[0] + mat4[i][0]; - out[1] = out[1] + mat4[i][1]; - out[2] = out[2] + mat4[i][2]; - } -} - -/* Strict FP reductions shouldn't be used for the outer loops, only the - inner loops. */ - -float -double_reduc1 (float (*restrict i)[16]) -{ - float l = 0; - -#pragma GCC unroll 0 - for (int a = 0; a < 8; a++) - for (int b = 0; b < 8; b++) - l += i[b][a]; - return l; -} - -float -double_reduc2 (float *restrict i) -{ - float l = 0; - - for (int a = 0; a < 8; a++) - for (int b = 0; b < 16; b++) - { - l += i[b * 4]; - l += i[b * 4 + 1]; - l += i[b * 4 + 2]; - l += i[b * 4 + 3]; - } - return l; -} - -float -double_reduc3 (float *restrict i, float *restrict j) -{ - float k = 0, l = 0; - - for (int a = 0; a < 8; a++) - for (int b = 0; b < 8; b++) - { - k += i[b]; - l += j[b]; - } - return l * k; -} - -/* { dg-final { scan-assembler-times {\tfadda\ts[0-9]+, p[0-7], s[0-9]+, z[0-9]+\.s} 4 } } */ -/* { dg-final { scan-assembler-times {\tfadda\td[0-9]+, p[0-7], d[0-9]+, z[0-9]+\.d} 9 } } */ -/* 1 reduction each for double_reduc{1,2} and 2 for double_reduc3. Each one - is reported three times, once for SVE, once for 128-bit AdvSIMD and once - for 64-bit AdvSIMD. */ -/* { dg-final { scan-tree-dump-times "Detected double reduction" 12 "vect" } } */ -/* double_reduc2 has 2 reductions and slp_non_chained_reduc has 3. */ -/* { dg-final { scan-tree-dump-times "Detected reduction" 10 "vect" } } */ +/* { dg-final { scan-assembler-times {\tfadda\td[0-9]+, p[0-7], d[0-9]+, z[0-9]+\.d\n} 1 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/reduc_strict_4.c =================================================================== --- /dev/null 2019-09-17 11:41:18.176664108 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/reduc_strict_4.c 2019-10-31 17:20:02.404591908 +0000 @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +double mat[100][8]; + +double +slp_reduc_plus (int n) +{ + double tmp = 0.0; + for (int i = 0; i < n; i++) + { + tmp = tmp + mat[i][0]; + tmp = tmp + mat[i][1]; + tmp = tmp + mat[i][2]; + tmp = tmp + mat[i][3]; + tmp = tmp + mat[i][4]; + tmp = tmp + mat[i][5]; + tmp = tmp + mat[i][6]; + tmp = tmp + mat[i][7]; + } + return tmp; +} + +/* { dg-final { scan-assembler-times {\tfadda\td[0-9]+, p[0-7], d[0-9]+, z[0-9]+\.d} 4 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/reduc_strict_5.c =================================================================== --- /dev/null 2019-09-17 11:41:18.176664108 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/reduc_strict_5.c 2019-10-31 17:20:02.404591908 +0000 @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +double mat[100][12]; + +double +slp_reduc_plus (int n) +{ + double tmp = 0.0; + for (int i = 0; i < n; i++) + { + tmp = tmp + mat[i][0]; + tmp = tmp + mat[i][1]; + tmp = tmp + mat[i][2]; + tmp = tmp + mat[i][3]; + tmp = tmp + mat[i][4]; + tmp = tmp + mat[i][5]; + tmp = tmp + mat[i][6]; + tmp = tmp + mat[i][7]; + tmp = tmp + mat[i][8]; + tmp = tmp + mat[i][9]; + tmp = tmp + mat[i][10]; + tmp = tmp + mat[i][11]; + } + return tmp; +} + +/* { dg-final { scan-assembler-times {\tfadda\td[0-9]+, p[0-7], d[0-9]+, z[0-9]+\.d} 6 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/reduc_strict_6.c =================================================================== --- /dev/null 2019-09-17 11:41:18.176664108 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/reduc_strict_6.c 2019-10-31 17:20:02.404591908 +0000 @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +double mat[100][4]; +double mat2[100][8]; +double mat3[100][12]; +double mat4[100][3]; + +void +slp_non_chained_reduc (int n, double * restrict out) +{ + for (int i = 0; i < 3; i++) + out[i] = 0; + + for (int i = 0; i < n; i++) + { + out[0] = out[0] + mat4[i][0]; + out[1] = out[1] + mat4[i][1]; + out[2] = out[2] + mat4[i][2]; + } +} + +/* { dg-final { scan-assembler-times {\tld3d\t} 1 } } */ +/* { dg-final { scan-assembler-times {\tfadda\td[0-9]+, p[0-7], d[0-9]+, z[0-9]+\.d} 3 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/reduc_strict_7.c =================================================================== --- /dev/null 2019-09-17 11:41:18.176664108 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/reduc_strict_7.c 2019-10-31 17:20:02.404591908 +0000 @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */ + +/* Strict FP reductions shouldn't be used for the outer loop, only the + inner loop. */ + +float +double_reduc (float (*i)[16]) +{ + float l = 0; + +#pragma GCC unroll 0 + for (int a = 0; a < 8; a++) + for (int b = 0; b < 100; b++) + l += i[b][a]; + return l; +} + +/* { dg-final { scan-assembler-times {\tfadda\ts[0-9]+, p[0-7], s[0-9]+, z[0-9]+\.s\n} 1 } } */ +/* { dg-final { scan-tree-dump "Detected double reduction" "vect" } } */ +/* { dg-final { scan-tree-dump-not "OUTER LOOP VECTORIZED" "vect" } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/reduc_strict_8.c =================================================================== --- /dev/null 2019-09-17 11:41:18.176664108 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/reduc_strict_8.c 2019-10-31 17:20:02.404591908 +0000 @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */ + +float +double_reduc (float *i) +{ + float l = 0; + + for (int a = 0; a < 8; a++) + for (int b = 0; b < 16; b++) + { + l += i[b * 4]; + l += i[b * 4 + 1]; + l += i[b * 4 + 2]; + l += i[b * 4 + 3]; + } + return l; +} + +/* { dg-final { scan-assembler-times {\tfadda\ts[0-9]+, p[0-7], s[0-9]+, z[0-9]+\.s\n} 1 } } */ +/* { dg-final { scan-tree-dump "Detected double reduction" "vect" } } */ +/* { dg-final { scan-tree-dump-not "OUTER LOOP VECTORIZED" "vect" } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/reduc_strict_9.c =================================================================== --- /dev/null 2019-09-17 11:41:18.176664108 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/reduc_strict_9.c 2019-10-31 17:20:02.404591908 +0000 @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */ + +float +double_reduc (float *i, float *j) +{ + float k = 0, l = 0; + + for (int a = 0; a < 8; a++) + for (int b = 0; b < 100; b++) + { + k += i[b]; + l += j[b]; + } + return l * k; +} + +/* { dg-final { scan-assembler-times {\tld1w\t} 2 } } */ +/* { dg-final { scan-assembler-times {\tfadda\ts[0-9]+, p[0-7], s[0-9]+, z[0-9]+\.s\n} 2 } } */ +/* { dg-final { scan-tree-dump "Detected double reduction" "vect" } } */ +/* { dg-final { scan-tree-dump-not "OUTER LOOP VECTORIZED" "vect" } } */