From patchwork Thu Jun 1 06:45:09 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 769497 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3wddMX0yrjz9sCX for ; Thu, 1 Jun 2017 16:52:54 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="pSN5Uy8o"; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; q=dns; s= default; b=NmgKkCSSPi4MX1GDmURidrywJ9J7Mkg/EJBatmsZbHhlLXUdtJkd4 djR9+8x8ZvkGlnQSGX/tPAzrNHD2iLY/yMwevn+BD/9KbEYR2a9mxrES2ek1aQ+x vaTvS5GcgbcGnVDs6QqLGIJXlMlJxsp2gyznzFs6bkpo2EQx4lbfT4= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; s= default; bh=pKMfywwZnNODmYDeWQPcc2c55Ms=; b=pSN5Uy8oxoYdtjxo3fIO j1iwsGV7ZSFvn8Hh9ILZcbwUipAet8/dzD/gMi+iBEqRchu93VbDh8MAdtSbQbHc pFsHCgvjwb4JO8gW5+IFiZU84Z0WTJAjF6/9qKXAeaI6CdARXt/xvr5c0V2VGVAU FAAJeQGN0G5jjrQeeM68JPI= Received: (qmail 123385 invoked by alias); 1 Jun 2017 06:47:20 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 92845 invoked by uid 89); 1 Jun 2017 06:46:00 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-9.0 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_NONE, RCVD_IN_SORBS_SPAM, RCVD_IN_SORBS_WEB, SPF_PASS autolearn=ham version=3.3.2 spammy=assignments, validity, latch X-HELO: mail-wm0-f43.google.com Received: from mail-wm0-f43.google.com (HELO mail-wm0-f43.google.com) (74.125.82.43) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 01 Jun 2017 06:45:12 +0000 Received: by mail-wm0-f43.google.com with SMTP id 7so144580469wmo.1 for ; Wed, 31 May 2017 23:45:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:mail-followup-to:subject:date:message-id :user-agent:mime-version; bh=TYBgLavMQjL2e+3jH+GB4b8WqIAV9V3g3aPEQmdL+bY=; b=kVTmhdyI2DSGg74AL36d7/GTWGNYQxc5kkmo4clx0gLrVBvnPLbjgaOScdcjc9Ixtr 5rYPXpxSi5jUqcXWddUwbA7mgJ099g46aYTTMNuNuUzMlN2UJ2C7RWBtj/AKOVh4kAOz mfGE/mqhkwl3WyZy+lcCcmwNgns5MdYoGtIqjfnfrv880vEEr2xRB/RpuYJ/R5YZnqp2 iFgqLktk48OujRUArw1oazw9AOeMehMZqzC0jMfkoL18m09+rzcNI4VUXSOuX34koiYN OWDigujE4q8zTz/2B6bZURv1Pk31gJTOXkbCSiHvWNkcu6ScNJpytOE7HHoZffawcdHH ZDCg== X-Gm-Message-State: AODbwcDdCYIsLzDPgb7puiIcMzoYmYiRIDAIPd/uy8Brg3aHywAs7x0M J9P22NgluCuQsQ47Tc3G5A== X-Received: by 10.223.163.12 with SMTP id c12mr19964324wrb.56.1496299514194; Wed, 31 May 2017 23:45:14 -0700 (PDT) Received: from localhost (92.40.248.228.threembb.co.uk. [92.40.248.228]) by smtp.gmail.com with ESMTPSA id p76sm21598138wma.15.2017.05.31.23.45.12 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 31 May 2017 23:45:13 -0700 (PDT) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@linaro.org Subject: Handle unpropagated assignments in SLP Date: Thu, 01 Jun 2017 07:45:09 +0100 Message-ID: <87h900yznu.fsf@linaro.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Some of the SVE patches extend SLP to predicated operations created by ifcvt. However, ifcvt currently forces the mask into a temporary: mask = ifc_temp_var (TREE_TYPE (mask), mask, &gsi); and at the moment SLP doesn't handle simple assignments like: SSA_NAME = SSA_NAME SSA_NAME = (It does of course handle: SSA_NAME = SSA_NAME op SSA_NAME SSA_NAME = SSA_NAME op ) I realise copy propagation should usually ensure that these simple assignments don't occur, but normal loop vectorisation handles them just fine, and SLP does too once we get over the initial validity check. I thought this patch might be useful even if we decide that we don't want ifcvt to create a temporary mask in such cases. Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? Thanks, Richard 2017-06-01 Richard Sandiford gcc/ * tree-vect-slp.c (vect_build_slp_tree_1): Allow mixtures of SSA names and constants, without treating them as separate operations. Explicitly reject stores. gcc/testsuite/ * gcc.dg/vect/slp-temp-1.c: New test. * gcc.dg/vect/slp-temp-2.c: Likewise. * gcc.dg/vect/slp-temp-3.c: Likewise. Index: gcc/tree-vect-slp.c =================================================================== --- gcc/tree-vect-slp.c 2017-05-18 07:51:12.387750673 +0100 +++ gcc/tree-vect-slp.c 2017-06-01 07:21:44.094320070 +0100 @@ -671,6 +671,13 @@ vect_build_slp_tree_1 (vec_info *vinfo, first_op1 = gimple_assign_rhs2 (stmt); } } + else if ((TREE_CODE_CLASS (rhs_code) == tcc_constant + || rhs_code == SSA_NAME) + && (TREE_CODE_CLASS (first_stmt_code) == tcc_constant + || first_stmt_code == SSA_NAME)) + /* Merging two simple rvalues is OK and doesn't count as two + operations. */ + ; else { if (first_stmt_code != rhs_code @@ -800,11 +807,14 @@ vect_build_slp_tree_1 (vec_info *vinfo, } /* Not memory operation. */ - if (TREE_CODE_CLASS (rhs_code) != tcc_binary - && TREE_CODE_CLASS (rhs_code) != tcc_unary - && TREE_CODE_CLASS (rhs_code) != tcc_expression - && TREE_CODE_CLASS (rhs_code) != tcc_comparison - && rhs_code != CALL_EXPR) + if (REFERENCE_CLASS_P (lhs) + || (TREE_CODE_CLASS (rhs_code) != tcc_binary + && TREE_CODE_CLASS (rhs_code) != tcc_unary + && TREE_CODE_CLASS (rhs_code) != tcc_expression + && TREE_CODE_CLASS (rhs_code) != tcc_comparison + && TREE_CODE_CLASS (rhs_code) != tcc_constant + && rhs_code != CALL_EXPR + && rhs_code != SSA_NAME)) { if (dump_enabled_p ()) { Index: gcc/testsuite/gcc.dg/vect/slp-temp-1.c =================================================================== --- /dev/null 2017-06-01 07:09:35.344016119 +0100 +++ gcc/testsuite/gcc.dg/vect/slp-temp-1.c 2017-06-01 07:39:24.406603119 +0100 @@ -0,0 +1,71 @@ +/* { dg-do run { target { lp64 || ilp32 } } } */ +/* { dg-additional-options "-fgimple -fno-tree-copy-prop" } */ + +void __GIMPLE (startwith ("loop")) +f (int *x, int n) +{ + int i_in; + int i_out; + int double_i; + + long unsigned int index_0; + long unsigned int offset_0; + int *addr_0; + int temp_0; + + long unsigned int index_1; + long unsigned int offset_1; + int *addr_1; + int temp_1; + + entry: + goto loop; + + loop: + i_in = __PHI (entry: 0, latch: i_out); + double_i = i_in * 2; + + index_0 = (long unsigned int) double_i; + offset_0 = index_0 * 4ul; + addr_0 = x_1(D) + offset_0; + temp_0 = 1; + *addr_0 = temp_0; + + index_1 = index_0 + 1ul; + offset_1 = index_1 * 4ul; + addr_1 = x_1(D) + offset_1; + temp_1 = 3; + *addr_1 = temp_1; + + i_out = i_in + 1; + if (n_2(D) > i_out) + goto latch; + else + goto exit; + + latch: + goto loop; + + exit: + return; +} + +#define N 1024 + +int +main (void) +{ + int a[N * 2]; + f (a, N); + for (int i = 0; i < N; ++i) + { + if (a[i * 2] != 1 + || a[i * 2 + 1] != 3) + __builtin_abort (); + asm volatile ("" ::: "memory"); + } + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_int } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_int } } } */ Index: gcc/testsuite/gcc.dg/vect/slp-temp-2.c =================================================================== --- /dev/null 2017-06-01 07:09:35.344016119 +0100 +++ gcc/testsuite/gcc.dg/vect/slp-temp-2.c 2017-06-01 07:39:30.914219838 +0100 @@ -0,0 +1,71 @@ +/* { dg-do run { target { lp64 || ilp32 } } } */ +/* { dg-additional-options "-fgimple -fno-tree-copy-prop" } */ + +void __GIMPLE (startwith ("loop")) +f (int *x, int n, int foo) +{ + int i_in; + int i_out; + int double_i; + + long unsigned int index_0; + long unsigned int offset_0; + int *addr_0; + int temp_0; + + long unsigned int index_1; + long unsigned int offset_1; + int *addr_1; + int temp_1; + + entry: + goto loop; + + loop: + i_in = __PHI (entry: 0, latch: i_out); + double_i = i_in * 2; + + index_0 = (long unsigned int) double_i; + offset_0 = index_0 * 4ul; + addr_0 = x_1(D) + offset_0; + temp_0 = 1; + *addr_0 = temp_0; + + index_1 = index_0 + 1ul; + offset_1 = index_1 * 4ul; + addr_1 = x_1(D) + offset_1; + temp_1 = foo_2(D); + *addr_1 = temp_1; + + i_out = i_in + 1; + if (n_3(D) > i_out) + goto latch; + else + goto exit; + + latch: + goto loop; + + exit: + return; +} + +#define N 1024 + +int +main (void) +{ + int a[N * 2]; + f (a, N, 11); + for (int i = 0; i < N; ++i) + { + if (a[i * 2] != 1 + || a[i * 2 + 1] != 11) + __builtin_abort (); + asm volatile ("" ::: "memory"); + } + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_int } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_int } } } */ Index: gcc/testsuite/gcc.dg/vect/slp-temp-3.c =================================================================== --- /dev/null 2017-06-01 07:09:35.344016119 +0100 +++ gcc/testsuite/gcc.dg/vect/slp-temp-3.c 2017-06-01 07:39:40.925589561 +0100 @@ -0,0 +1,84 @@ +/* { dg-do run { target { lp64 || ilp32 } } } */ +/* { dg-additional-options "-fgimple -fno-tree-copy-prop" } */ + +void __GIMPLE (startwith ("loop")) __attribute__ ((noinline, noclone)) +f (int *x, int n) +{ + int i_in; + int i_out; + int double_i; + + long unsigned int index_0; + long unsigned int offset_0; + int *addr_0; + int old_0; + int new_0; + int temp_0; + + long unsigned int index_1; + long unsigned int offset_1; + int *addr_1; + int old_1; + int new_1; + int temp_1; + + entry: + goto loop; + + loop: + i_in = __PHI (entry: 0, latch: i_out); + double_i = i_in * 2; + + index_0 = (long unsigned int) double_i; + offset_0 = index_0 * 4ul; + addr_0 = x_1(D) + offset_0; + old_0 = *addr_0; + new_0 = old_0 + 1; + temp_0 = new_0; + *addr_0 = temp_0; + + index_1 = index_0 + 1ul; + offset_1 = index_1 * 4ul; + addr_1 = x_1(D) + offset_1; + old_1 = *addr_1; + new_1 = old_1 + 3; + temp_1 = new_1; + *addr_1 = temp_1; + + i_out = i_in + 1; + if (n_2(D) > i_out) + goto latch; + else + goto exit; + + latch: + goto loop; + + exit: + return; +} + +#define N 1024 + +int +main (void) +{ + int a[N * 2]; + for (int i = 0; i < N * 2; ++i) + { + a[i] = i * 4; + asm volatile ("" ::: "memory"); + } + f (a, N); + for (int i = 0; i < N; ++i) + { + if (a[i * 2] != i * 8 + 1 + || a[i * 2 + 1] != i * 8 + 7) + __builtin_abort (); + asm volatile ("" ::: "memory"); + } + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_int } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_int } } } */