From patchwork Fri Jan 8 11:30:16 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Lawrence X-Patchwork-Id: 564735 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 9577F1401CD for ; Fri, 8 Jan 2016 22:30:53 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=BmAh5WH2; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references; q=dns; s= default; b=W/XiFwZbM/IDzchOgNw9QBBNIRoxRHxKBlEh3uKgt2cnbpU5U1QZb mTt/3HpvUYwin3eiGP9sCAGBWL+1Z1Rizv6YdCbJbVeJsiEF0ENU5np7an2hqD+u ZGfp4T3xq0yI4XR3E3Lv8jjTLG42Vgjg0DuAIIu5iK6U0ucJgTLdCw= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references; s= default; bh=XhxxprSb0tvcXOjAyebWuq/aVdc=; b=BmAh5WH2pIgaf99ATYcI GamyfAY18iKnV+q66Bi7CcvhRP5keNzD4WhjqsISZyMSbWn1ag3PQa2rcis1VDlT 8KMnWKHckKjoQzbUuUuSS7IqGhngyRhSOZiSaaGWZpCyzbL567T+j16fK65M08ls qoabVYszp/mm3QxxVVxb5KU= Received: (qmail 111339 invoked by alias); 8 Jan 2016 11:30:44 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 111312 invoked by uid 89); 8 Jan 2016 11:30:42 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.7 required=5.0 tests=AWL, BAYES_20, RP_MATCHES_RCVD, SPF_PASS autolearn=ham version=3.3.2 spammy=725, istarget, cancelled, aarch64*-*-* X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 08 Jan 2016 11:30:40 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B702649; Fri, 8 Jan 2016 03:30:05 -0800 (PST) Received: from arm.com (e105915-lin.emea.arm.com [10.2.206.30]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id E97643F24D; Fri, 8 Jan 2016 03:30:37 -0800 (PST) From: Alan Lawrence To: gcc-patches@gcc.gnu.org Cc: rguenther@suse.de, richard.earnshaw@arm.com, marcus.shawcroft@arm.com, ramana.radhakrishnan@arm.com Subject: [PATCH] Fix PR68707 Date: Fri, 8 Jan 2016 11:30:16 +0000 Message-Id: <1452252616-15125-1-git-send-email-alan.lawrence@arm.com> In-Reply-To: References: X-IsSubscribed: yes Here's an alternative patch, using the hunk from https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68707#c16, which 'improves' (heh, we think) on the previous by allowing SLP to proceed where the loads are strided, for example, slp-perm-11.c. Also I fix the testsuite failures (looking for SLP where we've now decided to do load/store-lanes instead). I used simple scan-tree-dump (rather than -times) because load_lanes typically appears many times in the log for each instance output, and it felt fragile to look for a specific number. Note, this doesn't fix PR67323 - that's an example where unpermuted SLP is still (a bit) worse than load-lanes, as it needs unrolling * 2. The previously-posted patch doesn't fix this either, however (https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01620.html). Bootstrapped + check-gcc + check-g++ on x86_64 (no changes), arm and aarch64 (various new scan-tree-dump tests all passing, and PASS->FAIL for O3-pr36098.c). gcc/ChangeLog: 2016-01-XX Alan Lawrence Richard Biener * tree-vect-slp.c (vect_analyze_slp_instance): Cancel permuted SLP instances that can be handled via vect_load_lanes. gcc/testsuite/ChangeLog: * lib/target-supports.exp (check_effective_target_vect_load_lanes): New. * gcc.dg/vect/slp-perm-1.c: Look for vect_load_lanes instead of SLP on platforms supporting it. * gcc.dg/vect/slp-perm-2.c: Likewise. * gcc.dg/vect/slp-perm-3.c: Likewise. * gcc.dg/vect/slp-perm-5.c: Likewise. * gcc.dg/vect/slp-perm-7.c: Likewise. * gcc.dg/vect/slp-perm-8.c: Likewise. * gcc.dg/vect/slp-perm-6.c: Look for vect_load_lanes in addition to SLP on platforms supporting it. --- gcc/testsuite/gcc.dg/vect/slp-perm-1.c | 6 +++++- gcc/testsuite/gcc.dg/vect/slp-perm-2.c | 7 +++++-- gcc/testsuite/gcc.dg/vect/slp-perm-3.c | 7 +++++-- gcc/testsuite/gcc.dg/vect/slp-perm-5.c | 7 +++++-- gcc/testsuite/gcc.dg/vect/slp-perm-6.c | 7 +++++-- gcc/testsuite/gcc.dg/vect/slp-perm-7.c | 8 +++++--- gcc/testsuite/gcc.dg/vect/slp-perm-8.c | 7 +++++-- gcc/testsuite/lib/target-supports.exp | 19 +++++++++++++++++++ gcc/tree-vect-slp.c | 30 ++++++++++++++++++++++++++++++ 9 files changed, 84 insertions(+), 14 deletions(-) diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-1.c b/gcc/testsuite/gcc.dg/vect/slp-perm-1.c index 2b6c134..6d0d66f 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-perm-1.c +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-1.c @@ -58,5 +58,9 @@ int main (int argc, const char* argv[]) } /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_perm } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_perm } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { vect_perm && {! vect_load_lanes } } } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target vect_load_lanes } } } */ +/* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use load/store-lanes" { target { vect_perm && vect_load_lanes } } } } */ +/* { dg-final { scan-tree-dump "LOAD_LANES" "vect" { target vect_load_lanes } } } */ +/* { dg-final { scan-tree-dump "STORE_LANES" "vect" { target vect_load_lanes } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-2.c b/gcc/testsuite/gcc.dg/vect/slp-perm-2.c index da50e16..34b413d 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-perm-2.c +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-2.c @@ -54,5 +54,8 @@ int main (int argc, const char* argv[]) } /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_perm } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_perm } } } */ - +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { vect_perm && {! vect_load_lanes } } } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target vect_load_lanes } } } */ +/* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use load/store-lanes" { target { vect_perm && vect_load_lanes } } } } */ +/* { dg-final { scan-tree-dump "LOAD_LANES" "vect" { target vect_load_lanes } } } */ +/* { dg-final { scan-tree-dump "STORE_LANES" "vect" { target vect_load_lanes } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-3.c b/gcc/testsuite/gcc.dg/vect/slp-perm-3.c index 1d33f9b..ec13e0f 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-perm-3.c +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-3.c @@ -67,6 +67,9 @@ int main (int argc, const char* argv[]) } /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_perm } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_perm } } } */ - +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { vect_perm && {! vect_load_lanes } } } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target vect_load_lanes } } } */ +/* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use load/store-lanes" { target { vect_perm && vect_load_lanes } } } } */ +/* { dg-final { scan-tree-dump "LOAD_LANES" "vect" { target vect_load_lanes } } } */ +/* { dg-final { scan-tree-dump "STORE_LANES" "vect" { target vect_load_lanes } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-5.c b/gcc/testsuite/gcc.dg/vect/slp-perm-5.c index 9ed4d724..f44f44f 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-perm-5.c +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-5.c @@ -73,6 +73,9 @@ int main (int argc, const char* argv[]) } /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_perm } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_perm } } } */ - +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { vect_perm && {! vect_load_lanes } } } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target vect_load_lanes } } } */ +/* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use load/store-lanes" { target { vect_perm && vect_load_lanes } } } } */ +/* { dg-final { scan-tree-dump "LOAD_LANES" "vect" { target vect_load_lanes } } } */ +/* { dg-final { scan-tree-dump "STORE_LANES" "vect" { target vect_load_lanes } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-6.c b/gcc/testsuite/gcc.dg/vect/slp-perm-6.c index 392d36f..551734a 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-perm-6.c +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-6.c @@ -72,5 +72,8 @@ int main (int argc, const char* argv[]) } /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_perm } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_perm } } } */ - +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { vect_perm && {! vect_load_lanes } } } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_load_lanes } } } */ +/* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use load/store-lanes" { target { vect_perm && vect_load_lanes } } } } */ +/* { dg-final { scan-tree-dump "LOAD_LANES" "vect" { target vect_load_lanes } } } */ +/* { dg-final { scan-tree-dump "STORE_LANES" "vect" { target vect_load_lanes } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-7.c b/gcc/testsuite/gcc.dg/vect/slp-perm-7.c index 4352334..8b66142 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-perm-7.c +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-7.c @@ -71,6 +71,8 @@ int main (int argc, const char* argv[]) } /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target vect_perm } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_perm } } } */ - - +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { vect_perm && {! vect_load_lanes } } } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target vect_load_lanes } } } */ +/* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use load/store-lanes" { target { vect_perm && vect_load_lanes } } } } */ +/* { dg-final { scan-tree-dump "LOAD_LANES" "vect" { target vect_load_lanes } } } */ +/* { dg-final { scan-tree-dump "STORE_LANES" "vect" { target vect_load_lanes } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-8.c b/gcc/testsuite/gcc.dg/vect/slp-perm-8.c index 3bdd150..b1e9a10 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-perm-8.c +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-8.c @@ -54,5 +54,8 @@ int main (int argc, const char* argv[]) /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target { vect_perm_byte && vect_char_mult } } } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_perm_byte && {! vect_char_mult } } } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_perm_byte } } } */ - +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { vect_perm_byte && {! vect_load_lanes } } } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target vect_load_lanes } } } */ +/* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use load/store-lanes" { target { vect_perm_byte && vect_load_lanes } } } } */ +/* { dg-final { scan-tree-dump "LOAD_LANES" "vect" { target vect_load_lanes } } } */ +/* { dg-final { scan-tree-dump "STORE_LANES" "vect" { target vect_load_lanes } } } */ diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 4e349e9..0e6d2b6 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -4823,6 +4823,25 @@ proc check_effective_target_vect_element_align { } { return $et_vect_element_align } +# Return 1 if the target supports vector LOAD_LANES operations, 0 otherwise. + +proc check_effective_target_vect_load_lanes { } { + global et_vect_load_lanes + + if [info exists et_vect_load_lanes] { + verbose "check_effective_target_vect_load_lanes: using cached result" 2 + } else { + set et_vect_load_lanes 0 + if { ([istarget arm*-*-*] && [check_effective_target_arm_neon_ok]) + || [istarget aarch64*-*-*] } { + set et_vect_load_lanes 1 + } + } + + verbose "check_effective_target_vect_load_lanes: returning $et_vect_load_lanes" 2 + return $et_vect_load_lanes +} + # Return 1 if the target supports vector conditional operations, 0 otherwise. proc check_effective_target_vect_condition { } { diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c index b893682..b53767f 100644 --- a/gcc/tree-vect-slp.c +++ b/gcc/tree-vect-slp.c @@ -1793,6 +1793,36 @@ vect_analyze_slp_instance (vec_info *vinfo, } } + /* If the loads and stores can be handled with load/store-lane + instructions do not generate this SLP instance. */ + if (is_a (vinfo) + && loads_permuted + && dr && vect_store_lanes_supported (vectype, group_size)) + { + slp_tree load_node; + FOR_EACH_VEC_ELT (loads, i, load_node) + { + gimple *first_stmt = GROUP_FIRST_ELEMENT + (vinfo_for_stmt (SLP_TREE_SCALAR_STMTS (load_node)[0])); + stmt_vec_info stmt_vinfo = vinfo_for_stmt (first_stmt); + /* Use SLP for strided accesses (or if we can't load-lanes). */ + if (STMT_VINFO_STRIDED_P (stmt_vinfo) + || ! vect_load_lanes_supported + (STMT_VINFO_VECTYPE (stmt_vinfo), + GROUP_SIZE (stmt_vinfo))) + break; + } + if (i == loads.length ()) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "Built SLP cancelled: can use " + "load/store-lanes\n"); + vect_free_slp_instance (new_instance); + return false; + } + } + vinfo->slp_instances.safe_push (new_instance); if (dump_enabled_p ())