From patchwork Fri May 17 06:20:55 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 244504 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "localhost", Issuer "www.qmailtoaster.com" (not verified)) by ozlabs.org (Postfix) with ESMTPS id 33FB52C009D for ; Fri, 17 May 2013 16:21:30 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:reply-to:references:mime-version :content-type:in-reply-to; q=dns; s=default; b=DhZOQctF0VMdGMVbD g7YOTaqudAzlCFwWJzEg4vvK5hzGxQEtm/Qz/QSj73iuTFYDssbt/DKDVSsXwAij X0czMFQFilLC66IdduUbWPdY27E7M08CNjfZP2KEElo0RXoYJD7yHn3JzdVG1cDm Nli1BZNBMkoO6Trxc2cuZG7XYg= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:reply-to:references:mime-version :content-type:in-reply-to; s=default; bh=tgGQgTpp97HAGzN2lyZ+Rif 71Os=; b=hzhyohddtBj9himlLsgklsBPVdFM9kftftQzDhtFzhrVN2lO5AQT8Np SMFA3Kf/9JVsash4WRAfYvW7prvUBPTfjTemcWmIiY5BEa19NQvrX0mEYiY8y+9B q0eniFZu/wn+gTsmDPCnQpy0iE7aiHosfIaJnvWr8iCdAg2vr/K8= Received: (qmail 3270 invoked by alias); 17 May 2013 06:21:23 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 3238 invoked by uid 89); 17 May 2013 06:21:17 -0000 X-Spam-SWARE-Status: No, score=-6.6 required=5.0 tests=AWL, BAYES_00, RCVD_IN_HOSTKARMA_W, RCVD_IN_HOSTKARMA_WL, RP_MATCHES_RCVD, SPF_HELO_PASS, SPF_PASS, TW_AV, TW_TM autolearn=ham version=3.3.1 Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Fri, 17 May 2013 06:21:04 +0000 Received: from int-mx01.intmail.prod.int.phx2.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r4H6L1jf016888 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 17 May 2013 02:21:02 -0400 Received: from zalov.cz (vpn-48-63.rdu2.redhat.com [10.10.48.63]) by int-mx01.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id r4H6KxGV008653 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 17 May 2013 02:21:01 -0400 Received: from zalov.cz (localhost [127.0.0.1]) by zalov.cz (8.14.5/8.14.5) with ESMTP id r4H6KwF7005934; Fri, 17 May 2013 08:20:58 +0200 Received: (from jakub@localhost) by zalov.cz (8.14.5/8.14.5/Submit) id r4H6Kubx005933; Fri, 17 May 2013 08:20:56 +0200 Date: Fri, 17 May 2013 08:20:55 +0200 From: Jakub Jelinek To: Richard Biener Cc: gcc-patches@gcc.gnu.org Subject: [PATCH] Pattern recognizer rotate improvement Message-ID: <20130517062055.GN1377@tucnak.redhat.com> Reply-To: Jakub Jelinek References: <20130510144535.GO1377@tucnak.redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) On Wed, May 15, 2013 at 03:24:37PM +0200, Richard Biener wrote: > We have the same issue in some other places where we insert invariant > code into the loop body - one reason there is another LIM pass > after vectorization. Well, in this case it causes the shift amount to be loaded into a vector instead of scalar, therefore even when LIM moves it before the loop, it will only work with vector/vector shifts and be more expensive that way (need to broadcast the value in a vector). The following patch improves it slightly at least for loops, by just emitting the shift amount stmts to loop preheader, rotate-4.c used to be only vectorizable with -mavx2 (which has vector/vector shifts), now also -mavx (which doesn't) vectorizes it. Unfortunately this trick doesn't work for SLP vectorization, emitting the stmts at the start of the current bb doesn't help, because every stmt emits its own and thus it is vectorized with vector/vector shifts only anyway. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2013-05-17 Jakub Jelinek * tree-vect-patterns.c (vect_recog_rotate_pattern): For vect_external_def oprnd1 with loop_vinfo, try to emit optional cast, negation and and stmts on the loop preheader edge instead of into the pattern def seq. * gcc.target/i386/rotate-4.c: Compile only with -mavx instead of -mavx2, require only avx instead of avx2. * gcc.target/i386/rotate-4a.c: Include avx-check.h instead of avx2-check.h and turn into an avx runtime test instead of avx2 runtime test. Jakub --- gcc/tree-vect-patterns.c.jj 2013-05-16 13:56:08.000000000 +0200 +++ gcc/tree-vect-patterns.c 2013-05-16 15:27:00.565143478 +0200 @@ -1494,6 +1494,7 @@ vect_recog_rotate_pattern (vec * bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_vinfo); enum vect_def_type dt; optab optab1, optab2; + edge ext_def = NULL; if (!is_gimple_assign (last_stmt)) return NULL; @@ -1574,6 +1575,21 @@ vect_recog_rotate_pattern (vec * if (*type_in == NULL_TREE) return NULL; + if (dt == vect_external_def + && TREE_CODE (oprnd1) == SSA_NAME + && loop_vinfo) + { + struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); + ext_def = loop_preheader_edge (loop); + if (!SSA_NAME_IS_DEFAULT_DEF (oprnd1)) + { + basic_block bb = gimple_bb (SSA_NAME_DEF_STMT (oprnd1)); + if (bb == NULL + || !dominated_by_p (CDI_DOMINATORS, ext_def->dest, bb)) + ext_def = NULL; + } + } + def = NULL_TREE; if (TREE_CODE (oprnd1) == INTEGER_CST || TYPE_MODE (TREE_TYPE (oprnd1)) == TYPE_MODE (type)) @@ -1593,7 +1609,14 @@ vect_recog_rotate_pattern (vec * def = vect_recog_temp_ssa_var (type, NULL); def_stmt = gimple_build_assign_with_ops (NOP_EXPR, def, oprnd1, NULL_TREE); - append_pattern_def_seq (stmt_vinfo, def_stmt); + if (ext_def) + { + basic_block new_bb + = gsi_insert_on_edge_immediate (ext_def, def_stmt); + gcc_assert (!new_bb); + } + else + append_pattern_def_seq (stmt_vinfo, def_stmt); } stype = TREE_TYPE (def); @@ -1618,11 +1641,19 @@ vect_recog_rotate_pattern (vec * def2 = vect_recog_temp_ssa_var (stype, NULL); def_stmt = gimple_build_assign_with_ops (NEGATE_EXPR, def2, def, NULL_TREE); - def_stmt_vinfo - = new_stmt_vec_info (def_stmt, loop_vinfo, bb_vinfo); - set_vinfo_for_stmt (def_stmt, def_stmt_vinfo); - STMT_VINFO_VECTYPE (def_stmt_vinfo) = vecstype; - append_pattern_def_seq (stmt_vinfo, def_stmt); + if (ext_def) + { + basic_block new_bb + = gsi_insert_on_edge_immediate (ext_def, def_stmt); + gcc_assert (!new_bb); + } + else + { + def_stmt_vinfo = new_stmt_vec_info (def_stmt, loop_vinfo, bb_vinfo); + set_vinfo_for_stmt (def_stmt, def_stmt_vinfo); + STMT_VINFO_VECTYPE (def_stmt_vinfo) = vecstype; + append_pattern_def_seq (stmt_vinfo, def_stmt); + } def2 = vect_recog_temp_ssa_var (stype, NULL); tree mask @@ -1630,11 +1661,19 @@ vect_recog_rotate_pattern (vec * def_stmt = gimple_build_assign_with_ops (BIT_AND_EXPR, def2, gimple_assign_lhs (def_stmt), mask); - def_stmt_vinfo - = new_stmt_vec_info (def_stmt, loop_vinfo, bb_vinfo); - set_vinfo_for_stmt (def_stmt, def_stmt_vinfo); - STMT_VINFO_VECTYPE (def_stmt_vinfo) = vecstype; - append_pattern_def_seq (stmt_vinfo, def_stmt); + if (ext_def) + { + basic_block new_bb + = gsi_insert_on_edge_immediate (ext_def, def_stmt); + gcc_assert (!new_bb); + } + else + { + def_stmt_vinfo = new_stmt_vec_info (def_stmt, loop_vinfo, bb_vinfo); + set_vinfo_for_stmt (def_stmt, def_stmt_vinfo); + STMT_VINFO_VECTYPE (def_stmt_vinfo) = vecstype; + append_pattern_def_seq (stmt_vinfo, def_stmt); + } } var1 = vect_recog_temp_ssa_var (type, NULL); --- gcc/testsuite/gcc.target/i386/rotate-4.c.jj 2013-05-16 13:50:14.000000000 +0200 +++ gcc/testsuite/gcc.target/i386/rotate-4.c 2013-05-16 15:23:32.729313026 +0200 @@ -1,6 +1,6 @@ /* { dg-do compile } */ -/* { dg-require-effective-target avx2 } */ -/* { dg-options "-O3 -mavx2 -fdump-tree-vect-details" } */ +/* { dg-require-effective-target avx } */ +/* { dg-options "-O3 -mavx -fdump-tree-vect-details" } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ /* { dg-final { cleanup-tree-dump "vect" } } */ --- gcc/testsuite/gcc.target/i386/rotate-4a.c.jj 2013-05-16 14:00:33.000000000 +0200 +++ gcc/testsuite/gcc.target/i386/rotate-4a.c 2013-05-16 15:23:44.791247428 +0200 @@ -1,14 +1,14 @@ /* { dg-do run } */ -/* { dg-require-effective-target avx2 } */ -/* { dg-options "-O3 -mavx2" } */ +/* { dg-require-effective-target avx } */ +/* { dg-options "-O3 -mavx" } */ -#include "avx2-check.h" +#include "avx-check.h" #include "rotate-4.c" static void __attribute__((noinline)) -avx2_test (void) +avx_test (void) { int i; for (i = 0; i < 1024; i++)