From patchwork Wed Nov 7 14:59:02 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 994307 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-489248-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=suse.de Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="Mycq2lWd"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 42qqLs4C0Gz9s9h for ; Thu, 8 Nov 2018 01:59:16 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:mime-version:content-type; q=dns; s= default; b=T7QHVM8vZOv1BUxzmYk0fCOesBLj2UsSOdD3KjSdxO9+s5U22BT/F f/mdod9SBXUoWANRC0sxoSSTmwfceZXOqGQUpCGfTblHP4uKHVq7Jpcsa3SrEklf C/aQ/UrkjpyrKO87gvuFiSi2GWZXXvCsEVvrrh6CtPUUCZQ1PBlWog= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:mime-version:content-type; s= default; bh=CsHcg1VxwJyi6EYWk/Rs0oIWuuk=; b=Mycq2lWdUOUMgmXGJGCp tvqQ5Dr95Oo4Hl8pLhLpwPRipJbbMEYo7UshbhslhP/POD/VqMTLXNvtx28++pEC Ly/fWsn5D6KTDz2Sp2sfeOt/mN3S7Z3hSVYG+sc6Ckb03ldtFeo97I17iKDqXW7a FSMj2PTC6ghD41vY2dpQlEM= Received: (qmail 58222 invoked by alias); 7 Nov 2018 14:59:08 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 58184 invoked by uid 89); 7 Nov 2018 14:59:06 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-26.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_NUMSUBJECT, SPF_PASS autolearn=ham version=3.3.2 spammy=lose, AVX2, __avx2__, __AVX2__ X-HELO: mx1.suse.de Received: from mx2.suse.de (HELO mx1.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 07 Nov 2018 14:59:04 +0000 Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 38D40B62C for ; Wed, 7 Nov 2018 14:59:02 +0000 (UTC) Date: Wed, 7 Nov 2018 15:59:02 +0100 (CET) From: Richard Biener To: gcc-patches@gcc.gnu.org Subject: [PATCH] Fix PR87914 Message-ID: User-Agent: Alpine 2.20 (LSU 67 2015-01-07) MIME-Version: 1.0 This PR shows one example (IIRC I've seen others recently) where we fail to handle outer loop vectorization because we do a poor job identifying "safe" nested cycles. This improves the situation. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. I've also built SPEC 2006 CPU with and without LTO on a Haswell machine. I do expect fallout since the reduction code is still incredibly fragile... Richard. From 854d80f1822ae6b37afa865ae49d64ceaee68b26 Mon Sep 17 00:00:00 2001 From: Richard Guenther Date: Wed, 7 Nov 2018 12:19:45 +0100 Subject: [PATCH] fix-pr87914 2018-11-07 Richard Biener PR tree-optimization/87914 * tree-vect-loop.c (vect_is_simple_reduction): Improve detection of nested cycles. (vectorizable_reduction): Handle shifts and rotates by dispatching to vectorizable_shift. * tree-vect-stmts.c (vect_get_vec_def_for_operand_1): Handle in-loop uses of vect_nested_cycle defs. Merge cycle and internal def cases. (vectorizable_shift): Export and handle being called as vect_nested_cycle. (vect_analyze_stmt): Call vectorizable_shift after vectorizable_reduction. * tree-vectorizer.h (vectorizable_shift): Declare. * lib/target-supports.exp (check_effective_target_vect_var_shift): New. (check_avx2_available): Likewise. * g++.dg/vect/pr87914.cc: New testcase. diff --git a/gcc/testsuite/g++.dg/vect/pr87914.cc b/gcc/testsuite/g++.dg/vect/pr87914.cc new file mode 100644 index 00000000000..12fbba3af2f --- /dev/null +++ b/gcc/testsuite/g++.dg/vect/pr87914.cc @@ -0,0 +1,49 @@ +// { dg-do run } +// { dg-additional-options "-fopenmp-simd" } +// { dg-additional-options "-mavx2" { target { avx2_runtime } } } + +extern "C" int memcmp(const void *s1, const void *s2, __SIZE_TYPE__ n); +extern "C" void abort(void); + +template +T reverseBits(T x) +{ + unsigned int s = sizeof(x) * 8; + T mask = ~T(0); + while ((s >>= 1) > 0) + { + mask ^= (mask << s); + x = ((x >> s) & mask) | ((x << s) & ~mask); // unsupported use in stmt + } + return x; +} + +void __attribute__((noinline,noipa)) +test_reverseBits(unsigned* x) +{ +#pragma omp simd aligned(x:32) + for (int i = 0; i < 16; ++i) + x[i] = reverseBits(x[i]); // couldn't vectorize loop +} + +int main() +{ + unsigned arr[16] __attribute__((aligned(32))) + = { 0x01020304, 0x05060708, 0x0a0b0c0d, 0x0e0f1011, + 0x11121314, 0x45065708, 0xfa0b3c0du, 0x0e0f1211, + 0x21222324, 0x55066708, 0xfa0b2c0du, 0x1e0f1011, + 0x31323334, 0x65067708, 0xfa0b5c0du, 0x0e3f1011 }; + unsigned arr2[16] + = { 0x20c04080, 0x10e060a0, 0xb030d050, 0x8808f070u, + 0x28c84888, 0x10ea60a2, 0xb03cd05f, 0x8848f070u, + 0x24c44484, 0x10e660aa, 0xb034d05f, 0x8808f078u, + 0x2ccc4c8c, 0x10ee60a6, 0xb03ad05f, 0x8808fc70u }; + + test_reverseBits (arr); + + if (memcmp (arr, arr2, sizeof (arr)) != 0) + abort (); + return 0; +} + +// { dg-final { scan-tree-dump "OUTER LOOP VECTORIZED" "vect" { target { vect_var_shift && vect_int } } } } diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 9780e53dfc0..1d5ad9abdca 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -5316,6 +5316,15 @@ proc check_effective_target_vect_shift { } { && [check_effective_target_s390_vx]) }}] } +# Return 1 if the target supports hardware vector shift by register operation. + +proc check_effective_target_vect_var_shift { } { + return [check_cached_effective_target_indexed vect_var_shift { + expr {(([istarget i?86-*-*] || [istarget x86_64-*-*]) + && [check_avx2_available]) + }}] +} + proc check_effective_target_whole_vector_shift { } { if { [istarget i?86-*-*] || [istarget x86_64-*-*] || [istarget ia64-*-*] @@ -7150,6 +7159,19 @@ proc check_avx_available { } { return 0; } +# Return true if we are compiling for AVX2 target. + +proc check_avx2_available { } { + if { [check_no_compiler_messages avx_available assembly { + #ifndef __AVX2__ + #error unsupported + #endif + } ""] } { + return 1; + } + return 0; +} + # Return true if we are compiling for SSSE3 target. proc check_ssse3_available { } { diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 51be405b5a0..e392aab1d52 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -2880,6 +2880,11 @@ vect_is_simple_reduction (loop_vec_info loop_info, stmt_vec_info phi_info, return NULL; } + /* For inner loop reductions in nested vectorization there are no + constraints on the number of uses in the inner loop. */ + if (loop == vect_loop->inner) + continue; + nloop_uses++; if (nloop_uses > 1) { @@ -2938,13 +2943,19 @@ vect_is_simple_reduction (loop_vec_info loop_info, stmt_vec_info phi_info, else /* We can have more than one loop-closed PHI. */ lcphis.safe_push (as_a (use_stmt)); - if (nloop_uses > 1) - { - if (dump_enabled_p ()) - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, - "reduction used in loop.\n"); - return NULL; - } + } + + /* If this isn't a nested cycle or if the nested cycle reduction value + is used ouside of the inner loop we cannot handle uses of the reduction + value. */ + bool nested_in_vect_loop = flow_loop_nested_p (vect_loop, loop); + if ((!nested_in_vect_loop || !lcphis.is_empty ()) + && nloop_uses > 1) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "reduction used in loop.\n"); + return NULL; } /* If DEF_STMT is a phi node itself, we expect it to have a single argument @@ -3005,9 +3016,15 @@ vect_is_simple_reduction (loop_vec_info loop_info, stmt_vec_info phi_info, } gassign *def_stmt = as_a (def_stmt_info->stmt); - bool nested_in_vect_loop = flow_loop_nested_p (vect_loop, loop); code = orig_code = gimple_assign_rhs_code (def_stmt); + if (nested_in_vect_loop && !check_reduction) + { + if (dump_enabled_p ()) + report_vect_op (MSG_NOTE, def_stmt, "detected nested cycle: "); + return def_stmt_info; + } + /* We can handle "res -= x[i]", which is non-associative by simply rewriting this into "res += -x[i]". Avoid changing gimple instruction for the first simple tests and only do this @@ -6488,6 +6505,19 @@ vectorizable_reduction (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, vec_mode = TYPE_MODE (vectype_in); poly_uint64 nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out); + if (nested_cycle) + { + def_bb = gimple_bb (reduc_def_phi); + def_stmt_loop = def_bb->loop_father; + def_arg = PHI_ARG_DEF_FROM_EDGE (reduc_def_phi, + loop_preheader_edge (def_stmt_loop)); + stmt_vec_info def_arg_stmt_info = loop_vinfo->lookup_def (def_arg); + if (def_arg_stmt_info + && (STMT_VINFO_DEF_TYPE (def_arg_stmt_info) + == vect_double_reduction_def)) + double_reduc = true; + } + if (code == COND_EXPR) { /* Only call during the analysis stage, otherwise we'll lose @@ -6502,20 +6532,26 @@ vectorizable_reduction (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, return false; } } - else + else if (code == LSHIFT_EXPR || code == RSHIFT_EXPR + || code == LROTATE_EXPR || code == RROTATE_EXPR) { - /* 4. Supportable by target? */ - - if (code == LSHIFT_EXPR || code == RSHIFT_EXPR - || code == LROTATE_EXPR || code == RROTATE_EXPR) + /* Only call during the analysis stage, otherwise we'll lose + STMT_VINFO_TYPE. We only support this for nested cycles + without double reductions at the moment. */ + if (!nested_cycle + || double_reduc + || (!vec_stmt && !vectorizable_shift (stmt_info, gsi, NULL, + NULL, cost_vec))) { - /* Shifts and rotates are only supported by vectorizable_shifts, - not vectorizable_reduction. */ if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, - "unsupported shift or rotation.\n"); + "unsupported shift or rotation in reduction\n"); return false; } + } + else + { + /* 4. Supportable by target? */ /* 4.1. check support for the operation in the loop */ optab = optab_for_tree_code (code, vectype_in, optab_default); @@ -6620,19 +6656,6 @@ vectorizable_reduction (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, orig_code = cond_reduc_op_code; } - if (nested_cycle) - { - def_bb = gimple_bb (reduc_def_phi); - def_stmt_loop = def_bb->loop_father; - def_arg = PHI_ARG_DEF_FROM_EDGE (reduc_def_phi, - loop_preheader_edge (def_stmt_loop)); - stmt_vec_info def_arg_stmt_info = loop_vinfo->lookup_def (def_arg); - if (def_arg_stmt_info - && (STMT_VINFO_DEF_TYPE (def_arg_stmt_info) - == vect_double_reduction_def)) - double_reduc = true; - } - reduc_fn = IFN_LAST; if (reduction_type == TREE_CODE_REDUCTION @@ -7003,6 +7026,12 @@ vectorizable_reduction (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, /* Multiple types are not supported for condition. */ break; } + if (code == LSHIFT_EXPR + || code == RSHIFT_EXPR) + { + vectorizable_shift (stmt_info, gsi, vec_stmt, slp_node, NULL); + break; + } /* Handle uses. */ if (j == 0) diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index 7127c17c788..8133149b2dc 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -1461,6 +1461,16 @@ vect_get_vec_def_for_operand_1 (stmt_vec_info def_stmt_info, /* Code should use vect_get_vec_def_for_operand. */ gcc_unreachable (); + /* Operand is defined by a loop header phi. In case of nested + cycles we also may have uses of the backedge def. */ + case vect_reduction_def: + case vect_double_reduction_def: + case vect_nested_cycle: + case vect_induction_def: + gcc_assert (gimple_code (def_stmt_info->stmt) == GIMPLE_PHI + || dt == vect_nested_cycle); + /* Fallthru. */ + /* operand is defined inside the loop. */ case vect_internal_def: { @@ -1480,23 +1490,6 @@ vect_get_vec_def_for_operand_1 (stmt_vec_info def_stmt_info, return vec_oprnd; } - /* operand is defined by a loop header phi. */ - case vect_reduction_def: - case vect_double_reduction_def: - case vect_nested_cycle: - case vect_induction_def: - { - gcc_assert (gimple_code (def_stmt_info->stmt) == GIMPLE_PHI); - - /* Get the def from the vectorized stmt. */ - vec_stmt_info = STMT_VINFO_VEC_STMT (def_stmt_info); - if (gphi *phi = dyn_cast (vec_stmt_info->stmt)) - vec_oprnd = PHI_RESULT (phi); - else - vec_oprnd = gimple_get_lhs (vec_stmt_info->stmt); - return vec_oprnd; - } - default: gcc_unreachable (); } @@ -5363,7 +5356,7 @@ vect_supportable_shift (enum tree_code code, tree scalar_type) stmt to replace it, put it in VEC_STMT, and insert it at GSI. Return true if STMT_INFO is vectorizable in this way. */ -static bool +bool vectorizable_shift (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, stmt_vec_info *vec_stmt, slp_tree slp_node, stmt_vector_for_cost *cost_vec) @@ -5401,6 +5394,7 @@ vectorizable_shift (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, return false; if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def + && STMT_VINFO_DEF_TYPE (stmt_info) != vect_nested_cycle && ! vec_stmt) return false; @@ -5480,7 +5474,8 @@ vectorizable_shift (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, shift/rotate amount is a vector, use the vector/vector shift optabs. */ if ((dt[1] == vect_internal_def - || dt[1] == vect_induction_def) + || dt[1] == vect_induction_def + || dt[1] == vect_nested_cycle) && !slp_node) scalar_shift_arg = false; else if (dt[1] == vect_constant_def @@ -9540,7 +9535,6 @@ vect_analyze_stmt (stmt_vec_info stmt_info, bool *need_to_vectorize, || vectorizable_simd_clone_call (stmt_info, NULL, NULL, node, cost_vec) || vectorizable_conversion (stmt_info, NULL, NULL, node, cost_vec) - || vectorizable_shift (stmt_info, NULL, NULL, node, cost_vec) || vectorizable_operation (stmt_info, NULL, NULL, node, cost_vec) || vectorizable_assignment (stmt_info, NULL, NULL, node, cost_vec) || vectorizable_load (stmt_info, NULL, NULL, node, node_instance, @@ -9549,6 +9543,7 @@ vect_analyze_stmt (stmt_vec_info stmt_info, bool *need_to_vectorize, || vectorizable_reduction (stmt_info, NULL, NULL, node, node_instance, cost_vec) || vectorizable_induction (stmt_info, NULL, NULL, node, cost_vec) + || vectorizable_shift (stmt_info, NULL, NULL, node, cost_vec) || vectorizable_condition (stmt_info, NULL, NULL, NULL, 0, node, cost_vec) || vectorizable_comparison (stmt_info, NULL, NULL, NULL, node, diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 1434eeaf270..72a12aea8f3 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -1487,6 +1487,9 @@ extern opt_result vect_analyze_stmt (stmt_vec_info, bool *, slp_tree, extern bool vectorizable_condition (stmt_vec_info, gimple_stmt_iterator *, stmt_vec_info *, tree, int, slp_tree, stmt_vector_for_cost *); +extern bool vectorizable_shift (stmt_vec_info, gimple_stmt_iterator *, + stmt_vec_info *, slp_tree, + stmt_vector_for_cost *); extern void vect_get_load_cost (stmt_vec_info, int, bool, unsigned int *, unsigned int *, stmt_vector_for_cost *,