From patchwork Fri Feb 9 14:01:33 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 871396 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-472949-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="DBnYkUJP"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zdGvZ2K4sz9ryk for ; Sat, 10 Feb 2018 01:01:45 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:mime-version:content-type; q=dns; s= default; b=VqL4vaxnOwGgNWmGcU2GqvG42la68G4yovbQk+DcZKlN4kx8XtRij OJzrZhDW87QpuLgfn1+ZMNgYIuPbeB3OQs40vho5CE+5esM6hcnUpxmbD56tGYvP q2hyGGcPk6FR+0yn53wXdlR2RwK1wmoSqs+lmjSGiKm0f92SJaApIc= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:mime-version:content-type; s= default; bh=QW/DZENp2/JwFK2OtWDnYa69cGs=; b=DBnYkUJP1PLvuROrmWX3 bvDsTKSylVXUX+w1lOp0nVYs7H+QpLSbpTimOXAyDcIbU23BvMSdogHDBvSOhyh8 P5CsV5q5nQBls3ClFt7rBZEckuFWscjCpYsNK/SgnB2bZ8wdxUUW3Wgtah/UKez9 LzgT6PP4h66EcQXkTViN6iM= Received: (qmail 22434 invoked by alias); 9 Feb 2018 14:01:38 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 22424 invoked by uid 89); 9 Feb 2018 14:01:37 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-11.9 required=5.0 tests=BAYES_00, GIT_PATCH_2, GIT_PATCH_3, SPF_PASS, T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy= X-HELO: mx2.suse.de Received: from mx2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 09 Feb 2018 14:01:36 +0000 Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id E744CAE46 for ; Fri, 9 Feb 2018 14:01:33 +0000 (UTC) Date: Fri, 9 Feb 2018 15:01:33 +0100 (CET) From: Richard Biener To: gcc-patches@gcc.gnu.org Subject: [PATCH] Fix PR84037 some more Message-ID: User-Agent: Alpine 2.20 (LSU 67 2015-01-07) MIME-Version: 1.0 This improves SLP detection when swapping of operands is needed to match up stmts. Formerly we only considered swapping the not "matched" set of stmts but if we have a +- pair that might not have worked (also for other reasons). The following patch makes us instead see if we can eventually swap the operands in the set of matched stmts. This allows us to handle the + in the +- case as it happens in capacita. This doesn't get us to fully SLP the important loop but we now detect three out of four instances compared to just one before. This results in a speedup of 9.5% when using AVX2 ... if there were not the cost model rejecting the hybrid SLP vectorization now (like it does in the SSE case even without this patch). This means we're now using interleaving for this loop which improves runtime by "only" 7%. One of the reasons of the not profitable vectorization is the hybrid SLP which has shared stmts between the SLP instances and the interleaving instances - so tackling the last missed SLP group is next on my list. But using interleaving is also an improvement here. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2018-02-09 Richard Biener PR tree-optimization/84037 * tree-vect-slp.c (vect_build_slp_tree_2): Try swapping the matched stmts if we cannot swap the non-matched ones. Index: gcc/tree-vect-slp.c =================================================================== --- gcc/tree-vect-slp.c (revision 257520) +++ gcc/tree-vect-slp.c (working copy) @@ -1308,37 +1308,65 @@ vect_build_slp_tree_2 (vec_info *vinfo, && nops == 2 && oprnds_info[1]->first_dt == vect_internal_def && is_gimple_assign (stmt) - && commutative_tree_code (gimple_assign_rhs_code (stmt)) - && ! two_operators /* Do so only if the number of not successful permutes was nor more than a cut-ff as re-trying the recursive match on possibly each level of the tree would expose exponential behavior. */ && *npermutes < 4) { - /* Verify if we can safely swap or if we committed to a specific - operand order already. */ - for (j = 0; j < group_size; ++j) - if (!matches[j] - && (swap[j] != 0 - || STMT_VINFO_NUM_SLP_USES (vinfo_for_stmt (stmts[j])))) - { - if (dump_enabled_p ()) - { - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, - "Build SLP failed: cannot swap operands " - "of shared stmt "); - dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, - stmts[j], 0); - } - goto fail; - } + /* See whether we can swap the matching or the non-matching + stmt operands. */ + bool swap_not_matching = true; + do + { + for (j = 0; j < group_size; ++j) + { + if (matches[j] != !swap_not_matching) + continue; + gimple *stmt = stmts[j]; + /* Verify if we can swap operands of this stmt. */ + if (!is_gimple_assign (stmt) + || !commutative_tree_code (gimple_assign_rhs_code (stmt))) + { + if (!swap_not_matching) + goto fail; + swap_not_matching = false; + break; + } + /* Verify if we can safely swap or if we committed to a + specific operand order already. + ??? Instead of modifying GIMPLE stmts here we could + record whether we want to swap operands in the SLP + node and temporarily do that when processing it + (or wrap operand accessors in a helper). */ + else if (swap[j] != 0 + || STMT_VINFO_NUM_SLP_USES (vinfo_for_stmt (stmt))) + { + if (!swap_not_matching) + { + if (dump_enabled_p ()) + { + dump_printf_loc (MSG_MISSED_OPTIMIZATION, + vect_location, + "Build SLP failed: cannot swap " + "operands of shared stmt "); + dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, + TDF_SLIM, stmts[j], 0); + } + goto fail; + } + swap_not_matching = false; + break; + } + } + } + while (j != group_size); /* Swap mismatched definition stmts. */ dump_printf_loc (MSG_NOTE, vect_location, "Re-trying with swapped operands of stmts "); for (j = 0; j < group_size; ++j) - if (!matches[j]) + if (matches[j] == !swap_not_matching) { std::swap (oprnds_info[0]->def_stmts[j], oprnds_info[1]->def_stmts[j]); @@ -1367,7 +1395,7 @@ vect_build_slp_tree_2 (vec_info *vinfo, for (j = 0; j < group_size; ++j) { gimple *stmt = stmts[j]; - if (!matches[j]) + if (matches[j] == !swap_not_matching) { /* Avoid swapping operands twice. */ if (gimple_plf (stmt, GF_PLF_1)) @@ -1382,7 +1410,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, for (j = 0; j < group_size; ++j) { gimple *stmt = stmts[j]; - gcc_assert (gimple_plf (stmt, GF_PLF_1) == ! matches[j]); + gcc_assert (gimple_plf (stmt, GF_PLF_1) + == (matches[j] == !swap_not_matching)); } /* If we have all children of child built up from scalars then