From patchwork Wed Jul 17 10:24:42 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 1133232 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-505194-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=suse.de Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="XUl4Rw3E"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 45pYKz3QCtz9sMQ for ; Wed, 17 Jul 2019 20:24:53 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:mime-version:content-type; q=dns; s= default; b=jJHd55IQ9Oq6uhTrBpJZsTyuqs0Ekbf6sQLscYG+2EPVPcfe+RLeV hihLEMhOI5PYy+ExUp8hZdrBE2Q2qVpC224Qmo2WKIs+Y3FayjBdkGPVTJSKlyi8 KPYh0cvIJ0dOE7fivT+sIHfw33gAqeeywMDPL3Xlc64zKwXHtwC9DM= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:mime-version:content-type; s= default; bh=LQnD1Jd4rx1xJhHZrz01zp2XBnw=; b=XUl4Rw3EViV4OyvHkvbA UtIbAp5lOtD8QytHHqPtHJx9SSv+HIxTJYakELrYJo9FcPR+6oZ+TA78Dmgl1oYf FTUjTGmS+7TZyZbui40nC2Ya99rD9/3oKBOg4zFr+Hcy9RiM8L1X0az25ct/VP/W UBzsWFG6KnsukOf817G0O5E= Received: (qmail 10945 invoked by alias); 17 Jul 2019 10:24:46 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 10937 invoked by uid 89); 17 Jul 2019 10:24:46 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-10.2 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, KAM_NUMSUBJECT, SPF_PASS autolearn=ham version=3.3.1 spammy= X-HELO: mx1.suse.de Received: from mx2.suse.de (HELO mx1.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 17 Jul 2019 10:24:45 +0000 Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id C3D9DB090 for ; Wed, 17 Jul 2019 10:24:42 +0000 (UTC) Date: Wed, 17 Jul 2019 12:24:42 +0200 (CEST) From: Richard Biener To: gcc-patches@gcc.gnu.org Subject: [PATCH] Fix PR91178 Message-ID: User-Agent: Alpine 2.20 (LSU 67 2015-01-07) MIME-Version: 1.0 This is the vectorizer part of the fix - currently when we need to permute a load in contiguous accesses we load the "gap" between two instances of a group as well. That can cause quite excessive code generation (fixed up by DCE / forwprop later but confusing intermediate passes compile-time wise) in case the gap is large. The following addresses this in the SLP case, simply skipping code generation of such loads. This avoids the huge IV increment chain which causes all of the followup issues. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2019-07-17 Richard Biener PR tree-optimization/91178 * tree-vect-stmts.c (get_group_load_store_type): For SLP loads with a gap larger than the vector size always use VMAT_STRIDED_SLP. (vectorizable_load): For VMAT_STRIDED_SLP with a permutation avoid loading vectors that are only contained in the gap and thus are not needed. * gcc.dg/torture/pr91178.c: New testcase. Index: gcc/testsuite/gcc.dg/torture/pr91178.c =================================================================== --- gcc/testsuite/gcc.dg/torture/pr91178.c (nonexistent) +++ gcc/testsuite/gcc.dg/torture/pr91178.c (working copy) @@ -0,0 +1,11 @@ +/* { dg-do compile } */ + +int a; +extern int f[10][91125]; +int b[50]; +void c() +{ + for (int d = 6; d <= a; d++) + for (int e = 16; e <= 24; e++) + b[e] -= f[d][d]; +} Index: gcc/tree-vect-stmts.c =================================================================== --- gcc/tree-vect-stmts.c (revision 273520) +++ gcc/tree-vect-stmts.c (working copy) @@ -2267,6 +2267,14 @@ get_group_load_store_type (stmt_vec_info / vect_get_scalar_dr_size (first_dr_info))) overrun_p = false; + /* If the gap at the end of the group exceeds a whole vector + in size use the strided SLP code which can skip code-generation + for the gap. */ + if (vls_type == VLS_LOAD && known_gt (gap, nunits)) + *memory_access_type = VMAT_STRIDED_SLP; + else + *memory_access_type = VMAT_CONTIGUOUS; + /* If the gap splits the vector in half and the target can do half-vector operations avoid the epilogue peeling by simply loading half of the vector only. Usually @@ -2274,7 +2282,8 @@ get_group_load_store_type (stmt_vec_info dr_alignment_support alignment_support_scheme; scalar_mode elmode = SCALAR_TYPE_MODE (TREE_TYPE (vectype)); machine_mode vmode; - if (overrun_p + if (*memory_access_type == VMAT_CONTIGUOUS + && overrun_p && !masked_p && (((alignment_support_scheme = vect_supportable_dr_alignment (first_dr_info, false))) @@ -2297,7 +2306,6 @@ get_group_load_store_type (stmt_vec_info "Peeling for outer loop is not supported\n"); return false; } - *memory_access_type = VMAT_CONTIGUOUS; } } else @@ -8732,6 +8740,7 @@ vectorizable_load (stmt_vec_info stmt_in /* Checked by get_load_store_type. */ unsigned int const_nunits = nunits.to_constant (); unsigned HOST_WIDE_INT cst_offset = 0; + unsigned int group_gap = 0; gcc_assert (!LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)); gcc_assert (!nested_in_vect_loop); @@ -8749,6 +8758,7 @@ vectorizable_load (stmt_vec_info stmt_in if (slp && grouped_load) { group_size = DR_GROUP_SIZE (first_stmt_info); + group_gap = DR_GROUP_GAP (first_stmt_info); ref_type = get_group_alias_ptr_type (first_stmt_info); } else @@ -8892,6 +8902,14 @@ vectorizable_load (stmt_vec_info stmt_in if (nloads > 1) vec_alloc (v, nloads); stmt_vec_info new_stmt_info = NULL; + if (slp && slp_perm + && (group_el % group_size) > group_size - group_gap + && (group_el % group_size) + nloads * lnel < group_size) + { + dr_chain.quick_push (NULL_TREE); + group_el += nloads * lnel; + continue; + } for (i = 0; i < nloads; i++) { tree this_off = build_int_cst (TREE_TYPE (alias_off),