From patchwork Wed Sep 28 11:41:54 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 676154 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3skbR318CWz9sXR for ; Wed, 28 Sep 2016 21:42:19 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=rxYH1Chc; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:mime-version:content-type; q=dns; s= default; b=LJi0LXQXErfRDoFlN8VgL1ajGCnnZ0tscmP5wadKbJ+xQYlsM/Oaq e1j3jnKGIfuKARVhBi1ph+I2KiOMO1utrfGakf6xKi6N3JWLPk6RB7X/kj+R1SCS ue4KRnZC35td33CkDgja0JmZxXyogBgSHHQJWP5hgZntXC/mUK0X84= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:mime-version:content-type; s= default; bh=QprZffNcN3pVVRhYliEAIgtp6rg=; b=rxYH1ChcZzNmJi9vSvIu qkCmSzly7ak//60ZyanDHllKj5n4LgS0dJgu4ElYMhXeVmIFdjheRgCDY0xnCyLZ +hjeTBo0U8FRYiRnl600Nq1LWXe1VFh2WlqTYPR+THOfuC2AZekwBXCnfd5n0rth LpRHPrlrzqRDl2CLAXx9JyM= Received: (qmail 105570 invoked by alias); 28 Sep 2016 11:42:08 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 105466 invoked by uid 89); 28 Sep 2016 11:42:07 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-4.9 required=5.0 tests=BAYES_00, RP_MATCHES_RCVD, SPF_PASS autolearn=ham version=3.3.2 spammy=inability, pun, forwarding X-HELO: mx2.suse.de Received: from mx2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 28 Sep 2016 11:41:56 +0000 Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 48B79ADBB for ; Wed, 28 Sep 2016 11:41:54 +0000 (UTC) Date: Wed, 28 Sep 2016 13:41:54 +0200 (CEST) From: Richard Biener To: gcc-patches@gcc.gnu.org Subject: [PATCH] Avoid store forwarding issue in vectorizing strided SLP loads Message-ID: User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Currently strided SLP vectorization creates vector constructors composed of vector elements. This is a constructor form that is not handled specially by the expander but it gets expanded via piecewise stores to scratch memory and a load of that scratch memory. This is obviously bad on any modern CPU which can do store forwarding (which in this case does not work on any CPU I know of). The following patch simply avoids the issue by making the vectorizer create integer loads, composing a vector of that integers and then punning that to the desired vector type. Thus (V4SF){V2SF, V2SF} becomes (V4SF)(V2DI){DI, DI} and every body is happy. Especially x264 gets a 5-10% improvement (dependent on vector size and x86 sub-architecture). Handling the vector-vector constructors on the expander side would require either similar punning or making vec_init parametric on the element mode plus supporting vector elements in all targets (which in the end probably will simply pun them similarly). Bootstrap and regtest running on x86_64-unknown-linux-gnu. Any comments? Thanks, Richard. 2016-09-28 Richard Biener * tree-vect-stmts.c (vectorizable_load): Avoid emitting vector constructors with vector elements. Index: gcc/tree-vect-stmts.c =================================================================== --- gcc/tree-vect-stmts.c (revision 240565) +++ gcc/tree-vect-stmts.c (working copy) @@ -6862,17 +6925,40 @@ vectorizable_load (gimple *stmt, gimple_ int nloads = nunits; int lnel = 1; tree ltype = TREE_TYPE (vectype); + tree lvectype = vectype; auto_vec dr_chain; if (memory_access_type == VMAT_STRIDED_SLP) { - nloads = nunits / group_size; if (group_size < nunits) { - lnel = group_size; - ltype = build_vector_type (TREE_TYPE (vectype), group_size); + /* Avoid emitting a constructor of vector elements by performing + the loads using an integer type of the same size, + constructing a vector of those and then re-interpreting it + as the original vector type. This works around the fact + that the vec_init optab was only designed for scalar + element modes and thus expansion goes through memory. + This avoids a huge runtime penalty due to the general + inability to perform store forwarding from smaller stores + to a larger load. */ + unsigned lsize + = group_size * TYPE_PRECISION (TREE_TYPE (vectype)); + enum machine_mode elmode = mode_for_size (lsize, MODE_INT, 0); + enum machine_mode vmode = mode_for_vector (elmode, + nunits / group_size); + /* If we can't construct such a vector fall back to + element loads of the original vector type. */ + if (VECTOR_MODE_P (vmode) + && optab_handler (vec_init_optab, vmode) != CODE_FOR_nothing) + { + nloads = nunits / group_size; + lnel = group_size; + ltype = build_nonstandard_integer_type (lsize, 1); + lvectype = build_vector_type (ltype, nloads); + } } else { + nloads = 1; lnel = nunits; ltype = vectype; } @@ -6925,9 +7011,17 @@ vectorizable_load (gimple *stmt, gimple_ } if (nloads > 1) { - tree vec_inv = build_constructor (vectype, v); - new_temp = vect_init_vector (stmt, vec_inv, vectype, gsi); + tree vec_inv = build_constructor (lvectype, v); + new_temp = vect_init_vector (stmt, vec_inv, lvectype, gsi); new_stmt = SSA_NAME_DEF_STMT (new_temp); + if (lvectype != vectype) + { + new_stmt = gimple_build_assign (make_ssa_name (vectype), + VIEW_CONVERT_EXPR, + build1 (VIEW_CONVERT_EXPR, + vectype, new_temp)); + vect_finish_stmt_generation (stmt, new_stmt, gsi); + } } if (slp)