From patchwork Fri Sep 15 21:04:52 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kelvin Nilsen X-Patchwork-Id: 814399 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-462298-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="TKc1rhQk"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3xv7G13M9lz9s3w for ; Sat, 16 Sep 2017 07:05:10 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:date:mime-version:content-type :content-transfer-encoding:message-id; q=dns; s=default; b=ncmZj TnVJHzYCJvyJA7Yp7bZpL13N1s01PILUdJnj55xL/3YEogPtWFsdPFwXlNvmDDH9 BWz/UakbSZfpLWGlyKLlsvhgtXYg91JzHD7ls+tJQxe/tsMiBiuo8LbbsJ6NW0P+ VrJylYrKVw6QoHDKEY/o+o6/RPZDSTuzeVzrbM= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:date:mime-version:content-type :content-transfer-encoding:message-id; s=default; bh=sf6K9vRADlz aw0iDi1FnULJj+n0=; b=TKc1rhQkqPHMceYFZKj4qUoUiVqrlZ/UiK85H6eBWqx yDOMDOY0czCHx9TdJgLPiaSQkugZt26k+oGHNTguCfJnqAqwstiuP0gaKQlAkb9r nMSWgRaUB6a9Ue19K2IpqGEktgGJ/7FfAfxaAZMzRZw/1sWJCTzIHGc4Nq/Xu5pk = Received: (qmail 53326 invoked by alias); 15 Sep 2017 21:05:01 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 53315 invoked by uid 89); 15 Sep 2017 21:05:01 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-10.8 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.2 spammy= X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.158.5) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 15 Sep 2017 21:04:58 +0000 Received: from pps.filterd (m0098419.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id v8FKxH6G046166 for ; Fri, 15 Sep 2017 17:04:57 -0400 Received: from e35.co.us.ibm.com (e35.co.us.ibm.com [32.97.110.153]) by mx0b-001b2d01.pphosted.com with ESMTP id 2d0gqmqsw0-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Fri, 15 Sep 2017 17:04:56 -0400 Received: from localhost by e35.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 15 Sep 2017 15:04:55 -0600 Received: from b03cxnp08025.gho.boulder.ibm.com (9.17.130.17) by e35.co.us.ibm.com (192.168.1.135) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Fri, 15 Sep 2017 15:04:54 -0600 Received: from b03ledav006.gho.boulder.ibm.com (b03ledav006.gho.boulder.ibm.com [9.17.130.237]) by b03cxnp08025.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id v8FL4s0F1376702 for ; Fri, 15 Sep 2017 14:04:54 -0700 Received: from b03ledav006.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 178A8C604A for ; Fri, 15 Sep 2017 15:04:54 -0600 (MDT) Received: from oc6462846008.ibm.com (unknown [9.80.215.16]) by b03ledav006.gho.boulder.ibm.com (Postfix) with ESMTP id 9B32DC603E for ; Fri, 15 Sep 2017 15:04:53 -0600 (MDT) To: gcc-patches@gcc.gnu.org From: Kelvin Nilsen Subject: [PATCH, rs6000] Replace swap of a loaded vector constant with load of a swapped vector constant Date: Fri, 15 Sep 2017 15:04:52 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.0 MIME-Version: 1.0 X-TM-AS-GCONF: 00 x-cbid: 17091521-0012-0000-0000-00001503597C X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007746; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000229; SDB=6.00917545; UDB=6.00460860; IPR=6.00697772; BA=6.00005589; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00017172; XFM=3.00000015; UTC=2017-09-15 21:04:55 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17091521-0013-0000-0000-00004F80D071 Message-Id: <73f793bf-9725-5036-42df-b33842c8dba9@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2017-09-15_07:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=3 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1707230000 definitions=main-1709150308 X-IsSubscribed: yes On Power8 little endian, two instructions are needed to load from the natural in-memory representation of a vector into a vector register: a load followed by a swap. When the vector value to be loaded is a constant, more efficient code can be achieved by swapping the representation of the constant in memory so that only a load instruction is required. This patch has been bootstrapped and tested without regressions on powerpc64le-unknown-linux (P8) and on powerpc-unknown-linux (P8, big-endian, with both -m32 and -m64 target options). Is this ok for trunk? gcc/ChangeLog: 2017-09-14 Kelvin Nilsen * config/rs6000/rs6000-p8swap.c (const_load_sequence_p): Revise this function to return false if the definition used by the swap instruction is artificial, or if the memory address from which the constant value is loaded is not represented by a base address held in a register or if the base address register is a frame or stack pointer. Additionally, return false if the base address of the loaded constant is a SYMBOL_REF but is not considered to be a constant. (replace_swapped_load_constant): New function. (rs6000_analyze_swaps): Add a new pass to replace a swap of a loaded constant vector with a load of a swapped constant vector. gcc/testsuite/ChangeLog: 2017-09-14 Kelvin Nilsen * gcc.target/powerpc/swaps-p8-28.c: New test. * gcc.target/powerpc/swaps-p8-29.c: New test. * gcc.target/powerpc/swps-p8-30.c: New test. Index: gcc/config/rs6000/rs6000-p8swap.c =================================================================== --- gcc/config/rs6000/rs6000-p8swap.c (revision 252768) +++ gcc/config/rs6000/rs6000-p8swap.c (working copy) @@ -342,7 +342,8 @@ const_load_sequence_p (swap_web_entry *insn_entry, FOR_EACH_INSN_INFO_USE (use, insn_info) { struct df_link *def_link = DF_REF_CHAIN (use); - if (!def_link || def_link->next) + if (!def_link || !def_link->ref || DF_REF_IS_ARTIFICIAL (def_link->ref) + || def_link->next) return false; rtx def_insn = DF_REF_INSN (def_link->ref); @@ -358,6 +359,8 @@ const_load_sequence_p (swap_web_entry *insn_entry, rtx mem = XEXP (SET_SRC (body), 0); rtx base_reg = XEXP (mem, 0); + if (!REG_P (base_reg)) + return false; df_ref base_use; insn_info = DF_INSN_INFO_GET (def_insn); @@ -370,6 +373,14 @@ const_load_sequence_p (swap_web_entry *insn_entry, if (!base_def_link || base_def_link->next) return false; + /* Constants held on the stack are not "true" constants + * because their values are not part of the static load + * image. If this constant's base reference is a stack + * or frame pointer, it is seen as an artificial + * reference. */ + if (DF_REF_IS_ARTIFICIAL (base_def_link->ref)) + return false; + rtx tocrel_insn = DF_REF_INSN (base_def_link->ref); rtx tocrel_body = PATTERN (tocrel_insn); rtx base, offset; @@ -385,6 +396,25 @@ const_load_sequence_p (swap_web_entry *insn_entry, split_const (XVECEXP (tocrel_base, 0, 0), &base, &offset); if (GET_CODE (base) != SYMBOL_REF || !CONSTANT_POOL_ADDRESS_P (base)) return false; + else + { + /* FIXME: The conditions under which + * ((GET_CODE (const_vector) == SYMBOL_REF) && + * !CONSTANT_POOL_ADDRESS_P (const_vector)) + * are not well understood. This code prevents + * an internal compiler error which will occur in + * replace_swapped_load_constant () if we were to return + * true. Some day, we should figure out how to properly + * handle this condition in + * replace_swapped_load_constant () and then we can + * remove this special test. */ + rtx const_vector = get_pool_constant (base); + if (GET_CODE (const_vector) == SYMBOL_REF) + { + if (!CONSTANT_POOL_ADDRESS_P (const_vector)) + return false; + } + } } } return true; @@ -1281,6 +1311,189 @@ replace_swap_with_copy (swap_web_entry *insn_entry insn->set_deleted (); } +/* Given that swap_insn represents a swap of a load of a constant + vector value, replace with a single instruction that loads a + swapped variant of the original constant. + + The "natural" representation of a byte array in memory is the same + for big endian and little endian. + + unsigned char byte_array[] = + { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f }; + + However, when loaded into a vector register, the representation + depends on endian conventions. + + In big-endian mode, the register holds: + + MSB LSB + [ f, e, d, c, b, a, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 ] + + In little-endian mode, the register holds: + + MSB LSB + [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f ] + + + Word arrays require different handling. Consider the word array: + + unsigned int word_array[] = + { 0x00010203, 0x04050607, 0x08090a0b, 0x0c0d0e0f }; + + The in-memory representation depends on endian configuration. The + equivalent array, declared as a byte array, in memory would be: + + unsigned char big_endian_word_array_data[] = + { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f } + + unsigned char little_endian_word_array_data[] = + { 3, 2, 1, 0, 7, 6, 5, 4, b, a, 9, 8, f, e, d, c } + + In big-endian mode, the register holds: + + MSB LSB + [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f ] + + In little-endian mode, the register holds: + + MSB LSB + [ c, d, e, f, 8, 9, a, b, 4, 5, 6, 7, 0, 1, 2, 3 ] + + + Similar transformations apply to the vector of half-word and vector + of double-word representations. + + For now, don't handle vectors of quad-precision values. Just return. + A better solution is to fix the code generator to emit lvx/stvx for + those. */ +static void +replace_swapped_load_constant (swap_web_entry *insn_entry, rtx swap_insn) +{ + /* Find the load. */ + struct df_insn_info *insn_info = DF_INSN_INFO_GET (swap_insn); + rtx_insn *load_insn = 0; + df_ref use; + + FOR_EACH_INSN_INFO_USE (use, insn_info) + { + struct df_link *def_link = DF_REF_CHAIN (use); + gcc_assert (def_link && !def_link->next); + load_insn = DF_REF_INSN (def_link->ref); + break; + } + gcc_assert (load_insn); + + /* Find the TOC-relative symbol access. */ + insn_info = DF_INSN_INFO_GET (load_insn); + rtx_insn *tocrel_insn = 0; + FOR_EACH_INSN_INFO_USE (use, insn_info) + { + struct df_link *def_link = DF_REF_CHAIN (use); + gcc_assert (def_link && !def_link->next); + tocrel_insn = DF_REF_INSN (def_link->ref); + break; + } + gcc_assert (tocrel_insn); + + /* Find the embedded CONST_VECTOR. We have to call toc_relative_expr_p + to set tocrel_base; otherwise it would be unnecessary as we've + already established it will return true. */ + rtx base, offset; + rtx tocrel_expr = SET_SRC (PATTERN (tocrel_insn)); + const_rtx tocrel_base; + /* There is an extra level of indirection for small/large code models. */ + if (GET_CODE (tocrel_expr) == MEM) + tocrel_expr = XEXP (tocrel_expr, 0); + if (!toc_relative_expr_p (tocrel_expr, false, &tocrel_base, NULL)) + gcc_unreachable (); + split_const (XVECEXP (tocrel_base, 0, 0), &base, &offset); + rtx const_vector = get_pool_constant (base); + /* With the extra indirection, get_pool_constant will produce the + real constant from the reg_equal expression, so get the real + constant. */ + if (GET_CODE (const_vector) == SYMBOL_REF) + const_vector = get_pool_constant (const_vector); + gcc_assert (GET_CODE (const_vector) == CONST_VECTOR); + + rtx new_mem; + enum machine_mode mode = GET_MODE (const_vector); + /* Create an adjusted constant from the original constant. */ + + if (mode == V1TImode) + /* Leave this code as is. */ + return; + else if (mode == V16QImode) + { + rtx vals = gen_rtx_PARALLEL (mode, rtvec_alloc (16)); + int i; + for (i = 0; i < 16; i++) + XVECEXP (vals, 0, ((i+8) % 16)) = XVECEXP (const_vector, 0, i); + rtx new_const_vector = gen_rtx_CONST_VECTOR (mode, XVEC (vals, 0)); + new_mem = force_const_mem (mode, new_const_vector); + } + else if ((mode == V4SImode) || (mode == V4SFmode)) + { + rtx vals = gen_rtx_PARALLEL (mode, rtvec_alloc (4)); + int i; + for (i = 0; i < 4; i++) + XVECEXP (vals, 0, ((i+2) % 4)) = XVECEXP (const_vector, 0, i); + rtx new_const_vector = gen_rtx_CONST_VECTOR (mode, XVEC (vals, 0)); + new_mem = force_const_mem (mode, new_const_vector); + } + else if ((mode == V8HImode) +#ifdef HAVE_V8HFmode + || (mode == V8HFmode) +#endif + ) + { + rtx vals = gen_rtx_PARALLEL (mode, rtvec_alloc (8)); + int i; + for (i = 0; i < 8; i++) + XVECEXP (vals, 0, ((i+4) % 8)) = XVECEXP (const_vector, 0, i); + rtx new_const_vector = gen_rtx_CONST_VECTOR (mode, XVEC (vals, 0)); + new_mem = force_const_mem (mode, new_const_vector); + } + else if ((mode == V2DImode) || (mode == V2DFmode)) + { + rtx vals = gen_rtx_PARALLEL (mode, rtvec_alloc (2)); + int i; + /* Sometimes, load of a V1TImode vector is represented as a load + of two double words with a swap on top of it. */ + for (i = 0; i < 2; i++) + XVECEXP (vals, 0, ((i+1) % 2)) = XVECEXP (const_vector, 0, i); + rtx new_const_vector = gen_rtx_CONST_VECTOR (mode, XVEC (vals, 0)); + new_mem = force_const_mem (mode, new_const_vector); + } + else + { + /* We do not expect other modes to be constant-load-swapped. */ + gcc_assert (0); + } + + /* This gives us a MEM whose base operand is a SYMBOL_REF, which we + can't recognize. Force the SYMBOL_REF into a register. */ + if (!REG_P (XEXP (new_mem, 0))) { + rtx base_reg = force_reg (Pmode, XEXP (new_mem, 0)); + XEXP (new_mem, 0) = base_reg; + /* Move the newly created insn ahead of the load insn. */ + rtx_insn *force_insn = get_last_insn (); + remove_insn (force_insn); + rtx_insn *before_load_insn = PREV_INSN (load_insn); + add_insn_after (force_insn, before_load_insn, BLOCK_FOR_INSN (load_insn)); + df_insn_rescan (before_load_insn); + df_insn_rescan (force_insn); + } + + /* Replace the MEM in the load instruction and rescan it. */ + XEXP (SET_SRC (PATTERN (load_insn)), 0) = new_mem; + INSN_CODE (load_insn) = -1; /* Force re-recognition. */ + df_insn_rescan (load_insn); + + unsigned int uid = INSN_UID (swap_insn); + mark_swaps_for_removal (insn_entry, uid); + replace_swap_with_copy (insn_entry, uid); +} + /* Dump the swap table to DUMP_FILE. */ static void dump_swap_insn_table (swap_web_entry *insn_entry) @@ -1873,6 +2086,47 @@ rs6000_analyze_swaps (function *fun) /* Clean up. */ free (insn_entry); + + /* Use additional pass over rtl to replace swap(load(vector constant)) + with load(swapped vector constant). */ + swap_web_entry *pass2_insn_entry; + pass2_insn_entry = XCNEWVEC (swap_web_entry, get_max_uid ()); + + /* Walk the insns to gather basic data. */ + FOR_ALL_BB_FN (bb, fun) + FOR_BB_INSNS_SAFE (bb, insn, curr_insn) + { + unsigned int uid = INSN_UID (insn); + if (NONDEBUG_INSN_P (insn)) + { + pass2_insn_entry[uid].insn = insn; + + pass2_insn_entry[uid].is_relevant = 1; + pass2_insn_entry[uid].is_load = insn_is_load_p (insn); + pass2_insn_entry[uid].is_store = insn_is_store_p (insn); + + /* Determine if this is a doubleword swap. If not, + determine whether it can legally be swapped. */ + if (insn_is_swap_p (insn)) + pass2_insn_entry[uid].is_swap = 1; + } + } + + { + unsigned e = get_max_uid (), i; + for (i = 0; i < e; ++i) + { + if (pass2_insn_entry[i].is_swap && !pass2_insn_entry[i].is_load) + { + insn = pass2_insn_entry[i].insn; + if (const_load_sequence_p (pass2_insn_entry, insn)) + replace_swapped_load_constant (pass2_insn_entry, insn); + } + } + } + + /* Clean up. */ + free (pass2_insn_entry); return 0; } Index: gcc/testsuite/gcc.target/powerpc/swaps-p8-28.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/swaps-p8-28.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/swaps-p8-28.c (working copy) @@ -0,0 +1,30 @@ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-do run { target { powerpc64*-*-* } } } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ +/* { dg-options "-mcpu=power8 -O3 " } */ + +#include + +extern void abort (void); + +vector char y = { 0, 1, 2, 3, + 4, 5, 6, 7, + 8, 9, 10, 11, + 12, 13, 14, 15 }; + +vector char +foo (void) +{ + vector char x = vec_xl (0, &y); + return y; +} + +int +main (int argc, char *argv[]) +{ + vector char fetched_value = foo (); + if (fetched_value[0] != 0 || fetched_value[15] != 15) + abort (); + else + return 0; +} Index: gcc/testsuite/gcc.target/powerpc/swaps-p8-29.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/swaps-p8-29.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/swaps-p8-29.c (working copy) @@ -0,0 +1,30 @@ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-do run { target { powerpc64*-*-* } } } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ +/* { dg-options "-mcpu=power8 -O3 " } */ + +#include + +extern void abort (void); + +const vector char y = { 0, 1, 2, 3, + 4, 5, 6, 7, + 8, 9, 10, 11, + 12, 13, 14, 15 }; + +vector char +foo (void) +{ + vector char x = vec_xl (0, &y); + return y; +} + +int +main (int argc, char *argv[]) +{ + vector char fetched_value = foo (); + if (fetched_value[0] != 0 || fetched_value[15] != 15) + abort (); + else + return 0; +} Index: gcc/testsuite/gcc.target/powerpc/swps-p8-30.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/swps-p8-30.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/swps-p8-30.c (working copy) @@ -0,0 +1,36 @@ +/* This file's name was changed from swaps-p8-30.c so that I could + search the assembler for "not swap", since swap is an alias for + xxpermdi. With the original file name, the assembler search would + get a false positive on the name of the file. */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-do compile { target { powerpc64le-*-* } } } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ +/* { dg-options "-mcpu=power8 -O3 " } */ +/* { dg-final { scan-assembler-not "xxpermdi" } } */ +/* { dg-final { scan-assembler-not "swap" } } */ + +#include + +extern void abort (void); + +const vector char y = { 0, 1, 2, 3, + 4, 5, 6, 7, + 8, 9, 10, 11, + 12, 13, 14, 15 }; + +vector char +foo (void) +{ + vector char x = vec_xl (0, &y); + return y; +} + +int +main (int argc, char *argv[]) +{ + vector char fetched_value = foo (); + if (fetched_value[0] != 0 || fetched_value[15] != 15) + abort (); + else + return 0; +}