From patchwork Thu Dec 31 18:30:02 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 561907 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id A73D6140BA6 for ; Fri, 1 Jan 2016 05:30:20 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=hsAqdvsD; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:mime-version:content-type; q=dns; s= default; b=f9UMu4RJnVZII4wCeC7BaEsuKBW9ZDFTgDZNsUIB0JTvpVZzV8ntJ edhmM+2c5qSj50Nr5znexurGc14V7u1PgCiiNrQEgQGGeN44D6SKmyR8Juz4bKzd L1cPArwMlGs8oGYkZ48MoYzeXu7w789U+V7O/Oukk2aO+MYQ83ASKg= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:mime-version:content-type; s= default; bh=OUdfvFbGNm064VvmaLLc51DF/Eg=; b=hsAqdvsD9dPmuViZlqal ZD2XxRxTflNrWnrYpWlgIJXfKEsvbvqndokkuDknSQvZsK68SqUiAj1jrQ9AWObD i4uNE3+syCGCzyTZHZhEB4+7LXO47s1VZ7qVg+rfWyAiNuLKVAb/cTcbHM7fk8tq aAaZASUsS9jqpGqFli9ZM8Y= Received: (qmail 123997 invoked by alias); 31 Dec 2015 18:30:12 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 123984 invoked by uid 89); 31 Dec 2015 18:30:11 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=1.1 required=5.0 tests=AWL, BAYES_50, KAM_ASCII_DIVIDERS, KAM_LAZY_DOMAIN_SECURITY autolearn=no version=3.3.2 spammy=King, __asm__, meissnerlinuxvnetibmcom, meissner@linux.vnet.ibm.com X-HELO: e17.ny.us.ibm.com Received: from e17.ny.us.ibm.com (HELO e17.ny.us.ibm.com) (129.33.205.207) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (CAMELLIA256-SHA encrypted) ESMTPS; Thu, 31 Dec 2015 18:30:10 +0000 Received: from localhost by e17.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 31 Dec 2015 13:30:08 -0500 Received: from d01dlp01.pok.ibm.com (9.56.250.166) by e17.ny.us.ibm.com (146.89.104.204) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Thu, 31 Dec 2015 13:30:06 -0500 X-IBM-Helo: d01dlp01.pok.ibm.com X-IBM-MailFrom: meissner@ibm-tiger.the-meissners.org X-IBM-RcptTo: gcc-patches@gcc.gnu.org Received: from b01cxnp23032.gho.pok.ibm.com (b01cxnp23032.gho.pok.ibm.com [9.57.198.27]) by d01dlp01.pok.ibm.com (Postfix) with ESMTP id CAF0F38C8041 for ; Thu, 31 Dec 2015 13:30:05 -0500 (EST) Received: from d01av01.pok.ibm.com (d01av01.pok.ibm.com [9.56.224.215]) by b01cxnp23032.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id tBVIU5FF28836060 for ; Thu, 31 Dec 2015 18:30:05 GMT Received: from d01av01.pok.ibm.com (localhost [127.0.0.1]) by d01av01.pok.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id tBVIU5ne014284 for ; Thu, 31 Dec 2015 13:30:05 -0500 Received: from ibm-tiger.the-meissners.org (dhcp-9-32-77-111.usma.ibm.com [9.32.77.111]) by d01av01.pok.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id tBVIU2Rf013959; Thu, 31 Dec 2015 13:30:03 -0500 Received: by ibm-tiger.the-meissners.org (Postfix, from userid 500) id 9A8C245F4F; Thu, 31 Dec 2015 13:30:02 -0500 (EST) Date: Thu, 31 Dec 2015 13:30:02 -0500 From: Michael Meissner To: gcc-patches@gcc.gnu.org, dje.gcc@gmail.com Subject: [PATCH], PowerPC, add ISA 3.0 xxperm (power9 patch #12) Message-ID: <20151231183002.GA6687@ibm-tiger.the-meissners.org> Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, dje.gcc@gmail.com MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-12-10) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15123118-0041-0000-0000-000002EE3B69 X-IsSubscribed: yes This patch adds support for the ISA 3.0 XXPERM instruction, which is like VPERM, except it can operate on any VSX register. Since the instruction is a 3 operand instruction (RT and RA must be the same), I made it so VPERM was preferred. I also added XXPERM fusion support where a XXLOR move instruction immediately before the XXPERM instruction is fused together. I have bootstrapped and done make check on a big endian power7 and a little endian power8 system. In addition, I built all of Spec 2006 with power9 support enabled, and all of the tests that previously built now build with XXPERM being generated (the OMNETPP benchmark currently does not build on little endian for either power8 or power9). Are these patches ok to check in? [gcc] 2015-12-31 Michael Meissner * config/rs6000/constraints.md (wo constraint): New constraint for ISA 3.0 (power9). * config/rs6000/rs6000.c (rs6000_debug_reg_global): Add support for wo constraint. (rs6000_init_hard_regno_mode_ok): Likewise. * config/rs6000/rs6000.h (r6000_reg_class_enum): Add support for wo constraint. * config/rs6000/altivec.md (altivec_vperm_): Clean up vperm expanders not to have constraints. Add support for ISA 3.0 xxperm instruction. Add support for fusing xxlor with xxperm. (altivec_vperm__internal): Likewise. (altivec_vperm_v8hiv16qi): Likewise. (altivec_vperm_v16q): Likewise. (altivec_vperm__uns): Likewise. (vperm_v8hiv4si): Likewise. (vperm_v16qiv8hi): Likewise. * doc/md.texi (RS/6000 constraints): Document wo constraint. [gcc/testsuite] 2015-12-31 Michael Meissner * gcc.target/powerpc/p9-permute.c: New test for xxperm code generation. Index: gcc/config/rs6000/constraints.md =================================================================== --- gcc/config/rs6000/constraints.md (revision 232008) +++ gcc/config/rs6000/constraints.md (working copy) @@ -99,7 +99,8 @@ (define_register_constraint "wm" "rs6000 ;; There is a mode_attr that resolves to wm for SDmode and wn for SFmode (define_register_constraint "wn" "NO_REGS" "No register (NO_REGS).") -;; wo is not currently used +(define_register_constraint "wo" "rs6000_constraints[RS6000_CONSTRAINT_wo]" + "VSX register if the -mpower9-vector option was used or NO_REGS.") (define_register_constraint "wp" "rs6000_constraints[RS6000_CONSTRAINT_wp]" "VSX register to use for IEEE 128-bit fp TFmode, or NO_REGS.") Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 232008) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -2284,6 +2284,7 @@ rs6000_debug_reg_global (void) "wk reg_class = %s\n" "wl reg_class = %s\n" "wm reg_class = %s\n" + "wo reg_class = %s\n" "wp reg_class = %s\n" "wq reg_class = %s\n" "wr reg_class = %s\n" @@ -2311,6 +2312,7 @@ rs6000_debug_reg_global (void) reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wk]], reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wl]], reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wm]], + reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wo]], reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wp]], reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wq]], reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wr]], @@ -3019,7 +3021,11 @@ rs6000_init_hard_regno_mode_ok (bool glo if (TARGET_P9_DFORM) rs6000_constraints[RS6000_CONSTRAINT_wb] = ALTIVEC_REGS; - /* Support for new direct moves. */ + /* Support for ISA 3.0 (power9) vectors. */ + if (TARGET_P9_VECTOR) + rs6000_constraints[RS6000_CONSTRAINT_wo] = VSX_REGS; + + /* Support for new direct moves (ISA 3.0 + 64bit). */ if (TARGET_DIRECT_MOVE_128) rs6000_constraints[RS6000_CONSTRAINT_we] = VSX_REGS; Index: gcc/config/rs6000/rs6000.h =================================================================== --- gcc/config/rs6000/rs6000.h (revision 232008) +++ gcc/config/rs6000/rs6000.h (working copy) @@ -1535,6 +1535,7 @@ enum r6000_reg_class_enum { RS6000_CONSTRAINT_wk, /* FPR/VSX register for DFmode direct moves. */ RS6000_CONSTRAINT_wl, /* FPR register for LFIWAX */ RS6000_CONSTRAINT_wm, /* VSX register for direct move */ + RS6000_CONSTRAINT_wo, /* VSX register for power9 vector. */ RS6000_CONSTRAINT_wp, /* VSX reg for IEEE 128-bit fp TFmode. */ RS6000_CONSTRAINT_wq, /* VSX reg for IEEE 128-bit fp KFmode. */ RS6000_CONSTRAINT_wr, /* GPR register if 64-bit */ Index: gcc/config/rs6000/altivec.md =================================================================== --- gcc/config/rs6000/altivec.md (revision 232008) +++ gcc/config/rs6000/altivec.md (working copy) @@ -1933,10 +1933,10 @@ (define_insn "*altivec_vrfiz" [(set_attr "type" "vecfloat")]) (define_expand "altivec_vperm_" - [(set (match_operand:VM 0 "register_operand" "=v") - (unspec:VM [(match_operand:VM 1 "register_operand" "v") - (match_operand:VM 2 "register_operand" "v") - (match_operand:V16QI 3 "register_operand" "v")] + [(set (match_operand:VM 0 "register_operand" "") + (unspec:VM [(match_operand:VM 1 "register_operand" "") + (match_operand:VM 2 "register_operand" "") + (match_operand:V16QI 3 "register_operand" "")] UNSPEC_VPERM))] "TARGET_ALTIVEC" { @@ -1947,31 +1947,40 @@ (define_expand "altivec_vperm_" } }) +;; Slightly prefer vperm, since the target does not overlap the source (define_insn "*altivec_vperm__internal" - [(set (match_operand:VM 0 "register_operand" "=v") - (unspec:VM [(match_operand:VM 1 "register_operand" "v") - (match_operand:VM 2 "register_operand" "v") - (match_operand:V16QI 3 "register_operand" "v")] + [(set (match_operand:VM 0 "register_operand" "=v,?wo,?&wo") + (unspec:VM [(match_operand:VM 1 "register_operand" "v,0,wo") + (match_operand:VM 2 "register_operand" "v,wo,wo") + (match_operand:V16QI 3 "register_operand" "v,wo,wo")] UNSPEC_VPERM))] "TARGET_ALTIVEC" - "vperm %0,%1,%2,%3" - [(set_attr "type" "vecperm")]) + "@ + vperm %0,%1,%2,%3 + xxperm %x0,%x2,%x3 + xxlor %x0,%x1,%x1\t\t# xxperm fusion\;xxperm %x0,%x2,%x3" + [(set_attr "type" "vecperm") + (set_attr "length" "4,4,8")]) (define_insn "altivec_vperm_v8hiv16qi" - [(set (match_operand:V16QI 0 "register_operand" "=v") - (unspec:V16QI [(match_operand:V8HI 1 "register_operand" "v") - (match_operand:V8HI 2 "register_operand" "v") - (match_operand:V16QI 3 "register_operand" "v")] + [(set (match_operand:V16QI 0 "register_operand" "=v,?wo,?&wo") + (unspec:V16QI [(match_operand:V8HI 1 "register_operand" "v,0,wo") + (match_operand:V8HI 2 "register_operand" "v,wo,wo") + (match_operand:V16QI 3 "register_operand" "v,wo,wo")] UNSPEC_VPERM))] "TARGET_ALTIVEC" - "vperm %0,%1,%2,%3" - [(set_attr "type" "vecperm")]) + "@ + vperm %0,%1,%2,%3 + xxperm %x0,%x2,%x3 + xxlor %x0,%x1,%x1\t\t# xxperm fusion\;xxperm %x0,%x2,%x3" + [(set_attr "type" "vecperm") + (set_attr "length" "4,4,8")]) (define_expand "altivec_vperm__uns" - [(set (match_operand:VM 0 "register_operand" "=v") - (unspec:VM [(match_operand:VM 1 "register_operand" "v") - (match_operand:VM 2 "register_operand" "v") - (match_operand:V16QI 3 "register_operand" "v")] + [(set (match_operand:VM 0 "register_operand" "") + (unspec:VM [(match_operand:VM 1 "register_operand" "") + (match_operand:VM 2 "register_operand" "") + (match_operand:V16QI 3 "register_operand" "")] UNSPEC_VPERM_UNS))] "TARGET_ALTIVEC" { @@ -1983,14 +1992,18 @@ (define_expand "altivec_vperm__uns }) (define_insn "*altivec_vperm__uns_internal" - [(set (match_operand:VM 0 "register_operand" "=v") - (unspec:VM [(match_operand:VM 1 "register_operand" "v") - (match_operand:VM 2 "register_operand" "v") - (match_operand:V16QI 3 "register_operand" "v")] + [(set (match_operand:VM 0 "register_operand" "=v,?wo,?&wo") + (unspec:VM [(match_operand:VM 1 "register_operand" "v,0,wo") + (match_operand:VM 2 "register_operand" "v,wo,wo") + (match_operand:V16QI 3 "register_operand" "v,wo,wo")] UNSPEC_VPERM_UNS))] "TARGET_ALTIVEC" - "vperm %0,%1,%2,%3" - [(set_attr "type" "vecperm")]) + "@ + vperm %0,%1,%2,%3 + xxperm %x0,%x2,%x3 + xxlor %x0,%x1,%x1\t\t# xxperm fusion\;xxperm %x0,%x2,%x3" + [(set_attr "type" "vecperm") + (set_attr "length" "4,4,8")]) (define_expand "vec_permv16qi" [(set (match_operand:V16QI 0 "register_operand" "") @@ -2778,24 +2791,32 @@ (define_expand "vec_unpacks_lo_} in the template so that the correct register is used. Otherwise the register number output in the assembly file will be incorrect if an Altivec register @@ -3175,6 +3175,9 @@ VSX register if direct move instructions @item wn No register (NO_REGS). +@item wo +VSX register to use for ISA 3.0 vector instructions, or NO_REGS. + @item wp VSX register to use for IEEE 128-bit floating point TFmode, or NO_REGS. Index: gcc/testsuite/gcc.target/powerpc/p9-permute.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/p9-permute.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/p9-permute.c (revision 0) @@ -0,0 +1,20 @@ +/* { dg-do compile { target { powerpc64le-*-* } } } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */ +/* { dg-options "-mcpu=power9 -O2" } */ + +#include + +vector long long +permute (vector long long *p, vector long long *q, vector unsigned char mask) +{ + vector long long a = *p; + vector long long b = *q; + + /* Force a, b to be in FPR registers. */ + __asm__ (" # a: %x0, b: %x1" : "+d" (a), "+d" (b)); + + return vec_perm (a, b, mask); +} + +/* { dg-final { scan-assembler "xxperm" } } */ +/* { dg-final { scan-assembler-not "vperm" } } */