From patchwork Tue Jun 22 17:58:37 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 56557 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id AD858B6F11 for ; Wed, 23 Jun 2010 03:58:48 +1000 (EST) Received: (qmail 13669 invoked by alias); 22 Jun 2010 17:58:46 -0000 Received: (qmail 13613 invoked by uid 22791); 22 Jun 2010 17:58:45 -0000 X-SWARE-Spam-Status: No, hits=-1.1 required=5.0 tests=AWL, BAYES_00, NO_DNS_FOR_FROM, TW_AV, TW_VX, TW_XV, T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from mga11.intel.com (HELO mga11.intel.com) (192.55.52.93) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 22 Jun 2010 17:58:39 +0000 Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga102.fm.intel.com with ESMTP; 22 Jun 2010 10:58:18 -0700 X-ExtLoop1: 1 Received: from gnu-6.sc.intel.com ([10.3.194.135]) by fmsmga002.fm.intel.com with ESMTP; 22 Jun 2010 10:58:30 -0700 Received: by gnu-6.sc.intel.com (Postfix, from userid 500) id 2F4F0201C6; Tue, 22 Jun 2010 10:58:38 -0700 (PDT) Date: Tue, 22 Jun 2010 10:58:37 -0700 From: "H.J. Lu" To: gcc-patches@gcc.gnu.org Cc: Uros Bizjak Subject: PATCH: Simplify AVX cast and extract lower 128bit patterns Message-ID: <20100622175837.GA3734@intel.com> Reply-To: "H.J. Lu" MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-12-10) Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hi, AVX cast from 256bit to 128bit and extract lower 128bit from 256 bit are the same operation. This patch replaces AVX cast with lower 128bit extraction. It also uses define and split for lower 128bit extractions. Tested on Linux/x86-64. OK for trunk? Thanks. H.J. --- 2010-06-22 H.J. Lu * config/i386/i386.c (bdesc_args): Replace CODE_FOR_avx_si_si256, CODE_FOR_avx_ps_ps256 and CODE_FOR_avx_pd_pd256 with CODE_FOR_vec_extract_lo_v8si, CODE_FOR_vec_extract_lo_v8sf and CODE_FOR_vec_extract_lo_v4df. * config/i386/sse.md (vec_extract_lo_:AVX256MODE4P): Changed to define_insn_and_split. (vec_extract_lo_:AVX256MODE8P): Likewise. (vec_extract_lo_v16hi): Likewise. (vec_extract_lo_v32qi): Likewise. (avx__): Likewise. (avx__): Removed. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 657e55a..268be3b 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -22427,9 +22427,9 @@ static const struct builtin_description bdesc_args[] = { OPTION_MASK_ISA_AVX, CODE_FOR_avx_si256_si, "__builtin_ia32_si256_si", IX86_BUILTIN_SI256_SI, UNKNOWN, (int) V8SI_FTYPE_V4SI }, { OPTION_MASK_ISA_AVX, CODE_FOR_avx_ps256_ps, "__builtin_ia32_ps256_ps", IX86_BUILTIN_PS256_PS, UNKNOWN, (int) V8SF_FTYPE_V4SF }, { OPTION_MASK_ISA_AVX, CODE_FOR_avx_pd256_pd, "__builtin_ia32_pd256_pd", IX86_BUILTIN_PD256_PD, UNKNOWN, (int) V4DF_FTYPE_V2DF }, - { OPTION_MASK_ISA_AVX, CODE_FOR_avx_si_si256, "__builtin_ia32_si_si256", IX86_BUILTIN_SI_SI256, UNKNOWN, (int) V4SI_FTYPE_V8SI }, - { OPTION_MASK_ISA_AVX, CODE_FOR_avx_ps_ps256, "__builtin_ia32_ps_ps256", IX86_BUILTIN_PS_PS256, UNKNOWN, (int) V4SF_FTYPE_V8SF }, - { OPTION_MASK_ISA_AVX, CODE_FOR_avx_pd_pd256, "__builtin_ia32_pd_pd256", IX86_BUILTIN_PD_PD256, UNKNOWN, (int) V2DF_FTYPE_V4DF }, + { OPTION_MASK_ISA_AVX, CODE_FOR_vec_extract_lo_v8si, "__builtin_ia32_si_si256", IX86_BUILTIN_SI_SI256, UNKNOWN, (int) V4SI_FTYPE_V8SI }, + { OPTION_MASK_ISA_AVX, CODE_FOR_vec_extract_lo_v8sf, "__builtin_ia32_ps_ps256", IX86_BUILTIN_PS_PS256, UNKNOWN, (int) V4SF_FTYPE_V8SF }, + { OPTION_MASK_ISA_AVX, CODE_FOR_vec_extract_lo_v4df, "__builtin_ia32_pd_pd256", IX86_BUILTIN_PD_PD256, UNKNOWN, (int) V2DF_FTYPE_V4DF }, { OPTION_MASK_ISA_AVX, CODE_FOR_avx_vtestpd, "__builtin_ia32_vtestzpd", IX86_BUILTIN_VTESTZPD, EQ, (int) INT_FTYPE_V2DF_V2DF_PTEST }, { OPTION_MASK_ISA_AVX, CODE_FOR_avx_vtestpd, "__builtin_ia32_vtestcpd", IX86_BUILTIN_VTESTCPD, LTU, (int) INT_FTYPE_V2DF_V2DF_PTEST }, diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 7625906..ed22675 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -4178,19 +4178,24 @@ DONE; }) -(define_insn "vec_extract_lo_" +(define_insn_and_split "vec_extract_lo_" [(set (match_operand: 0 "nonimmediate_operand" "=x,m") (vec_select: - (match_operand:AVX256MODE4P 1 "register_operand" "x,x") + (match_operand:AVX256MODE4P 1 "nonimmediate_operand" "xm,x") (parallel [(const_int 0) (const_int 1)])))] "TARGET_AVX" - "vextractf128\t{$0x0, %1, %0|%0, %1, 0x0}" - [(set_attr "type" "sselog") - (set_attr "prefix_extra" "1") - (set_attr "length_immediate" "1") - (set_attr "memory" "none,store") - (set_attr "prefix" "vex") - (set_attr "mode" "V8SF")]) + "#" + "&& reload_completed" + [(const_int 0)] +{ + rtx op1 = operands[1]; + if (REG_P (op1)) + op1 = gen_rtx_REG (mode, REGNO (op1)); + else + op1 = gen_lowpart (mode, op1); + emit_move_insn (operands[0], op1); + DONE; +}) (define_insn "vec_extract_hi_" [(set (match_operand: 0 "nonimmediate_operand" "=x,m") @@ -4206,20 +4211,25 @@ (set_attr "prefix" "vex") (set_attr "mode" "V8SF")]) -(define_insn "vec_extract_lo_" +(define_insn_and_split "vec_extract_lo_" [(set (match_operand: 0 "nonimmediate_operand" "=x,m") (vec_select: - (match_operand:AVX256MODE8P 1 "register_operand" "x,x") + (match_operand:AVX256MODE8P 1 "nonimmediate_operand" "xm,x") (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3)])))] "TARGET_AVX" - "vextractf128\t{$0x0, %1, %0|%0, %1, 0x0}" - [(set_attr "type" "sselog") - (set_attr "prefix_extra" "1") - (set_attr "length_immediate" "1") - (set_attr "memory" "none,store") - (set_attr "prefix" "vex") - (set_attr "mode" "V8SF")]) + "#" + "&& reload_completed" + [(const_int 0)] +{ + rtx op1 = operands[1]; + if (REG_P (op1)) + op1 = gen_rtx_REG (mode, REGNO (op1)); + else + op1 = gen_lowpart (mode, op1); + emit_move_insn (operands[0], op1); + DONE; +}) (define_insn "vec_extract_hi_" [(set (match_operand: 0 "nonimmediate_operand" "=x,m") @@ -4236,22 +4246,27 @@ (set_attr "prefix" "vex") (set_attr "mode" "V8SF")]) -(define_insn "vec_extract_lo_v16hi" +(define_insn_and_split "vec_extract_lo_v16hi" [(set (match_operand:V8HI 0 "nonimmediate_operand" "=x,m") (vec_select:V8HI - (match_operand:V16HI 1 "register_operand" "x,x") + (match_operand:V16HI 1 "nonimmediate_operand" "xm,x") (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3) (const_int 4) (const_int 5) (const_int 6) (const_int 7)])))] "TARGET_AVX" - "vextractf128\t{$0x0, %1, %0|%0, %1, 0x0}" - [(set_attr "type" "sselog") - (set_attr "prefix_extra" "1") - (set_attr "length_immediate" "1") - (set_attr "memory" "none,store") - (set_attr "prefix" "vex") - (set_attr "mode" "V8SF")]) + "#" + "&& reload_completed" + [(const_int 0)] +{ + rtx op1 = operands[1]; + if (REG_P (op1)) + op1 = gen_rtx_REG (V8HImode, REGNO (op1)); + else + op1 = gen_lowpart (V8HImode, op1); + emit_move_insn (operands[0], op1); + DONE; +}) (define_insn "vec_extract_hi_v16hi" [(set (match_operand:V8HI 0 "nonimmediate_operand" "=x,m") @@ -4270,10 +4285,10 @@ (set_attr "prefix" "vex") (set_attr "mode" "V8SF")]) -(define_insn "vec_extract_lo_v32qi" +(define_insn_and_split "vec_extract_lo_v32qi" [(set (match_operand:V16QI 0 "nonimmediate_operand" "=x,m") (vec_select:V16QI - (match_operand:V32QI 1 "register_operand" "x,x") + (match_operand:V32QI 1 "nonimmediate_operand" "xm,x") (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3) (const_int 4) (const_int 5) @@ -4283,13 +4298,18 @@ (const_int 12) (const_int 13) (const_int 14) (const_int 15)])))] "TARGET_AVX" - "vextractf128\t{$0x0, %1, %0|%0, %1, 0x0}" - [(set_attr "type" "sselog") - (set_attr "prefix_extra" "1") - (set_attr "length_immediate" "1") - (set_attr "memory" "none,store") - (set_attr "prefix" "vex") - (set_attr "mode" "V8SF")]) + "#" + "&& reload_completed" + [(const_int 0)] +{ + rtx op1 = operands[1]; + if (REG_P (op1)) + op1 = gen_rtx_REG (V16QImode, REGNO (op1)); + else + op1 = gen_lowpart (V16QImode, op1); + emit_move_insn (operands[0], op1); + DONE; +}) (define_insn "vec_extract_hi_v32qi" [(set (match_operand:V16QI 0 "nonimmediate_operand" "=x,m") @@ -12252,77 +12272,24 @@ (set_attr "prefix" "vex") (set_attr "mode" "")]) -(define_insn "avx__" - [(set (match_operand:AVX256MODE2P 0 "register_operand" "=x,x") +(define_insn_and_split "avx__" + [(set (match_operand:AVX256MODE2P 0 "nonimmediate_operand" "=x,m") (unspec:AVX256MODE2P - [(match_operand: 1 "nonimmediate_operand" "0,xm")] - UNSPEC_CAST))] - "TARGET_AVX" -{ - switch (which_alternative) - { - case 0: - return ""; - case 1: - switch (get_attr_mode (insn)) - { - case MODE_V8SF: - return "vmovaps\t{%1, %x0|%x0, %1}"; - case MODE_V4DF: - return "vmovapd\t{%1, %x0|%x0, %1}"; - case MODE_OI: - return "vmovdqa\t{%1, %x0|%x0, %1}"; - default: - break; - } - default: - break; - } - gcc_unreachable (); -} - [(set_attr "type" "ssemov") - (set_attr "prefix" "vex") - (set_attr "mode" "") - (set (attr "length") - (if_then_else (eq_attr "alternative" "0") - (const_string "0") - (const_string "*")))]) - -(define_insn "avx__" - [(set (match_operand: 0 "register_operand" "=x,x") - (unspec: - [(match_operand:AVX256MODE2P 1 "nonimmediate_operand" "0,xm")] + [(match_operand: 1 "nonimmediate_operand" "xm,x")] UNSPEC_CAST))] "TARGET_AVX" + "#" + "&& reload_completed" + [(const_int 0)] { - switch (which_alternative) - { - case 0: - return ""; - case 1: - switch (get_attr_mode (insn)) - { - case MODE_V8SF: - return "vmovaps\t{%x1, %0|%0, %x1}"; - case MODE_V4DF: - return "vmovapd\t{%x1, %0|%0, %x1}"; - case MODE_OI: - return "vmovdqa\t{%x1, %0|%0, %x1}"; - default: - break; - } - default: - break; - } - gcc_unreachable (); -} - [(set_attr "type" "ssemov") - (set_attr "prefix" "vex") - (set_attr "mode" "") - (set (attr "length") - (if_then_else (eq_attr "alternative" "0") - (const_string "0") - (const_string "*")))]) + rtx op1 = operands[1]; + if (REG_P (op1)) + op1 = gen_rtx_REG (mode, REGNO (op1)); + else + op1 = gen_lowpart (mode, op1); + emit_move_insn (operands[0], op1); + DONE; +}) (define_expand "vec_init" [(match_operand:AVX256MODE 0 "register_operand" "")