From patchwork Mon Oct 31 22:29:21 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 122987 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 3F282B6F94 for ; Tue, 1 Nov 2011 09:29:47 +1100 (EST) Received: (qmail 4820 invoked by alias); 31 Oct 2011 22:29:44 -0000 Received: (qmail 4811 invoked by uid 22791); 31 Oct 2011 22:29:43 -0000 X-SWARE-Spam-Status: No, hits=-7.2 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, SPF_HELO_PASS X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 31 Oct 2011 22:29:23 +0000 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id p9VMTMUb030104 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Mon, 31 Oct 2011 18:29:22 -0400 Received: from tyan-ft48-01.lab.bos.redhat.com (tyan-ft48-01.lab.bos.redhat.com [10.16.42.4]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id p9VMTLwq017041 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 31 Oct 2011 18:29:22 -0400 Received: from tyan-ft48-01.lab.bos.redhat.com (tyan-ft48-01.lab.bos.redhat.com [127.0.0.1]) by tyan-ft48-01.lab.bos.redhat.com (8.14.4/8.14.4) with ESMTP id p9VMTLXD029393; Mon, 31 Oct 2011 23:29:21 +0100 Received: (from jakub@localhost) by tyan-ft48-01.lab.bos.redhat.com (8.14.4/8.14.4/Submit) id p9VMTLeR029391; Mon, 31 Oct 2011 23:29:21 +0100 Date: Mon, 31 Oct 2011 23:29:21 +0100 From: Jakub Jelinek To: Richard Henderson , Uros Bizjak Cc: gcc-patches@gcc.gnu.org Subject: [PATCH] Add fixuns_trunc2 Message-ID: <20111031222920.GH1052@tyan-ft48-01.lab.bos.redhat.com> Reply-To: Jakub Jelinek MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hi! This allows to vectorize float -> uint conversion. To convert V{4,8}SFmode op0 to V{4,8}SImode target, it emits: V{4,8}SFmode mask = op0 >= { INT_MAX + 1U + .0f, INT_MAX + 1U + .0f, ... } // non-signalling GE V{4,8}SFmode tmp1 = mask & { 2.0f * INT_MIN, 2.0f * INT_MIN, ... } V{4,8}SFmode tmp2 = op0 + tmp1 V{4,8}SImode target = (V{4,8}SImode) tmp2 TARGET_AVX is needed, because pre-AVX we didn't have non-signalling GE in cmpps and we don't want to raise exceptions if op0 is QNaN (scalar code uses vucomiss). Ok for trunk? 2011-10-31 Jakub Jelinek * config/i386/sse.md (fixuns_trunc2): New expander. Jakub --- gcc/config/i386/sse.md.jj 2011-10-31 21:05:21.000000000 +0100 +++ gcc/config/i386/sse.md 2011-10-31 22:53:13.000000000 +0100 @@ -2322,6 +2322,35 @@ (define_insn "fix_truncv4sfv4si2" (set_attr "prefix" "maybe_vex") (set_attr "mode" "TI")]) +(define_expand "fixuns_trunc2" + [(set (match_dup 4) + (unspec:VF1 + [(match_operand:VF1 1 "register_operand" "") + (match_dup 2) + (const_int 29)] UNSPEC_PCMP)) + (set (match_dup 5) + (and:VF1 (match_dup 4) (match_dup 3))) + (set (match_dup 6) + (plus:VF1 (match_dup 1) (match_dup 5))) + (set (match_operand: 0 "register_operand" "") + (fix: (match_dup 6)))] + "TARGET_AVX" +{ + REAL_VALUE_TYPE MTWO32r, TWO31r; + int i; + + real_ldexp (&TWO31r, &dconst1, 31); + operands[2] = const_double_from_real_value (TWO31r, SFmode); + operands[2] = ix86_build_const_vector (mode, 1, operands[2]); + operands[2] = force_reg (mode, operands[2]); + real_ldexp (&MTWO32r, &dconstm1, 32); + operands[3] = const_double_from_real_value (MTWO32r, SFmode); + operands[3] = ix86_build_const_vector (mode, 1, operands[3]); + operands[3] = force_reg (mode, operands[3]); + for (i = 4; i < 7; i++) + operands[i] = gen_reg_rtx (mode); +}) + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;; Parallel double-precision floating point conversion operations