From patchwork Mon Jul 4 10:18:29 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Uros Bizjak X-Patchwork-Id: 103080 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id D2F77B6F68 for ; Mon, 4 Jul 2011 20:19:00 +1000 (EST) Received: (qmail 6038 invoked by alias); 4 Jul 2011 10:18:58 -0000 Received: (qmail 6026 invoked by uid 22791); 4 Jul 2011 10:18:56 -0000 X-SWARE-Spam-Status: No, hits=0.5 required=5.0 tests=AWL, BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, MEDICAL_SUBJECT, RCVD_IN_DNSWL_LOW, TW_ZJ X-Spam-Check-By: sourceware.org Received: from mail-pv0-f175.google.com (HELO mail-pv0-f175.google.com) (74.125.83.175) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 04 Jul 2011 10:18:30 +0000 Received: by pvf24 with SMTP id 24so4843413pvf.20 for ; Mon, 04 Jul 2011 03:18:29 -0700 (PDT) MIME-Version: 1.0 Received: by 10.142.48.13 with SMTP id v13mr2910775wfv.310.1309774709562; Mon, 04 Jul 2011 03:18:29 -0700 (PDT) Received: by 10.142.107.13 with HTTP; Mon, 4 Jul 2011 03:18:29 -0700 (PDT) In-Reply-To: References: <20110630225005.GA1839@intel.com> Date: Mon, 4 Jul 2011 12:18:29 +0200 Message-ID: Subject: Re: PATCH: PR target/49600: Bad SSE2 int->float split in i386.md From: Uros Bizjak To: "H.J. Lu" Cc: gcc-patches@gcc.gnu.org Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org On Mon, Jul 4, 2011 at 7:13 AM, H.J. Lu wrote: >>>> In one SSE2 int->float split, when TARGET_USE_VECTOR_CONVERTS is true, >>>> TARGET_INTER_UNIT_MOVES is false and GENERAL_REG_P (op1) is true. we >>>> will get gcc_unreachable.  This patch removes TARGET_INTER_UNIT_MOVES >>>> check.  OK for trunk? >>> >>> This will result in register allocation failure. Operand 0 of > > That particular sse2_loadld insn matches: > > (insn 49 22 50 5 (set (reg:V4SI 21 xmm0 [83]) >        (vec_merge:V4SI (vec_duplicate:V4SI (reg/v:SI 1 dx [orig:64 > test ] [64])) >            (const_vector:V4SI [ >                    (const_int 0 [0]) >                    (const_int 0 [0]) >                    (const_int 0 [0]) >                    (const_int 0 [0]) >                ]) >            (const_int 1 [0x1]))) x.i:11 1365 {vec_setv4si_0} >     (nil)) > Yes, but it should not be generated for !TARGET_INTER_UNIT_MOVES. The constraint should be Yi, but then we don't shadow other alternatives correctly. >>> sse2_loadld pattern has conditional constraint Yi that depends on >>> TARGET_INTER_UNIT_MOVES, so we can't blindly generate sse2_loadld >>> after reload.  I'm testing attached patch. >>> >>> BTW: Do you perhaps have a testcase for this problem? >> >> I have a testcase. But it needs a new x86 optimization we are working on it. >> >>> 2011-07-03  Uros Bizjak   >>> >>>        PR target/49600 >>>        * config/i386/i386.md (SSE2 int->float split): Push operand 1 in >>>        general register to memory for !TARGET_INTER_UNIT_MOVES. >>> >> >> I will give it a try. >> > > It doesn't work: I still got Yes, I later noticed that I have changed the wrong pattern (the one with memory clobber) ;( . Attached is the correct patch. Uros. Index: config/i386/i386.md =================================================================== --- config/i386/i386.md (revision 175786) +++ config/i386/i386.md (working copy) @@ -5022,11 +5022,20 @@ if (GET_CODE (op1) == SUBREG) op1 = SUBREG_REG (op1); - if (GENERAL_REG_P (op1) && TARGET_INTER_UNIT_MOVES) + if (GENERAL_REG_P (op1)) { operands[4] = simplify_gen_subreg (V4SImode, operands[0], mode, 0); - emit_insn (gen_sse2_loadld (operands[4], - CONST0_RTX (V4SImode), operands[1])); + if (TARGET_INTER_UNIT_MOVES) + emit_insn (gen_sse2_loadld (operands[4], + CONST0_RTX (V4SImode), operands[1])); + else + { + operands[5] = ix86_force_to_memory (GET_MODE (operands[1]), + operands[1]); + emit_insn (gen_sse2_loadld (operands[4], + CONST0_RTX (V4SImode), operands[5])); + ix86_free_from_memory (GET_MODE (operands[1])); + } } /* We can ignore possible trapping value in the high part of SSE register for non-trapping math. */