From patchwork Mon Jul 29 11:22:25 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Uros Bizjak X-Patchwork-Id: 262725 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "localhost", Issuer "www.qmailtoaster.com" (not verified)) by ozlabs.org (Postfix) with ESMTPS id 032A32C00D3 for ; Mon, 29 Jul 2013 21:22:44 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:date:message-id:subject:from:to:cc:content-type; q=dns; s=default; b=iTa1i8FaWs9HsBQXkLeHgcdCdOctHRgC+Hlc7wcjaPe hk9yOVxIvgJJCk5Wf+Nj+FsbsqBV/dUVpZQZOOZ1V/TcqCRaMjz92KQw5digkFlB ju8+5ORJdxQej+rYMNI3Rc7WOow8iL4/pDJlT003U90jCNDYyapili72NhnJ7/KQ = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:date:message-id:subject:from:to:cc:content-type; s=default; bh=JmJ/IBPgN2CzpbpcI/ZuZ7QR/xg=; b=DbiSh8hslpCy+WceH 2RclC2tyLARWq3vokBFls2RaOm5UHZzdOOGuSsDYnzS+7Oay0bgdY3WjUkDLz5Ku rlziQ8z+tf7rrne1kqLtTX4p1PbxOMn5PdMuw/L6MLL6d1zJQaXqthwXYZ6X2NYE 9u8j8+QB1ewhRW5iBhKZf1/HUc= Received: (qmail 21711 invoked by alias); 29 Jul 2013 11:22:34 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 21653 invoked by uid 89); 29 Jul 2013 11:22:34 -0000 X-Spam-SWARE-Status: No, score=-0.8 required=5.0 tests=AWL, BAYES_50, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, RCVD_IN_HOSTKARMA_YE, RDNS_NONE, SPF_PASS autolearn=no version=3.3.1 Received: from Unknown (HELO mail-ob0-f173.google.com) (209.85.214.173) by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Mon, 29 Jul 2013 11:22:33 +0000 Received: by mail-ob0-f173.google.com with SMTP id er7so9014988obc.18 for ; Mon, 29 Jul 2013 04:22:26 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.182.72.137 with SMTP id d9mr50830083obv.99.1375096945923; Mon, 29 Jul 2013 04:22:25 -0700 (PDT) Received: by 10.182.111.167 with HTTP; Mon, 29 Jul 2013 04:22:25 -0700 (PDT) Date: Mon, 29 Jul 2013 13:22:25 +0200 Message-ID: Subject: [PATCH, i386]: Fix PR57954 - AVX missing vxorps (zeroing) before vcvtsi2s %edx, slow down AVX code From: Uros Bizjak To: "gcc-patches@gcc.gnu.org" Cc: "H.J. Lu" X-Virus-Found: No Hello! Attached patch (that is in fact a variant of HJ's patch) implements clearing of SSE target register before cvt* instructions (the same approach as ICC takes). While there, there is also no need to check for SUBREGs in post-reload splitters. 2013-07-29 Uros Bizjak * config/i386/i386.md (float post-reload splitters): Do not check for subregs of SSE registers. 2013-07-29 Uros Bizjak H.J. Lu PR target/57954 PR target/57988 * config/i386/i386.md (post-reload splitter to avoid partial SSE reg dependency stalls): New. Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32} and committed to mainline. Uros. Index: config/i386/i386.md =================================================================== --- config/i386/i386.md (revision 201298) +++ config/i386/i386.md (working copy) @@ -4596,10 +4596,7 @@ (clobber (match_operand:SWI48 2 "memory_operand"))] "SSE_FLOAT_MODE_P (mode) && TARGET_MIX_SSE_I387 && TARGET_INTER_UNIT_CONVERSIONS - && reload_completed - && (SSE_REG_P (operands[0]) - || (GET_CODE (operands[0]) == SUBREG - && SSE_REG_P (SUBREG_REG (operands[0]))))" + && reload_completed && SSE_REG_P (operands[0])" [(set (match_dup 0) (float:MODEF (match_dup 1)))]) (define_split @@ -4608,10 +4605,7 @@ (clobber (match_operand:SWI48 2 "memory_operand"))] "SSE_FLOAT_MODE_P (mode) && TARGET_MIX_SSE_I387 && !(TARGET_INTER_UNIT_CONVERSIONS || optimize_function_for_size_p (cfun)) - && reload_completed - && (SSE_REG_P (operands[0]) - || (GET_CODE (operands[0]) == SUBREG - && SSE_REG_P (SUBREG_REG (operands[0]))))" + && reload_completed && SSE_REG_P (operands[0])" [(set (match_dup 2) (match_dup 1)) (set (match_dup 0) (float:MODEF (match_dup 2)))]) @@ -4697,10 +4691,7 @@ (clobber (match_operand:SI 2 "memory_operand"))] "TARGET_SSE2 && TARGET_SSE_MATH && TARGET_USE_VECTOR_CONVERTS && optimize_function_for_speed_p (cfun) - && reload_completed - && (SSE_REG_P (operands[0]) - || (GET_CODE (operands[0]) == SUBREG - && SSE_REG_P (SUBREG_REG (operands[0]))))" + && reload_completed && SSE_REG_P (operands[0])" [(const_int 0)] { rtx op1 = operands[1]; @@ -4740,10 +4731,7 @@ (clobber (match_operand:SI 2 "memory_operand"))] "TARGET_SSE2 && TARGET_SSE_MATH && TARGET_USE_VECTOR_CONVERTS && optimize_function_for_speed_p (cfun) - && reload_completed - && (SSE_REG_P (operands[0]) - || (GET_CODE (operands[0]) == SUBREG - && SSE_REG_P (SUBREG_REG (operands[0]))))" + && reload_completed && SSE_REG_P (operands[0])" [(const_int 0)] { operands[3] = simplify_gen_subreg (mode, operands[0], @@ -4764,10 +4752,7 @@ (float:MODEF (match_operand:SI 1 "register_operand")))] "TARGET_SSE2 && TARGET_SSE_MATH && TARGET_USE_VECTOR_CONVERTS && optimize_function_for_speed_p (cfun) - && reload_completed - && (SSE_REG_P (operands[0]) - || (GET_CODE (operands[0]) == SUBREG - && SSE_REG_P (SUBREG_REG (operands[0]))))" + && reload_completed && SSE_REG_P (operands[0])" [(const_int 0)] { rtx op1 = operands[1]; @@ -4810,10 +4795,7 @@ (float:MODEF (match_operand:SI 1 "memory_operand")))] "TARGET_SSE2 && TARGET_SSE_MATH && TARGET_USE_VECTOR_CONVERTS && optimize_function_for_speed_p (cfun) - && reload_completed - && (SSE_REG_P (operands[0]) - || (GET_CODE (operands[0]) == SUBREG - && SSE_REG_P (SUBREG_REG (operands[0]))))" + && reload_completed && SSE_REG_P (operands[0])" [(const_int 0)] { operands[3] = simplify_gen_subreg (mode, operands[0], @@ -4872,10 +4854,7 @@ (clobber (match_operand:SWI48 2 "memory_operand"))] "SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH && (TARGET_INTER_UNIT_CONVERSIONS || optimize_function_for_size_p (cfun)) - && reload_completed - && (SSE_REG_P (operands[0]) - || (GET_CODE (operands[0]) == SUBREG - && SSE_REG_P (SUBREG_REG (operands[0]))))" + && reload_completed && SSE_REG_P (operands[0])" [(set (match_dup 0) (float:MODEF (match_dup 1)))]) (define_insn "*float2_sse_nointerunit" @@ -4905,10 +4884,7 @@ (clobber (match_operand:SWI48 2 "memory_operand"))] "SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH && !(TARGET_INTER_UNIT_CONVERSIONS || optimize_function_for_size_p (cfun)) - && reload_completed - && (SSE_REG_P (operands[0]) - || (GET_CODE (operands[0]) == SUBREG - && SSE_REG_P (SUBREG_REG (operands[0]))))" + && reload_completed && SSE_REG_P (operands[0])" [(set (match_dup 2) (match_dup 1)) (set (match_dup 0) (float:MODEF (match_dup 2)))]) @@ -4917,10 +4893,7 @@ (float:MODEF (match_operand:SWI48 1 "memory_operand"))) (clobber (match_operand:SWI48 2 "memory_operand"))] "SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH - && reload_completed - && (SSE_REG_P (operands[0]) - || (GET_CODE (operands[0]) == SUBREG - && SSE_REG_P (SUBREG_REG (operands[0]))))" + && reload_completed && SSE_REG_P (operands[0])" [(set (match_dup 0) (float:MODEF (match_dup 1)))]) (define_insn "*float2_i387_with_temp" @@ -4968,6 +4941,46 @@ && reload_completed" [(set (match_dup 0) (float:X87MODEF (match_dup 1)))]) +;; Avoid partial SSE register dependency stalls + +(define_split + [(set (match_operand:MODEF 0 "register_operand") + (float:MODEF (match_operand:SI 1 "nonimmediate_operand")))] + "TARGET_SSE2 && TARGET_SSE_MATH + && TARGET_SSE_PARTIAL_REG_DEPENDENCY + && optimize_function_for_speed_p (cfun) + && reload_completed && SSE_REG_P (operands[0])" + [(set (match_dup 0) + (vec_merge: + (vec_duplicate: + (float:MODEF (match_dup 1))) + (match_dup 0) + (const_int 1)))] +{ + operands[0] = simplify_gen_subreg (mode, operands[0], + mode, 0); + emit_move_insn (operands[0], CONST0_RTX (mode)); +}) + +(define_split + [(set (match_operand:MODEF 0 "register_operand") + (float:MODEF (match_operand:DI 1 "nonimmediate_operand")))] + "TARGET_64BIT && TARGET_SSE2 && TARGET_SSE_MATH + && TARGET_SSE_PARTIAL_REG_DEPENDENCY + && optimize_function_for_speed_p (cfun) + && reload_completed && SSE_REG_P (operands[0])" + [(set (match_dup 0) + (vec_merge: + (vec_duplicate: + (float:MODEF (match_dup 1))) + (match_dup 0) + (const_int 1)))] +{ + operands[0] = simplify_gen_subreg (mode, operands[0], + mode, 0); + emit_move_insn (operands[0], CONST0_RTX (mode)); +}) + ;; Avoid store forwarding (partial memory) stall penalty ;; by passing DImode value through XMM registers. */