From patchwork Wed May 9 17:54:12 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Uros Bizjak X-Patchwork-Id: 158021 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 8B215B6F6E for ; Thu, 10 May 2012 03:54:35 +1000 (EST) Comment: DKIM? See http://www.dkim.org DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=gcc.gnu.org; s=default; x=1337190875; h=Comment: DomainKey-Signature:Received:Received:Received:Received: MIME-Version:Received:Received:Date:Message-ID:Subject:From:To: Cc:Content-Type:Mailing-List:Precedence:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:Sender:Delivered-To; bh=LXlrZbF glcQWOVOuvyVSOF2nKmQ=; b=bvW5hWS9ImjB40YXG23zlPT9iB2HL2rqJrY34aM 9LS2KY71wdhZ8dJp1Ld2NyrxBi6zDUzhaiReSDm8xQlUWI53hauXFs0uR8PYVHuC matKxTwdge5UNu7LqxvKgvRINlffLtBps6E09GugfOxgz/xGBYOx4af1U7n8RY1R 0Pwc= Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org; h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:MIME-Version:Received:Received:Date:Message-ID:Subject:From:To:Cc:Content-Type:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To; b=oL9RNjVtzou04cqw/+J3N+/gznAvc1jzoihIuZX5EmadSxrxQ8TlVk/dpddSfj cJOs/TUROx0IX+My/uHolhZS8dLfgEUYYHMAVAGTNSRr+xlh8BEjlOjh7N0N108O jnothaTjcDfLWkgoIUlt01zZMnJOnPtq6KfWP6DPatW90=; Received: (qmail 16422 invoked by alias); 9 May 2012 17:54:30 -0000 Received: (qmail 16411 invoked by uid 22791); 9 May 2012 17:54:27 -0000 X-SWARE-Spam-Status: No, hits=-4.0 required=5.0 tests=AWL, BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, KHOP_RCVD_TRUST, RCVD_IN_DNSWL_LOW, RCVD_IN_HOSTKARMA_YE, TW_AV, TW_BD, TW_VD, TW_VX, TW_ZJ X-Spam-Check-By: sourceware.org Received: from mail-yx0-f175.google.com (HELO mail-yx0-f175.google.com) (209.85.213.175) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 09 May 2012 17:54:13 +0000 Received: by yenm3 with SMTP id m3so599935yen.20 for ; Wed, 09 May 2012 10:54:12 -0700 (PDT) MIME-Version: 1.0 Received: by 10.236.182.131 with SMTP id o3mr1181977yhm.113.1336586052752; Wed, 09 May 2012 10:54:12 -0700 (PDT) Received: by 10.146.124.5 with HTTP; Wed, 9 May 2012 10:54:12 -0700 (PDT) Date: Wed, 9 May 2012 19:54:12 +0200 Message-ID: Subject: [PATCH, i386]: Fix PR 44141, Redundant loads and stores generated for AMD bdver1 target From: Uros Bizjak To: gcc-patches@gcc.gnu.org Cc: "Kumar, Venkataramanan" , Ulrich Weigand Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hello! Attached patch avoids a deficiency in reload, where reload gives up on handling subregs of pseudos (please see the PR [1] for explanation by Ulrich). The patch simply avoids generating V4SF moves with V4SF subregs of V2DF values unless really necessary (i.e. moving SSE2 modes without SSE2 enabled, which shouldn't happen anyway). With patched gcc, expand pass emits (unaligned) moves in their original mode, and this mode is kept until asm is generated. The asm instruction is chosen according to the mode of insn pattern, and the mode is calculated using various influencing conditions. 2012-05-09 Uros Bizjak PR target/44141 * config/i386/i386.c (ix86_expand_vector_move_misalign): Do not handle 128 bit vectors specially for TARGET_AVX. Emit sse2_movupd and sse_movupd RTXes for TARGET_AVX, TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL or when optimizing for size. * config/i386/sse.md (*mov_internal): Remove TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL handling from asm output code. Calculate "mode" attribute according to optimize_function_for_size_p and TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL flag. (*_movu): Choose asm template depending on the mode of the instruction. Calculate "mode" attribute according to optimize_function_for_size_p, TARGET_SSE_TYPELESS_STORES and TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL flags. (*_movdqu): Ditto. Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}. The patch also fixes the testcase from the PR. Patch will be committed to mainline SVN. [1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44141#c16 Uros. Index: config/i386/sse.md =================================================================== --- config/i386/sse.md (revision 187286) +++ config/i386/sse.md (working copy) @@ -449,8 +449,6 @@ && (misaligned_operand (operands[0], mode) || misaligned_operand (operands[1], mode))) return "vmovupd\t{%1, %0|%0, %1}"; - else if (TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL) - return "%vmovaps\t{%1, %0|%0, %1}"; else return "%vmovapd\t{%1, %0|%0, %1}"; @@ -460,8 +458,6 @@ && (misaligned_operand (operands[0], mode) || misaligned_operand (operands[1], mode))) return "vmovdqu\t{%1, %0|%0, %1}"; - else if (TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL) - return "%vmovaps\t{%1, %0|%0, %1}"; else return "%vmovdqa\t{%1, %0|%0, %1}"; @@ -475,19 +471,21 @@ [(set_attr "type" "sselog1,ssemov,ssemov") (set_attr "prefix" "maybe_vex") (set (attr "mode") - (cond [(match_test "TARGET_AVX") + (cond [(and (eq_attr "alternative" "1,2") + (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL")) + (if_then_else + (match_test "GET_MODE_SIZE (mode) > 16") + (const_string "V8SF") + (const_string "V4SF")) + (match_test "TARGET_AVX") (const_string "") - (ior (ior (match_test "optimize_function_for_size_p (cfun)") - (not (match_test "TARGET_SSE2"))) + (ior (and (eq_attr "alternative" "1,2") + (match_test "optimize_function_for_size_p (cfun)")) (and (eq_attr "alternative" "2") (match_test "TARGET_SSE_TYPELESS_STORES"))) (const_string "V4SF") - (eq (const_string "mode") (const_string "V4SFmode")) - (const_string "V4SF") - (eq (const_string "mode") (const_string "V2DFmode")) - (const_string "V2DF") ] - (const_string "TI")))]) + (const_string "")))]) (define_insn "sse2_movq128" [(set (match_operand:V2DI 0 "register_operand" "=x") @@ -597,11 +595,33 @@ [(match_operand:VF 1 "nonimmediate_operand" "xm,x")] UNSPEC_MOVU))] "TARGET_SSE && !(MEM_P (operands[0]) && MEM_P (operands[1]))" - "%vmovu\t{%1, %0|%0, %1}" +{ + switch (get_attr_mode (insn)) + { + case MODE_V8SF: + case MODE_V4SF: + return "%vmovups\t{%1, %0|%0, %1}"; + default: + return "%vmovu\t{%1, %0|%0, %1}"; + } +} [(set_attr "type" "ssemov") (set_attr "movu" "1") (set_attr "prefix" "maybe_vex") - (set_attr "mode" "")]) + (set (attr "mode") + (cond [(match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL") + (if_then_else + (match_test "GET_MODE_SIZE (mode) > 16") + (const_string "V8SF") + (const_string "V4SF")) + (match_test "TARGET_AVX") + (const_string "") + (ior (match_test "optimize_function_for_size_p (cfun)") + (and (eq_attr "alternative" "1") + (match_test "TARGET_SSE_TYPELESS_STORES"))) + (const_string "V4SF") + ] + (const_string "")))]) (define_expand "_movdqu" [(set (match_operand:VI1 0 "nonimmediate_operand") @@ -618,7 +638,16 @@ (unspec:VI1 [(match_operand:VI1 1 "nonimmediate_operand" "xm,x")] UNSPEC_MOVU))] "TARGET_SSE2 && !(MEM_P (operands[0]) && MEM_P (operands[1]))" - "%vmovdqu\t{%1, %0|%0, %1}" +{ + switch (get_attr_mode (insn)) + { + case MODE_V8SF: + case MODE_V4SF: + return "%vmovups\t{%1, %0|%0, %1}"; + default: + return "%vmovdqu\t{%1, %0|%0, %1}"; + } +} [(set_attr "type" "ssemov") (set_attr "movu" "1") (set (attr "prefix_data16") @@ -627,7 +656,20 @@ (const_string "*") (const_string "1"))) (set_attr "prefix" "maybe_vex") - (set_attr "mode" "")]) + (set (attr "mode") + (cond [(match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL") + (if_then_else + (match_test "GET_MODE_SIZE (mode) > 16") + (const_string "V8SF") + (const_string "V4SF")) + (match_test "TARGET_AVX") + (const_string "") + (ior (match_test "optimize_function_for_size_p (cfun)") + (and (eq_attr "alternative" "1") + (match_test "TARGET_SSE_TYPELESS_STORES"))) + (const_string "V4SF") + ] + (const_string "")))]) (define_insn "_lddqu" [(set (match_operand:VI1 0 "register_operand" "=x") Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 187289) +++ config/i386/i386.c (working copy) @@ -15907,60 +15907,19 @@ ix86_expand_vector_move_misalign (enum machine_mod op0 = operands[0]; op1 = operands[1]; - if (TARGET_AVX) + if (TARGET_AVX + && GET_MODE_SIZE (mode) == 32) { switch (GET_MODE_CLASS (mode)) { case MODE_VECTOR_INT: case MODE_INT: - switch (GET_MODE_SIZE (mode)) - { - case 16: - if (TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL) - { - op0 = gen_lowpart (V4SFmode, op0); - op1 = gen_lowpart (V4SFmode, op1); - emit_insn (gen_sse_movups (op0, op1)); - } - else - { - op0 = gen_lowpart (V16QImode, op0); - op1 = gen_lowpart (V16QImode, op1); - emit_insn (gen_sse2_movdqu (op0, op1)); - } - break; - case 32: - op0 = gen_lowpart (V32QImode, op0); - op1 = gen_lowpart (V32QImode, op1); - ix86_avx256_split_vector_move_misalign (op0, op1); - break; - default: - gcc_unreachable (); - } - break; + op0 = gen_lowpart (V32QImode, op0); + op1 = gen_lowpart (V32QImode, op1); + /* FALLTHRU */ + case MODE_VECTOR_FLOAT: - switch (mode) - { - case V4SFmode: - emit_insn (gen_sse_movups (op0, op1)); - break; - case V2DFmode: - if (TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL) - { - op0 = gen_lowpart (V4SFmode, op0); - op1 = gen_lowpart (V4SFmode, op1); - emit_insn (gen_sse_movups (op0, op1)); - } - else - emit_insn (gen_sse2_movupd (op0, op1)); - break; - case V8SFmode: - case V4DFmode: - ix86_avx256_split_vector_move_misalign (op0, op1); - break; - default: - gcc_unreachable (); - } + ix86_avx256_split_vector_move_misalign (op0, op1); break; default: @@ -15972,16 +15931,6 @@ ix86_expand_vector_move_misalign (enum machine_mod if (MEM_P (op1)) { - /* If we're optimizing for size, movups is the smallest. */ - if (optimize_insn_for_size_p () - || TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL) - { - op0 = gen_lowpart (V4SFmode, op0); - op1 = gen_lowpart (V4SFmode, op1); - emit_insn (gen_sse_movups (op0, op1)); - return; - } - /* ??? If we have typed data, then it would appear that using movdqu is the only way to get unaligned data loaded with integer type. */ @@ -15989,16 +15938,19 @@ ix86_expand_vector_move_misalign (enum machine_mod { op0 = gen_lowpart (V16QImode, op0); op1 = gen_lowpart (V16QImode, op1); + /* We will eventually emit movups based on insn attributes. */ emit_insn (gen_sse2_movdqu (op0, op1)); - return; } - - if (TARGET_SSE2 && mode == V2DFmode) + else if (TARGET_SSE2 && mode == V2DFmode) { rtx zero; - if (TARGET_SSE_UNALIGNED_LOAD_OPTIMAL) + if (TARGET_AVX + || TARGET_SSE_UNALIGNED_LOAD_OPTIMAL + || TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL + || optimize_function_for_size_p (cfun)) { + /* We will eventually emit movups based on insn attributes. */ emit_insn (gen_sse2_movupd (op0, op1)); return; } @@ -16030,7 +15982,10 @@ ix86_expand_vector_move_misalign (enum machine_mod } else { - if (TARGET_SSE_UNALIGNED_LOAD_OPTIMAL) + if (TARGET_AVX + || TARGET_SSE_UNALIGNED_LOAD_OPTIMAL + || TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL + || optimize_function_for_size_p (cfun)) { op0 = gen_lowpart (V4SFmode, op0); op1 = gen_lowpart (V4SFmode, op1); @@ -16045,6 +16000,7 @@ ix86_expand_vector_move_misalign (enum machine_mod if (mode != V4SFmode) op0 = gen_lowpart (V4SFmode, op0); + m = adjust_address (op1, V2SFmode, 0); emit_insn (gen_sse_loadlps (op0, op0, m)); m = adjust_address (op1, V2SFmode, 8); @@ -16053,30 +16009,20 @@ ix86_expand_vector_move_misalign (enum machine_mod } else if (MEM_P (op0)) { - /* If we're optimizing for size, movups is the smallest. */ - if (optimize_insn_for_size_p () - || TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL) - { - op0 = gen_lowpart (V4SFmode, op0); - op1 = gen_lowpart (V4SFmode, op1); - emit_insn (gen_sse_movups (op0, op1)); - return; - } - - /* ??? Similar to above, only less clear - because of typeless stores. */ - if (TARGET_SSE2 && !TARGET_SSE_TYPELESS_STORES - && GET_MODE_CLASS (mode) == MODE_VECTOR_INT) + if (TARGET_SSE2 && GET_MODE_CLASS (mode) == MODE_VECTOR_INT) { op0 = gen_lowpart (V16QImode, op0); op1 = gen_lowpart (V16QImode, op1); + /* We will eventually emit movups based on insn attributes. */ emit_insn (gen_sse2_movdqu (op0, op1)); - return; } - - if (TARGET_SSE2 && mode == V2DFmode) + else if (TARGET_SSE2 && mode == V2DFmode) { - if (TARGET_SSE_UNALIGNED_STORE_OPTIMAL) + if (TARGET_AVX + || TARGET_SSE_UNALIGNED_STORE_OPTIMAL + || TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL + || optimize_function_for_size_p (cfun)) + /* We will eventually emit movups based on insn attributes. */ emit_insn (gen_sse2_movupd (op0, op1)); else { @@ -16091,7 +16037,10 @@ ix86_expand_vector_move_misalign (enum machine_mod if (mode != V4SFmode) op1 = gen_lowpart (V4SFmode, op1); - if (TARGET_SSE_UNALIGNED_STORE_OPTIMAL) + if (TARGET_AVX + || TARGET_SSE_UNALIGNED_STORE_OPTIMAL + || TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL + || optimize_function_for_size_p (cfun)) { op0 = gen_lowpart (V4SFmode, op0); emit_insn (gen_sse_movups (op0, op1));