From patchwork Sun Oct 7 08:45:35 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 189794 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 7FD362C023B for ; Sun, 7 Oct 2012 19:45:53 +1100 (EST) Comment: DKIM? See http://www.dkim.org DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=gcc.gnu.org; s=default; x=1350204355; h=Comment: DomainKey-Signature:Received:Received:Received:Received:Received: Received:From:To:Mail-Followup-To:Cc:Subject:References:Date: In-Reply-To:Message-ID:User-Agent:MIME-Version:Content-Type: Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:Sender:Delivered-To; bh=YXsONQkRCADsCwlERMax w9NVano=; b=xBlCMzJ7esJrFy7q0GjtiDdm6TEWYfyRueRC7uN+wAwVg7dGK+Yy X60z/XJaykPPIdS4HsF9ydZ96P3PsYGch5Q1NapGCUD8ggOaNvO7u1f2AHl4WvvV OSBRTm6tSt3MehF9jRFdZeW8Zg97bwDiw2r0NDvUoYq171L6Aws1fsw= Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org; h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:Received:Received:From:To:Mail-Followup-To:Cc:Subject:References:Date:In-Reply-To:Message-ID:User-Agent:MIME-Version:Content-Type:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To; b=ZOqhwk2LulhpjWVepjnbkT6DJbqdM1e4JVHkggUpST59fqwHMZOnQCmug9mUMh bpc6Q9wWJ83aMAaok+DR7+r4OX/krPg9XIDxhCWDQOi5zeYWwTwStQWe/7IyZPV4 EnOs81qqN19c8FgwbFpX+56O56cJag33tgdhX8vTHtZJQ=; Received: (qmail 25186 invoked by alias); 7 Oct 2012 08:45:50 -0000 Received: (qmail 25173 invoked by uid 22791); 7 Oct 2012 08:45:49 -0000 X-SWARE-Spam-Status: No, hits=-4.3 required=5.0 tests=AWL, BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, KHOP_RCVD_TRUST, RCVD_IN_DNSWL_LOW, RCVD_IN_HOSTKARMA_YE, TW_MG X-Spam-Check-By: sourceware.org Received: from mail-wg0-f51.google.com (HELO mail-wg0-f51.google.com) (74.125.82.51) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Sun, 07 Oct 2012 08:45:42 +0000 Received: by mail-wg0-f51.google.com with SMTP id ed3so1501534wgb.8 for ; Sun, 07 Oct 2012 01:45:37 -0700 (PDT) Received: by 10.216.136.230 with SMTP id w80mr8321834wei.199.1349599537271; Sun, 07 Oct 2012 01:45:37 -0700 (PDT) Received: from localhost ([2.26.188.227]) by mx.google.com with ESMTPS id fb20sm14373500wid.1.2012.10.07.01.45.34 (version=TLSv1/SSLv3 cipher=OTHER); Sun, 07 Oct 2012 01:45:35 -0700 (PDT) From: Richard Sandiford To: "Maciej W. Rozycki" Mail-Followup-To: "Maciej W. Rozycki" , Sandra Loosemore , , rdsandiford@googlemail.com Cc: Sandra Loosemore , Subject: Re: PING Re: [PATCH, MIPS] add new peephole for 74k dspr2 References: <502D0DF4.3070302@codesourcery.com> <87zk5qu783.fsf@talisman.home> <5034EBC6.5080602@codesourcery.com> <87bohw1ecb.fsf@talisman.home> <5058ACDA.5060706@codesourcery.com> <87lig719ig.fsf@talisman.home> <87sja7tadr.fsf@talisman.home> <87obkuttj6.fsf@talisman.home> <87k3vits62.fsf@talisman.home> Date: Sun, 07 Oct 2012 09:45:35 +0100 In-Reply-To: (Maciej W. Rozycki's message of "Tue, 25 Sep 2012 18:57:19 +0100") Message-ID: <87bogeoekw.fsf@talisman.home> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.3 (gnu/linux) MIME-Version: 1.0 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org "Maciej W. Rozycki" writes: > On Tue, 25 Sep 2012, Richard Sandiford wrote: >> Although I see the 4kp with its 32-cycle MULTs and MADDs is one where >> MULT $0,$0 would be a really bad choice. Sigh. The amount of effort >> required for this optimisation is getting a bit ridiculous. > > I have double-checked some documentation, and in fact many MIPS cores, > including the current ones, have a configuration option to include either > a high-performance or an area-efficient MD unit. Take the M14Kc for > example -- its high-performance unit has a one-cycle latency/issue rate > for 16-bit multiplication (two-cycle for full 32 bits; here the width of > rt is explicitly named) and the area-efficient has a 32-cycle > latency/issue rate only regardless of the operand size (obviously > iterating over addition one bit at a time). The latency of MTHI/MTLO is 1 > across both units. > > So I think this can't really be selected automatically for all cores, > some human-supplied knowledge about the MD unit used is required -- that > obviously affects other operations too, e.g. some multiplications > involving a constant that may be cheaper to do either directly or with a > sequence of additions depending on the MD unit present (unless optimising > for size, of course). Yeah, as far as this optimisation goes, I think your original suggestion of using the DFA to work out the best sequence is still right. If the DFA says that multiplications take a handful of cycles when they actually take 32 then we're not going to optimise multiplication very well whatever happens. In practice, code destined for non-4kc cores with iterative multipliers would probably tune well with -mtune=4kp. Anyway, I said upthread that I was looking into removing MD0_REG and MD1_REG because I thought I'd seen a case where that was needed for madd-9.c to pass. I can no longer reproduce that, so it was probably pilot error. The final patch ended up being much more complicated than I'd have liked, but at least it should be easier to add other automatically-derived tuning costs in future. Tested on mipsisa64-elf and mipsisa32-elf. Applied. Richard gcc/ * config/mips/mips-protos.h (mips_split_type): New enum. (mips_split_64bit_move_p, mips_split_doubleword_move): Delete. (mips_split_move_p, mips_split_move, mips_split_move_insn_p) (mips_split_move_insn): Declare. * config/mips/mips.c (mips_tuning_info): New variable. (mips_load_store_insns): Use mips_split_move_insn_p instead of mips_split_64bit_move_p. (mips_emit_move_or_split, mips_mult_move_p): New functions. (mips_split_64bit_move_p): Rename to... (mips_split_move_p): ...this and take a mips_split_type argument. Generalize to all moves. Call mips_mult_move_p. (mips_split_doubleword_move): Rename to... (mips_split_move): ...this and take a mips_split_type argument. Assert that mips_split_move_p holds. (mips_insn_split_type, mips_split_move_insn_p, mips_split_move_insn): New functions. (mips_output_move): Use mips_split_move_p instead of mips_split_64bit_move_p. Handle MULT $0, $0 moves. (mips_save_reg): Use mips_emit_move_or_split. (mips_sim_reset): Assign to curr_state. Call targetm.sched.init and advance_state. (mips_sim_init): Call targetm.sched.init_dfa_pre_cycle_insn and targetm.sched.init_dfa_post_cycle_insn, if defined. (mips_sim_next_cycle): Assign to curr_state. Use advance_state instead of state_transition. (mips_sim_issue_insn): Assign to curr_state. Use targetm.sched.variable_issue to see how many more insns can be issued. (mips_seq_time, mips_mult_zero_zero_cost) (mips_set_fast_mult_zero_zero_p, mips_set_tuning_info) (mips_expand_to_rtl_hook): New functions. (TARGET_EXPAND_TO_RTL_HOOK): Define. * config/mips/mips.md (move_type): Add imul. (type): Map imul move_types to imul. (*movdi_32bit, *movti): Add imul alternatives. Use mips_split_move_insn_p and mips_split_move_insn instead of mips_split_64bit_move_p and mips_split_doubleword_move in move splitters. gcc/testsuite/ 2012-10-07 Richard Sandiford Sandra Loosemore * gcc.target/mips/madd-9.c: Force code to be tuned for the 4kc and test that the accumulator is initialized using MULT. * gcc.target/mips/mips32-dsp-accinit-1.c: New test. * gcc.target/mips/mips32-dsp-accinit-2.c: Likewise. Index: gcc/config/mips/mips-protos.h =================================================================== --- gcc/config/mips/mips-protos.h 2012-10-06 17:26:43.408618293 +0100 +++ gcc/config/mips/mips-protos.h 2012-10-07 08:21:40.892791332 +0100 @@ -173,6 +173,25 @@ enum mips_call_type { MIPS_CALL_EPILOGUE }; +/* Controls the conditions under which certain instructions are split. + + SPLIT_IF_NECESSARY + Only perform splits that are necessary for correctness + (because no unsplit version exists). + + SPLIT_FOR_SPEED + Perform splits that are necessary for correctness or + beneficial for code speed. + + SPLIT_FOR_SIZE + Perform splits that are necessary for correctness or + beneficial for code size. */ +enum mips_split_type { + SPLIT_IF_NECESSARY, + SPLIT_FOR_SPEED, + SPLIT_FOR_SIZE +}; + extern bool mips_symbolic_constant_p (rtx, enum mips_symbol_context, enum mips_symbol_type *); extern int mips_regno_mode_ok_for_base_p (int, enum machine_mode, bool); @@ -212,8 +231,10 @@ extern int m16_simm8_8 (rtx, enum machin extern int m16_nsimm8_8 (rtx, enum machine_mode); extern rtx mips_subword (rtx, bool); -extern bool mips_split_64bit_move_p (rtx, rtx); -extern void mips_split_doubleword_move (rtx, rtx); +extern bool mips_split_move_p (rtx, rtx, enum mips_split_type); +extern void mips_split_move (rtx, rtx, enum mips_split_type); +extern bool mips_split_move_insn_p (rtx, rtx, rtx); +extern void mips_split_move_insn (rtx, rtx, rtx); extern const char *mips_output_move (rtx, rtx); extern bool mips_cfun_has_cprestore_slot_p (void); extern bool mips_cprestore_address_p (rtx, bool); Index: gcc/config/mips/mips.c =================================================================== --- gcc/config/mips/mips.c 2012-10-06 22:24:11.262714074 +0100 +++ gcc/config/mips/mips.c 2012-10-07 09:38:39.656797917 +0100 @@ -265,6 +265,24 @@ static const char *const mips_fp_conditi MIPS_FP_CONDITIONS (STRINGIFY) }; +/* Tuning information that is automatically derived from other sources + (such as the scheduler). */ +static struct { + /* The architecture and tuning settings that this structure describes. */ + enum processor arch; + enum processor tune; + + /* True if this structure describes MIPS16 settings. */ + bool mips16_p; + + /* True if the structure has been initialized. */ + bool initialized_p; + + /* True if "MULT $0, $0" is preferable to "MTLO $0; MTHI $0" + when optimizing for speed. */ + bool fast_mult_zero_zero_p; +} mips_tuning_info; + /* Information about a function's frame layout. */ struct GTY(()) mips_frame_info { /* The size of the frame in bytes. */ @@ -2395,11 +2413,11 @@ mips_load_store_insns (rtx mem, rtx insn mode = GET_MODE (mem); /* Try to prove that INSN does not need to be split. */ - might_split_p = true; - if (GET_MODE_BITSIZE (mode) == 64) + might_split_p = GET_MODE_SIZE (mode) > UNITS_PER_WORD; + if (might_split_p) { set = single_set (insn); - if (set && !mips_split_64bit_move_p (SET_DEST (set), SET_SRC (set))) + if (set && !mips_split_move_insn_p (SET_DEST (set), SET_SRC (set), insn)) might_split_p = false; } @@ -2441,6 +2459,18 @@ mips_emit_move (rtx dest, rtx src) : emit_move_insn_1 (dest, src)); } +/* Emit a move from SRC to DEST, splitting compound moves into individual + instructions. SPLIT_TYPE is the type of split to perform. */ + +static void +mips_emit_move_or_split (rtx dest, rtx src, enum mips_split_type split_type) +{ + if (mips_split_move_p (dest, src, split_type)) + mips_split_move (dest, src, split_type); + else + mips_emit_move (dest, src); +} + /* Emit an instruction of the form (set TARGET (CODE OP0)). */ static void @@ -4119,39 +4149,60 @@ mips_subword (rtx op, bool high_p) return simplify_gen_subreg (word_mode, op, mode, byte); } -/* Return true if a 64-bit move from SRC to DEST should be split into two. */ +/* Return true if SRC should be moved into DEST using "MULT $0, $0". + SPLIT_TYPE is the condition under which moves should be split. */ + +static bool +mips_mult_move_p (rtx dest, rtx src, enum mips_split_type split_type) +{ + return ((split_type != SPLIT_FOR_SPEED + || mips_tuning_info.fast_mult_zero_zero_p) + && src == const0_rtx + && REG_P (dest) + && GET_MODE_SIZE (GET_MODE (dest)) == 2 * UNITS_PER_WORD + && (ISA_HAS_DSP_MULT + ? ACC_REG_P (REGNO (dest)) + : MD_REG_P (REGNO (dest)))); +} + +/* Return true if a move from SRC to DEST should be split into two. + SPLIT_TYPE describes the split condition. */ bool -mips_split_64bit_move_p (rtx dest, rtx src) +mips_split_move_p (rtx dest, rtx src, enum mips_split_type split_type) { - if (TARGET_64BIT) + /* Check whether the move can be done using some variant of MULT $0,$0. */ + if (mips_mult_move_p (dest, src, split_type)) return false; /* FPR-to-FPR moves can be done in a single instruction, if they're allowed at all. */ - if (FP_REG_RTX_P (src) && FP_REG_RTX_P (dest)) + unsigned int size = GET_MODE_SIZE (GET_MODE (dest)); + if (size == 8 && FP_REG_RTX_P (src) && FP_REG_RTX_P (dest)) return false; /* Check for floating-point loads and stores. */ - if (ISA_HAS_LDC1_SDC1) + if (size == 8 && ISA_HAS_LDC1_SDC1) { if (FP_REG_RTX_P (dest) && MEM_P (src)) return false; if (FP_REG_RTX_P (src) && MEM_P (dest)) return false; } - return true; + + /* Otherwise split all multiword moves. */ + return size > UNITS_PER_WORD; } -/* Split a doubleword move from SRC to DEST. On 32-bit targets, - this function handles 64-bit moves for which mips_split_64bit_move_p - holds. For 64-bit targets, this function handles 128-bit moves. */ +/* Split a move from SRC to DEST, given that mips_split_move_p holds. + SPLIT_TYPE describes the split condition. */ void -mips_split_doubleword_move (rtx dest, rtx src) +mips_split_move (rtx dest, rtx src, enum mips_split_type split_type) { rtx low_dest; + gcc_checking_assert (mips_split_move_p (dest, src, split_type)); if (FP_REG_RTX_P (dest) || FP_REG_RTX_P (src)) { if (!TARGET_64BIT && GET_MODE (dest) == DImode) @@ -4206,6 +4257,41 @@ mips_split_doubleword_move (rtx dest, rt } } } + +/* Return the split type for instruction INSN. */ + +static enum mips_split_type +mips_insn_split_type (rtx insn) +{ + basic_block bb = BLOCK_FOR_INSN (insn); + if (bb) + { + if (optimize_bb_for_speed_p (bb)) + return SPLIT_FOR_SPEED; + else + return SPLIT_FOR_SIZE; + } + /* Once CFG information has been removed, we should trust the optimization + decisions made by previous passes and only split where necessary. */ + return SPLIT_IF_NECESSARY; +} + +/* Return true if a move from SRC to DEST in INSN should be split. */ + +bool +mips_split_move_insn_p (rtx dest, rtx src, rtx insn) +{ + return mips_split_move_p (dest, src, mips_insn_split_type (insn)); +} + +/* Split a move from SRC to DEST in INSN, given that mips_split_move_insn_p + holds. */ + +void +mips_split_move_insn (rtx dest, rtx src, rtx insn) +{ + mips_split_move (dest, src, mips_insn_split_type (insn)); +} /* Return the appropriate instructions to move SRC into DEST. Assume that SRC is operand 1 and DEST is operand 0. */ @@ -4223,7 +4309,7 @@ mips_output_move (rtx dest, rtx src) mode = GET_MODE (dest); dbl_p = (GET_MODE_SIZE (mode) == 8); - if (dbl_p && mips_split_64bit_move_p (dest, src)) + if (mips_split_move_p (dest, src, SPLIT_IF_NECESSARY)) return "#"; if ((src_code == REG && GP_REG_P (REGNO (src))) @@ -4234,6 +4320,14 @@ mips_output_move (rtx dest, rtx src) if (GP_REG_P (REGNO (dest))) return "move\t%0,%z1"; + if (mips_mult_move_p (dest, src, SPLIT_IF_NECESSARY)) + { + if (ISA_HAS_DSP_MULT) + return "mult\t%q0,%.,%."; + else + return "mult\t%.,%."; + } + /* Moves to HI are handled by special .md insns. */ if (REGNO (dest) == LO_REGNUM) return "mtlo\t%z1"; @@ -10444,10 +10538,7 @@ mips_save_reg (rtx reg, rtx mem) { rtx x1, x2; - if (mips_split_64bit_move_p (mem, reg)) - mips_split_doubleword_move (mem, reg); - else - mips_emit_move (mem, reg); + mips_emit_move_or_split (mem, reg, SPLIT_IF_NECESSARY); x1 = mips_frame_set (mips_subword (mem, false), mips_subword (reg, false)); @@ -14907,10 +14998,15 @@ struct mips_sim { static void mips_sim_reset (struct mips_sim *state) { + curr_state = state->dfa_state; + state->time = 0; state->insns_left = state->issue_rate; memset (&state->last_set, 0, sizeof (state->last_set)); - state_reset (state->dfa_state); + state_reset (curr_state); + + targetm.sched.init (0, false, 0); + advance_state (curr_state); } /* Initialize STATE before its first use. DFA_STATE points to an @@ -14919,6 +15015,12 @@ mips_sim_reset (struct mips_sim *state) static void mips_sim_init (struct mips_sim *state, state_t dfa_state) { + if (targetm.sched.init_dfa_pre_cycle_insn) + targetm.sched.init_dfa_pre_cycle_insn (); + + if (targetm.sched.init_dfa_post_cycle_insn) + targetm.sched.init_dfa_post_cycle_insn (); + state->issue_rate = mips_issue_rate (); state->dfa_state = dfa_state; mips_sim_reset (state); @@ -14929,9 +15031,11 @@ mips_sim_init (struct mips_sim *state, s static void mips_sim_next_cycle (struct mips_sim *state) { + curr_state = state->dfa_state; + state->time++; state->insns_left = state->issue_rate; - state_transition (state->dfa_state, 0); + advance_state (curr_state); } /* Advance simulation state STATE until instruction INSN can read @@ -15037,8 +15141,11 @@ mips_sim_record_set (rtx x, const_rtx pa static void mips_sim_issue_insn (struct mips_sim *state, rtx insn) { - state_transition (state->dfa_state, insn); - state->insns_left--; + curr_state = state->dfa_state; + + state_transition (curr_state, insn); + state->insns_left = targetm.sched.variable_issue (0, false, insn, + state->insns_left); mips_sim_insn = insn; note_stores (PATTERN (insn), mips_sim_record_set, state); @@ -15089,6 +15196,109 @@ mips_sim_finish_insn (struct mips_sim *s break; } } + +/* Use simulator state STATE to calculate the execution time of + instruction sequence SEQ. */ + +static unsigned int +mips_seq_time (struct mips_sim *state, rtx seq) +{ + mips_sim_reset (state); + for (rtx insn = seq; insn; insn = NEXT_INSN (insn)) + { + mips_sim_wait_insn (state, insn); + mips_sim_issue_insn (state, insn); + } + return state->time; +} + +/* Return the execution-time cost of mips_tuning_info.fast_mult_zero_zero_p + setting SETTING, using STATE to simulate instruction sequences. */ + +static unsigned int +mips_mult_zero_zero_cost (struct mips_sim *state, bool setting) +{ + mips_tuning_info.fast_mult_zero_zero_p = setting; + start_sequence (); + + enum machine_mode dword_mode = TARGET_64BIT ? TImode : DImode; + rtx hilo = gen_rtx_REG (dword_mode, MD_REG_FIRST); + mips_emit_move_or_split (hilo, const0_rtx, SPLIT_FOR_SPEED); + + /* If the target provides mulsidi3_32bit then that's the most likely + consumer of the result. Test for bypasses. */ + if (dword_mode == DImode && HAVE_maddsidi4) + { + rtx gpr = gen_rtx_REG (SImode, GP_REG_FIRST + 4); + emit_insn (gen_maddsidi4 (hilo, gpr, gpr, hilo)); + } + + unsigned int time = mips_seq_time (state, get_insns ()); + end_sequence (); + return time; +} + +/* Check the relative speeds of "MULT $0,$0" and "MTLO $0; MTHI $0" + and set up mips_tuning_info.fast_mult_zero_zero_p accordingly. + Prefer MULT -- which is shorter -- in the event of a tie. */ + +static void +mips_set_fast_mult_zero_zero_p (struct mips_sim *state) +{ + if (TARGET_MIPS16) + /* No MTLO or MTHI available. */ + mips_tuning_info.fast_mult_zero_zero_p = true; + else + { + unsigned int true_time = mips_mult_zero_zero_cost (state, true); + unsigned int false_time = mips_mult_zero_zero_cost (state, false); + mips_tuning_info.fast_mult_zero_zero_p = (true_time <= false_time); + } +} + +/* Set up costs based on the current architecture and tuning settings. */ + +static void +mips_set_tuning_info (void) +{ + if (mips_tuning_info.initialized_p + && mips_tuning_info.arch == mips_arch + && mips_tuning_info.tune == mips_tune + && mips_tuning_info.mips16_p == TARGET_MIPS16) + return; + + mips_tuning_info.arch = mips_arch; + mips_tuning_info.tune = mips_tune; + mips_tuning_info.mips16_p = TARGET_MIPS16; + mips_tuning_info.initialized_p = true; + + dfa_start (); + + struct mips_sim state; + mips_sim_init (&state, alloca (state_size ())); + + mips_set_fast_mult_zero_zero_p (&state); + + dfa_finish (); +} + +/* Implement TARGET_EXPAND_TO_RTL_HOOK. */ + +static void +mips_expand_to_rtl_hook (void) +{ + /* We need to call this at a point where we can safely create sequences + of instructions, so TARGET_OVERRIDE_OPTIONS is too early. We also + need to call it at a point where the DFA infrastructure is not + already in use, so we can't just call it lazily on demand. + + At present, mips_tuning_info is only needed during post-expand + RTL passes such as split_insns, so this hook should be early enough. + We may need to move the call elsewhere if mips_tuning_info starts + to be used for other things (such as rtx_costs, or expanders that + could be called during gimple optimization). */ + mips_set_tuning_info (); +} /* The VR4130 pipeline issues aligned pairs of instructions together, but it stalls the second instruction if it depends on the first. @@ -17760,6 +17970,8 @@ #define TARGET_MACHINE_DEPENDENT_REORG m #undef TARGET_PREFERRED_RELOAD_CLASS #define TARGET_PREFERRED_RELOAD_CLASS mips_preferred_reload_class +#undef TARGET_EXPAND_TO_RTL_HOOK +#define TARGET_EXPAND_TO_RTL_HOOK mips_expand_to_rtl_hook #undef TARGET_ASM_FILE_START #define TARGET_ASM_FILE_START mips_file_start #undef TARGET_ASM_FILE_START_FILE_DIRECTIVE Index: gcc/config/mips/mips.md =================================================================== --- gcc/config/mips/mips.md 2012-10-06 21:08:44.902431555 +0100 +++ gcc/config/mips/mips.md 2012-10-06 22:24:24.854713317 +0100 @@ -204,7 +204,7 @@ (define_attr "jal_macro" "no,yes" ;; the split instructions; in some cases, it is more appropriate for the ;; scheduling type to be "multi" instead. (define_attr "move_type" - "unknown,load,fpload,store,fpstore,mtc,mfc,mtlo,mflo,move,fmove, + "unknown,load,fpload,store,fpstore,mtc,mfc,mtlo,mflo,imul,move,fmove, const,constN,signext,ext_ins,logical,arith,sll0,andi,loadpool, shift_shift" (const_string "unknown")) @@ -369,6 +369,7 @@ (define_attr "type" (eq_attr "move_type" "mflo") (const_string "mflo") ;; These types of move are always single insns. + (eq_attr "move_type" "imul") (const_string "imul") (eq_attr "move_type" "fmove") (const_string "fmove") (eq_attr "move_type" "loadpool") (const_string "load") (eq_attr "move_type" "signext") (const_string "signext") @@ -4242,14 +4243,17 @@ (define_insn "*mov_ra" (set_attr "mode" "")]) (define_insn "*movdi_32bit" - [(set (match_operand:DI 0 "nonimmediate_operand" "=d,d,d,m,*a,*d,*f,*f,*d,*m,*B*C*D,*B*C*D,*d,*m") - (match_operand:DI 1 "move_operand" "d,i,m,d,*J*d,*a,*J*d,*m,*f,*f,*d,*m,*B*C*D,*B*C*D"))] + [(set (match_operand:DI 0 "nonimmediate_operand" "=d,d,d,m,*a,*a,*d,*f,*f,*d,*m,*B*C*D,*B*C*D,*d,*m") + (match_operand:DI 1 "move_operand" "d,i,m,d,*J,*d,*a,*J*d,*m,*f,*f,*d,*m,*B*C*D,*B*C*D"))] "!TARGET_64BIT && !TARGET_MIPS16 && (register_operand (operands[0], DImode) || reg_or_0_operand (operands[1], DImode))" { return mips_output_move (operands[0], operands[1]); } - [(set_attr "move_type" "move,const,load,store,mtlo,mflo,mtc,fpload,mfc,fpstore,mtc,fpload,mfc,fpstore") - (set_attr "mode" "DI")]) + [(set_attr "move_type" "move,const,load,store,imul,mtlo,mflo,mtc,fpload,mfc,fpstore,mtc,fpload,mfc,fpstore") + (set (attr "mode") + (if_then_else (eq_attr "move_type" "imul") + (const_string "SI") + (const_string "DI")))]) (define_insn "*movdi_32bit_mips16" [(set (match_operand:DI 0 "nonimmediate_operand" "=d,y,d,d,d,d,m,*d") @@ -4695,15 +4699,18 @@ (define_expand "movti" }) (define_insn "*movti" - [(set (match_operand:TI 0 "nonimmediate_operand" "=d,d,d,m,*a,*d") - (match_operand:TI 1 "move_operand" "d,i,m,dJ,*d*J,*a"))] + [(set (match_operand:TI 0 "nonimmediate_operand" "=d,d,d,m,*a,*a,*d") + (match_operand:TI 1 "move_operand" "d,i,m,dJ,*J,*d,*a"))] "TARGET_64BIT && !TARGET_MIPS16 && (register_operand (operands[0], TImode) || reg_or_0_operand (operands[1], TImode))" - "#" - [(set_attr "move_type" "move,const,load,store,mtlo,mflo") - (set_attr "mode" "TI")]) + { return mips_output_move (operands[0], operands[1]); } + [(set_attr "move_type" "move,const,load,store,imul,mtlo,mflo") + (set (attr "mode") + (if_then_else (eq_attr "move_type" "imul") + (const_string "SI") + (const_string "TI")))]) (define_insn "*movti_mips16" [(set (match_operand:TI 0 "nonimmediate_operand" "=d,y,d,d,d,d,m,*d") @@ -4753,21 +4760,20 @@ (define_insn "*movtf_mips16" (define_split [(set (match_operand:MOVE64 0 "nonimmediate_operand") (match_operand:MOVE64 1 "move_operand"))] - "reload_completed && !TARGET_64BIT - && mips_split_64bit_move_p (operands[0], operands[1])" + "reload_completed && mips_split_move_insn_p (operands[0], operands[1], insn)" [(const_int 0)] { - mips_split_doubleword_move (operands[0], operands[1]); + mips_split_move_insn (operands[0], operands[1], curr_insn); DONE; }) (define_split [(set (match_operand:MOVE128 0 "nonimmediate_operand") (match_operand:MOVE128 1 "move_operand"))] - "TARGET_64BIT && reload_completed" + "reload_completed && mips_split_move_insn_p (operands[0], operands[1], insn)" [(const_int 0)] { - mips_split_doubleword_move (operands[0], operands[1]); + mips_split_move_insn (operands[0], operands[1], curr_insn); DONE; }) Index: gcc/testsuite/gcc.target/mips/madd-9.c =================================================================== --- gcc/testsuite/gcc.target/mips/madd-9.c 2012-10-06 17:26:43.407618293 +0100 +++ gcc/testsuite/gcc.target/mips/madd-9.c 2012-10-06 22:24:24.855713317 +0100 @@ -1,7 +1,13 @@ /* { dg-do compile } */ -/* { dg-options "isa_rev>=1 -mgp32" } */ -/* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */ +/* { dg-options "isa_rev>=1 -mgp32 -mtune=4kc" } */ +/* References to X within the loop need to have a higher frequency than + references to X outside the loop, otherwise there is no reason + to prefer multiply/accumulator registers over GPRs. */ +/* { dg-skip-if "requires register frequencies" { *-*-* } { "-O0" "-Os" } { "" } } */ /* { dg-final { scan-assembler-not "\tmul\t" } } */ +/* { dg-final { scan-assembler-not "\tmthi" } } */ +/* { dg-final { scan-assembler-not "\tmtlo" } } */ +/* { dg-final { scan-assembler "\tmult\t" } } */ /* { dg-final { scan-assembler "\tmadd\t" } } */ NOMIPS16 long long Index: gcc/testsuite/gcc.target/mips/mips32-dsp-accinit-1.c =================================================================== --- /dev/null 2012-10-06 09:26:21.400998949 +0100 +++ gcc/testsuite/gcc.target/mips/mips32-dsp-accinit-1.c 2012-10-06 22:24:24.856713316 +0100 @@ -0,0 +1,22 @@ +/* { dg-options "-mdspr2 -mgp32 -mtune=74kc" } */ +/* References to RESULT within the loop need to have a higher frequency than + references to RESULT outside the loop, otherwise there is no reason + to prefer multiply/accumulator registers over GPRs. */ +/* { dg-skip-if "requires register frequencies" { *-*-* } { "-O0" "-Os" } { "" } } */ + +/* Check that the zero-initialization of the accumulator feeding into + the madd is done by means of a mult instruction instead of mthi/mtlo. */ + +NOMIPS16 long long f (int n, int *v, int m) +{ + long long result = 0; + int i; + + for (i = 0; i < n; i++) + result = __builtin_mips_madd (result, v[i], m); + return result; +} + +/* { dg-final { scan-assembler "\tmult\t\\\$ac.,\\\$0,\\\$0" } } */ +/* { dg-final { scan-assembler-not "mthi\t" } } */ +/* { dg-final { scan-assembler-not "mtlo\t" } } */ Index: gcc/testsuite/gcc.target/mips/mips32-dsp-accinit-2.c =================================================================== --- /dev/null 2012-10-06 09:26:21.400998949 +0100 +++ gcc/testsuite/gcc.target/mips/mips32-dsp-accinit-2.c 2012-10-06 22:24:24.857713316 +0100 @@ -0,0 +1,22 @@ +/* { dg-options "-mdspr2 -mgp32 -mtune=4kp" } */ +/* References to RESULT within the loop need to have a higher frequency than + references to RESULT outside the loop, otherwise there is no reason + to prefer multiply/accumulator registers over GPRs. */ +/* { dg-skip-if "requires register frequencies" { *-*-* } { "-O0" "-Os" } { "" } } */ + +/* Check that the zero-initialization of the accumulator feeding into + the madd is done by means of a mult instruction instead of mthi/mtlo. */ + +NOMIPS16 long long f (int n, int *v, int m) +{ + long long result = 0; + int i; + + for (i = 0; i < n; i++) + result = __builtin_mips_madd (result, v[i], m); + return result; +} + +/* { dg-final { scan-assembler-not "mult\t\[^\n\]*\\\$0" } } */ +/* { dg-final { scan-assembler "\tmthi\t" } } */ +/* { dg-final { scan-assembler "\tmtlo\t" } } */