From patchwork Thu Dec 13 17:03:30 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Uros Bizjak X-Patchwork-Id: 206174 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 491122C00A8 for ; Fri, 14 Dec 2012 04:03:47 +1100 (EST) Comment: DKIM? See http://www.dkim.org DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=gcc.gnu.org; s=default; x=1356023028; h=Comment: DomainKey-Signature:Received:Received:Received:Received: MIME-Version:Received:Received:In-Reply-To:References:Date: Message-ID:Subject:From:To:Cc:Content-Type:Mailing-List: Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:Sender:Delivered-To; bh=+k+Ts4qLCw+Hp5LhnFDjRFoTelQ=; b=HJEZb5TfxnbcQ7ce3rO2KJMHWbwallkCh3wR6Fk/Cyhwcb8f5YO5U8vh0vF25T qzmeoT9aCZh5nKMlh6vmesdVRLEYE1zuro+ZmIz8jFrlstnYHkzr0GN5h1f7UF+g pb2U28lwRGgvGKWymSYWBZkI5EOTCK/kExVTu8ZpTxK/E= Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org; h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:MIME-Version:Received:Received:In-Reply-To:References:Date:Message-ID:Subject:From:To:Cc:Content-Type:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To; b=jKJjOHcwAnSTHmnhUKveUKYcjpsEUuAArbGKOytmbkBLte5YaEhoSwpN3LNJ4o iNX9FjNg7EM0c/ehsKzNao8uWemXTt0tML+Yalah77bJf0lge4LqmsHyz+LmDZDG WR58EGeXBE4tWOw8hleaK6liNwNwvdajC8zgneFVEwrlc=; Received: (qmail 13013 invoked by alias); 13 Dec 2012 17:03:40 -0000 Received: (qmail 12999 invoked by uid 22791); 13 Dec 2012 17:03:37 -0000 X-SWARE-Spam-Status: No, hits=-5.0 required=5.0 tests=AWL, BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, KHOP_RCVD_TRUST, KHOP_THREADED, RCVD_IN_DNSWL_LOW, RCVD_IN_HOSTKARMA_YE X-Spam-Check-By: sourceware.org Received: from mail-oa0-f47.google.com (HELO mail-oa0-f47.google.com) (209.85.219.47) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 13 Dec 2012 17:03:31 +0000 Received: by mail-oa0-f47.google.com with SMTP id h1so2381773oag.20 for ; Thu, 13 Dec 2012 09:03:30 -0800 (PST) MIME-Version: 1.0 Received: by 10.60.31.51 with SMTP id x19mr2082744oeh.25.1355418210413; Thu, 13 Dec 2012 09:03:30 -0800 (PST) Received: by 10.182.153.201 with HTTP; Thu, 13 Dec 2012 09:03:30 -0800 (PST) In-Reply-To: References: <50C8DBB5.7000503@redhat.com> Date: Thu, 13 Dec 2012 18:03:30 +0100 Message-ID: Subject: Re: [PATCH,x86] Fix combine for condditional instructions. From: Uros Bizjak To: Yuri Rumyantsev Cc: Richard Henderson , Richard Biener , gcc-patches , Igor Zamyatin Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org On Thu, Dec 13, 2012 at 4:02 PM, Yuri Rumyantsev wrote: > We did not see any performance improvement on Atom in 32-bit mode at > routelookup from eembc_2_0 (eembc_1_1). I assume that for x86_64 the patch works as expected. Let's take a bigger hammer for 32bit targets - the splitter that effectively does the same as your proposed patch. Please note, that this splitter operates with optimize_function_for_speed_p predicate, so this transformation will take place in the whole function. Can you perhaps investigate what happens if this predicate is changed back to optimize_insn_for_speed_p () - this is what we would like to have here? ;; Don't do conditional moves with memory inputs. This splitter helps ;; register starved x86_32 by forcing inputs into registers before reload. (define_split [(set (match_operand:SWI248 0 "register_operand") (if_then_else:SWI248 (match_operator 1 "ix86_comparison_operator" [(reg FLAGS_REG) (const_int 0)]) (match_operand:SWI248 2 "nonimmediate_operand") (match_operand:SWI248 3 "nonimmediate_operand")))] "!TARGET_64BIT && TARGET_CMOVE && (MEM_P (operands[2]) || MEM_P (operands[3])) && can_create_pseudo_p () && optimize_function_for_speed_p (cfun)" [(set (match_dup 0) (if_then_else:SWI248 (match_dup 1) (match_dup 2) (match_dup 3)))] { if (MEM_P (operands[2])) operands[2] = force_reg (mode, operands[2]); if (MEM_P (operands[3])) operands[3] = force_reg (mode, operands[3]); }) Attached is the complete patch, including peephole2s. Uros. Index: i386.md =================================================================== --- i386.md (revision 194451) +++ i386.md (working copy) @@ -16093,6 +16093,27 @@ [(set_attr "type" "icmov") (set_attr "mode" "")]) +;; Don't do conditional moves with memory inputs. This splitter helps +;; register starved x86_32 by forcing inputs into registers before reload. +(define_split + [(set (match_operand:SWI248 0 "register_operand") + (if_then_else:SWI248 (match_operator 1 "ix86_comparison_operator" + [(reg FLAGS_REG) (const_int 0)]) + (match_operand:SWI248 2 "nonimmediate_operand") + (match_operand:SWI248 3 "nonimmediate_operand")))] + "!TARGET_64BIT && TARGET_CMOVE + && (MEM_P (operands[2]) || MEM_P (operands[3])) + && can_create_pseudo_p () + && optimize_function_for_speed_p (cfun)" + [(set (match_dup 0) + (if_then_else:SWI248 (match_dup 1) (match_dup 2) (match_dup 3)))] +{ + if (MEM_P (operands[2])) + operands[2] = force_reg (mode, operands[2]); + if (MEM_P (operands[3])) + operands[3] = force_reg (mode, operands[3]); +}) + (define_insn "*movqicc_noc" [(set (match_operand:QI 0 "register_operand" "=r,r") (if_then_else:QI (match_operator 1 "ix86_comparison_operator" @@ -16105,14 +16126,12 @@ (set_attr "mode" "QI")]) (define_split - [(set (match_operand 0 "register_operand") - (if_then_else (match_operator 1 "ix86_comparison_operator" - [(reg FLAGS_REG) (const_int 0)]) - (match_operand 2 "register_operand") - (match_operand 3 "register_operand")))] + [(set (match_operand:SWI12 0 "register_operand") + (if_then_else:SWI12 (match_operator 1 "ix86_comparison_operator" + [(reg FLAGS_REG) (const_int 0)]) + (match_operand:SWI12 2 "register_operand") + (match_operand:SWI12 3 "register_operand")))] "TARGET_CMOVE && !TARGET_PARTIAL_REG_STALL - && (GET_MODE (operands[0]) == QImode - || GET_MODE (operands[0]) == HImode) && reload_completed" [(set (match_dup 0) (if_then_else:SI (match_dup 1) (match_dup 2) (match_dup 3)))] @@ -16122,6 +16141,31 @@ operands[3] = gen_lowpart (SImode, operands[3]); }) +;; Don't do conditional moves with memory inputs +(define_peephole2 + [(match_scratch:SWI248 2 "r") + (set (match_operand:SWI248 0 "register_operand") + (if_then_else:SWI248 (match_operator 1 "ix86_comparison_operator" + [(reg FLAGS_REG) (const_int 0)]) + (match_dup 0) + (match_operand:SWI248 3 "memory_operand")))] + "TARGET_CMOVE && optimize_insn_for_speed_p ()" + [(set (match_dup 2) (match_dup 3)) + (set (match_dup 0) + (if_then_else:SWI248 (match_dup 1) (match_dup 0) (match_dup 2)))]) + +(define_peephole2 + [(match_scratch:SWI248 2 "r") + (set (match_operand:SWI248 0 "register_operand") + (if_then_else:SWI248 (match_operator 1 "ix86_comparison_operator" + [(reg FLAGS_REG) (const_int 0)]) + (match_operand:SWI248 3 "memory_operand") + (match_dup 0)))] + "TARGET_CMOVE && optimize_insn_for_speed_p ()" + [(set (match_dup 2) (match_dup 3)) + (set (match_dup 0) + (if_then_else:SWI248 (match_dup 1) (match_dup 2) (match_dup 0)))]) + (define_expand "movcc" [(set (match_operand:X87MODEF 0 "register_operand") (if_then_else:X87MODEF @@ -16209,6 +16253,56 @@ [(set_attr "type" "fcmov,fcmov,icmov,icmov") (set_attr "mode" "SF,SF,SI,SI")]) +;; Don't do conditional moves with memory inputs. This splitter helps +;; register starved x86_32 by forcing inputs into registers before reload. +(define_split + [(set (match_operand:MODEF 0 "register_operand") + (if_then_else:MODEF (match_operator 1 "ix86_comparison_operator" + [(reg FLAGS_REG) (const_int 0)]) + (match_operand:MODEF 2 "nonimmediate_operand") + (match_operand:MODEF 3 "nonimmediate_operand")))] + "!TARGET_64BIT && TARGET_80387 && TARGET_CMOVE + && (MEM_P (operands[2]) || MEM_P (operands[3])) + && can_create_pseudo_p () + && optimize_function_for_speed_p (cfun)" + [(set (match_dup 0) + (if_then_else:MODEF (match_dup 1) (match_dup 2) (match_dup 3)))] +{ + if (MEM_P (operands[2])) + operands[2] = force_reg (mode, operands[2]); + if (MEM_P (operands[3])) + operands[3] = force_reg (mode, operands[3]); +}) + +;; Don't do conditional moves with memory inputs +(define_peephole2 + [(match_scratch:MODEF 2 "r") + (set (match_operand:MODEF 0 "register_and_not_any_fp_reg_operand") + (if_then_else:MODEF (match_operator 1 "fcmov_comparison_operator" + [(reg FLAGS_REG) (const_int 0)]) + (match_dup 0) + (match_operand:MODEF 3 "memory_operand")))] + "(mode != DFmode || TARGET_64BIT) + && TARGET_80387 && TARGET_CMOVE + && optimize_insn_for_speed_p ()" + [(set (match_dup 2) (match_dup 3)) + (set (match_dup 0) + (if_then_else:MODEF (match_dup 1) (match_dup 0) (match_dup 2)))]) + +(define_peephole2 + [(match_scratch:MODEF 2 "r") + (set (match_operand:MODEF 0 "register_and_not_any_fp_reg_operand") + (if_then_else:MODEF (match_operator 1 "fcmov_comparison_operator" + [(reg FLAGS_REG) (const_int 0)]) + (match_operand:MODEF 3 "memory_operand") + (match_dup 0)))] + "(mode != DFmode || TARGET_64BIT) + && TARGET_80387 && TARGET_CMOVE + && optimize_insn_for_speed_p ()" + [(set (match_dup 2) (match_dup 3)) + (set (match_dup 0) + (if_then_else:MODEF (match_dup 1) (match_dup 2) (match_dup 0)))]) + ;; All moves in XOP pcmov instructions are 128 bits and hence we restrict ;; the scalar versions to have only XMM registers as operands.