From patchwork Wed Dec 16 07:08:00 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Senthil Kumar Selvaraj X-Patchwork-Id: 557328 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 7A65B1402DE for ; Wed, 16 Dec 2015 18:09:59 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=Q9nnU9s7; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:mime-version:content-type; q=dns; s=default; b=fTwXsJA0UgNxI3gRoKXSZSLpnZRUpGpuqideFFf3564IE3oia/ XS4tGngzHbd5AYYP3bCUWWmptvRNaakVni/nHFHlQ9G6sQXM+HYOeNxbSuIasL1O rHQZFglglaLzhMRz7pZz6ggQX9w9wPwISIQJMUjluhE0PTdlKa7v1p56o= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:mime-version:content-type; s= default; bh=HzzOlFl1IW8Lf6Ft4DWXaIQNQ2w=; b=Q9nnU9s7gtbdinbEf47g nDnnQ8RZe6gCh4t49JupEfuMkzQ9LbZwnSjhZMGxmuq7q4ypw9Ri5WffTkulsIfR risgyKczQZ34mcFC+IkyPzpW050/NDDH4s91Tl9f7nUUGqLvE8OhJIymO2tYSHyd WZE+34rNNYM4+ZadNTmOgLY= Received: (qmail 43454 invoked by alias); 16 Dec 2015 07:09:52 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 43433 invoked by uid 89); 16 Dec 2015 07:09:50 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.6 required=5.0 tests=AWL, BAYES_00, SPF_PASS, T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: eusmtp01.atmel.com Received: from eusmtp01.atmel.com (HELO eusmtp01.atmel.com) (212.144.249.243) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Wed, 16 Dec 2015 07:09:49 +0000 Received: from HNOCHT01.corp.atmel.com (10.161.30.161) by eusmtp01.atmel.com (10.161.101.31) with Microsoft SMTP Server (TLS) id 14.3.235.1; Wed, 16 Dec 2015 08:09:40 +0100 Received: from jaguar.corp.atmel.com (10.161.30.18) by HNOCHT01.corp.atmel.com (10.161.30.161) with Microsoft SMTP Server (TLS) id 14.3.235.1; Wed, 16 Dec 2015 08:09:44 +0100 Date: Wed, 16 Dec 2015 12:38:00 +0530 From: Senthil Kumar Selvaraj To: CC: , Subject: [Patch, avr] Provide correct memory move costs Message-ID: <20151216070800.GA17952@jaguar.corp.atmel.com> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.24 (2015-08-30) X-IsSubscribed: yes Hi, When analyzing code size regressions for AVR for top-of-trunk, I found a few cases where aggresive inlining (by the middle-end) of functions containing calls to memcpy was bloating up the code. Turns out that the AVR backend has MOVE_MAX set to 4 (unchanged from the original commit), when it really should be 1, as the AVRs can only move a single byte between reg and memory in a single instruction. Setting it to 4 causes the middle end to underestimate the cost of memcopys with a compile time constant length parameter, as it thinks a 4 byte copy's cost is only a single instruction. Just setting MOVE_MAX to 1 makes the middle end too conservative though, and causes a bunch of regression tests to fail, as lots of optimizations fail to pass the code size increase threshold check, even when not optimizing for size. Instead, the below patch sets MOVE_MAX_PIECES to 2, and implements a target hook that tells the middle-end to use load/store insns for memory moves upto two bytes. Also, the patch sets MOVE_RATIO to 3 when optimizing for speed, so that moves upto 4 bytes will occur through load/store sequences, like it does now. With this, only a couple of regression tests fail. uninit-19.c fails because it thinks only non-pic code won't inline a function, but the cost computation prevents inlining for AVRs. The test passes if the optimization level is increased to -O3. strlenopt-8.c has an XPASS and a FAIL because a previous pass issued a builtin_memcpy instead of a MEM assignment. Execution still passes. I'll continue running more tests to see if there are other performance related consequences. Is this ok? If ok, could someone commit please? I don't have commit access. Regards Senthil gcc/ChangeLog 2015-12-16 Senthil Kumar Selvaraj * config/avr/avr.h (MOVE_MAX): Set value to 1. (MOVE_MAX_PIECES): Define. (MOVE_RATIO): Define. * config/avr/avr.c (TARGET_USE_BY_PIECES_INFRASTRUCTURE_P): Provide target hook. (avr_use_by_pieces_infrastructure_p): New function. diff --git gcc/config/avr/avr.c gcc/config/avr/avr.c index 609a42b..9cc95db 100644 --- gcc/config/avr/avr.c +++ gcc/config/avr/avr.c @@ -2431,6 +2431,27 @@ avr_print_operand (FILE *file, rtx x, int code) } +/* Implement TARGET_USE_BY_PIECES_INFRASTRUCTURE_P. */ + +/* Prefer sequence of loads/stores for moves of size upto + two - two pairs of load/store instructions are always better + than the 5 instruction sequence for a loop (1 instruction + for loop counter setup, and 4 for the body of the loop). */ + +static bool +avr_use_by_pieces_infrastructure_p (unsigned HOST_WIDE_INT size, + unsigned int align ATTRIBUTE_UNUSED, + enum by_pieces_operation op, + bool speed_p) +{ + + if (op != MOVE_BY_PIECES || (speed_p && (size > (MOVE_MAX_PIECES)))) + return default_use_by_pieces_infrastructure_p (size, align, op, speed_p); + + return size <= (MOVE_MAX_PIECES); +} + + /* Worker function for `NOTICE_UPDATE_CC'. */ /* Update the condition code in the INSN. */ @@ -13763,6 +13784,10 @@ avr_fold_builtin (tree fndecl, int n_args ATTRIBUTE_UNUSED, tree *arg, #undef TARGET_PRINT_OPERAND_PUNCT_VALID_P #define TARGET_PRINT_OPERAND_PUNCT_VALID_P avr_print_operand_punct_valid_p +#undef TARGET_USE_BY_PIECES_INFRASTRUCTURE_P +#define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \ + avr_use_by_pieces_infrastructure_p + struct gcc_target targetm = TARGET_INITIALIZER; diff --git gcc/config/avr/avr.h gcc/config/avr/avr.h index 7439964..ebfb8ed 100644 --- gcc/config/avr/avr.h +++ gcc/config/avr/avr.h @@ -453,7 +453,22 @@ typedef struct avr_args #undef WORD_REGISTER_OPERATIONS -#define MOVE_MAX 4 +/* Can move only a single byte from memory to reg in a + single instruction. */ + +#define MOVE_MAX 1 + +/* Allow upto two bytes moves to occur using by_pieces + infrastructure */ + +#define MOVE_MAX_PIECES 2 + +/* Set MOVE_RATIO to 3 to allow memory moves upto 4 bytes to happen + by pieces when optimizing for speed, like it did when MOVE_MAX_PIECES + was 4. When optimizing for size, allow memory moves upto 2 bytes. + Also see avr_use_by_pieces_infrastructure_p. */ + +#define MOVE_RATIO(speed) ((speed) ? 3 : 2) #define TRULY_NOOP_TRUNCATION(OUTPREC, INPREC) 1