[avr] Provide correct memory move costs

Submitted by Senthil Kumar Selvaraj on Dec. 16, 2015, 7:08 a.m.

Details

Message ID 20151216070800.GA17952@jaguar.corp.atmel.com
State New
Headers show

Commit Message

Senthil Kumar Selvaraj Dec. 16, 2015, 7:08 a.m.
Hi,

  When analyzing code size regressions for AVR for top-of-trunk, I
  found a few cases where aggresive inlining (by the middle-end)
  of functions containing calls to memcpy was bloating up the code.

  Turns out that the AVR backend has MOVE_MAX set to 4 (unchanged from the 
  original commit), when it really should be 1, as the AVRs can only 
  move a single byte between reg and memory in a single instruction. 
  Setting it to 4 causes the middle end to underestimate the
  cost of memcopys with a compile time constant length parameter, as it 
  thinks a 4 byte copy's cost is only a single instruction.

  Just setting MOVE_MAX to 1 makes the middle end too conservative
  though, and causes a bunch of regression tests to fail, as lots of
  optimizations fail to pass the code size increase threshold check,
	even when not optimizing for size.

  Instead, the below patch sets MOVE_MAX_PIECES to 2, and implements a
  target hook that tells the middle-end to use load/store insns for
  memory moves upto two bytes. Also, the patch sets MOVE_RATIO to 3 when
  optimizing for speed, so that moves upto 4 bytes will occur through
  load/store sequences, like it does now.

  With this, only a couple of regression tests fail. uninit-19.c fails
  because it thinks only non-pic code won't inline a function, but the
  cost computation prevents inlining for AVRs. The test passes if
  the optimization level is increased to -O3. 

	strlenopt-8.c has an XPASS and a FAIL because a previous pass issued 
	a builtin_memcpy instead of a MEM assignment. Execution still passes.

  I'll continue running more tests to see if there are other performance
  related consequences.

  Is this ok? If ok, could someone commit please? I don't have commit
  access.

Regards
Senthil

gcc/ChangeLog

2015-12-16  Senthil Kumar Selvaraj  <senthil_kumar.selvaraj@atmel.com>

	* config/avr/avr.h (MOVE_MAX): Set value to 1. 
	(MOVE_MAX_PIECES): Define.
	(MOVE_RATIO): Define.
	* config/avr/avr.c (TARGET_USE_BY_PIECES_INFRASTRUCTURE_P):
	Provide target hook.
	(avr_use_by_pieces_infrastructure_p): New function.

Comments

Denis Chertykov Dec. 17, 2015, 4:56 p.m.
2015-12-16 10:08 GMT+03:00 Senthil Kumar Selvaraj
<senthil_kumar.selvaraj@atmel.com>:
> Hi,
>
>   When analyzing code size regressions for AVR for top-of-trunk, I
>   found a few cases where aggresive inlining (by the middle-end)
>   of functions containing calls to memcpy was bloating up the code.
>
>   Turns out that the AVR backend has MOVE_MAX set to 4 (unchanged from the
>   original commit), when it really should be 1, as the AVRs can only
>   move a single byte between reg and memory in a single instruction.
>   Setting it to 4 causes the middle end to underestimate the
>   cost of memcopys with a compile time constant length parameter, as it
>   thinks a 4 byte copy's cost is only a single instruction.
>
>   Just setting MOVE_MAX to 1 makes the middle end too conservative
>   though, and causes a bunch of regression tests to fail, as lots of
>   optimizations fail to pass the code size increase threshold check,
>         even when not optimizing for size.
>
>   Instead, the below patch sets MOVE_MAX_PIECES to 2, and implements a
>   target hook that tells the middle-end to use load/store insns for
>   memory moves upto two bytes. Also, the patch sets MOVE_RATIO to 3 when
>   optimizing for speed, so that moves upto 4 bytes will occur through
>   load/store sequences, like it does now.
>
>   With this, only a couple of regression tests fail. uninit-19.c fails
>   because it thinks only non-pic code won't inline a function, but the
>   cost computation prevents inlining for AVRs. The test passes if
>   the optimization level is increased to -O3.
>
>         strlenopt-8.c has an XPASS and a FAIL because a previous pass issued
>         a builtin_memcpy instead of a MEM assignment. Execution still passes.
>
>   I'll continue running more tests to see if there are other performance
>   related consequences.
>
>   Is this ok? If ok, could someone commit please? I don't have commit
>   access.
>
> Regards
> Senthil
>
> gcc/ChangeLog
>
> 2015-12-16  Senthil Kumar Selvaraj  <senthil_kumar.selvaraj@atmel.com>
>
>         * config/avr/avr.h (MOVE_MAX): Set value to 1.
>         (MOVE_MAX_PIECES): Define.
>         (MOVE_RATIO): Define.
>         * config/avr/avr.c (TARGET_USE_BY_PIECES_INFRASTRUCTURE_P):
>         Provide target hook.
>         (avr_use_by_pieces_infrastructure_p): New function.

Committed.

Denis.

Patch hide | download patch | download mbox

diff --git gcc/config/avr/avr.c gcc/config/avr/avr.c
index 609a42b..9cc95db 100644
--- gcc/config/avr/avr.c
+++ gcc/config/avr/avr.c
@@ -2431,6 +2431,27 @@  avr_print_operand (FILE *file, rtx x, int code)
 }
 
 
+/* Implement TARGET_USE_BY_PIECES_INFRASTRUCTURE_P.  */
+
+/* Prefer sequence of loads/stores for moves of size upto
+   two - two pairs of load/store instructions are always better
+   than the 5 instruction sequence for a loop (1 instruction
+   for loop counter setup, and 4 for the body of the loop). */
+
+static bool
+avr_use_by_pieces_infrastructure_p (unsigned HOST_WIDE_INT size,
+				     unsigned int align ATTRIBUTE_UNUSED,
+				     enum by_pieces_operation op,
+				     bool speed_p)
+{
+
+  if (op != MOVE_BY_PIECES || (speed_p && (size > (MOVE_MAX_PIECES))))
+    return default_use_by_pieces_infrastructure_p (size, align, op, speed_p);
+
+  return size <= (MOVE_MAX_PIECES);
+}
+
+
 /* Worker function for `NOTICE_UPDATE_CC'.  */
 /* Update the condition code in the INSN.  */
 
@@ -13763,6 +13784,10 @@  avr_fold_builtin (tree fndecl, int n_args ATTRIBUTE_UNUSED, tree *arg,
 #undef  TARGET_PRINT_OPERAND_PUNCT_VALID_P
 #define TARGET_PRINT_OPERAND_PUNCT_VALID_P avr_print_operand_punct_valid_p
 
+#undef TARGET_USE_BY_PIECES_INFRASTRUCTURE_P
+#define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \
+  avr_use_by_pieces_infrastructure_p
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 
diff --git gcc/config/avr/avr.h gcc/config/avr/avr.h
index 7439964..ebfb8ed 100644
--- gcc/config/avr/avr.h
+++ gcc/config/avr/avr.h
@@ -453,7 +453,22 @@  typedef struct avr_args
 
 #undef WORD_REGISTER_OPERATIONS
 
-#define MOVE_MAX 4
+/* Can move only a single byte from memory to reg in a
+   single instruction. */
+
+#define MOVE_MAX 1
+
+/* Allow upto two bytes moves to occur using by_pieces
+   infrastructure */
+
+#define MOVE_MAX_PIECES 2
+
+/* Set MOVE_RATIO to 3 to allow memory moves upto 4 bytes to happen
+   by pieces when optimizing for speed, like it did when MOVE_MAX_PIECES
+   was 4. When optimizing for size, allow memory moves upto 2 bytes. 
+   Also see avr_use_by_pieces_infrastructure_p. */
+
+#define MOVE_RATIO(speed) ((speed) ? 3 : 2)
 
 #define TRULY_NOOP_TRUNCATION(OUTPREC, INPREC) 1