MIPS16/GCC: Emit bounds checking as RTL in `casesi'
diff mbox

Message ID alpine.DEB.2.00.1706060554150.21750@tp.orcam.me.uk
State Accepted
Headers show

Commit Message

Maciej W. Rozycki June 7, 2017, 7:37 a.m. UTC
Hi,

 Further to my changes made last November here is an update to the MIPS16 
`casesi' pattern making it emit bounds checking as RTL rather than having 
it as hardcoded assembly within the `casesi_internal_mips16_<mode>' 
dispatcher.  See below for how PR tree-optimization/51513 has prevented me 
from proceeding back then.

 This new arrangement has several advantages:

1. There is no hardcoded BTEQZ branch instruction that has a limited span
   and can overflow causing an assembly failure if the target label is too
   distant.  GCC is able to relax out of range MIPS16 branches these days, 
   but obviously they have to be expressed in RTL rather than buried in 
   assembly code.  This overflow can be easily reproduced; please enquire 
   for a boring example if interested.

2. The `casesi_internal_mips16_<mode>' pattern now has an accurate length
   (aka instruction count) recorded as all its remaining emitted assembly 
   instructions are known in advance to fit in their regular (16-bit) 
   machine encodings.  Previously there was uncertainty about the SLTU and
   BTEQZ instructions used for the bounds check, which depending on their 
   exact arguments could have each resulted in their either regular 
   (16-bit) or extended (32-bit) encoding.  Consequently the worst length 
   estimate was recorded instead, possibly causing worse code generation
   (e.g. premature out-of-range branch relaxation or jump table expansion 
   from half-words to full words).

3. GCC now has freedom to schedule code around bounds checking as it sees 
   fit rather than having to adapt to the fixed assembly block.  In fact 
   it tends to make use of it right away swapping BTEQZ for the BTNEZ
   instruction and rescheduling code such that the out-of-bounds (default) 
   case executes linearly.

There are probably more benefits, but these are what has immediately come 
to my mind.

 As noted above I meant to propose this change along with the rest so as 
to have it in GCC 7, however emitting the bounds check as a separate RTL 
pattern triggered an unrelated bug, then unknown to me, causing a fatal 
code generation regression, and the lack of time did not allow me to 
investigate it further.  This was easily reproduced with a piece of code 
(reduced from actual Linux kernel code) like this:

$ cat frob.c
int
frob (int i)
{
  switch (i)
    {
    case -5:
      return -2;
    case -3:
      return -1;
    case 0:
      return 0;
    case 3:
      return 1;
    case 5:
      break;
    default:
      __builtin_unreachable ();
    }
  return i;
}

producing truncated assembly like this:

$ gcc -O2 -mips16 -mcode-readable=yes -dp -S frob.c
$ cat frob.s
	.file	1 "frob.c"
	.section .mdebug.abi32
	.previous
	.nan	legacy
	.module	fp=32
	.module	nooddspreg
	.abicalls
	.option	pic0
	.text
	.align	2
	.weak	frob
	.set	mips16
	.set	nomicromips
	.ent	frob
	.type	frob, @function
frob:
	.frame	$sp,0,$31		# vars= 0, regs= 0/0, args= 0, gp= 0
	.mask	0x00000000,0
	.fmask	0x00000000,0
	addiu	$2,$4,5	 # 11	*addsi3_mips16/7	[length = 2]
	.end	frob
	.size	frob, .-frob
	.ident	"GCC: (GNU) 7.0.0 20161117 (experimental)"
$ 

-- where as you can see the whole switch statement has vanished along with 
any return path from the function, and only the preparatory addition 
emitted from `casesi' with:

      emit_insn (gen_addsi3 (reg, operands[0], offset));

remained.

 The causing bug has turned out to be what was filed as PR 
tree-optimization/51513 and has been kindly fixed by Peter recently 
(thanks, Peter!) with r247844 ("Fix PR51513, switch statement with default 
case containing __builtin_unreachable leads to wild branch"), 
<https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00739.html>, enabling me to 
proceed with this change without having to investigate the cause of code 
breakage -- which for the MIPS16 target has clearly turned out to be 
graver than a mere silly branch never to be executed.

 Given the previous troubles with this change I have decided to add
MIPS16 test cases to verify that code truncation has not happened, 
complementing gcc.target/powerpc/pr51513.c, in case further tweaks in this 
area might do something bad.  This would be caught by 
gcc.target/mips/insn-casesi.c added with r242424, but that test case does 
not refer to PR tree-optimization/51513, so let's make it explicit.  With 
the PR tree-optimization/51513 fix removed the two new cases indeed cause:

FAIL: gcc.target/mips/pr51513-1.c   -O2   scan-assembler \tjrc?\t\\$31\n
FAIL: gcc.target/mips/pr51513-1.c   -O3 -g   scan-assembler \tjrc?\t\\$31\n
FAIL: gcc.target/mips/pr51513-1.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none   scan-assembler \tjrc?\t\\$31\n
FAIL: gcc.target/mips/pr51513-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects   scan-assembler \tjrc?\t\\$31\n
FAIL: gcc.target/mips/pr51513-2.c   -O2  execution test
FAIL: gcc.target/mips/pr51513-2.c   -O3 -g  execution test
FAIL: gcc.target/mips/pr51513-2.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test

and overall (when executed individually):

                === gcc Summary ===

# of expected passes            21
# of unexpected failures        7

 Also since all the assembly templates in `casesi_internal_mips16_<mode>' 
must have been changed due to the rearrangement of operands, I have 
decided to make their formatting consistent with our templates used 
elsewhere, that is to remove the space between `,' and the following 
operand.  This in turn required a small tweak to 
gcc.target/mips/data-sym-jump.c.  I hope it is fine to have this along 
with the rest as a single change.

 As a further update we could probably emit assembly instructions 
currently produced with `casesi_internal_mips16_<mode>' individually as 
RTL instructions, however the result would likely not be as spectacular; 
certainly not as the branch overflow fix is, though it could possibly 
enable some optimisations in certain cases.  So I'm leaving it as an 
exercise for whoever feels compelled to do it.

	gcc/
	* config/mips/mips.md (MIPS16_T_REGNUM): Remove constant.
	(casesi): Emit bounds checking as RTL.
	(casesi_internal_mips16_<mode>): Remove bounds checking.

	gcc/testsuite/
	* gcc.target/mips/data-sym-jump.c: Adjust for whitespace
	changes.
	* gcc.target/mips/pr51513-1.c: New test.
	* gcc.target/mips/pr51513-2.c: New test.

 No regressions in o32 little-endian MIPS16 multilib testing.  OK to 
apply?

  Maciej

gcc-mips16-casesi-rtl.diff

Comments

Matthew Fortune June 12, 2017, 6:42 p.m. UTC | #1
Maciej Rozycki <Maciej.Rozycki@imgtec.com> writes:
>  Further to my changes made last November here is an update to the MIPS16
> `casesi' pattern making it emit bounds checking as RTL rather than having
> it as hardcoded assembly within the `casesi_internal_mips16_<mode>'
> dispatcher.  See below for how PR tree-optimization/51513 has prevented me
> from proceeding back then.
> 
>  This new arrangement has several advantages:
> 
> 1. There is no hardcoded BTEQZ branch instruction that has a limited span
>    and can overflow causing an assembly failure if the target label is too
>    distant.  GCC is able to relax out of range MIPS16 branches these days,
>    but obviously they have to be expressed in RTL rather than buried in
>    assembly code.  This overflow can be easily reproduced; please enquire
>    for a boring example if interested.
> 
> 2. The `casesi_internal_mips16_<mode>' pattern now has an accurate length
>    (aka instruction count) recorded as all its remaining emitted assembly
>    instructions are known in advance to fit in their regular (16-bit)
>    machine encodings.  Previously there was uncertainty about the SLTU and
>    BTEQZ instructions used for the bounds check, which depending on their
>    exact arguments could have each resulted in their either regular
>    (16-bit) or extended (32-bit) encoding.  Consequently the worst length
>    estimate was recorded instead, possibly causing worse code generation
>    (e.g. premature out-of-range branch relaxation or jump table expansion
>    from half-words to full words).
> 
> 3. GCC now has freedom to schedule code around bounds checking as it sees
>    fit rather than having to adapt to the fixed assembly block.  In fact
>    it tends to make use of it right away swapping BTEQZ for the BTNEZ
>    instruction and rescheduling code such that the out-of-bounds (default)
>    case executes linearly.
> 
> There are probably more benefits, but these are what has immediately come
> to my mind.

Sounds good, I certainly agree that we will see the benefits you list above.

>  As noted above I meant to propose this change along with the rest so as
> to have it in GCC 7, however emitting the bounds check as a separate RTL
> pattern triggered an unrelated bug, then unknown to me, causing a fatal
> code generation regression, and the lack of time did not allow me to
> investigate it further.  This was easily reproduced with a piece of code
> (reduced from actual Linux kernel code) like this:
> 
> $ cat frob.c
> int
> frob (int i)
> {
>   switch (i)
>     {
>     case -5:
>       return -2;
>     case -3:
>       return -1;
>     case 0:
>       return 0;
>     case 3:
>       return 1;
>     case 5:
>       break;
>     default:
>       __builtin_unreachable ();
>     }
>   return i;
> }
> 
> producing truncated assembly like this:
> 
> $ gcc -O2 -mips16 -mcode-readable=yes -dp -S frob.c
> $ cat frob.s
> 	.file	1 "frob.c"
> 	.section .mdebug.abi32
> 	.previous
> 	.nan	legacy
> 	.module	fp=32
> 	.module	nooddspreg
> 	.abicalls
> 	.option	pic0
> 	.text
> 	.align	2
> 	.weak	frob
> 	.set	mips16
> 	.set	nomicromips
> 	.ent	frob
> 	.type	frob, @function
> frob:
> 	.frame	$sp,0,$31		# vars= 0, regs= 0/0, args= 0, gp= 0
> 	.mask	0x00000000,0
> 	.fmask	0x00000000,0
> 	addiu	$2,$4,5	 # 11	*addsi3_mips16/7	[length = 2]
> 	.end	frob
> 	.size	frob, .-frob
> 	.ident	"GCC: (GNU) 7.0.0 20161117 (experimental)"
> $
> 
> -- where as you can see the whole switch statement has vanished along with
> any return path from the function, and only the preparatory addition
> emitted from `casesi' with:
> 
>       emit_insn (gen_addsi3 (reg, operands[0], offset));
> 
> remained.
> 
>  The causing bug has turned out to be what was filed as PR
> tree-optimization/51513 and has been kindly fixed by Peter recently
> (thanks, Peter!) with r247844 ("Fix PR51513, switch statement with default
> case containing __builtin_unreachable leads to wild branch"),
> <https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00739.html>, enabling me to
> proceed with this change without having to investigate the cause of code
> breakage -- which for the MIPS16 target has clearly turned out to be
> graver than a mere silly branch never to be executed.
> 
>  Given the previous troubles with this change I have decided to add
> MIPS16 test cases to verify that code truncation has not happened,
> complementing gcc.target/powerpc/pr51513.c, in case further tweaks in this
> area might do something bad.  This would be caught by
> gcc.target/mips/insn-casesi.c added with r242424, but that test case does
> not refer to PR tree-optimization/51513, so let's make it explicit.  With
> the PR tree-optimization/51513 fix removed the two new cases indeed cause:
> 
> FAIL: gcc.target/mips/pr51513-1.c   -O2   scan-assembler \tjrc?\t\\$31\n
> FAIL: gcc.target/mips/pr51513-1.c   -O3 -g   scan-assembler \tjrc?\t\\$31\n
> FAIL: gcc.target/mips/pr51513-1.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none
> scan-assembler \tjrc?\t\\$31\n
> FAIL: gcc.target/mips/pr51513-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects
> scan-assembler \tjrc?\t\\$31\n
> FAIL: gcc.target/mips/pr51513-2.c   -O2  execution test
> FAIL: gcc.target/mips/pr51513-2.c   -O3 -g  execution test
> FAIL: gcc.target/mips/pr51513-2.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none
> execution test
> 
> and overall (when executed individually):
> 
>                 === gcc Summary ===
> 
> # of expected passes            21
> # of unexpected failures        7
> 
>  Also since all the assembly templates in `casesi_internal_mips16_<mode>'
> must have been changed due to the rearrangement of operands, I have
> decided to make their formatting consistent with our templates used
> elsewhere, that is to remove the space between `,' and the following
> operand.  This in turn required a small tweak to
> gcc.target/mips/data-sym-jump.c.  I hope it is fine to have this along
> with the rest as a single change.
> 
>  As a further update we could probably emit assembly instructions
> currently produced with `casesi_internal_mips16_<mode>' individually as
> RTL instructions, however the result would likely not be as spectacular;
> certainly not as the branch overflow fix is, though it could possibly
> enable some optimisations in certain cases.  So I'm leaving it as an
> exercise for whoever feels compelled to do it.

The only benefit would be if something could fill the load latency between
obtaining the offset and adding to the base which seems low probability.

> 	gcc/
> 	* config/mips/mips.md (MIPS16_T_REGNUM): Remove constant.
> 	(casesi): Emit bounds checking as RTL.
> 	(casesi_internal_mips16_<mode>): Remove bounds checking.
> 
> 	gcc/testsuite/
> 	* gcc.target/mips/data-sym-jump.c: Adjust for whitespace
> 	changes.
> 	* gcc.target/mips/pr51513-1.c: New test.
> 	* gcc.target/mips/pr51513-2.c: New test.
> 
>  No regressions in o32 little-endian MIPS16 multilib testing.  OK to
> apply?

OK.

Thanks for the intricate commentary on this patch, makes it very easy to
review.

Thanks,
Matthew
Maciej W. Rozycki June 14, 2017, 11:28 p.m. UTC | #2
On Mon, 12 Jun 2017, Matthew Fortune wrote:

> The only benefit would be if something could fill the load latency between
> obtaining the offset and adding to the base which seems low probability.

 Or if any intermediate results were found to be reusable in other 
calculations nearby (not that I think this is likely though).

> Thanks for the intricate commentary on this patch, makes it very easy to
> review.

 I hoped it would.  Writing a proper description also serves as the last 
point of reflection: "Did I get it right?  Haven't I missed anything?"

 Patch applied now, thank you for you review.

  Maciej

Patch
diff mbox

Index: gcc/gcc/config/mips/mips.md
===================================================================
--- gcc.orig/gcc/config/mips/mips.md	2017-06-06 21:53:32.979416504 +0100
+++ gcc/gcc/config/mips/mips.md	2017-06-06 21:53:48.596362464 +0100
@@ -162,7 +162,6 @@ 
   [(TLS_GET_TP_REGNUM		3)
    (GET_FCSR_REGNUM		2)
    (SET_FCSR_REGNUM		4)
-   (MIPS16_T_REGNUM		24)
    (PIC_FUNCTION_ADDR_REGNUM	25)
    (RETURN_ADDR_REGNUM		31)
    (CPRESTORE_SLOT_REGNUM	76)
@@ -6389,68 +6388,57 @@ 
   if (!arith_operand (operands[0], SImode))
     operands[0] = force_reg (SImode, operands[0]);
 
-  operands[2] = GEN_INT (INTVAL (operands[2]) + 1);
-
+  emit_cmp_and_jump_insns (operands[0], operands[2], GTU,
+			   NULL_RTX, SImode, 1, operands[4]);
   emit_jump_insn (PMODE_INSN (gen_casesi_internal_mips16,
-			      (operands[0], operands[2],
-			       operands[3], operands[4])));
-
+			      (operands[0], operands[3])));
   DONE;
 })
 
 (define_insn "casesi_internal_mips16_<mode>"
   [(set (pc)
-     (if_then_else
-       (ltu (match_operand:SI 0 "register_operand" "d")
-	    (match_operand:SI 1 "arith_operand" "dI"))
-       (unspec:P
-        [(match_dup 0)
-	 (label_ref (match_operand 2 "" ""))]
-	UNSPEC_CASESI_DISPATCH)
-       (label_ref (match_operand 3 "" ""))))
-   (clobber (match_scratch:P 4 "=d"))
-   (clobber (match_scratch:P 5 "=d"))
-   (clobber (reg:SI MIPS16_T_REGNUM))]
+	(unspec:P [(match_operand:SI 0 "register_operand" "d")
+		   (label_ref (match_operand 1 "" ""))]
+	 UNSPEC_CASESI_DISPATCH))
+   (clobber (match_scratch:P 2 "=d"))
+   (clobber (match_scratch:P 3 "=d"))]
   "TARGET_MIPS16_SHORT_JUMP_TABLES"
 {
-  rtx diff_vec = PATTERN (NEXT_INSN (as_a <rtx_insn *> (operands[2])));
+  rtx diff_vec = PATTERN (NEXT_INSN (as_a <rtx_insn *> (operands[1])));
 
   gcc_assert (GET_CODE (diff_vec) == ADDR_DIFF_VEC);
-  
-  output_asm_insn ("sltu\t%0, %1", operands);
-  output_asm_insn ("bteqz\t%3", operands);
-  
+
   switch (GET_MODE (diff_vec))
     {
     case HImode:
-      output_asm_insn ("sll\t%5, %0, 1", operands);
-      output_asm_insn ("<d>la\t%4, %2", operands);
-      output_asm_insn ("<d>addu\t%5, %4, %5", operands);
-      output_asm_insn ("lh\t%5, 0(%5)", operands);
+      output_asm_insn ("sll\t%3,%0,1", operands);
+      output_asm_insn ("<d>la\t%2,%1", operands);
+      output_asm_insn ("<d>addu\t%3,%2,%3", operands);
+      output_asm_insn ("lh\t%3,0(%3)", operands);
       break;
 
     case SImode:
-      output_asm_insn ("sll\t%5, %0, 2", operands);
-      output_asm_insn ("<d>la\t%4, %2", operands);
-      output_asm_insn ("<d>addu\t%5, %4, %5", operands);
-      output_asm_insn ("lw\t%5, 0(%5)", operands);
+      output_asm_insn ("sll\t%3,%0,2", operands);
+      output_asm_insn ("<d>la\t%2,%1", operands);
+      output_asm_insn ("<d>addu\t%3,%2,%3", operands);
+      output_asm_insn ("lw\t%3,0(%3)", operands);
       break;
 
     default:
       gcc_unreachable ();
     }
 
-  output_asm_insn ("<d>addu\t%4, %4, %5", operands);
+  output_asm_insn ("<d>addu\t%2,%2,%3", operands);
 
   if (GENERATE_MIPS16E)
-    return "jrc\t%4";
+    return "jrc\t%2";
   else
-    return "jr\t%4";
+    return "jr\t%2";
 }
   [(set (attr "insn_count")
 	(if_then_else (match_test "GENERATE_MIPS16E")
-		      (const_string "10")
-		      (const_string "11")))])
+		      (const_string "6")
+		      (const_string "7")))])
 
 ;; For TARGET_USE_GOT, we save the gp in the jmp_buf as well.
 ;; While it is possible to either pull it off the stack (in the
Index: gcc/gcc/testsuite/gcc.target/mips/data-sym-jump.c
===================================================================
--- gcc.orig/gcc/testsuite/gcc.target/mips/data-sym-jump.c	2017-06-06 21:53:32.995674428 +0100
+++ gcc/gcc/testsuite/gcc.target/mips/data-sym-jump.c	2017-06-06 21:53:48.642154539 +0100
@@ -25,7 +25,7 @@  frob (int i)
 
 /* Expect assembly like:
 
-	la	$2, $L4
+	la	$2,$L4
 						# Anything goes here.
 	.type	__jump_frob_4, @object		# Symbol # must match label.
 __jump_frob_4:					# The symbol must match.
@@ -47,4 +47,4 @@  __jend_frob_4:					# The symbol must mat
 
    that is `__jump_*'/`__jend_*' symbols inserted around a jump table.  */
 
-/* { dg-final { scan-assembler "\tla\t\\\$\[0-9\]+, (.L(\[0-9\]+))\n.*\t\\.type\t(__jump_frob_\\2), @object\n\\3:\n\\1:\n(?:\t\\.(?:half|word)\t.L\[0-9\]+-\\1\n)\{11\}\t\\.type\t(__jend_frob_\\2), @function\n\\4:\n\t\\.insn\n" } } */
+/* { dg-final { scan-assembler "\tla\t\\\$\[0-9\]+,(.L(\[0-9\]+))\n.*\t\\.type\t(__jump_frob_\\2), @object\n\\3:\n\\1:\n(?:\t\\.(?:half|word)\t.L\[0-9\]+-\\1\n)\{11\}\t\\.type\t(__jend_frob_\\2), @function\n\\4:\n\t\\.insn\n" } } */
Index: gcc/gcc/testsuite/gcc.target/mips/pr51513-1.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ gcc/gcc/testsuite/gcc.target/mips/pr51513-1.c	2017-06-07 05:15:48.170762991 +0100
@@ -0,0 +1,48 @@ 
+/* { dg-do compile } */
+/* { dg-options "-mips16 -mcode-readable=yes" } */
+
+/* PR tree-optimization/51513 verification variant for MIPS16, #1.  */
+
+int __attribute__ ((weak))
+frob (int i)
+{
+  switch (i)
+    {
+    case -5:
+      return -2;
+    case -3:
+      return -1;
+    case 0:
+      return 0;
+    case 3:
+      return 1;
+    case 5:
+      break;
+    default:
+      __builtin_unreachable ();
+    }
+  return i;
+}
+
+/* Without the fix for PR tree-optimization/51513 truncated code
+   would be emitted for `frob', like:
+
+	.text
+	.align	2
+	.weak	frob
+	.set	mips16
+	.set	nomicromips
+	.ent	frob
+	.type	frob, @function
+frob:
+	.frame	$sp,0,$31		# vars= 0, regs= 0/0, args= 0, gp= 0
+	.mask	0x00000000,0
+	.fmask	0x00000000,0
+	addiu	$2,$4,5
+	.end	frob
+	.size	frob, .-frob
+
+  meaning `frob' will have no chance to return, let alone produce
+  the result expected.  */
+
+/* { dg-final { scan-assembler "\tjrc?\t\\\$31\n" } } */
Index: gcc/gcc/testsuite/gcc.target/mips/pr51513-2.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ gcc/gcc/testsuite/gcc.target/mips/pr51513-2.c	2017-06-07 05:15:48.187060805 +0100
@@ -0,0 +1,56 @@ 
+/* { dg-do run } */
+/* { dg-options "-mips16 -mcode-readable=yes" } */
+
+/* PR tree-optimization/51513 verification variant for MIPS16, #2.  */
+
+int __attribute__ ((weak))
+frob (int i)
+{
+  switch (i)
+    {
+    case -5:
+      return -2;
+    case -3:
+      return -1;
+    case 0:
+      return 0;
+    case 3:
+      return 1;
+    case 5:
+      break;
+    default:
+      __builtin_unreachable ();
+    }
+  return i;
+}
+
+int
+main (void)
+{
+  return !(frob (-5) == -2
+	   & frob (-3) == -1
+	   & frob (0) == 0
+	   & frob (3) == 1
+	   & frob (5) == 5);
+}
+
+/* Without the fix for PR tree-optimization/51513 truncated code
+   would be emitted for `frob', like:
+
+	.text
+	.align	2
+	.weak	frob
+	.set	mips16
+	.set	nomicromips
+	.ent	frob
+	.type	frob, @function
+frob:
+	.frame	$sp,0,$31		# vars= 0, regs= 0/0, args= 0, gp= 0
+	.mask	0x00000000,0
+	.fmask	0x00000000,0
+	addiu	$2,$4,5
+	.end	frob
+	.size	frob, .-frob
+
+  meaning `frob' will have no chance to return, let alone produce
+  the result expected.  */