From patchwork Wed Jun 7 07:37:32 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej W. Rozycki" X-Patchwork-Id: 772230 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3wjL4t1PQhz9sNH for ; Wed, 7 Jun 2017 17:38:05 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="Pc9rozYI"; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:mime-version:content-type; q=dns; s=default; b=xbsXTrY3fg6nvi9yVcFHOjVwjmKFj0WRDflEYxCAztbWEHBIbJ N0EFmjpylV/ng3y0eECu3HFFTWWXFAZGjSUVgUm0tmQdCH/JUdAPStHGsqtwZSa2 ZzApcsjN0mQf9kUSNFHPOfCMF1v3Q5pXabkF97fE2euLPkTM6LHQxGa4o= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:mime-version:content-type; s= default; bh=CcfBMvQwKjVE9zAvX8OoXpuzalY=; b=Pc9rozYI6n+4FWZCrw+c FyzI5QYSygUhh8+7FTykPu4Z7BEldEOtS3a+VlcOc8VjqkUQ0/wDs7kqnbHngX+z 4XqwM7zQyPlN08Pc/tK6jUGQ1CJNeR5rOwDJa5MiLxL1UH08Iq6cbIT+oxQ/VGyC 0bIZZgHHzztqIc+TWTEC/bc= Received: (qmail 87201 invoked by alias); 7 Jun 2017 07:37:47 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 87059 invoked by uid 89); 7 Jun 2017 07:37:45 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-10.4 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_NONE, SPF_PASS, T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=UD:L, spectacular, span, whoever X-HELO: mailapp01.imgtec.com Received: from mailapp01.imgtec.com (HELO mailapp01.imgtec.com) (195.59.15.196) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 07 Jun 2017 07:37:42 +0000 Received: from HHMAIL01.hh.imgtec.org (unknown [10.100.10.19]) by Forcepoint Email with ESMTPS id EA8A7BE877EE2; Wed, 7 Jun 2017 08:37:39 +0100 (IST) Received: from [10.20.78.141] (10.20.78.141) by HHMAIL01.hh.imgtec.org (10.100.10.21) with Microsoft SMTP Server id 14.3.294.0; Wed, 7 Jun 2017 08:37:41 +0100 Date: Wed, 7 Jun 2017 08:37:32 +0100 From: "Maciej W. Rozycki" To: Catherine Moore , Matthew Fortune CC: Peter Bergner , Subject: [PATCH] MIPS16/GCC: Emit bounds checking as RTL in `casesi' Message-ID: User-Agent: Alpine 2.00 (DEB 1167 2008-08-23) MIME-Version: 1.0 Hi, Further to my changes made last November here is an update to the MIPS16 `casesi' pattern making it emit bounds checking as RTL rather than having it as hardcoded assembly within the `casesi_internal_mips16_' dispatcher. See below for how PR tree-optimization/51513 has prevented me from proceeding back then. This new arrangement has several advantages: 1. There is no hardcoded BTEQZ branch instruction that has a limited span and can overflow causing an assembly failure if the target label is too distant. GCC is able to relax out of range MIPS16 branches these days, but obviously they have to be expressed in RTL rather than buried in assembly code. This overflow can be easily reproduced; please enquire for a boring example if interested. 2. The `casesi_internal_mips16_' pattern now has an accurate length (aka instruction count) recorded as all its remaining emitted assembly instructions are known in advance to fit in their regular (16-bit) machine encodings. Previously there was uncertainty about the SLTU and BTEQZ instructions used for the bounds check, which depending on their exact arguments could have each resulted in their either regular (16-bit) or extended (32-bit) encoding. Consequently the worst length estimate was recorded instead, possibly causing worse code generation (e.g. premature out-of-range branch relaxation or jump table expansion from half-words to full words). 3. GCC now has freedom to schedule code around bounds checking as it sees fit rather than having to adapt to the fixed assembly block. In fact it tends to make use of it right away swapping BTEQZ for the BTNEZ instruction and rescheduling code such that the out-of-bounds (default) case executes linearly. There are probably more benefits, but these are what has immediately come to my mind. As noted above I meant to propose this change along with the rest so as to have it in GCC 7, however emitting the bounds check as a separate RTL pattern triggered an unrelated bug, then unknown to me, causing a fatal code generation regression, and the lack of time did not allow me to investigate it further. This was easily reproduced with a piece of code (reduced from actual Linux kernel code) like this: $ cat frob.c int frob (int i) { switch (i) { case -5: return -2; case -3: return -1; case 0: return 0; case 3: return 1; case 5: break; default: __builtin_unreachable (); } return i; } producing truncated assembly like this: $ gcc -O2 -mips16 -mcode-readable=yes -dp -S frob.c $ cat frob.s .file 1 "frob.c" .section .mdebug.abi32 .previous .nan legacy .module fp=32 .module nooddspreg .abicalls .option pic0 .text .align 2 .weak frob .set mips16 .set nomicromips .ent frob .type frob, @function frob: .frame $sp,0,$31 # vars= 0, regs= 0/0, args= 0, gp= 0 .mask 0x00000000,0 .fmask 0x00000000,0 addiu $2,$4,5 # 11 *addsi3_mips16/7 [length = 2] .end frob .size frob, .-frob .ident "GCC: (GNU) 7.0.0 20161117 (experimental)" $ -- where as you can see the whole switch statement has vanished along with any return path from the function, and only the preparatory addition emitted from `casesi' with: emit_insn (gen_addsi3 (reg, operands[0], offset)); remained. The causing bug has turned out to be what was filed as PR tree-optimization/51513 and has been kindly fixed by Peter recently (thanks, Peter!) with r247844 ("Fix PR51513, switch statement with default case containing __builtin_unreachable leads to wild branch"), , enabling me to proceed with this change without having to investigate the cause of code breakage -- which for the MIPS16 target has clearly turned out to be graver than a mere silly branch never to be executed. Given the previous troubles with this change I have decided to add MIPS16 test cases to verify that code truncation has not happened, complementing gcc.target/powerpc/pr51513.c, in case further tweaks in this area might do something bad. This would be caught by gcc.target/mips/insn-casesi.c added with r242424, but that test case does not refer to PR tree-optimization/51513, so let's make it explicit. With the PR tree-optimization/51513 fix removed the two new cases indeed cause: FAIL: gcc.target/mips/pr51513-1.c -O2 scan-assembler \tjrc?\t\\$31\n FAIL: gcc.target/mips/pr51513-1.c -O3 -g scan-assembler \tjrc?\t\\$31\n FAIL: gcc.target/mips/pr51513-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none scan-assembler \tjrc?\t\\$31\n FAIL: gcc.target/mips/pr51513-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects scan-assembler \tjrc?\t\\$31\n FAIL: gcc.target/mips/pr51513-2.c -O2 execution test FAIL: gcc.target/mips/pr51513-2.c -O3 -g execution test FAIL: gcc.target/mips/pr51513-2.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test and overall (when executed individually): === gcc Summary === # of expected passes 21 # of unexpected failures 7 Also since all the assembly templates in `casesi_internal_mips16_' must have been changed due to the rearrangement of operands, I have decided to make their formatting consistent with our templates used elsewhere, that is to remove the space between `,' and the following operand. This in turn required a small tweak to gcc.target/mips/data-sym-jump.c. I hope it is fine to have this along with the rest as a single change. As a further update we could probably emit assembly instructions currently produced with `casesi_internal_mips16_' individually as RTL instructions, however the result would likely not be as spectacular; certainly not as the branch overflow fix is, though it could possibly enable some optimisations in certain cases. So I'm leaving it as an exercise for whoever feels compelled to do it. gcc/ * config/mips/mips.md (MIPS16_T_REGNUM): Remove constant. (casesi): Emit bounds checking as RTL. (casesi_internal_mips16_): Remove bounds checking. gcc/testsuite/ * gcc.target/mips/data-sym-jump.c: Adjust for whitespace changes. * gcc.target/mips/pr51513-1.c: New test. * gcc.target/mips/pr51513-2.c: New test. No regressions in o32 little-endian MIPS16 multilib testing. OK to apply? Maciej gcc-mips16-casesi-rtl.diff Index: gcc/gcc/config/mips/mips.md =================================================================== --- gcc.orig/gcc/config/mips/mips.md 2017-06-06 21:53:32.979416504 +0100 +++ gcc/gcc/config/mips/mips.md 2017-06-06 21:53:48.596362464 +0100 @@ -162,7 +162,6 @@ [(TLS_GET_TP_REGNUM 3) (GET_FCSR_REGNUM 2) (SET_FCSR_REGNUM 4) - (MIPS16_T_REGNUM 24) (PIC_FUNCTION_ADDR_REGNUM 25) (RETURN_ADDR_REGNUM 31) (CPRESTORE_SLOT_REGNUM 76) @@ -6389,68 +6388,57 @@ if (!arith_operand (operands[0], SImode)) operands[0] = force_reg (SImode, operands[0]); - operands[2] = GEN_INT (INTVAL (operands[2]) + 1); - + emit_cmp_and_jump_insns (operands[0], operands[2], GTU, + NULL_RTX, SImode, 1, operands[4]); emit_jump_insn (PMODE_INSN (gen_casesi_internal_mips16, - (operands[0], operands[2], - operands[3], operands[4]))); - + (operands[0], operands[3]))); DONE; }) (define_insn "casesi_internal_mips16_" [(set (pc) - (if_then_else - (ltu (match_operand:SI 0 "register_operand" "d") - (match_operand:SI 1 "arith_operand" "dI")) - (unspec:P - [(match_dup 0) - (label_ref (match_operand 2 "" ""))] - UNSPEC_CASESI_DISPATCH) - (label_ref (match_operand 3 "" "")))) - (clobber (match_scratch:P 4 "=d")) - (clobber (match_scratch:P 5 "=d")) - (clobber (reg:SI MIPS16_T_REGNUM))] + (unspec:P [(match_operand:SI 0 "register_operand" "d") + (label_ref (match_operand 1 "" ""))] + UNSPEC_CASESI_DISPATCH)) + (clobber (match_scratch:P 2 "=d")) + (clobber (match_scratch:P 3 "=d"))] "TARGET_MIPS16_SHORT_JUMP_TABLES" { - rtx diff_vec = PATTERN (NEXT_INSN (as_a (operands[2]))); + rtx diff_vec = PATTERN (NEXT_INSN (as_a (operands[1]))); gcc_assert (GET_CODE (diff_vec) == ADDR_DIFF_VEC); - - output_asm_insn ("sltu\t%0, %1", operands); - output_asm_insn ("bteqz\t%3", operands); - + switch (GET_MODE (diff_vec)) { case HImode: - output_asm_insn ("sll\t%5, %0, 1", operands); - output_asm_insn ("la\t%4, %2", operands); - output_asm_insn ("addu\t%5, %4, %5", operands); - output_asm_insn ("lh\t%5, 0(%5)", operands); + output_asm_insn ("sll\t%3,%0,1", operands); + output_asm_insn ("la\t%2,%1", operands); + output_asm_insn ("addu\t%3,%2,%3", operands); + output_asm_insn ("lh\t%3,0(%3)", operands); break; case SImode: - output_asm_insn ("sll\t%5, %0, 2", operands); - output_asm_insn ("la\t%4, %2", operands); - output_asm_insn ("addu\t%5, %4, %5", operands); - output_asm_insn ("lw\t%5, 0(%5)", operands); + output_asm_insn ("sll\t%3,%0,2", operands); + output_asm_insn ("la\t%2,%1", operands); + output_asm_insn ("addu\t%3,%2,%3", operands); + output_asm_insn ("lw\t%3,0(%3)", operands); break; default: gcc_unreachable (); } - output_asm_insn ("addu\t%4, %4, %5", operands); + output_asm_insn ("addu\t%2,%2,%3", operands); if (GENERATE_MIPS16E) - return "jrc\t%4"; + return "jrc\t%2"; else - return "jr\t%4"; + return "jr\t%2"; } [(set (attr "insn_count") (if_then_else (match_test "GENERATE_MIPS16E") - (const_string "10") - (const_string "11")))]) + (const_string "6") + (const_string "7")))]) ;; For TARGET_USE_GOT, we save the gp in the jmp_buf as well. ;; While it is possible to either pull it off the stack (in the Index: gcc/gcc/testsuite/gcc.target/mips/data-sym-jump.c =================================================================== --- gcc.orig/gcc/testsuite/gcc.target/mips/data-sym-jump.c 2017-06-06 21:53:32.995674428 +0100 +++ gcc/gcc/testsuite/gcc.target/mips/data-sym-jump.c 2017-06-06 21:53:48.642154539 +0100 @@ -25,7 +25,7 @@ frob (int i) /* Expect assembly like: - la $2, $L4 + la $2,$L4 # Anything goes here. .type __jump_frob_4, @object # Symbol # must match label. __jump_frob_4: # The symbol must match. @@ -47,4 +47,4 @@ __jend_frob_4: # The symbol must mat that is `__jump_*'/`__jend_*' symbols inserted around a jump table. */ -/* { dg-final { scan-assembler "\tla\t\\\$\[0-9\]+, (.L(\[0-9\]+))\n.*\t\\.type\t(__jump_frob_\\2), @object\n\\3:\n\\1:\n(?:\t\\.(?:half|word)\t.L\[0-9\]+-\\1\n)\{11\}\t\\.type\t(__jend_frob_\\2), @function\n\\4:\n\t\\.insn\n" } } */ +/* { dg-final { scan-assembler "\tla\t\\\$\[0-9\]+,(.L(\[0-9\]+))\n.*\t\\.type\t(__jump_frob_\\2), @object\n\\3:\n\\1:\n(?:\t\\.(?:half|word)\t.L\[0-9\]+-\\1\n)\{11\}\t\\.type\t(__jend_frob_\\2), @function\n\\4:\n\t\\.insn\n" } } */ Index: gcc/gcc/testsuite/gcc.target/mips/pr51513-1.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ gcc/gcc/testsuite/gcc.target/mips/pr51513-1.c 2017-06-07 05:15:48.170762991 +0100 @@ -0,0 +1,48 @@ +/* { dg-do compile } */ +/* { dg-options "-mips16 -mcode-readable=yes" } */ + +/* PR tree-optimization/51513 verification variant for MIPS16, #1. */ + +int __attribute__ ((weak)) +frob (int i) +{ + switch (i) + { + case -5: + return -2; + case -3: + return -1; + case 0: + return 0; + case 3: + return 1; + case 5: + break; + default: + __builtin_unreachable (); + } + return i; +} + +/* Without the fix for PR tree-optimization/51513 truncated code + would be emitted for `frob', like: + + .text + .align 2 + .weak frob + .set mips16 + .set nomicromips + .ent frob + .type frob, @function +frob: + .frame $sp,0,$31 # vars= 0, regs= 0/0, args= 0, gp= 0 + .mask 0x00000000,0 + .fmask 0x00000000,0 + addiu $2,$4,5 + .end frob + .size frob, .-frob + + meaning `frob' will have no chance to return, let alone produce + the result expected. */ + +/* { dg-final { scan-assembler "\tjrc?\t\\\$31\n" } } */ Index: gcc/gcc/testsuite/gcc.target/mips/pr51513-2.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ gcc/gcc/testsuite/gcc.target/mips/pr51513-2.c 2017-06-07 05:15:48.187060805 +0100 @@ -0,0 +1,56 @@ +/* { dg-do run } */ +/* { dg-options "-mips16 -mcode-readable=yes" } */ + +/* PR tree-optimization/51513 verification variant for MIPS16, #2. */ + +int __attribute__ ((weak)) +frob (int i) +{ + switch (i) + { + case -5: + return -2; + case -3: + return -1; + case 0: + return 0; + case 3: + return 1; + case 5: + break; + default: + __builtin_unreachable (); + } + return i; +} + +int +main (void) +{ + return !(frob (-5) == -2 + & frob (-3) == -1 + & frob (0) == 0 + & frob (3) == 1 + & frob (5) == 5); +} + +/* Without the fix for PR tree-optimization/51513 truncated code + would be emitted for `frob', like: + + .text + .align 2 + .weak frob + .set mips16 + .set nomicromips + .ent frob + .type frob, @function +frob: + .frame $sp,0,$31 # vars= 0, regs= 0/0, args= 0, gp= 0 + .mask 0x00000000,0 + .fmask 0x00000000,0 + addiu $2,$4,5 + .end frob + .size frob, .-frob + + meaning `frob' will have no chance to return, let alone produce + the result expected. */