From patchwork Fri Dec 16 01:54:43 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Law X-Patchwork-Id: 706308 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3tftgC16Vqz9snm for ; Fri, 16 Dec 2016 12:55:18 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="x3Eqq6Kh"; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:message-id:date:mime-version:content-type; q=dns; s= default; b=EcBhKw10EF0EGJRMDFY8anfEBbmw3EOZPtRgt01Ihvykd/Xmi7T42 glLbiHgb0FGVoj1IYRCTN2AIJzuslqmBWeRdw7lqmXDPM/PuiA/mL1Bz4nhvfrM7 nG8s+d5IiWEF7XKMkhBkrseK3X+sKLwHgXvrSxTLpzf1r1bk9jaAFc= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:message-id:date:mime-version:content-type; s= default; bh=S9dQ94YlcfFRTXDDDlGvPSXCB+4=; b=x3Eqq6Kh4djHC7472B9n HRafbGtlB0F6LzMQoKpnA8kajDsLCfUPDH7HL+h3qyWZAjCEIrTY6FZcoxoHXhs0 8s1vfy5kWFjKJN4lrxieOJc3EUcvVv0SoLnOKT9hzz5N+MKhDvI56owp1VAyTQXq YWXC0TR2IlA4i19kUumeEdE= Received: (qmail 112962 invoked by alias); 16 Dec 2016 01:54:49 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 112892 invoked by uid 89); 16 Dec 2016 01:54:49 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-3.0 required=5.0 tests=BAYES_00, RP_MATCHES_RCVD, SPF_HELO_PASS, UNSUBSCRIBE_BODY autolearn=ham version=3.3.2 spammy=UD:offset, rfa, RFA, attacks X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 16 Dec 2016 01:54:45 +0000 Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 592C47AEA1 for ; Fri, 16 Dec 2016 01:54:44 +0000 (UTC) Received: from localhost.localdomain (ovpn-116-182.phx2.redhat.com [10.3.116.182]) by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id uBG1shP2007651 for ; Thu, 15 Dec 2016 20:54:44 -0500 From: Jeff Law To: gcc-patches Subject: [RFA] [PR tree-optimization/33562] [PATCH 1/4] Byte tracking in DSE Message-ID: <79d271d8-739e-23c6-ed24-e591cb7fb941@redhat.com> Date: Thu, 15 Dec 2016 18:54:43 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 X-IsSubscribed: yes This is the first of the 4 part patchkit to address deficiencies in our DSE implementation. This patch addresses the P2 regression 33562 which has been a low priority regression since gcc-4.3. To summarize, DSE no longer has the ability to detect an aggregate store as dead if subsequent stores are done in a piecemeal fashion. I originally tackled this by changing how we lower complex objects. That was sufficient to address 33562, but was reasonably rejected. This version attacks the problem by improving DSE to track stores to memory at a byte level. That allows us to determine if a series of stores completely covers an earlier store (thus making the earlier store dead). A useful side effect of this is we can detect when parts of a store are dead and potentially rewrite the store. This patch implements that for complex object initializations. While not strictly part of 33562, it's so closely related that I felt it belongs as part of this patch. This originally limited the size of the tracked memory space to 64 bytes. I bumped the limit after working through the CONSTRUCTOR and mem* trimming patches. The 256 byte limit is still fairly arbitrary and I wouldn't lose sleep if we throttled back to 64 or 128 bytes. Later patches in the kit will build upon this patch. So if pieces look like skeleton code, that's because it is. Bootstrapped and regression tested on x86_64-linux-gnu. OK for the trunk? PR tree-optimization/33562 * params.def (PARM_DSE_MAX_OBJECT_SIZE): New PARAM. * tree-ssa-dse.c: Include params.h. (initialize_ao_ref_for_dse): New, partially extracted from dse_optimize_stmt. (valid_io_ref_for_dse): New. (clear_bytes_written_by, trim_complex_store): Likewise. (trim_partially_dead_store): Likewise. (dse_partially_dead_store_p): Track what bytes were originally stored into memory by the statement as well as the subset of bytes that are still live. If we "fail", but have identified the store as partially dead, try to rewrite it to store fewer bytes of data. Exit the main loop if we find a full kill as a single statement or via group of statements. (dse_optimize_stmt): Use initialize_ao_ref_for_dse. * gcc.dg/tree-ssa/complex-4.c: No longer xfailed. * gcc.dg/tree-ssa/complex-5.c: Likewise. * gcc.dg/tree-ssa/ssa-dse-9.c: Likewise. * gcc.dg/tree-ssa/ssa-dse-18.c: New test. * gcc.dg/tree-ssa/ssa-dse-19.c: Likewise. * gcc.dg/tree-ssa/ssa-dse-20.c: Likewise. * gcc.dg/tree-ssa/ssa-dse-21.c: Likewise. diff --git a/gcc/params.def b/gcc/params.def index 50f75a7..ddc3d65 100644 --- a/gcc/params.def +++ b/gcc/params.def @@ -532,6 +532,11 @@ DEFPARAM(PARAM_AVG_LOOP_NITER, "Average number of iterations of a loop.", 10, 1, 0) +DEFPARAM(PARAM_DSE_MAX_OBJECT_SIZE, + "dse-max-object-size", + "Maximum size (in bytes) of objects tracked by dead store elimination.", + 256, 0, 0) + DEFPARAM(PARAM_SCEV_MAX_EXPR_SIZE, "scev-max-expr-size", "Bound on size of expressions used in the scalar evolutions analyzer.", diff --git a/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c b/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c index 87a2638..3155741 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c @@ -10,4 +10,4 @@ int f(void) return g(&t); } -/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c b/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c index e2cd403..e6d027f 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c @@ -8,4 +8,4 @@ int f(void) __imag__ t = 2; } -/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-18.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-18.c new file mode 100644 index 0000000..92b2df8 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-18.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ +int g(_Complex int*); +int f(void) +{ + _Complex int t = 0; + int i, j; + __imag__ t += 2; + return g(&t); +} + + +/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "REALPART_EXPR" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "IMAGPART_EXPR" 1 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-19.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-19.c new file mode 100644 index 0000000..718b746 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-19.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ +int g(_Complex int*); +int f(void) +{ + _Complex int t = 0; + int i, j; + __real__ t += 2; + return g(&t); +} + + +/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "REALPART_EXPR" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "IMAGPART_EXPR" 1 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-20.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-20.c new file mode 100644 index 0000000..4e14d9b --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-20.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O1 -fno-tree-dce -fdump-tree-optimized" } */ +_Complex int t = 0; +int f(void) +{ + t = 0; + __imag__ t = 2; +} + +/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "REALPART_EXPR" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "IMAGPART_EXPR" 1 "optimized" } } */ + diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-21.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-21.c new file mode 100644 index 0000000..d1e0b85 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-21.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O1 -fno-tree-dce -fdump-tree-optimized" } */ +_Complex int t = 0; +int f(void) +{ + t = 0; + __real__ t = 2; +} + +/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "REALPART_EXPR" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "IMAGPART_EXPR" 1 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-9.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-9.c index 594c20c..ae48ddd 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-9.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-9.c @@ -11,4 +11,4 @@ foo () } /* We should eliminate the first assignment. */ -/* { dg-final { scan-tree-dump-times "VDEF" 2 "dse1" { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump-times "VDEF" 2 "dse1" } } */ diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c index 778b363..eea185c 100644 --- a/gcc/tree-ssa-dse.c +++ b/gcc/tree-ssa-dse.c @@ -33,6 +33,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-dfa.h" #include "domwalk.h" #include "tree-cfgcleanup.h" +#include "params.h" /* This file implements dead store elimination. @@ -68,6 +69,158 @@ along with GCC; see the file COPYING3. If not see remove their dead edges eventually. */ static bitmap need_eh_cleanup; +/* STMT is a statement that may write into memory. Analyze it and + initialize WRITE to describe how STMT affects memory. + + Return TRUE if the the statement was analyzed, FALSE otherwise. + + It is always safe to return FALSE. But typically better optimziation + can be achieved by analyzing more statements. */ + +static bool +initialize_ao_ref_for_dse (gimple *stmt, ao_ref *write) +{ + /* It's advantageous to handle certain mem* functions. */ + if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL)) + { + switch (DECL_FUNCTION_CODE (gimple_call_fndecl (stmt))) + { + case BUILT_IN_MEMCPY: + case BUILT_IN_MEMMOVE: + case BUILT_IN_MEMSET: + { + tree size = NULL_TREE; + if (gimple_call_num_args (stmt) == 3) + size = gimple_call_arg (stmt, 2); + tree ptr = gimple_call_arg (stmt, 0); + ao_ref_init_from_ptr_and_size (write, ptr, size); + return true; + } + default: + break; + } + } + else if (is_gimple_assign (stmt)) + { + ao_ref_init (write, gimple_assign_lhs (stmt)); + return true; + } + return false; +} + +/* Given REF from the the alias oracle, return TRUE if it is a valid + memory reference for dead store elimination, false otherwise. + + In particular, the reference must have a known base, known maximum + size, start at a byte offset and have a size that is one or more + bytes. */ + +static bool +valid_ao_ref_for_dse (ao_ref *ref) +{ + return (ao_ref_base (ref) + && ref->max_size != -1 + && (ref->offset % BITS_PER_UNIT) == 0 + && (ref->size % BITS_PER_UNIT) == 0); +} + +/* Clear any bytes written by STMT from the bitmap LIVE_BYTES. The base + address written by STMT must match the one found in REF, which must + have its base address previously initialized. + + This routine must be conservative. If we don't know the offset or + actual size written, assume nothing was written. */ + +static void +clear_bytes_written_by (bitmap live_bytes, gimple *stmt, ao_ref *ref) +{ + ao_ref write; + if (!initialize_ao_ref_for_dse (stmt, &write)) + return; + + /* Verify we have the same base memory address and that the write + has a known size. If so, then clear the appropriate bytes in + the LIVE_BYTES bitmap. */ + if (valid_ao_ref_for_dse (&write) + && write.base == ref->base + && write.size == write.max_size) + bitmap_clear_range (live_bytes, + write.offset / BITS_PER_UNIT, + write.size / BITS_PER_UNIT); +} + +/* STMT initializes an object from COMPLEX_CST where one or more of the + bytes written are dead stores. ORIG is the bitmap of bytes stored by + STMT. LIVE is the bitmap of stores that are actually live. + + Attempt to rewrite STMT so that only the real or imaginary part of + the object is actually stored. */ + +static void +trim_complex_store (bitmap orig, bitmap live, gimple *stmt) +{ + bitmap dead = BITMAP_ALLOC (NULL); + bitmap_and_compl (dead, orig, live); + + /* So we have to verify that either the real or imag part as a whole + is dead. They are always the same size. Thus for one to be dead + the number of live bytes would have to be the same as the number of + dead bytes. */ + if (bitmap_count_bits (live) == bitmap_count_bits (dead)) + { + /* Hopefully we have live bits that look like 0xff00 or 0xff (assuming + 8 bytes for the underlying real/imag part). If so we can optimize + this case. */ + if (bitmap_first_set_bit (live) == 0 + && bitmap_first_set_bit (dead) == bitmap_count_bits (live)) + { + /* TREE_REALPART is live */ + tree x = TREE_REALPART (gimple_assign_rhs1 (stmt)); + tree y = gimple_assign_lhs (stmt); + y = build1 (REALPART_EXPR, TREE_TYPE (x), y); + gimple_assign_set_lhs (stmt, y); + gimple_assign_set_rhs1 (stmt, x); + } + else if (bitmap_first_set_bit (dead) == 0 + && bitmap_first_set_bit (live) == bitmap_count_bits (dead)) + { + /* TREE_IMAGPART is live */ + tree x = TREE_IMAGPART (gimple_assign_rhs1 (stmt)); + tree y = gimple_assign_lhs (stmt); + y = build1 (IMAGPART_EXPR, TREE_TYPE (x), y); + gimple_assign_set_lhs (stmt, y); + gimple_assign_set_rhs1 (stmt, x); + } + /* Other cases indicate parts of both the real and imag subobjects + are live. We do not try to optimize those cases. */ + } + BITMAP_FREE (dead); +} + +/* STMT is a memory write where one or more bytes written are dead + stores. ORIG is the bitmap of bytes stored by STMT. LIVE is the + bitmap of stores that are actually live. + + Attempt to rewrite STMT so that it writes fewer memory locations. Right + now we only support trimming at the start or end of the memory region. + It's not clear how much there is to be gained by trimming from the middle + of the region. */ + +static void +trim_partially_dead_store (bitmap orig, bitmap live, gimple *stmt) +{ + if (is_gimple_assign (stmt)) + { + switch (gimple_assign_rhs_code (stmt)) + { + case COMPLEX_CST: + trim_complex_store (orig, live, stmt); + break; + default: + break; + } + } +} /* A helper of dse_optimize_stmt. Given a GIMPLE_ASSIGN in STMT that writes to REF, find a candidate @@ -79,9 +232,32 @@ dse_possible_dead_store_p (ao_ref *ref, gimple *stmt, gimple **use_stmt) { gimple *temp; unsigned cnt = 0; + bitmap live_bytes = NULL; + bitmap orig_live_bytes = NULL; *use_stmt = NULL; + /* REF is a memory write. Go ahead and get its base, size, extent + information and encode the bytes written into LIVE_BYTES. We can + handle any case where we have a known base and maximum size. + + However, experimentation has shown that bit level tracking is not + useful in practice, so we only track at the byte level. + + Furthermore, experimentation has shown that 99% of the cases + that require byte tracking are 64 bytes or less. */ + if (valid_ao_ref_for_dse (ref) + && (ref->max_size / BITS_PER_UNIT + <= PARAM_VALUE (PARAM_DSE_MAX_OBJECT_SIZE))) + { + live_bytes = BITMAP_ALLOC (NULL); + orig_live_bytes = BITMAP_ALLOC (NULL); + bitmap_set_range (live_bytes, + ref->offset / BITS_PER_UNIT, + ref->max_size / BITS_PER_UNIT); + bitmap_copy (orig_live_bytes, live_bytes); + } + /* Find the first dominated statement that clobbers (part of) the memory stmt stores to with no intermediate statement that may use part of the memory stmt stores. That is, find a store that may @@ -164,7 +341,15 @@ dse_possible_dead_store_p (ao_ref *ref, gimple *stmt, gimple **use_stmt) } if (fail) - return false; + { + /* STMT might be partially dead and we may be able to reduce + how many memory locations it stores into. */ + if (live_bytes + && !bitmap_equal_p (live_bytes, orig_live_bytes) + && !gimple_clobber_p (stmt)) + trim_partially_dead_store (orig_live_bytes, live_bytes, stmt); + return false; + } /* If we didn't find any definition this means the store is dead if it isn't a store to global reachable memory. In this case @@ -177,12 +362,18 @@ dse_possible_dead_store_p (ao_ref *ref, gimple *stmt, gimple **use_stmt) temp = stmt; break; } + + if (live_bytes && temp) + clear_bytes_written_by (live_bytes, temp, ref); } - /* Continue walking until we reach a kill. */ - while (!stmt_kills_ref_p (temp, ref)); + /* Continue walking until we reach a full kill as a single statement + or there are no more live bytes. */ + while (!stmt_kills_ref_p (temp, ref) + && !(live_bytes && bitmap_empty_p (live_bytes))); *use_stmt = temp; - + BITMAP_FREE (live_bytes); + BITMAP_FREE (orig_live_bytes); return true; } @@ -214,6 +405,10 @@ dse_optimize_stmt (gimple_stmt_iterator *gsi) || TREE_CODE (gimple_assign_lhs (stmt)) != MEM_REF)) return; + ao_ref ref; + if (!initialize_ao_ref_for_dse (stmt, &ref)) + return; + /* We know we have virtual definitions. We can handle assignments and some builtin calls. */ if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL)) @@ -225,12 +420,6 @@ dse_optimize_stmt (gimple_stmt_iterator *gsi) case BUILT_IN_MEMSET: { gimple *use_stmt; - ao_ref ref; - tree size = NULL_TREE; - if (gimple_call_num_args (stmt) == 3) - size = gimple_call_arg (stmt, 2); - tree ptr = gimple_call_arg (stmt, 0); - ao_ref_init_from_ptr_and_size (&ref, ptr, size); if (!dse_possible_dead_store_p (&ref, stmt, &use_stmt)) return; @@ -244,6 +433,7 @@ dse_optimize_stmt (gimple_stmt_iterator *gsi) tree lhs = gimple_call_lhs (stmt); if (lhs) { + tree ptr = gimple_call_arg (stmt, 0); gimple *new_stmt = gimple_build_assign (lhs, ptr); unlink_stmt_vdef (stmt); if (gsi_replace (gsi, new_stmt, true)) @@ -274,13 +464,8 @@ dse_optimize_stmt (gimple_stmt_iterator *gsi) if (operand_equal_p (gimple_assign_rhs1 (stmt), gimple_assign_lhs (stmt), 0)) use_stmt = stmt; - else - { - ao_ref ref; - ao_ref_init (&ref, gimple_assign_lhs (stmt)); - if (!dse_possible_dead_store_p (&ref, stmt, &use_stmt)) - return; - } + else if (!dse_possible_dead_store_p (&ref, stmt, &use_stmt)) + return; /* Now we know that use_stmt kills the LHS of stmt. */