From patchwork Tue Nov 7 17:33:02 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aldy Hernandez X-Patchwork-Id: 835386 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-466170-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="hhmyDwi6"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3yWc374cgvz9s7C for ; Wed, 8 Nov 2017 04:33:22 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:message-id:date:mime-version:content-type; q=dns; s=default; b=lpoaQu2lugy4X0C4cJTnFLlvAGbvVpZFCv3KaU3DMdBRujxwK5 5X3iHVoZPtlvx08xavx9fvWyIdqIevTK5IXrSBm0Q0+YfK+FE4dsug1x6A8Zl03t FDEipFBxs4t35qJduOwMXqrbEBntjRn6NWPP3wpKwJjGnryBMlfHXjNjI= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:message-id:date:mime-version:content-type; s= default; bh=uL3MOlPrkxyHFiMzqtVwBbcnt34=; b=hhmyDwi6M0qygFSLtJba D3pjtyHo0k6XsNuD654+gSK1V5hz4zOPqXumlvjcm6HgIVsK0dK0E4QEbDGzu/0T U17J64Pcl6TXuUcrWKlkP+/SYQRiPOnL76gfDvGQzDS14ppTcPDxruEk5geFBQ7P aonpwZBj7X7j3oFeiEVIstg= Received: (qmail 16693 invoked by alias); 7 Nov 2017 17:33:10 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 16683 invoked by uid 89); 7 Nov 2017 17:33:09 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.2 spammy= X-HELO: mail-wm0-f50.google.com Received: from mail-wm0-f50.google.com (HELO mail-wm0-f50.google.com) (74.125.82.50) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 07 Nov 2017 17:33:07 +0000 Received: by mail-wm0-f50.google.com with SMTP id t139so5596871wmt.1 for ; Tue, 07 Nov 2017 09:33:06 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:cc:from:subject:message-id:date:user-agent :mime-version:content-language; bh=pfUCxRXR0ipGWoneelBUWv2M7fNU94Pvmi5LxPiECNM=; b=ep0aQk47AKE+OOmq2C0zPgf1xXx7AiXuCMFlyQpP1/2WbPg/3MECM7WG47F74nP5l/ X2bC7CQQSY855yuVbJz9pGGNlWLF2g98fKv4rLyGUR9XOhG8TN94/tjbyVr51zK68kfF 6G4ZM75iHGyLMAl6aP17b/lqaDtNrkOkFHX3UqldRY8/1ihFYA3tWlr1HBiSW/xmnXOl HxRPV3tDJLdCPkz7NB4jZ6AYG0jhIm/GTabUGIlofzUhXIoZWzl5eyxx+V/DRL7kGIFv 7ubz9VfOicwEJw/iXpsqqM8Vf2BSvf3CsFNRk+KZylgulm0YQ1MAAoKc99+jLD+4U3LM KIGQ== X-Gm-Message-State: AJaThX6h3+zcWEqao5vYkdOWVNTfuZZfFABK041yQ4gSaIgU2nY96xWY hjoI0pXTqXhn6ruQJXiWlqAfYf4W7+U= X-Google-Smtp-Source: ABhQp+SIv/05vRNznQNj+XzzoFKBeSvjaqdqy2RSTh6azvgLV8p97XH9BkGESvdio55krKpmosmLBA== X-Received: by 10.28.229.212 with SMTP id c203mr2221042wmh.57.1510075984476; Tue, 07 Nov 2017 09:33:04 -0800 (PST) Received: from abulafia.quesejoda.com (78.red-88-13-206.dynamicip.rima-tde.net. [88.13.206.78]) by smtp.gmail.com with ESMTPSA id u5sm1460440wmf.44.2017.11.07.09.33.03 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 07 Nov 2017 09:33:03 -0800 (PST) To: Jeff Law Cc: gcc-patches From: Aldy Hernandez Subject: [patch] jump threading multiple paths that start from the same BB Message-ID: Date: Tue, 7 Nov 2017 12:33:02 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 X-IsSubscribed: yes [One more time, but without rejected HTML mail, because apparently this is my first post to gcc-patches *ever* ;-)]. Howdy! While poking around in the backwards threader I noticed that we bail if we have already seen a starting BB. /* Do not jump-thread twice from the same block. */ if (bitmap_bit_p (threaded_blocks, entry->src->index) This limitation discards paths that are sub-paths of paths that have already been threaded. The following patch scans the remaining to-be-threaded paths to identify if any of them start from the same point, and are thus sub-paths of the just-threaded path. By removing the common prefix of blocks in upcoming threadable paths, and then rewiring first non-common block appropriately, we expose new threading opportunities, since we are no longer starting from the same BB. We also simplify the would-be threaded paths, because we don't duplicate already duplicated paths. This sounds confusing, but the documentation for the entry point to my patch (adjust_paths_after_duplication) shows an actual example: +/* After an FSM path has been jump threaded, adjust the remaining FSM + paths that are subsets of this path, so these paths can be safely + threaded within the context of the new threaded path. + + For example, suppose we have just threaded: + + 5 -> 6 -> 7 -> 8 -> 12 => 5 -> 6' -> 7' -> 8' -> 12' + + And we have an upcoming threading candidate: + 5 -> 6 -> 7 -> 8 -> 15 -> 20 + + This function adjusts the upcoming path into: + 8' -> 15 -> 20 Notice that we will no longer have two paths that start from the same BB. One will start with bb5, while the adjusted path will start with bb8'. With this we kill two birds-- we are able to thread more paths, and these paths will avoid duplicating a whole mess of things that have already been threaded. The attached patch is a subset of some upcoming work that can live on its own. It bootstraps and regtests. Also, by running it on a handful of .ii files, I can see that we are able to thread sub-paths that we previously dropped on the floor. More is better, right? :) To test this, I stole Jeff's method of using cachegrind to benchmark instruction counts and conditional branches (https://gcc.gnu.org/ml/gcc-patches/2013-11/msg02434.html). Basically, I bootstrapped two compilers, with and without improvements, and used each to build a stage1 trunk. Each of these stage1-trunks were used on 246 .ii GCC files I have lying around from a bootstrap some random point last year. I used no special flags on builds apart from --enable-languages=c,c++. Although I would've wished a larger improvement, this works comes for free, as it's just a subset of other work I'm hacking on. Without further ado, here are my monumental, earth shattering improvements: Conditional branches Without patch: 411846839709 With patch: 411831330316 %changed: -0.0037660% Number of instructions Without patch: 2271844650634 With patch: 2271761153578 %changed: -0.0036754% OK for trunk? Aldy p.s. There is a lot of debugging/dumping code in my patch, which I can gladly remove if/when approved. It helps keep my head straight while looking at this spaghetti :). gcc/ * tree-ssa-threadupdate.c (mark_threaded_blocks): Avoid dereferencing path[] beyond its length. (debug_path): New. (debug_all_paths): New. (rewire_first_differing_edge): New. (adjust_paths_after_duplication): New. (duplicate_thread_path): Call adjust_paths_after_duplication. Add new argument. (thread_through_all_blocks): Add new argument to duplicate_thread_path. diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c index 1dab0f1fab4..53ac7181b4b 100644 --- a/gcc/tree-ssa-threadupdate.c +++ b/gcc/tree-ssa-threadupdate.c @@ -1763,7 +1763,8 @@ mark_threaded_blocks (bitmap threaded_blocks) { vec *path = paths[i]; - if ((*path)[1]->type != EDGE_COPY_SRC_JOINER_BLOCK) + if (path->length () > 1 + && (*path)[1]->type != EDGE_COPY_SRC_JOINER_BLOCK) { edge e = (*path)[0]->e; e->aux = (void *)path; @@ -1783,7 +1784,8 @@ mark_threaded_blocks (bitmap threaded_blocks) { vec *path = paths[i]; - if ((*path)[1]->type == EDGE_COPY_SRC_JOINER_BLOCK) + if (path->length () > 1 + && (*path)[1]->type == EDGE_COPY_SRC_JOINER_BLOCK) { /* Attach the path to the starting edge if none is yet recorded. */ if ((*path)[0]->e->aux == NULL) @@ -1812,7 +1814,8 @@ mark_threaded_blocks (bitmap threaded_blocks) vec *path = paths[i]; edge e = (*path)[0]->e; - if ((*path)[1]->type == EDGE_COPY_SRC_JOINER_BLOCK && e->aux == path) + if (path->length () > 1 + && (*path)[1]->type == EDGE_COPY_SRC_JOINER_BLOCK && e->aux == path) { unsigned int j; for (j = 1; j < path->length (); j++) @@ -1978,6 +1981,169 @@ bb_in_bbs (basic_block bb, basic_block *bbs, int n) return false; } +DEBUG_FUNCTION void +debug_path (FILE *dump_file, int pathno) +{ + vec *p = paths[pathno]; + fprintf (dump_file, "path: "); + for (unsigned i = 0; i < p->length (); ++i) + fprintf (dump_file, "%d -> %d, ", + (*p)[i]->e->src->index, (*p)[i]->e->dest->index); + fprintf (dump_file, "\n"); +} + +DEBUG_FUNCTION void +debug_all_paths () +{ + for (unsigned i = 0; i < paths.length (); ++i) + debug_path (stderr, i); +} + +/* Rewire a jump_thread_edge so that the source block is now a + threaded source block. + + PATH_NUM is an index into the global path table PATHS. + EDGE_NUM is the jump thread edge number into said path. + + Returns TRUE if we were able to successfully rewire the edge. */ + +static bool +rewire_first_differing_edge (unsigned path_num, unsigned edge_num) +{ + vec *path = paths[path_num]; + edge &e = (*path)[edge_num]->e; + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "rewiring edge candidate: %d -> %d\n", + e->src->index, e->dest->index); + basic_block src_copy = get_bb_copy (e->src); + if (src_copy == NULL) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "ignoring candidate: there is no src COPY\n"); + return false; + } + edge new_edge = find_edge (src_copy, e->dest); + /* If the previously threaded paths created a flow graph where we + can no longer figure out where to go, give up. */ + if (new_edge == NULL) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "ignoring candidate: we lost our way\n"); + return false; + } + e = new_edge; + return true; +} + +/* After an FSM path has been jump threaded, adjust the remaining FSM + paths that are subsets of this path, so these paths can be safely + threaded within the context of the new threaded path. + + For example, suppose we have just threaded: + + 5 -> 6 -> 7 -> 8 -> 12 => 5 -> 6' -> 7' -> 8' -> 12' + + And we have an upcoming threading candidate: + 5 -> 6 -> 7 -> 8 -> 15 -> 20 + + This function adjusts the upcoming path into: + 8' -> 15 -> 20 + + CURR_PATH_NUM is an index into the global paths table. It + specifies the path that was just threaded. */ + +static void +adjust_paths_after_duplication (unsigned curr_path_num) +{ + vec *curr_path = paths[curr_path_num]; + gcc_assert ((*curr_path)[0]->type == EDGE_FSM_THREAD); + + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "just threaded: "); + debug_path (dump_file, curr_path_num); + } + + /* Iterate through all the other paths and adjust them. */ + for (unsigned cand_path_num = 0; cand_path_num < paths.length (); ) + { + if (cand_path_num == curr_path_num) + { + ++cand_path_num; + continue; + } + /* Make sure the candidate to adjust starts with the same path + as the recently threaded path and is an FSM thread. */ + vec *cand_path = paths[cand_path_num]; + if ((*cand_path)[0]->type != EDGE_FSM_THREAD + || (*cand_path)[0]->e != (*curr_path)[0]->e) + { + ++cand_path_num; + continue; + } + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "adjusting candidate: "); + debug_path (dump_file, cand_path_num); + } + + /* Chop off from the candidate path any prefix it shares with + the recently threaded path. */ + unsigned minlength = MIN (curr_path->length (), cand_path->length ()); + unsigned j; + for (j = 0; j < minlength; ++j) + { + edge cand_edge = (*cand_path)[j]->e; + edge curr_edge = (*curr_path)[j]->e; + + /* Once the prefix no longer matches, adjust the first + non-matching edge to point from an adjusted edge to + wherever it was going. */ + if (cand_edge != curr_edge) + { + gcc_assert (cand_edge->src == curr_edge->src); + if (!rewire_first_differing_edge (cand_path_num, j)) + goto remove_candidate_from_list; + break; + } + } + if (j == minlength) + { + /* If we consumed the max subgraph we could look at, and + still didn't find any different edges, it's the + last edge after MINLENGTH. */ + if (cand_path->length () > minlength) + { + if (!rewire_first_differing_edge (cand_path_num, j)) + goto remove_candidate_from_list; + } + else if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "adjusting first edge after MINLENGTH.\n"); + } + if (j > 0) + { + /* If we are removing everything, delete the entire candidate. */ + if (j == cand_path->length ()) + { + remove_candidate_from_list: + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "adjusted candidate: [EMPTY]\n"); + delete_jump_thread_path (cand_path); + paths.unordered_remove (cand_path_num); + continue; + } + /* Otherwise, just remove the redundant sub-path. */ + cand_path->block_remove (0, j); + } + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "adjusted candidate: "); + debug_path (dump_file, cand_path_num); + } + ++cand_path_num; + } +} + /* Duplicates a jump-thread path of N_REGION basic blocks. The ENTRY edge is redirected to the duplicate of the region. @@ -1985,11 +2151,14 @@ bb_in_bbs (basic_block bb, basic_block *bbs, int n) and create a single fallthru edge pointing to the same destination as the EXIT edge. + CURRENT_PATH_NO is an index into the global paths[] table + specifying the jump-thread path. + Returns false if it is unable to copy the region, true otherwise. */ static bool duplicate_thread_path (edge entry, edge exit, basic_block *region, - unsigned n_region) + unsigned n_region, unsigned current_path_no) { unsigned i; struct loop *loop = entry->dest->loop_father; @@ -2000,6 +2169,12 @@ duplicate_thread_path (edge entry, edge exit, basic_block *region, if (!can_copy_bbs_p (region, n_region)) return false; + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "\nabout to thread: "); + debug_path (dump_file, current_path_no); + } + /* Some sanity checking. Note that we do not check for all possible missuses of the functions. I.e. if you ask to copy something weird, it will work, but the state of structures probably will not be @@ -2128,6 +2303,8 @@ duplicate_thread_path (edge entry, edge exit, basic_block *region, free (region_copy); + adjust_paths_after_duplication (current_path_no); + free_original_copy_tables (); return true; } @@ -2251,7 +2428,7 @@ thread_through_all_blocks (bool may_peel_loop_headers) for (unsigned int j = 0; j < len - 1; j++) region[j] = (*path)[j]->e->dest; - if (duplicate_thread_path (entry, exit, region, len - 1)) + if (duplicate_thread_path (entry, exit, region, len - 1, i)) { /* We do not update dominance info. */ free_dominance_info (CDI_DOMINATORS);