From patchwork Tue Jan 28 00:22:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin Jambor X-Patchwork-Id: 1230093 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-518356-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha1 header.s=default header.b=oomLWJ6p; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4866l313wwz9sNx for ; Tue, 28 Jan 2020 11:22:37 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type; q=dns; s=default; b=DiZwCox+BMNanSLpv985qr6+j1lIk9E8sEUGKVgLy3NqKo1E5F eMiEj94098xYpePMJxpWWXKkiDgX/zYGb0LeTvP9TaoqVLlvyfKG5fwT4BiAYUch 6/1QILbBeb46ROSdeovKxETIsTj6D7rDfA0jG05UBjhGQAy02vpmcJqiI= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type; s= default; bh=oy8dk3FQNYQ+k+h3tJoU3FXiaDc=; b=oomLWJ6pL3mIH6MfL2bM cvyEniTeytxsQ9GGyfLE5tDlXu/cfsFxFIHDZKJXZtrKzklJ3KtWo2SaozuACZ9p Tfrg8vggLbPDd7PTDGm4lf4Cm+5k+vnX+q/EZX2GFNIix0MV2aPDOXSSmhdlnjJ5 dXLJMlWOShZmb3sqg2SanyM= Received: (qmail 125524 invoked by alias); 28 Jan 2020 00:22:29 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 125515 invoked by uid 89); 28 Jan 2020 00:22:29 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-20.8 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, SPF_PASS autolearn=ham version=3.3.1 spammy= X-HELO: mx2.suse.de Received: from mx2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 28 Jan 2020 00:22:27 +0000 Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id E5370AC35 for ; Tue, 28 Jan 2020 00:22:24 +0000 (UTC) From: Martin Jambor To: GCC Patches Cc: Subject: [PATCH 1/4] SRA: Add verification of accesses User-Agent: Notmuch/0.29.3 (https://notmuchmail.org) Emacs/26.3 (x86_64-suse-linux-gnu) Date: Tue, 28 Jan 2020 01:22:24 +0100 Message-ID: MIME-Version: 1.0 X-IsSubscribed: yes Hi, this patch has not changed since the last submission at all, in fact it got approved but without the follow-up fix of the reverse flag, it would introduce regression, so it should not be committed on its own. Because the follow-up patches perform some non-trivial operations on SRA patches, I wrote myself a verifier. And sure enough, it has spotted two issues, one of which is fixed in this patch too - we did not correctly set the parent link when creating artificial accesses for propagation across assignments. The second one is the (not) setting of reverse flag when creating accesses for total scalarization but since the following patch removes the offending function, this patch does not fix it. Bootstrapped and tested on x86_64, I consider this a pre-requisite for the followup patches (and the parent link fix really is). Thanks, Martin 2019-12-10 Martin Jambor * tree-sra.c (verify_sra_access_forest): New function. (verify_all_sra_access_forests): Likewise. (create_artificial_child_access): Set parent. (analyze_all_variable_accesses): Call the verifier. --- gcc/tree-sra.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 86 insertions(+) diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index 875d5b21763..36106fecaf1 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -2321,6 +2321,88 @@ build_access_trees (struct access *access) return true; } +/* Traverse the access forest where ROOT is the first root and verify that + various important invariants hold true. */ + +DEBUG_FUNCTION void +verify_sra_access_forest (struct access *root) +{ + struct access *access = root; + tree first_base = root->base; + gcc_assert (DECL_P (first_base)); + do + { + gcc_assert (access->base == first_base); + if (access->parent) + gcc_assert (access->offset >= access->parent->offset + && access->size <= access->parent->size); + if (access->next_sibling) + gcc_assert (access->next_sibling->offset + >= access->offset + access->size); + + poly_int64 poffset, psize, pmax_size; + bool reverse; + tree base = get_ref_base_and_extent (access->expr, &poffset, &psize, + &pmax_size, &reverse); + HOST_WIDE_INT offset, size, max_size; + if (!poffset.is_constant (&offset) + || !psize.is_constant (&size) + || !pmax_size.is_constant (&max_size)) + gcc_unreachable (); + gcc_assert (base == first_base); + gcc_assert (offset == access->offset); + gcc_assert (access->grp_unscalarizable_region + || size == max_size); + gcc_assert (max_size == access->size); + gcc_assert (reverse == access->reverse); + + if (access->first_child) + { + gcc_assert (access->first_child->parent == access); + access = access->first_child; + } + else if (access->next_sibling) + { + gcc_assert (access->next_sibling->parent == access->parent); + access = access->next_sibling; + } + else + { + while (access->parent && !access->next_sibling) + access = access->parent; + if (access->next_sibling) + access = access->next_sibling; + else + { + gcc_assert (access == root); + root = root->next_grp; + access = root; + } + } + } + while (access); +} + +/* Verify access forests of all candidates with accesses by calling + verify_access_forest on each on them. */ + +DEBUG_FUNCTION void +verify_all_sra_access_forests (void) +{ + bitmap_iterator bi; + unsigned i; + EXECUTE_IF_SET_IN_BITMAP (candidate_bitmap, 0, i, bi) + { + tree var = candidate (i); + struct access *access = get_first_repr_for_decl (var); + if (access) + { + gcc_assert (access->base == var); + verify_sra_access_forest (access); + } + } +} + /* Return true if expr contains some ARRAY_REFs into a variable bounded array. */ @@ -2566,6 +2648,7 @@ create_artificial_child_access (struct access *parent, struct access *model, access->offset = new_offset; access->size = model->size; access->type = model->type; + access->parent = parent; access->grp_write = set_grp_write; access->grp_read = false; access->reverse = model->reverse; @@ -2850,6 +2933,9 @@ analyze_all_variable_accesses (void) propagate_all_subaccesses (); + if (flag_checking) + verify_all_sra_access_forests (); + bitmap_copy (tmp, candidate_bitmap); EXECUTE_IF_SET_IN_BITMAP (tmp, 0, i, bi) { From patchwork Tue Jan 28 00:22:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin Jambor X-Patchwork-Id: 1230094 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-518357-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha1 header.s=default header.b=D5UY8OTr; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4866lf6CNBz9sNx for ; Tue, 28 Jan 2020 11:23:10 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type; q=dns; s=default; b=CVGZO+DCZNvy9tmLowlPUGgc28c4gEZ79qi9b9oXh5nmkzktAl AyEj7YaiNQfJmTZuvgnTzyRV34s7KaIDVPfn9siAJlujCurFzFaFmsj6JS7ORJ0v 3ZU0JbJK9M2ZAnpULH9EZUAtdT7HzJOQt2znQ1AWpQPZ5tBE2RxKFxBG0= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type; s= default; bh=U0ahZErF/DydidLiyrKH65mGLoM=; b=D5UY8OTri5OISZiCzXGb nDdjWaHY24EIu2U/SwDpdFmPmxn3qd9RYVh+u/AGekz2jPFiKkyOpSVpDEfEL7R9 6O8TnWIspLRq/9U/56Ll11epOO3d+ws2PCgGnwMQYYWnHK8Udgvr4JlJHNth9Nwm TVBFJc9u5L3KKlpmAUOFnKg= Received: (qmail 127076 invoked by alias); 28 Jan 2020 00:23:00 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 127056 invoked by uid 89); 28 Jan 2020 00:23:00 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-20.8 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, SPF_PASS autolearn=ham version=3.3.1 spammy= X-HELO: mx2.suse.de Received: from mx2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 28 Jan 2020 00:22:56 +0000 Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id D232AAC35; Tue, 28 Jan 2020 00:22:53 +0000 (UTC) From: Martin Jambor To: GCC Patches Cc: Richard Biener , Jan Hubicka , Jan Hubicka Subject: [PATCH 2/4] SRA: Total scalarization after access propagation [PR92706] User-Agent: Notmuch/0.29.3 (https://notmuchmail.org) Emacs/26.3 (x86_64-suse-linux-gnu) Date: Tue, 28 Jan 2020 01:22:53 +0100 Message-ID: MIME-Version: 1.0 X-IsSubscribed: yes Hi, this patch fixes the second testcase in PR 92706 by performing total scalarization only quite a bit later, when we already have access trees constructed and even done propagation of accesses from RHSs of assignment to LHSs. The new code simultaneously traverses the existing access tree and the declared variable type and adds artificial accesses whenever they can fit in between the existing ones. This prevents us from creating accesses based on the type which then clash with those which have propagated here from another access tree describing an aggregate on a RHS of an assignment, which means that both sides of the assignment will be scalarized differently, leading to bad code and the aggregate most likely not going away. This new version is hopefully slightly easier to read and review and also fixed one potential bug, but otherwise does pretty much the same thing as the first one. Bootstrapped and LTO-bootstrapped and tested on an x86_64-linux, where it causes two new guality XPASSes. I expect that review will lead to requests to change things but provided we want to fix PR 92706 now, I believe this is the way to go. The fix for PR 92486 which I am sending as a follow-up also depends on this patch. Thanks, Martin 2019-12-20 Martin Jambor PR tree-optimization/92706 * tree-sra.c (struct access): Adjust comment of grp_total_scalarization. (find_access_in_subtree): Look for single children spanning an entire access. (scalarizable_type_p): Allow register accesses, adjust callers. (completely_scalarize): Remove function. (scalarize_elem): Likewise. (create_total_scalarization_access): Likewise. (sort_and_splice_var_accesses): Do not track total scalarization flags. (analyze_access_subtree): New parameter totally, adjust to new meaning of grp_total_scalarization. (analyze_access_trees): Pass new parameter to analyze_access_subtree. (can_totally_scalarize_forest_p): New function. (create_total_scalarization_access): Likewise. (create_total_access_and_reshape): Likewise. (total_should_skip_creating_access): Likewise. (totally_scalarize_subtree): Likewise. (analyze_all_variable_accesses): Perform total scalarization after subaccess propagation using the new functions above. (initialize_constant_pool_replacements): Output initializers by traversing the access tree. testsuite/ * gcc.dg/tree-ssa/pr92706-2.c: New test. * gcc.dg/guality/pr59776.c: Xfail tests for s2.g. --- gcc/testsuite/gcc.dg/guality/pr59776.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/pr92706-2.c | 19 + gcc/tree-sra.c | 666 ++++++++++++++++------ 3 files changed, 505 insertions(+), 184 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr92706-2.c diff --git a/gcc/testsuite/gcc.dg/guality/pr59776.c b/gcc/testsuite/gcc.dg/guality/pr59776.c index 382abb622bb..6c1c8165b70 100644 --- a/gcc/testsuite/gcc.dg/guality/pr59776.c +++ b/gcc/testsuite/gcc.dg/guality/pr59776.c @@ -12,11 +12,11 @@ foo (struct S *p) struct S s1, s2; /* { dg-final { gdb-test pr59776.c:17 "s1.f" "5.0" } } */ s1 = *p; /* { dg-final { gdb-test pr59776.c:17 "s1.g" "6.0" } } */ s2 = s1; /* { dg-final { gdb-test pr59776.c:17 "s2.f" "0.0" } } */ - *(int *) &s2.f = 0; /* { dg-final { gdb-test pr59776.c:17 "s2.g" "6.0" } } */ + *(int *) &s2.f = 0; /* { dg-final { gdb-test pr59776.c:17 "s2.g" "6.0" { xfail *-*-* } } } */ asm volatile (NOP : : : "memory"); /* { dg-final { gdb-test pr59776.c:20 "s1.f" "5.0" } } */ asm volatile (NOP : : : "memory"); /* { dg-final { gdb-test pr59776.c:20 "s1.g" "6.0" } } */ s2 = s1; /* { dg-final { gdb-test pr59776.c:20 "s2.f" "5.0" } } */ - asm volatile (NOP : : : "memory"); /* { dg-final { gdb-test pr59776.c:20 "s2.g" "6.0" } } */ + asm volatile (NOP : : : "memory"); /* { dg-final { gdb-test pr59776.c:20 "s2.g" "6.0" { xfail *-*-* } } } */ asm volatile (NOP : : : "memory"); } diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr92706-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr92706-2.c new file mode 100644 index 00000000000..37ab9765db0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr92706-2.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-esra" } */ + +typedef __UINT64_TYPE__ uint64_t; +typedef __UINT32_TYPE__ uint32_t; +struct S { uint32_t i[2]; } __attribute__((aligned(__alignof__(uint64_t)))); +typedef uint64_t my_int64 __attribute__((may_alias)); +uint64_t load (void *p) +{ + struct S u, v, w; + uint64_t tem; + tem = *(my_int64 *)p; + *(my_int64 *)&v = tem; + u = v; + w = u; + return *(my_int64 *)&w; +} + +/* { dg-final { scan-tree-dump "Created a replacement for v" "esra" } } */ diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index 36106fecaf1..2b0849858de 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -211,8 +211,11 @@ struct access is not propagated in the access tree in any direction. */ unsigned grp_scalar_write : 1; - /* Is this access an artificial one created to scalarize some record - entirely? */ + /* In a root of an access tree, true means that the entire tree should be + totally scalarized - that all scalar leafs should be scalarized and + non-root grp_total_scalarization accesses should be honored. Otherwise, + non-root accesses with grp_total_scalarization should never get scalar + replacements. */ unsigned grp_total_scalarization : 1; /* Other passes of the analysis use this bit to make function @@ -485,6 +488,15 @@ find_access_in_subtree (struct access *access, HOST_WIDE_INT offset, access = child; } + /* Total scalarization does not replace single field structures with their + single field but rather creates an access for them underneath. Look for + it. */ + if (access) + while (access->first_child + && access->first_child->offset == offset + && access->first_child->size == size) + access = access->first_child; + return access; } @@ -856,7 +868,8 @@ create_access (tree expr, gimple *stmt, bool write) static bool scalarizable_type_p (tree type, bool const_decl) { - gcc_assert (!is_gimple_reg_type (type)); + if (is_gimple_reg_type (type)) + return true; if (type_contains_placeholder_p (type)) return false; @@ -871,8 +884,7 @@ scalarizable_type_p (tree type, bool const_decl) if (DECL_BIT_FIELD (fld)) return false; - if (!is_gimple_reg_type (ft) - && !scalarizable_type_p (ft, const_decl)) + if (!scalarizable_type_p (ft, const_decl)) return false; } @@ -902,8 +914,7 @@ scalarizable_type_p (tree type, bool const_decl) return false; tree elem = TREE_TYPE (type); - if (!is_gimple_reg_type (elem) - && !scalarizable_type_p (elem, const_decl)) + if (!scalarizable_type_p (elem, const_decl)) return false; return true; } @@ -912,114 +923,6 @@ scalarizable_type_p (tree type, bool const_decl) } } -static void scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, bool, tree, tree); - -/* Create total_scalarization accesses for all scalar fields of a member - of type DECL_TYPE conforming to scalarizable_type_p. BASE - must be the top-most VAR_DECL representing the variable; within that, - OFFSET locates the member and REF must be the memory reference expression for - the member. */ - -static void -completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref) -{ - switch (TREE_CODE (decl_type)) - { - case RECORD_TYPE: - for (tree fld = TYPE_FIELDS (decl_type); fld; fld = DECL_CHAIN (fld)) - if (TREE_CODE (fld) == FIELD_DECL) - { - HOST_WIDE_INT pos = offset + int_bit_position (fld); - tree ft = TREE_TYPE (fld); - tree nref = build3 (COMPONENT_REF, ft, ref, fld, NULL_TREE); - - scalarize_elem (base, pos, tree_to_uhwi (DECL_SIZE (fld)), - TYPE_REVERSE_STORAGE_ORDER (decl_type), - nref, ft); - } - break; - case ARRAY_TYPE: - { - tree elemtype = TREE_TYPE (decl_type); - tree elem_size = TYPE_SIZE (elemtype); - gcc_assert (elem_size && tree_fits_shwi_p (elem_size)); - HOST_WIDE_INT el_size = tree_to_shwi (elem_size); - gcc_assert (el_size > 0); - - tree minidx = TYPE_MIN_VALUE (TYPE_DOMAIN (decl_type)); - gcc_assert (TREE_CODE (minidx) == INTEGER_CST); - tree maxidx = TYPE_MAX_VALUE (TYPE_DOMAIN (decl_type)); - /* Skip (some) zero-length arrays; others have MAXIDX == MINIDX - 1. */ - if (maxidx) - { - gcc_assert (TREE_CODE (maxidx) == INTEGER_CST); - tree domain = TYPE_DOMAIN (decl_type); - /* MINIDX and MAXIDX are inclusive, and must be interpreted in - DOMAIN (e.g. signed int, whereas min/max may be size_int). */ - offset_int idx = wi::to_offset (minidx); - offset_int max = wi::to_offset (maxidx); - if (!TYPE_UNSIGNED (domain)) - { - idx = wi::sext (idx, TYPE_PRECISION (domain)); - max = wi::sext (max, TYPE_PRECISION (domain)); - } - for (int el_off = offset; idx <= max; ++idx) - { - tree nref = build4 (ARRAY_REF, elemtype, - ref, - wide_int_to_tree (domain, idx), - NULL_TREE, NULL_TREE); - scalarize_elem (base, el_off, el_size, - TYPE_REVERSE_STORAGE_ORDER (decl_type), - nref, elemtype); - el_off += el_size; - } - } - } - break; - default: - gcc_unreachable (); - } -} - -/* Create total_scalarization accesses for a member of type TYPE, which must - satisfy either is_gimple_reg_type or scalarizable_type_p. BASE must be the - top-most VAR_DECL representing the variable; within that, POS and SIZE locate - the member, REVERSE gives its torage order. and REF must be the reference - expression for it. */ - -static void -scalarize_elem (tree base, HOST_WIDE_INT pos, HOST_WIDE_INT size, bool reverse, - tree ref, tree type) -{ - if (is_gimple_reg_type (type)) - { - struct access *access = create_access_1 (base, pos, size); - access->expr = ref; - access->type = type; - access->grp_total_scalarization = 1; - access->reverse = reverse; - /* Accesses for intraprocedural SRA can have their stmt NULL. */ - } - else - completely_scalarize (base, type, pos, ref); -} - -/* Create a total_scalarization access for VAR as a whole. VAR must be of a - RECORD_TYPE or ARRAY_TYPE conforming to scalarizable_type_p. */ - -static void -create_total_scalarization_access (tree var) -{ - HOST_WIDE_INT size = tree_to_uhwi (DECL_SIZE (var)); - struct access *access; - - access = create_access_1 (var, 0, size); - access->expr = var; - access->type = TREE_TYPE (var); - access->grp_total_scalarization = 1; -} - /* Return true if REF has an VIEW_CONVERT_EXPR somewhere in it. */ static inline bool @@ -2029,7 +1932,6 @@ sort_and_splice_var_accesses (tree var) bool grp_assignment_read = access->grp_assignment_read; bool grp_assignment_write = access->grp_assignment_write; bool multiple_scalar_reads = false; - bool total_scalarization = access->grp_total_scalarization; bool grp_partial_lhs = access->grp_partial_lhs; bool first_scalar = is_gimple_reg_type (access->type); bool unscalarizable_region = access->grp_unscalarizable_region; @@ -2081,7 +1983,6 @@ sort_and_splice_var_accesses (tree var) grp_assignment_write |= ac2->grp_assignment_write; grp_partial_lhs |= ac2->grp_partial_lhs; unscalarizable_region |= ac2->grp_unscalarizable_region; - total_scalarization |= ac2->grp_total_scalarization; relink_to_new_repr (access, ac2); /* If there are both aggregate-type and scalar-type accesses with @@ -2122,9 +2023,7 @@ sort_and_splice_var_accesses (tree var) access->grp_scalar_write = grp_scalar_write; access->grp_assignment_read = grp_assignment_read; access->grp_assignment_write = grp_assignment_write; - access->grp_hint = total_scalarization - || (multiple_scalar_reads && !constant_decl_p (var)); - access->grp_total_scalarization = total_scalarization; + access->grp_hint = multiple_scalar_reads && !constant_decl_p (var); access->grp_partial_lhs = grp_partial_lhs; access->grp_unscalarizable_region = unscalarizable_region; access->grp_same_access_path = grp_same_access_path; @@ -2420,15 +2319,16 @@ expr_with_var_bounded_array_refs_p (tree expr) } /* Analyze the subtree of accesses rooted in ROOT, scheduling replacements when - both seeming beneficial and when ALLOW_REPLACEMENTS allows it. Also set all - sorts of access flags appropriately along the way, notably always set - grp_read and grp_assign_read according to MARK_READ and grp_write when - MARK_WRITE is true. + both seeming beneficial and when ALLOW_REPLACEMENTS allows it. If TOTALLY + is set, we are totally scalarizing the aggregate. Also set all sorts of + access flags appropriately along the way, notably always set grp_read and + grp_assign_read according to MARK_READ and grp_write when MARK_WRITE is + true. Creating a replacement for a scalar access is considered beneficial if its - grp_hint is set (this means we are either attempting total scalarization or - there is more than one direct read access) or according to the following - table: + grp_hint ot TOTALLY is set (this means either that there is more than one + direct read access or that we are attempting total scalarization) or + according to the following table: Access written to through a scalar type (once or more times) | @@ -2459,7 +2359,7 @@ expr_with_var_bounded_array_refs_p (tree expr) static bool analyze_access_subtree (struct access *root, struct access *parent, - bool allow_replacements) + bool allow_replacements, bool totally) { struct access *child; HOST_WIDE_INT limit = root->offset + root->size; @@ -2477,8 +2377,6 @@ analyze_access_subtree (struct access *root, struct access *parent, root->grp_write = 1; if (parent->grp_assignment_write) root->grp_assignment_write = 1; - if (parent->grp_total_scalarization) - root->grp_total_scalarization = 1; if (!parent->grp_same_access_path) root->grp_same_access_path = 0; } @@ -2493,10 +2391,10 @@ analyze_access_subtree (struct access *root, struct access *parent, { hole |= covered_to < child->offset; sth_created |= analyze_access_subtree (child, root, - allow_replacements && !scalar); + allow_replacements && !scalar, + totally); root->grp_unscalarized_data |= child->grp_unscalarized_data; - root->grp_total_scalarization &= child->grp_total_scalarization; if (child->grp_covered) covered_to += child->size; else @@ -2504,7 +2402,9 @@ analyze_access_subtree (struct access *root, struct access *parent, } if (allow_replacements && scalar && !root->first_child - && (root->grp_hint + && (totally || !root->grp_total_scalarization) + && (totally + || root->grp_hint || ((root->grp_scalar_read || root->grp_assignment_read) && (root->grp_scalar_write || root->grp_assignment_write)))) { @@ -2546,6 +2446,7 @@ analyze_access_subtree (struct access *root, struct access *parent, { if (allow_replacements && scalar && !root->first_child + && !root->grp_total_scalarization && (root->grp_scalar_write || root->grp_assignment_write) && !bitmap_bit_p (cannot_scalarize_away_bitmap, DECL_UID (root->base))) @@ -2566,7 +2467,7 @@ analyze_access_subtree (struct access *root, struct access *parent, root->grp_total_scalarization = 0; } - if (!hole || root->grp_total_scalarization) + if (!hole || totally) root->grp_covered = 1; else if (root->grp_write || comes_initialized_p (root->base)) root->grp_unscalarized_data = 1; /* not covered and written to */ @@ -2582,7 +2483,8 @@ analyze_access_trees (struct access *access) while (access) { - if (analyze_access_subtree (access, NULL, true)) + if (analyze_access_subtree (access, NULL, true, + access->grp_total_scalarization)) ret = true; access = access->next_grp; } @@ -2855,6 +2757,369 @@ propagate_all_subaccesses (void) } } +/* Return true if the forest beginning with ROOT does not contain + unscalarizable regions or non-byte aligned accesses. */ + +static bool +can_totally_scalarize_forest_p (struct access *root) +{ + struct access *access = root; + do + { + if (access->grp_unscalarizable_region + || (access->offset % BITS_PER_UNIT) != 0 + || (access->size % BITS_PER_UNIT) != 0 + || (is_gimple_reg_type (access->type) + && access->first_child)) + return false; + + if (access->first_child) + access = access->first_child; + else if (access->next_sibling) + access = access->next_sibling; + else + { + while (access->parent && !access->next_sibling) + access = access->parent; + if (access->next_sibling) + access = access->next_sibling; + else + { + gcc_assert (access == root); + root = root->next_grp; + access = root; + } + } + } + while (access); + return true; +} + +/* Create and return an ACCESS in PARENT spanning from POS with SIZE, TYPE and + reference EXPR for total scalarization purposes and mark it as such. Within + the children of PARENT, link it in between PTR and NEXT_SIBLING. */ + +static struct access * +create_total_scalarization_access (struct access *parent, HOST_WIDE_INT pos, + HOST_WIDE_INT size, tree type, tree expr, + struct access **ptr, + struct access *next_sibling) +{ + struct access *access = access_pool.allocate (); + memset (access, 0, sizeof (struct access)); + access->base = parent->base; + access->offset = pos; + access->size = size; + access->expr = expr; + access->type = type; + access->parent = parent; + access->grp_write = parent->grp_write; + access->grp_total_scalarization = 1; + access->grp_hint = 1; + access->grp_same_access_path = path_comparable_for_same_access (expr); + access->reverse = reverse_storage_order_for_component_p (expr); + + access->next_sibling = next_sibling; + *ptr = access; + return access; +} + +/* Create and return an ACCESS in PARENT spanning from POS with SIZE, TYPE and + reference EXPR for total scalarization purposes and mark it as such, link it + at *PTR and reshape the tree so that those elements at *PTR and their + siblings which fall within the part described by POS and SIZE are moved to + be children of the new access. If a partial overlap is detected, return + NULL. */ + +static struct access * +create_total_access_and_reshape (struct access *parent, HOST_WIDE_INT pos, + HOST_WIDE_INT size, tree type, tree expr, + struct access **ptr) +{ + struct access **p = ptr; + + while (*p && (*p)->offset < pos + size) + { + if ((*p)->offset + (*p)->size > pos + size) + return NULL; + p = &(*p)->next_sibling; + } + + struct access *next_child = *ptr; + struct access *new_acc + = create_total_scalarization_access (parent, pos, size, type, expr, + ptr, *p); + if (p != ptr) + { + new_acc->first_child = next_child; + *p = NULL; + for (struct access *a = next_child; a; a = a->next_sibling) + a->parent = new_acc; + } + return new_acc; +} + +static bool totally_scalarize_subtree (struct access *root); + +/* Return true if INNER is either the same type as OUTER or if it is the type + of a record field in OUTER at offset zero, possibly in nested + sub-records. */ + +static bool +access_and_field_type_match_p (tree outer, tree inner) +{ + if (TYPE_MAIN_VARIANT (outer) == TYPE_MAIN_VARIANT (inner)) + return true; + if (TREE_CODE (outer) != RECORD_TYPE) + return false; + tree fld = TYPE_FIELDS (outer); + while (fld) + { + if (TREE_CODE (fld) == FIELD_DECL) + { + if (!zerop (DECL_FIELD_OFFSET (fld))) + return false; + if (TYPE_MAIN_VARIANT (TREE_TYPE (fld)) == inner) + return true; + if (TREE_CODE (TREE_TYPE (fld)) == RECORD_TYPE) + fld = TYPE_FIELDS (TREE_TYPE (fld)); + else + return false; + } + else + fld = DECL_CHAIN (fld); + } + return false; +} + +/* Return type of total_should_skip_creating_access indicating whether a total + scalarization access for a field/element should be created, whether it + already exists or whether the entire total scalarization has to fail. */ + +enum total_sra_field_state {TOTAL_FLD_CREATE, TOTAL_FLD_DONE, TOTAL_FLD_FAILED}; + +/* Do all the necessary steps in total scalarization when the given aggregate + type has a TYPE at POS with the given SIZE should be put into PARENT and + when we have processed all its siblings with smaller offsets up until and + including LAST_SEEN_SIBLING (which can be NULL). + + If some further siblings are to be skipped, set *LAST_SEEN_SIBLING as + appropriate. Return TOTAL_FLD_CREATE id the caller should carry on with + creating a new access, TOTAL_FLD_DONE if access or accesses capable of + representing the described part of the aggregate for the purposes of total + scalarization already exist or TOTAL_FLD_FAILED if there is a problem which + prevents total scalarization from happening at all. */ + +static enum total_sra_field_state +total_should_skip_creating_access (struct access *parent, + struct access **last_seen_sibling, + tree type, HOST_WIDE_INT pos, + HOST_WIDE_INT size) +{ + struct access *next_child; + if (!*last_seen_sibling) + next_child = parent->first_child; + else + next_child = (*last_seen_sibling)->next_sibling; + + /* First, traverse the chain of siblings until it points to an access with + offset at least equal to POS. Check all skipped accesses whether they + span the POS boundary and if so, return with a failure. */ + while (next_child && next_child->offset < pos) + { + if (next_child->offset + next_child->size > pos) + return TOTAL_FLD_FAILED; + *last_seen_sibling = next_child; + next_child = next_child->next_sibling; + } + + /* Now check whether next_child has exactly the right POS and SIZE and if so, + whether it can represent what we need and can be totally scalarized + itself. */ + if (next_child && next_child->offset == pos + && next_child->size == size) + { + if (!is_gimple_reg_type (next_child->type) + && (!access_and_field_type_match_p (type, next_child->type) + || !totally_scalarize_subtree (next_child))) + return TOTAL_FLD_FAILED; + + *last_seen_sibling = next_child; + return TOTAL_FLD_DONE; + } + + /* If the child we're looking at would partially overlap, we just cannot + totally scalarize. */ + if (next_child + && next_child->offset < pos + size + && next_child->offset + next_child->size > pos + size) + return TOTAL_FLD_FAILED; + + if (is_gimple_reg_type (type)) + { + /* We don't scalarize accesses that are children of other scalar type + accesses, so if we go on and create an access for a register type, + there should not be any pre-existing children. There are rare cases + where the requested type is a vector but we already have register + accesses for all its elements which is equally good. Detect that + situation or whether we need to bail out. */ + + HOST_WIDE_INT covered = pos; + bool skipping = false; + while (next_child + && next_child->offset + next_child->size <= pos + size) + { + if (next_child->offset != covered + || !is_gimple_reg_type (next_child->type)) + return TOTAL_FLD_FAILED; + + covered += next_child->size; + *last_seen_sibling = next_child; + next_child = next_child->next_sibling; + skipping = true; + } + + if (skipping) + { + if (covered != pos + size) + return TOTAL_FLD_FAILED; + else + return TOTAL_FLD_DONE; + } + } + + return TOTAL_FLD_CREATE; +} + +/* Go over sub-tree rooted in ROOT and attempt to create scalar accesses + spanning all uncovered areas covered by ROOT, return false if the attempt + failed. All created accesses will have grp_unscalarizable_region set (and + should be ignored if the function returns false). */ + +static bool +totally_scalarize_subtree (struct access *root) +{ + gcc_checking_assert (!root->grp_unscalarizable_region); + gcc_checking_assert (!is_gimple_reg_type (root->type)); + + struct access *last_seen_sibling = NULL; + + switch (TREE_CODE (root->type)) + { + case RECORD_TYPE: + for (tree fld = TYPE_FIELDS (root->type); fld; fld = DECL_CHAIN (fld)) + if (TREE_CODE (fld) == FIELD_DECL) + { + tree ft = TREE_TYPE (fld); + HOST_WIDE_INT fsize = tree_to_uhwi (DECL_SIZE (fld)); + if (!fsize) + continue; + + HOST_WIDE_INT pos = root->offset + int_bit_position (fld); + enum total_sra_field_state + state = total_should_skip_creating_access (root, + &last_seen_sibling, + ft, pos, fsize); + switch (state) + { + case TOTAL_FLD_FAILED: + return false; + case TOTAL_FLD_DONE: + continue; + case TOTAL_FLD_CREATE: + break; + default: + gcc_unreachable (); + } + + struct access **p = (last_seen_sibling + ? &last_seen_sibling->next_sibling + : &root->first_child); + tree nref = build3 (COMPONENT_REF, ft, root->expr, fld, NULL_TREE); + struct access *new_child + = create_total_access_and_reshape (root, pos, fsize, ft, nref, p); + if (!new_child) + return false; + + if (!is_gimple_reg_type (ft) + && !totally_scalarize_subtree (new_child)) + return false; + last_seen_sibling = new_child; + } + break; + case ARRAY_TYPE: + { + tree elemtype = TREE_TYPE (root->type); + tree elem_size = TYPE_SIZE (elemtype); + gcc_assert (elem_size && tree_fits_shwi_p (elem_size)); + HOST_WIDE_INT el_size = tree_to_shwi (elem_size); + gcc_assert (el_size > 0); + + tree minidx = TYPE_MIN_VALUE (TYPE_DOMAIN (root->type)); + gcc_assert (TREE_CODE (minidx) == INTEGER_CST); + tree maxidx = TYPE_MAX_VALUE (TYPE_DOMAIN (root->type)); + /* Skip (some) zero-length arrays; others have MAXIDX == MINIDX - 1. */ + if (!maxidx) + goto out; + gcc_assert (TREE_CODE (maxidx) == INTEGER_CST); + tree domain = TYPE_DOMAIN (root->type); + /* MINIDX and MAXIDX are inclusive, and must be interpreted in + DOMAIN (e.g. signed int, whereas min/max may be size_int). */ + offset_int idx = wi::to_offset (minidx); + offset_int max = wi::to_offset (maxidx); + if (!TYPE_UNSIGNED (domain)) + { + idx = wi::sext (idx, TYPE_PRECISION (domain)); + max = wi::sext (max, TYPE_PRECISION (domain)); + } + for (HOST_WIDE_INT pos = root->offset; + idx <= max; + pos += el_size, ++idx) + { + enum total_sra_field_state + state = total_should_skip_creating_access (root, + &last_seen_sibling, + elemtype, pos, + el_size); + switch (state) + { + case TOTAL_FLD_FAILED: + return false; + case TOTAL_FLD_DONE: + continue; + case TOTAL_FLD_CREATE: + break; + default: + gcc_unreachable (); + } + + struct access **p = (last_seen_sibling + ? &last_seen_sibling->next_sibling + : &root->first_child); + tree nref = build4 (ARRAY_REF, elemtype, root->expr, + wide_int_to_tree (domain, idx), + NULL_TREE, NULL_TREE); + struct access *new_child + = create_total_access_and_reshape (root, pos, el_size, elemtype, + nref, p); + if (!new_child) + return false; + + if (!is_gimple_reg_type (elemtype) + && !totally_scalarize_subtree (new_child)) + return false; + last_seen_sibling = new_child; + } + } + break; + default: + gcc_unreachable (); + } + + out: + return true; +} + /* Go through all accesses collected throughout the (intraprocedural) analysis stage, exclude overlapping ones, identify representatives and build trees out of them, making decisions about scalarization on the way. Return true @@ -2867,8 +3132,22 @@ analyze_all_variable_accesses (void) bitmap tmp = BITMAP_ALLOC (NULL); bitmap_iterator bi; unsigned i; - bool optimize_speed_p = !optimize_function_for_size_p (cfun); + bitmap_copy (tmp, candidate_bitmap); + EXECUTE_IF_SET_IN_BITMAP (tmp, 0, i, bi) + { + tree var = candidate (i); + struct access *access; + + access = sort_and_splice_var_accesses (var); + if (!access || !build_access_trees (access)) + disqualify_candidate (var, + "No or inhibitingly overlapping accesses."); + } + + propagate_all_subaccesses (); + + bool optimize_speed_p = !optimize_function_for_size_p (cfun); /* If the user didn't set PARAM_SRA_MAX_SCALARIZATION_SIZE_<...>, fall back to a target default. */ unsigned HOST_WIDE_INT max_scalarization_size @@ -2884,7 +3163,6 @@ analyze_all_variable_accesses (void) if (global_options_set.x_param_sra_max_scalarization_size_size) max_scalarization_size = param_sra_max_scalarization_size_size; } - max_scalarization_size *= BITS_PER_UNIT; EXECUTE_IF_SET_IN_BITMAP (candidate_bitmap, 0, i, bi) @@ -2892,46 +3170,56 @@ analyze_all_variable_accesses (void) && !bitmap_bit_p (cannot_scalarize_away_bitmap, i)) { tree var = candidate (i); + if (!VAR_P (var)) + continue; - if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var), - constant_decl_p (var))) + if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var))) > max_scalarization_size) { - if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var))) - <= max_scalarization_size) - { - create_total_scalarization_access (var); - completely_scalarize (var, TREE_TYPE (var), 0, var); - statistics_counter_event (cfun, - "Totally-scalarized aggregates", 1); - if (dump_file && (dump_flags & TDF_DETAILS)) - { - fprintf (dump_file, "Will attempt to totally scalarize "); - print_generic_expr (dump_file, var); - fprintf (dump_file, " (UID: %u): \n", DECL_UID (var)); - } - } - else if (dump_file && (dump_flags & TDF_DETAILS)) + if (dump_file && (dump_flags & TDF_DETAILS)) { fprintf (dump_file, "Too big to totally scalarize: "); print_generic_expr (dump_file, var); fprintf (dump_file, " (UID: %u)\n", DECL_UID (var)); } + continue; } - } - bitmap_copy (tmp, candidate_bitmap); - EXECUTE_IF_SET_IN_BITMAP (tmp, 0, i, bi) - { - tree var = candidate (i); - struct access *access; + bool all_types_ok = true; + for (struct access *access = get_first_repr_for_decl (var); + access; + access = access->next_grp) + if (!can_totally_scalarize_forest_p (access) + || !scalarizable_type_p (access->type, constant_decl_p (var))) + { + all_types_ok = false; + break; + } + if (!all_types_ok) + continue; - access = sort_and_splice_var_accesses (var); - if (!access || !build_access_trees (access)) - disqualify_candidate (var, - "No or inhibitingly overlapping accesses."); - } + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Will attempt to totally scalarize "); + print_generic_expr (dump_file, var); + fprintf (dump_file, " (UID: %u): \n", DECL_UID (var)); + } + bool scalarized = true; + for (struct access *access = get_first_repr_for_decl (var); + access; + access = access->next_grp) + if (!is_gimple_reg_type (access->type) + && !totally_scalarize_subtree (access)) + { + scalarized = false; + break; + } - propagate_all_subaccesses (); + if (scalarized) + for (struct access *access = get_first_repr_for_decl (var); + access; + access = access->next_grp) + access->grp_total_scalarization = true; + } if (flag_checking) verify_all_sra_access_forests (); @@ -3804,25 +4092,39 @@ initialize_constant_pool_replacements (void) tree var = candidate (i); if (!constant_decl_p (var)) continue; - vec *access_vec = get_base_access_vector (var); - if (!access_vec) - continue; - for (unsigned i = 0; i < access_vec->length (); i++) + + struct access *access = get_first_repr_for_decl (var); + + while (access) { - struct access *access = (*access_vec)[i]; - if (!access->replacement_decl) - continue; - gassign *stmt - = gimple_build_assign (get_access_replacement (access), - unshare_expr (access->expr)); - if (dump_file && (dump_flags & TDF_DETAILS)) + if (access->replacement_decl) { - fprintf (dump_file, "Generating constant initializer: "); - print_gimple_stmt (dump_file, stmt, 0); - fprintf (dump_file, "\n"); + gassign *stmt + = gimple_build_assign (get_access_replacement (access), + unshare_expr (access->expr)); + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Generating constant initializer: "); + print_gimple_stmt (dump_file, stmt, 0); + fprintf (dump_file, "\n"); + } + gsi_insert_after (&gsi, stmt, GSI_NEW_STMT); + update_stmt (stmt); + } + + if (access->first_child) + access = access->first_child; + else if (access->next_sibling) + access = access->next_sibling; + else + { + while (access->parent && !access->next_sibling) + access = access->parent; + if (access->next_sibling) + access = access->next_sibling; + else + access = access->next_grp; } - gsi_insert_after (&gsi, stmt, GSI_NEW_STMT); - update_stmt (stmt); } } From patchwork Tue Jan 28 00:23:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin Jambor X-Patchwork-Id: 1230095 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-518358-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha1 header.s=default header.b=Ijih7g/O; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4866lx18jBz9sNx for ; Tue, 28 Jan 2020 11:23:24 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type; q=dns; s=default; b=otlUN+dv9p3NWowjvt/gq7M/pctwisNZXO/u9d/domE7jbohzr 3dbj0CtXkrX7LuKX/jKGDfu8non3j6sT6ONkVuYPr1URfSuszuI3RZVVm54ojr4H Wb351P03ZtU+KCIgR47MTfUFkmBuq9COdxwywihCL7IIq4J1h1woLA3fo= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type; s= default; bh=2zjTHgkHYZfXVMaDsp4ZZ5reO9s=; b=Ijih7g/OSa53avmOJ/Et 2SAv/dqmKmvzMotSoJZO/nKbdrHEy8xGY9P38jp8d7L9m2mjwKdQodaUGGP9FALD 4dvEdk2HjoM/4NBj57YLFG3Ngpioi6LVY62s86OQzjt6tM7gHaNovrh8JtfcwD8E Ibg3f80xhOPMT77Wsc9iHBc= Received: (qmail 127907 invoked by alias); 28 Jan 2020 00:23:10 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 127837 invoked by uid 89); 28 Jan 2020 00:23:09 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-20.9 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_SHORT, SPF_PASS autolearn=ham version=3.3.1 spammy= X-HELO: mx2.suse.de Received: from mx2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 28 Jan 2020 00:23:04 +0000 Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 3F103B1D7; Tue, 28 Jan 2020 00:23:02 +0000 (UTC) From: Martin Jambor To: GCC Patches Cc: Richard Biener , Jan Hubicka , Jan Hubicka Subject: [PATCH 3/4] SRA: Also propagate accesses from LHS to RHS [PR92706] User-Agent: Notmuch/0.29.3 (https://notmuchmail.org) Emacs/26.3 (x86_64-suse-linux-gnu) Date: Tue, 28 Jan 2020 01:23:01 +0100 Message-ID: MIME-Version: 1.0 X-IsSubscribed: yes Hi, the previous patch unfortunately does not fix the first testcase in PR 92706 and since I am afraid it might be the important one, I also focused on that. The issue here is again total scalarization accesses clashing with those representing accesses in the IL - on another aggregate but here the sides are switched. Whereas in the previous case the total scalarization accesses prevented propagation along assignments, here we have the user accesses on the LHS, so even though we do not create anything there, we totally scalarize the RHS and again end up with assignments with different scalarizations leading to bad code. So we clearly need to propagate information about accesses from RHSs to LHSs too, which the patch below does. Because the intent is only preventing bad total scalarization - which the last patch now performs late enough - and do not care about grp_write flag and so forth, the propagation is a bit simpler and so I did not try to unify all of the code for both directions. More information and some discussion is in the thread from the initial submission, the code has not changed in any (substantial) way. See https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01184.html and https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00698.html. Bootstrapped and tested on x86_64-linux. Thanks, Martin 2019-12-11 Martin Jambor PR tree-optimization/92706 * tree-sra.c (struct access): Fields first_link, last_link, next_queued and grp_queued renamed to first_rhs_link, last_rhs_link, next_rhs_queued and grp_rhs_queued respectively, new fields first_lhs_link, last_lhs_link, next_lhs_queued and grp_lhs_queued. (struct assign_link): Field next renamed to next_rhs, new field next_lhs. Updated comment. (work_queue_head): Renamed to rhs_work_queue_head. (lhs_work_queue_head): New variable. (add_link_to_lhs): New function. (relink_to_new_repr): Also relink LHS lists. (add_access_to_work_queue): Renamed to add_access_to_rhs_work_queue. (add_access_to_lhs_work_queue): New function. (pop_access_from_work_queue): Renamed to pop_access_from_rhs_work_queue. (pop_access_from_lhs_work_queue): New function. (build_accesses_from_assign): Also add links to LHS lists and to LHS work_queue. (child_would_conflict_in_lacc): Renamed to child_would_conflict_in_acc. Adjusted parameter names. (create_artificial_child_access): New parameter set_grp_read, use it. (subtree_mark_written_and_enqueue): Renamed to subtree_mark_written_and_rhs_enqueue. (propagate_subaccesses_across_link): Renamed to propagate_subaccesses_from_rhs. (propagate_subaccesses_from_lhs): New function. (propagate_all_subaccesses): Also propagate subaccesses from LHSs to RHSs. testsuite/ * gcc.dg/tree-ssa/pr92706-1.c: New test. --- gcc/testsuite/gcc.dg/tree-ssa/pr92706-1.c | 17 ++ gcc/tree-sra.c | 306 ++++++++++++++++------ 2 files changed, 248 insertions(+), 75 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr92706-1.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr92706-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr92706-1.c new file mode 100644 index 00000000000..c36d103798e --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr92706-1.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-esra-details" } */ + +struct S { int i[4]; } __attribute__((aligned(128))); +typedef __int128_t my_int128 __attribute__((may_alias)); +__int128_t load (void *p) +{ + struct S v; + __builtin_memcpy (&v, p, sizeof (struct S)); + struct S u; + u = v; + struct S w; + w = u; + return *(my_int128 *)&w; +} + +/* { dg-final { scan-tree-dump-not "Created a replacement for u offset: \[^0\]" "esra" } } */ diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index 2b0849858de..ea8594db193 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -167,11 +167,15 @@ struct access struct access *next_sibling; /* Pointers to the first and last element in the linked list of assign - links. */ - struct assign_link *first_link, *last_link; + links for propagation from LHS to RHS. */ + struct assign_link *first_rhs_link, *last_rhs_link; - /* Pointer to the next access in the work queue. */ - struct access *next_queued; + /* Pointers to the first and last element in the linked list of assign + links for propagation from LHS to RHS. */ + struct assign_link *first_lhs_link, *last_lhs_link; + + /* Pointer to the next access in the work queues. */ + struct access *next_rhs_queued, *next_lhs_queued; /* Replacement variable for this access "region." Never to be accessed directly, always only by the means of get_access_replacement() and only @@ -184,8 +188,11 @@ struct access /* Is this particular access write access? */ unsigned write : 1; - /* Is this access currently in the work queue? */ - unsigned grp_queued : 1; + /* Is this access currently in the rhs work queue? */ + unsigned grp_rhs_queued : 1; + + /* Is this access currently in the lhs work queue? */ + unsigned grp_lhs_queued : 1; /* Does this group contain a write access? This flag is propagated down the access tree. */ @@ -262,12 +269,14 @@ typedef struct access *access_p; static object_allocator access_pool ("SRA accesses"); /* A structure linking lhs and rhs accesses from an aggregate assignment. They - are used to propagate subaccesses from rhs to lhs as long as they don't - conflict with what is already there. */ + are used to propagate subaccesses from rhs to lhs and vice versa as long as + they don't conflict with what is already there. In the RHS->LHS direction, + we also propagate grp_write flag to lazily mark that the access contains any + meaningful data. */ struct assign_link { struct access *lacc, *racc; - struct assign_link *next; + struct assign_link *next_rhs, *next_lhs; }; /* Alloc pool for allocating assign link structures. */ @@ -327,7 +336,7 @@ static struct obstack name_obstack; /* Head of a linked list of accesses that need to have its subaccesses propagated to their assignment counterparts. */ -static struct access *work_queue_head; +static struct access *rhs_work_queue_head, *lhs_work_queue_head; /* Dump contents of ACCESS to file F in a human friendly way. If GRP is true, representative fields are dumped, otherwise those which only describe the @@ -534,79 +543,155 @@ get_var_base_offset_size_access (tree base, HOST_WIDE_INT offset, } /* Add LINK to the linked list of assign links of RACC. */ + static void add_link_to_rhs (struct access *racc, struct assign_link *link) { gcc_assert (link->racc == racc); - if (!racc->first_link) + if (!racc->first_rhs_link) { - gcc_assert (!racc->last_link); - racc->first_link = link; + gcc_assert (!racc->last_rhs_link); + racc->first_rhs_link = link; } else - racc->last_link->next = link; + racc->last_rhs_link->next_rhs = link; - racc->last_link = link; - link->next = NULL; + racc->last_rhs_link = link; + link->next_rhs = NULL; } -/* Move all link structures in their linked list in OLD_RACC to the linked list - in NEW_RACC. */ +/* Add LINK to the linked list of lhs assign links of LACC. */ + static void -relink_to_new_repr (struct access *new_racc, struct access *old_racc) +add_link_to_lhs (struct access *lacc, struct assign_link *link) { - if (!old_racc->first_link) + gcc_assert (link->lacc == lacc); + + if (!lacc->first_lhs_link) { - gcc_assert (!old_racc->last_link); - return; + gcc_assert (!lacc->last_lhs_link); + lacc->first_lhs_link = link; } + else + lacc->last_lhs_link->next_lhs = link; + + lacc->last_lhs_link = link; + link->next_lhs = NULL; +} - if (new_racc->first_link) +/* Move all link structures in their linked list in OLD_ACC to the linked list + in NEW_ACC. */ +static void +relink_to_new_repr (struct access *new_acc, struct access *old_acc) +{ + if (old_acc->first_rhs_link) { - gcc_assert (!new_racc->last_link->next); - gcc_assert (!old_racc->last_link || !old_racc->last_link->next); - new_racc->last_link->next = old_racc->first_link; - new_racc->last_link = old_racc->last_link; + if (new_acc->first_rhs_link) + { + gcc_assert (!new_acc->last_rhs_link->next_rhs); + gcc_assert (!old_acc->last_rhs_link + || !old_acc->last_rhs_link->next_rhs); + + new_acc->last_rhs_link->next_rhs = old_acc->first_rhs_link; + new_acc->last_rhs_link = old_acc->last_rhs_link; + } + else + { + gcc_assert (!new_acc->last_rhs_link); + + new_acc->first_rhs_link = old_acc->first_rhs_link; + new_acc->last_rhs_link = old_acc->last_rhs_link; + } + old_acc->first_rhs_link = old_acc->last_rhs_link = NULL; } else + gcc_assert (!old_acc->last_rhs_link); + + if (old_acc->first_lhs_link) { - gcc_assert (!new_racc->last_link); - new_racc->first_link = old_racc->first_link; - new_racc->last_link = old_racc->last_link; + if (new_acc->first_lhs_link) + { + gcc_assert (!new_acc->last_lhs_link->next_lhs); + gcc_assert (!old_acc->last_lhs_link + || !old_acc->last_lhs_link->next_lhs); + + new_acc->last_lhs_link->next_lhs = old_acc->first_lhs_link; + new_acc->last_lhs_link = old_acc->last_lhs_link; + } + else + { + gcc_assert (!new_acc->last_lhs_link); + + new_acc->first_lhs_link = old_acc->first_lhs_link; + new_acc->last_lhs_link = old_acc->last_lhs_link; + } + old_acc->first_lhs_link = old_acc->last_lhs_link = NULL; } - old_racc->first_link = old_racc->last_link = NULL; + else + gcc_assert (!old_acc->last_lhs_link); + } -/* Add ACCESS to the work queue (which is actually a stack). */ +/* Add ACCESS to the work to queue for propagation of subaccesses from RHS to + LHS (which is actually a stack). */ static void -add_access_to_work_queue (struct access *access) +add_access_to_rhs_work_queue (struct access *access) { - if (access->first_link && !access->grp_queued) + if (access->first_rhs_link && !access->grp_rhs_queued) { - gcc_assert (!access->next_queued); - access->next_queued = work_queue_head; - access->grp_queued = 1; - work_queue_head = access; + gcc_assert (!access->next_rhs_queued); + access->next_rhs_queued = rhs_work_queue_head; + access->grp_rhs_queued = 1; + rhs_work_queue_head = access; } } -/* Pop an access from the work queue, and return it, assuming there is one. */ +/* Add ACCESS to the work to queue for propagation of subaccesses from LHS to + RHS (which is actually a stack). */ + +static void +add_access_to_lhs_work_queue (struct access *access) +{ + if (access->first_lhs_link && !access->grp_lhs_queued) + { + gcc_assert (!access->next_lhs_queued); + access->next_lhs_queued = lhs_work_queue_head; + access->grp_lhs_queued = 1; + lhs_work_queue_head = access; + } +} + +/* Pop an access from the work queue for propagating from RHS to LHS, and + return it, assuming there is one. */ static struct access * -pop_access_from_work_queue (void) +pop_access_from_rhs_work_queue (void) { - struct access *access = work_queue_head; + struct access *access = rhs_work_queue_head; - work_queue_head = access->next_queued; - access->next_queued = NULL; - access->grp_queued = 0; + rhs_work_queue_head = access->next_rhs_queued; + access->next_rhs_queued = NULL; + access->grp_rhs_queued = 0; return access; } +/* Pop an access from the work queue for propagating from LHS to RHS, and + return it, assuming there is one. */ + +static struct access * +pop_access_from_lhs_work_queue (void) +{ + struct access *access = lhs_work_queue_head; + + lhs_work_queue_head = access->next_lhs_queued; + access->next_lhs_queued = NULL; + access->grp_lhs_queued = 0; + return access; +} /* Allocate necessary structures. */ @@ -1203,7 +1288,9 @@ build_accesses_from_assign (gimple *stmt) link->lacc = lacc; link->racc = racc; add_link_to_rhs (racc, link); - add_access_to_work_queue (racc); + add_link_to_lhs (lacc, link); + add_access_to_rhs_work_queue (racc); + add_access_to_lhs_work_queue (lacc); /* Let's delay marking the areas as written until propagation of accesses across link, unless the nature of rhs tells us that its data comes @@ -2492,17 +2579,17 @@ analyze_access_trees (struct access *access) return ret; } -/* Return true iff a potential new child of LACC at offset OFFSET and with size +/* Return true iff a potential new child of ACC at offset OFFSET and with size SIZE would conflict with an already existing one. If exactly such a child - already exists in LACC, store a pointer to it in EXACT_MATCH. */ + already exists in ACC, store a pointer to it in EXACT_MATCH. */ static bool -child_would_conflict_in_lacc (struct access *lacc, HOST_WIDE_INT norm_offset, +child_would_conflict_in_acc (struct access *acc, HOST_WIDE_INT norm_offset, HOST_WIDE_INT size, struct access **exact_match) { struct access *child; - for (child = lacc->first_child; child; child = child->next_sibling) + for (child = acc->first_child; child; child = child->next_sibling) { if (child->offset == norm_offset && child->size == size) { @@ -2528,7 +2615,7 @@ child_would_conflict_in_lacc (struct access *lacc, HOST_WIDE_INT norm_offset, static struct access * create_artificial_child_access (struct access *parent, struct access *model, HOST_WIDE_INT new_offset, - bool set_grp_write) + bool set_grp_read, bool set_grp_write) { struct access **child; tree expr = parent->base; @@ -2551,8 +2638,8 @@ create_artificial_child_access (struct access *parent, struct access *model, access->size = model->size; access->type = model->type; access->parent = parent; + access->grp_read = set_grp_read; access->grp_write = set_grp_write; - access->grp_read = false; access->reverse = model->reverse; child = &parent->first_child; @@ -2571,16 +2658,16 @@ create_artificial_child_access (struct access *parent, struct access *model, and has assignment links leading from it, re-enqueue it. */ static void -subtree_mark_written_and_enqueue (struct access *access) +subtree_mark_written_and_rhs_enqueue (struct access *access) { if (access->grp_write) return; access->grp_write = true; - add_access_to_work_queue (access); + add_access_to_rhs_work_queue (access); struct access *child; for (child = access->first_child; child; child = child->next_sibling) - subtree_mark_written_and_enqueue (child); + subtree_mark_written_and_rhs_enqueue (child); } /* Propagate subaccesses and grp_write flags of RACC across an assignment link @@ -2590,7 +2677,7 @@ subtree_mark_written_and_enqueue (struct access *access) possible. */ static bool -propagate_subaccesses_across_link (struct access *lacc, struct access *racc) +propagate_subaccesses_from_rhs (struct access *lacc, struct access *racc) { struct access *rchild; HOST_WIDE_INT norm_delta = lacc->offset - racc->offset; @@ -2603,7 +2690,7 @@ propagate_subaccesses_across_link (struct access *lacc, struct access *racc) gcc_checking_assert (!comes_initialized_p (racc->base)); if (racc->grp_write) { - subtree_mark_written_and_enqueue (lacc); + subtree_mark_written_and_rhs_enqueue (lacc); ret = true; } } @@ -2615,7 +2702,7 @@ propagate_subaccesses_across_link (struct access *lacc, struct access *racc) if (!lacc->grp_write) { ret = true; - subtree_mark_written_and_enqueue (lacc); + subtree_mark_written_and_rhs_enqueue (lacc); } return ret; } @@ -2625,7 +2712,7 @@ propagate_subaccesses_across_link (struct access *lacc, struct access *racc) if (!lacc->grp_write) { ret = true; - subtree_mark_written_and_enqueue (lacc); + subtree_mark_written_and_rhs_enqueue (lacc); } if (!lacc->first_child && !racc->first_child) { @@ -2655,7 +2742,7 @@ propagate_subaccesses_across_link (struct access *lacc, struct access *racc) struct access *new_acc = NULL; HOST_WIDE_INT norm_offset = rchild->offset + norm_delta; - if (child_would_conflict_in_lacc (lacc, norm_offset, rchild->size, + if (child_would_conflict_in_acc (lacc, norm_offset, rchild->size, &new_acc)) { if (new_acc) @@ -2663,17 +2750,17 @@ propagate_subaccesses_across_link (struct access *lacc, struct access *racc) if (!new_acc->grp_write && rchild->grp_write) { gcc_assert (!lacc->grp_write); - subtree_mark_written_and_enqueue (new_acc); + subtree_mark_written_and_rhs_enqueue (new_acc); ret = true; } rchild->grp_hint = 1; new_acc->grp_hint |= new_acc->grp_read; if (rchild->first_child - && propagate_subaccesses_across_link (new_acc, rchild)) + && propagate_subaccesses_from_rhs (new_acc, rchild)) { ret = 1; - add_access_to_work_queue (new_acc); + add_access_to_rhs_work_queue (new_acc); } } else @@ -2681,7 +2768,7 @@ propagate_subaccesses_across_link (struct access *lacc, struct access *racc) if (!lacc->grp_write) { ret = true; - subtree_mark_written_and_enqueue (lacc); + subtree_mark_written_and_rhs_enqueue (lacc); } } continue; @@ -2692,41 +2779,85 @@ propagate_subaccesses_across_link (struct access *lacc, struct access *racc) if (rchild->grp_write && !lacc->grp_write) { ret = true; - subtree_mark_written_and_enqueue (lacc); + subtree_mark_written_and_rhs_enqueue (lacc); } continue; } rchild->grp_hint = 1; new_acc = create_artificial_child_access (lacc, rchild, norm_offset, - lacc->grp_write - || rchild->grp_write); + false, (lacc->grp_write + || rchild->grp_write)); gcc_checking_assert (new_acc); if (racc->first_child) - propagate_subaccesses_across_link (new_acc, rchild); + propagate_subaccesses_from_rhs (new_acc, rchild); - add_access_to_work_queue (lacc); + add_access_to_rhs_work_queue (lacc); ret = true; } return ret; } +/* Propagate subaccesses of LACC across an assignment link to RACC if they + should inhibit total scalarization of the corresponding area. No flags are + being propagated in the process. Return true if anything changed. */ + +static bool +propagate_subaccesses_from_lhs (struct access *lacc, struct access *racc) +{ + if (is_gimple_reg_type (racc->type) + || lacc->grp_unscalarizable_region + || racc->grp_unscalarizable_region) + return false; + + /* TODO: Do we want set some new racc flag to stop potential total + scalarization if lacc is a scalar access (and none fo the two have + children)? */ + + bool ret = false; + HOST_WIDE_INT norm_delta = racc->offset - lacc->offset; + for (struct access *lchild = lacc->first_child; + lchild; + lchild = lchild->next_sibling) + { + struct access *matching_acc = NULL; + HOST_WIDE_INT norm_offset = lchild->offset + norm_delta; + + if (lchild->grp_unscalarizable_region + || child_would_conflict_in_acc (racc, norm_offset, lchild->size, + &matching_acc)) + { + if (matching_acc + && propagate_subaccesses_from_lhs (lchild, matching_acc)) + add_access_to_lhs_work_queue (matching_acc); + continue; + } + + struct access *new_acc + = create_artificial_child_access (racc, lchild, norm_offset, + true, false); + propagate_subaccesses_from_lhs (lchild, new_acc); + ret = true; + } + return ret; +} + /* Propagate all subaccesses across assignment links. */ static void propagate_all_subaccesses (void) { - while (work_queue_head) + while (rhs_work_queue_head) { - struct access *racc = pop_access_from_work_queue (); + struct access *racc = pop_access_from_rhs_work_queue (); struct assign_link *link; if (racc->group_representative) racc= racc->group_representative; - gcc_assert (racc->first_link); + gcc_assert (racc->first_rhs_link); - for (link = racc->first_link; link; link = link->next) + for (link = racc->first_rhs_link; link; link = link->next_rhs) { struct access *lacc = link->lacc; @@ -2739,22 +2870,47 @@ propagate_all_subaccesses (void) { if (!lacc->grp_write) { - subtree_mark_written_and_enqueue (lacc); + subtree_mark_written_and_rhs_enqueue (lacc); reque_parents = true; } } - else if (propagate_subaccesses_across_link (lacc, racc)) + else if (propagate_subaccesses_from_rhs (lacc, racc)) reque_parents = true; if (reque_parents) do { - add_access_to_work_queue (lacc); + add_access_to_rhs_work_queue (lacc); lacc = lacc->parent; } while (lacc); } } + + while (lhs_work_queue_head) + { + struct access *lacc = pop_access_from_lhs_work_queue (); + struct assign_link *link; + + if (lacc->group_representative) + lacc = lacc->group_representative; + gcc_assert (lacc->first_lhs_link); + + if (!bitmap_bit_p (candidate_bitmap, DECL_UID (lacc->base))) + continue; + + for (link = lacc->first_lhs_link; link; link = link->next_lhs) + { + struct access *racc = link->racc; + + if (racc->group_representative) + racc = racc->group_representative; + if (!bitmap_bit_p (candidate_bitmap, DECL_UID (racc->base))) + continue; + if (propagate_subaccesses_from_lhs (lacc, racc)) + add_access_to_lhs_work_queue (racc); + } + } } /* Return true if the forest beginning with ROOT does not contain From patchwork Tue Jan 28 00:23:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin Jambor X-Patchwork-Id: 1230096 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-518359-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha1 header.s=default header.b=tVPuy7rn; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4866mC1qVyz9sPn for ; Tue, 28 Jan 2020 11:23:39 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type; q=dns; s=default; b=nGvcCACOOqSglbmS66Qe+6m/sMmsvx64mzN1gcr/mq2k4gGtl0 TWjEcGUY3fSu7vefqqZfnTKV6yXV0v8i5bQbUwQP4pYGNKTN2V+l5qeSAyP8nmlw XrInzmyUEBr83rApDVvgEF4S8no4gusMXR+7OFVu70wQmRXBwl2Y/VjTw= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type; s= default; bh=i1uhYlkK6BJ49PKAu4O7TfPg8zY=; b=tVPuy7rnGDr0AVYCmz95 Sj6QZCvyiU39ba3raFSiwU4icjPbi7KqWWVYhJ7zGf6b4IpFPAMMqOt34hPkDFSP D7iVf61PAU1yw+pD+LNNtW9dgLtfLi2HdRRF5Uzgjg2m93KIAE6kp5EmfUXRCC7i rZBRe0dN11KYNpqS04eo5is= Received: (qmail 129441 invoked by alias); 28 Jan 2020 00:23:25 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 129332 invoked by uid 89); 28 Jan 2020 00:23:24 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-20.9 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_SHORT, SPF_PASS autolearn=ham version=3.3.1 spammy= X-HELO: mx2.suse.de Received: from mx2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 28 Jan 2020 00:23:19 +0000 Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id CA52DAC35; Tue, 28 Jan 2020 00:23:16 +0000 (UTC) From: Martin Jambor To: GCC Patches Cc: Richard Biener , Jan Hubicka , Jan Hubicka Subject: [PATCH 4/4] SRA: Do not ignore padding when totally scalarizing [PR92486] User-Agent: Notmuch/0.29.3 (https://notmuchmail.org) Emacs/26.3 (x86_64-suse-linux-gnu) Date: Tue, 28 Jan 2020 01:23:16 +0100 Message-ID: MIME-Version: 1.0 X-IsSubscribed: yes Hi, PR 92486 shows that DSE, when seeing a "normal" gimple aggregate assignment coming from a C struct assignment and one a representing a folded memcpy, can kill the latter and keep in place only the former, which does not copy padding - at least when SRA decides to totally scalarize a least one of the aggregates (when not doing total scalarization, SRA cares about padding) SRA would not totally scalarize an aggregate if it saw that it takes part in a gimple assignment which is a folded memcpy (see how type_changing_p is set in contains_vce_or_bfcref_p) but it doesn't because of the DSE decisions. I was asked to modify SRA to take padding into account - and to copy it around - when totally scalarizing, which is what the patch below does. I am not very happy about this, I am afraid it will lead to performance regressions, but this has all been discussed (see https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01185.html and https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00218.html). I tried to alleviate the problem by not only inserting accesses for padding but also by enlarging existing accesses whenever possible to extend over padding - the extended access would get copied when in the original IL an aggregate copy is replaced with SRA copies and a BIT_FIELD_REF would be generated to replace a scalar access to a part of the aggregate in the original IL. I have made it work in the sense that the patch passed bootstrap and testing (see the git branch refs/users/jamborm/heads/sra-later_total-bfr-20200127 or look at https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=shortlog;h=refs/users/jamborm/heads/sra-later_total-bfr-20200127 if you are interested), but this approach meant that each such extended replacement which was written to (so all of them) could potentially become only partially assigned to and so had to be marked as addressable and could not become gimple register, meaning that total scalarizatio would be creating addressable variables. Detecting such cases is not easy, it would mean introducing yet another type of write flag (written to exactly this access) and propagate that flag across assignment sub-accesses. So I decided that was not the way to go and instead only extended integer accesses, and that is what the atcg below does. Like in the previous attempt, whatever padding could not be covered by extending an access would be covered by extra artificial accesses. As you can see, it adds a little complexity to various places of the pass which are already not trivial, but hopefully it is manageable. Bootstrapped and tested on x86_64-linux, I'll curious about the feedback. Thanks, Martin 2020-01-27 Martin Jambor PR tree-optimization/92486 * tree-sra.c: Include langhooks.h (struct access): New fields reg_size and reg_acc_type. (dump_access): Print new fields. (acc_size): New function. (find_access_in_subtree): Use it, new parameter reg. (get_var_base_offset_size_access): Pass true to find_access_in_subtree. (create_access_1): Initialize reg_size. (create_artificial_child_access): Likewise. (create_total_scalarization_access): Likewise. (build_ref_for_model): Do not use model expr if reg_acc_size is non-NULL. (get_reg_access_replacement): New function. (verify_sra_access_forest): Adjust verification for presence of extended accesses covering padding. (analyze_access_subtree): Undo extension over padding if total scalarization failed, set grp_partial_lhs if we are going to introduce a partial store to the new replacement, do not ignore holes when totally scalarizing. (sra_type_for_size): New function. (total_scalarization_fill_padding): Likewise. (total_should_skip_creating_access): Use it. (totally_scalarize_subtree): Likewise. (sra_modify_expr): Use get_reg_access_replacement instead of get_access_replacement, adjust for reg_acc_type. (sra_modify_assign): Likewise. (load_assign_lhs_subreplacements): Pass false to find_access_in_subtree. testsuite/ * gcc.dg/tree-ssa/pr92486.c: New test. --- gcc/ChangeLog | 32 +++ gcc/testsuite/ChangeLog | 5 + gcc/testsuite/gcc.dg/tree-ssa/pr92486.c | 38 +++ gcc/tree-sra.c | 368 +++++++++++++++++++++--- 4 files changed, 396 insertions(+), 47 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr92486.c diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 3541c6638f9..34c60e6f2a3 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,35 @@ +2020-01-26 Martin Jambor + + PR tree-optimization/92486 + * tree-sra.c: Include langhooks.h + (struct access): New fields reg_size and reg_acc_type. + (dump_access): Print new fields. + (acc_size): New function. + (find_access_in_subtree): Use it, new parameter reg. + (get_var_base_offset_size_access): Pass true to + find_access_in_subtree. + (create_access_1): Initialize reg_size. + (create_artificial_child_access): Likewise. + (create_total_scalarization_access): Likewise. + (build_ref_for_model): Do not use model expr if reg_acc_size is + non-NULL. + (get_reg_access_replacement): New function. + (verify_sra_access_forest): Adjust verification for presence of + extended accesses covering padding. + (analyze_access_subtree): Undo extension over padding if total + scalarization failed, set grp_partial_lhs if we are going to introduce + a partial store to the new replacement, do not ignore holes when + totally scalarizing. + (sra_type_for_size): New function. + (total_scalarization_fill_padding): Likewise. + (total_should_skip_creating_access): Use it. + (totally_scalarize_subtree): Likewise. + (sra_modify_expr): Use get_reg_access_replacement instead of + get_access_replacement, adjust for reg_acc_type. + (sra_modify_assign): Likewise. + (load_assign_lhs_subreplacements): Pass false to + find_access_in_subtree. + 2020-01-27 Martin Liska PR driver/91220 diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 22a37dd1ab2..7ccb5098224 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,8 @@ +2020-01-24 Martin Jambor + + PR tree-optimization/92486 + * gcc.dg/tree-ssa/pr92486.c: New test. + 2020-01-27 Martin Liska PR target/93274 diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr92486.c b/gcc/testsuite/gcc.dg/tree-ssa/pr92486.c new file mode 100644 index 00000000000..77e84241eff --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr92486.c @@ -0,0 +1,38 @@ +/* { dg-do run } */ +/* { dg-options "-O1" } */ + +struct s { + char c; + int i; +}; + +__attribute__((noipa)) +void f(struct s *p, struct s *q) +{ + struct s w; + + __builtin_memset(&w, 0, sizeof(struct s)); + w = *q; + + __builtin_memset(p, 0, sizeof(struct s)); + *p = w; +} + +int main() +{ + struct s x; + __builtin_memset(&x, 1, sizeof(struct s)); + + struct s y; + __builtin_memset(&y, 2, sizeof(struct s)); + + f(&y, &x); + + for (unsigned char *p = (unsigned char *)&y; + p < (unsigned char *)&y + sizeof(struct s); + p++) + if (*p != 1) + __builtin_abort (); + + return 0; +} diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index ea8594db193..bda342ffdf9 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -99,6 +99,7 @@ along with GCC; see the file COPYING3. If not see #include "dbgcnt.h" #include "builtins.h" #include "tree-sra.h" +#include "langhooks.h" /* Enumeration of all aggregate reductions we can do. */ @@ -130,11 +131,16 @@ struct assign_link; struct access { - /* Values returned by `get_ref_base_and_extent' for each component reference - If EXPR isn't a component reference just set `BASE = EXPR', `OFFSET = 0', - `SIZE = TREE_SIZE (TREE_TYPE (expr))'. */ + /* Offset, size and base are values returned by `get_ref_base_and_extent' for + each component reference If EXPR isn't a component reference just set + `BASE = EXPR', `OFFSET = 0', `SIZE = TREE_SIZE (TREE_TYPE (expr))'. */ HOST_WIDE_INT offset; HOST_WIDE_INT size; + + /* If reg_acc_type is non-NULL, this is the size of the actual egister type + the access represented before it was extended to cover padding. Otherwise + must be equal to size. */ + HOST_WIDE_INT reg_size; tree base; /* Expression. It is context dependent so do not use it to create new @@ -144,6 +150,12 @@ struct access /* Type. */ tree type; + /* If non-NULL, this is the type that should be actually extracted or + inserted from/to the replacement decl when replacing accesses to the + individual field itself (as opposed to accesses created as part of + replacing aggregate copies which should use TYPE). */ + tree reg_acc_type; + /* The statement this access belongs to. */ gimple *stmt; @@ -391,10 +403,17 @@ dump_access (FILE *f, struct access *access, bool grp) print_generic_expr (f, access->base); fprintf (f, "', offset = " HOST_WIDE_INT_PRINT_DEC, access->offset); fprintf (f, ", size = " HOST_WIDE_INT_PRINT_DEC, access->size); + if (access->reg_size != access->size) + fprintf (f, ", reg_size = " HOST_WIDE_INT_PRINT_DEC, access->reg_size); fprintf (f, ", expr = "); print_generic_expr (f, access->expr); fprintf (f, ", type = "); print_generic_expr (f, access->type); + if (access->reg_acc_type) + { + fprintf (f, ", reg_acc_type = "); + print_generic_expr (f, access->type); + } fprintf (f, ", reverse = %d", access->reverse); if (grp) fprintf (f, ", grp_read = %d, grp_write = %d, grp_assignment_read = %d, " @@ -481,18 +500,27 @@ get_base_access_vector (tree base) return base_access_vec->get (base); } +/* Return ACCESS's size or reg_size depending on REG. */ + +static HOST_WIDE_INT +acc_size (struct access *access, bool reg) +{ + return reg ? access->reg_size : access->size; +} + /* Find an access with required OFFSET and SIZE in a subtree of accesses rooted in ACCESS. Return NULL if it cannot be found. */ static struct access * find_access_in_subtree (struct access *access, HOST_WIDE_INT offset, - HOST_WIDE_INT size) + HOST_WIDE_INT size, bool reg) { - while (access && (access->offset != offset || access->size != size)) + while (access && (access->offset != offset + || acc_size (access, reg) != size)) { struct access *child = access->first_child; - while (child && (child->offset + child->size <= offset)) + while (child && (child->offset + acc_size (child, reg) <= offset)) child = child->next_sibling; access = child; } @@ -503,7 +531,7 @@ find_access_in_subtree (struct access *access, HOST_WIDE_INT offset, if (access) while (access->first_child && access->first_child->offset == offset - && access->first_child->size == size) + && acc_size (access->first_child, reg) == size) access = access->first_child; return access; @@ -539,7 +567,7 @@ get_var_base_offset_size_access (tree base, HOST_WIDE_INT offset, if (!access) return NULL; - return find_access_in_subtree (access, offset, size); + return find_access_in_subtree (access, offset, size, true); } /* Add LINK to the linked list of assign links of RACC. */ @@ -868,6 +896,7 @@ create_access_1 (tree base, HOST_WIDE_INT offset, HOST_WIDE_INT size) access->base = base; access->offset = offset; access->size = size; + access->reg_size = size; base_access_vec->get_or_insert (base).safe_push (access); @@ -1667,6 +1696,7 @@ build_ref_for_model (location_t loc, tree base, HOST_WIDE_INT offset, { tree res; if (model->grp_same_access_path + && !model->reg_acc_type && !TREE_THIS_VOLATILE (base) && (TYPE_ADDR_SPACE (TREE_TYPE (base)) == TYPE_ADDR_SPACE (TREE_TYPE (model->expr))) @@ -2256,6 +2286,30 @@ get_access_replacement (struct access *access) return access->replacement_decl; } +/* Like above except in cases when ACCESS has non-NULL reg_acc_type, in which + case also construct BIT_FIELD_REF into the replacement of the corresponding + type (with location LOC). */ + +static tree +get_reg_access_replacement (location_t loc, struct access *access, bool write, + gassign **conversion) +{ + tree repl = get_access_replacement (access); + if (!access->reg_acc_type) + { + *conversion = NULL; + return repl; + } + + tree tmp = make_ssa_name (access->reg_acc_type); + if (write) + *conversion = gimple_build_assign (repl, NOP_EXPR, tmp); + else + *conversion = gimple_build_assign (tmp, NOP_EXPR, repl); + gimple_set_location (*conversion, loc); + return tmp; +} + /* Build a subtree of accesses rooted in *ACCESS, and move the pointer in the linked list along the way. Stop when *ACCESS is NULL or the access pointed @@ -2339,7 +2393,12 @@ verify_sra_access_forest (struct access *root) gcc_assert (offset == access->offset); gcc_assert (access->grp_unscalarizable_region || size == max_size); - gcc_assert (max_size == access->size); + if (access->reg_acc_type) + gcc_assert (max_size == access->reg_size + && max_size < access->size); + else + gcc_assert (max_size == access->size + && max_size == access->reg_size); gcc_assert (reverse == access->reverse); if (access->first_child) @@ -2405,6 +2464,39 @@ expr_with_var_bounded_array_refs_p (tree expr) return false; } +static void +upgrade_integral_size_to_prec (struct access *access) +{ + /* Always create access replacements that cover the whole access. + For integral types this means the precision has to match. + Avoid assumptions based on the integral type kind, too. */ + if (INTEGRAL_TYPE_P (access->type) + && (TREE_CODE (access->type) != INTEGER_TYPE + || TYPE_PRECISION (access->type) != access->size) + /* But leave bitfield accesses alone. */ + && (TREE_CODE (access->expr) != COMPONENT_REF + || !DECL_BIT_FIELD (TREE_OPERAND (access->expr, 1)))) + { + tree rt = access->type; + gcc_assert ((access->offset % BITS_PER_UNIT) == 0 + && (access->size % BITS_PER_UNIT) == 0); + access->type = build_nonstandard_integer_type (access->size, + TYPE_UNSIGNED (rt)); + access->expr = build_ref_for_offset (UNKNOWN_LOCATION, access->base, + access->offset, access->reverse, + access->type, NULL, false); + + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Changing the type of a replacement for "); + print_generic_expr (dump_file, access->base); + fprintf (dump_file, " offset: %u, size: %u ", + (unsigned) access->offset, (unsigned) access->size); + fprintf (dump_file, " to an integer.\n"); + } + } +} + /* Analyze the subtree of accesses rooted in ROOT, scheduling replacements when both seeming beneficial and when ALLOW_REPLACEMENTS allows it. If TOTALLY is set, we are totally scalarizing the aggregate. Also set all sorts of @@ -2488,6 +2580,15 @@ analyze_access_subtree (struct access *root, struct access *parent, hole = true; } + if (!totally && root->reg_acc_type) + { + /* If total scalarization did not eventually succeed, let's undo any + futile attempts to cover padding. */ + root->type = root->reg_acc_type; + root->size = root->reg_size; + root->reg_acc_type = NULL_TREE; + } + if (allow_replacements && scalar && !root->first_child && (totally || !root->grp_total_scalarization) && (totally @@ -2495,34 +2596,7 @@ analyze_access_subtree (struct access *root, struct access *parent, || ((root->grp_scalar_read || root->grp_assignment_read) && (root->grp_scalar_write || root->grp_assignment_write)))) { - /* Always create access replacements that cover the whole access. - For integral types this means the precision has to match. - Avoid assumptions based on the integral type kind, too. */ - if (INTEGRAL_TYPE_P (root->type) - && (TREE_CODE (root->type) != INTEGER_TYPE - || TYPE_PRECISION (root->type) != root->size) - /* But leave bitfield accesses alone. */ - && (TREE_CODE (root->expr) != COMPONENT_REF - || !DECL_BIT_FIELD (TREE_OPERAND (root->expr, 1)))) - { - tree rt = root->type; - gcc_assert ((root->offset % BITS_PER_UNIT) == 0 - && (root->size % BITS_PER_UNIT) == 0); - root->type = build_nonstandard_integer_type (root->size, - TYPE_UNSIGNED (rt)); - root->expr = build_ref_for_offset (UNKNOWN_LOCATION, root->base, - root->offset, root->reverse, - root->type, NULL, false); - - if (dump_file && (dump_flags & TDF_DETAILS)) - { - fprintf (dump_file, "Changing the type of a replacement for "); - print_generic_expr (dump_file, root->base); - fprintf (dump_file, " offset: %u, size: %u ", - (unsigned) root->offset, (unsigned) root->size); - fprintf (dump_file, " to an integer.\n"); - } - } + upgrade_integral_size_to_prec (root); root->grp_to_be_replaced = 1; root->replacement_decl = create_access_replacement (root); @@ -2554,7 +2628,7 @@ analyze_access_subtree (struct access *root, struct access *parent, root->grp_total_scalarization = 0; } - if (!hole || totally) + if (!hole) root->grp_covered = 1; else if (root->grp_write || comes_initialized_p (root->base)) root->grp_unscalarized_data = 1; /* not covered and written to */ @@ -2636,6 +2710,8 @@ create_artificial_child_access (struct access *parent, struct access *model, access->expr = expr; access->offset = new_offset; access->size = model->size; + gcc_assert (model->size == model->reg_size); + access->reg_size = model->reg_size; access->type = model->type; access->parent = parent; access->grp_read = set_grp_read; @@ -2966,6 +3042,7 @@ create_total_scalarization_access (struct access *parent, HOST_WIDE_INT pos, access->base = parent->base; access->offset = pos; access->size = size; + access->reg_size = size; access->expr = expr; access->type = type; access->parent = parent; @@ -3015,6 +3092,138 @@ create_total_access_and_reshape (struct access *parent, HOST_WIDE_INT pos, return new_acc; } +/* Given SIZE return the integer type to represent that many bits or NULL if + there is no suitable one. */ + +static tree +sra_type_for_size (HOST_WIDE_INT size, int unsignedp = 1) +{ + tree type = lang_hooks.types.type_for_size (size, unsignedp); + scalar_int_mode mode; + if (type + && is_a (TYPE_MODE (type), &mode) + && GET_MODE_BITSIZE (mode) == size) + return type; + else + return NULL_TREE; +} + +/* Create a series of accesses to cover padding from FROM to TO anch chain them + between LAST_PTR and NEXT_CHILD. Return the pointer to the last created + one. */ + +static struct access * +fill_padding_with_accesses (struct access *parent, HOST_WIDE_INT from, + HOST_WIDE_INT to, struct access **last_ptr, + struct access *next_child) +{ + struct access *access = NULL; + + gcc_assert (from < to); + do + { + HOST_WIDE_INT diff = to - from; + gcc_assert (diff >= BITS_PER_UNIT); + HOST_WIDE_INT stsz = 1 << floor_log2 (diff); + tree type; + + while (true) + { + type = sra_type_for_size (stsz); + if (type) + break; + stsz /= 2; + gcc_checking_assert (stsz >= BITS_PER_UNIT); + } + + do { + tree expr = build_ref_for_offset (UNKNOWN_LOCATION, parent->base, + from, parent->reverse, type, NULL, + false); + access = create_total_scalarization_access (parent, from, stsz, type, + expr, last_ptr, next_child); + access->grp_no_warning = 1; + from += stsz; + } + while (to - from >= stsz); + gcc_assert (from <= to); + } + while (from < to); + return access; +} + +/* Check whether any padding between FROM and TO must be filled during + total scalarization and if so, extend *LAST_SIBLING, create additional + artificial accesses under PARENT or both to fill it. Set *LAST_SIBLING to + the last created access and set its next_sibling to NEXT_CHILD. */ + +static void +total_scalarization_fill_padding (struct access *parent, + struct access **last_sibling, + HOST_WIDE_INT from, HOST_WIDE_INT to, + struct access *next_child) +{ + if (constant_decl_p (parent->base)) + return; + + gcc_assert (from <= to); + if (from == to) + return; + + if (struct access *ls = *last_sibling) + { + /* First, perform any upgrades to full integers we would have done + anyway. */ + upgrade_integral_size_to_prec (ls); + + HOST_WIDE_INT lastsize = ls->size; + if (INTEGRAL_TYPE_P (ls->type) + && pow2_or_zerop (lastsize)) + { + /* The last field was a reaonably sized integer, let's try to + enlarge as much as we can so that it is still naturally + aligned. */ + HOST_WIDE_INT lastpos = ls->offset; + HOST_WIDE_INT blocksize = lastsize; + tree type = NULL_TREE; + int unsignedp = TYPE_UNSIGNED (ls->type); + + while (true) + { + HOST_WIDE_INT b2 = blocksize * 2; + if (lastpos + b2 > to + || (lastpos % b2) != 0) + break; + tree t2 = sra_type_for_size (b2, unsignedp); + if (!t2) + break; + + blocksize = b2; + type = t2; + } + + if (blocksize != lastsize) + { + ls->reg_acc_type = ls->type; + ls->type = type; + ls->size = blocksize; + from = lastpos + blocksize; + } + + if (from == to) + return; + } + } + + struct access **last_ptr; + if (*last_sibling) + last_ptr = &(*last_sibling)->next_sibling; + else + last_ptr = &parent->first_child; + *last_sibling = fill_padding_with_accesses (parent, from, to, last_ptr, + next_child); +} + static bool totally_scalarize_subtree (struct access *root); /* Return true if INNER is either the same type as OUTER or if it is the type @@ -3073,10 +3282,17 @@ total_should_skip_creating_access (struct access *parent, HOST_WIDE_INT size) { struct access *next_child; + HOST_WIDE_INT covered; if (!*last_seen_sibling) - next_child = parent->first_child; + { + next_child = parent->first_child; + covered = parent->offset; + } else - next_child = (*last_seen_sibling)->next_sibling; + { + next_child = (*last_seen_sibling)->next_sibling; + covered = (*last_seen_sibling)->offset + (*last_seen_sibling)->size; + } /* First, traverse the chain of siblings until it points to an access with offset at least equal to POS. Check all skipped accesses whether they @@ -3085,10 +3301,16 @@ total_should_skip_creating_access (struct access *parent, { if (next_child->offset + next_child->size > pos) return TOTAL_FLD_FAILED; + total_scalarization_fill_padding (parent, last_seen_sibling, covered, + next_child->offset, next_child); + covered = next_child->offset + next_child->size; *last_seen_sibling = next_child; next_child = next_child->next_sibling; } + total_scalarization_fill_padding (parent, last_seen_sibling, covered, pos, + next_child); + /* Now check whether next_child has exactly the right POS and SIZE and if so, whether it can represent what we need and can be totally scalarized itself. */ @@ -3273,6 +3495,34 @@ totally_scalarize_subtree (struct access *root) } out: + /* Even though there is nothing in the type, there can still be accesses here + in the IL. */ + + HOST_WIDE_INT covered; + struct access *next_child; + + if (last_seen_sibling) + { + covered = last_seen_sibling->offset + last_seen_sibling->size; + next_child = last_seen_sibling->next_sibling; + } + else + { + covered = root->offset; + next_child = root->first_child; + } + + while (next_child) + { + total_scalarization_fill_padding (root, &last_seen_sibling, covered, + next_child->offset, next_child); + covered = next_child->offset + next_child->size; + last_seen_sibling = next_child; + next_child = next_child->next_sibling; + } + + total_scalarization_fill_padding (root, &last_seen_sibling, covered, + root->offset + root->size, NULL); return true; } @@ -3655,7 +3905,6 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write) if (access->grp_to_be_replaced) { - tree repl = get_access_replacement (access); /* If we replace a non-register typed access simply use the original access expression to extract the scalar component afterwards. This happens if scalarizing a function return value or parameter @@ -3666,10 +3915,14 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write) be accessed as a different type too, potentially creating a need for type conversion (see PR42196) and when scalarized unions are involved in assembler statements (see PR42398). */ - if (!useless_type_conversion_p (type, access->type)) + tree actual_type + = access->reg_acc_type ? access->reg_acc_type : access->type; + + if (!useless_type_conversion_p (type, actual_type)) { tree ref; + tree repl = get_access_replacement (access); ref = build_ref_for_model (loc, orig_expr, 0, access, gsi, false); if (write) @@ -3696,7 +3949,21 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write) } } else - *expr = repl; + { + gassign *conversion; + tree repl = get_reg_access_replacement (loc, access, write, + &conversion); + *expr = repl; + + if (conversion) + { + if (write) + gsi_insert_after (gsi, conversion, GSI_NEW_STMT); + else + gsi_insert_before (gsi, conversion, GSI_SAME_STMT); + } + } + sra_stats.exprs++; } else if (write && access->grp_to_be_debug_replaced) @@ -3806,7 +4073,8 @@ load_assign_lhs_subreplacements (struct access *lacc, gassign *stmt; tree rhs; - racc = find_access_in_subtree (sad->top_racc, offset, lacc->size); + racc = find_access_in_subtree (sad->top_racc, offset, lacc->size, + false); if (racc && racc->grp_to_be_replaced) { rhs = get_access_replacement (racc); @@ -3857,7 +4125,7 @@ load_assign_lhs_subreplacements (struct access *lacc, tree drhs; struct access *racc = find_access_in_subtree (sad->top_racc, offset, - lacc->size); + lacc->size, false); if (racc && racc->grp_to_be_replaced) { @@ -4010,8 +4278,11 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi) loc = gimple_location (stmt); if (lacc && lacc->grp_to_be_replaced) { - lhs = get_access_replacement (lacc); + gassign *lhs_conversion = NULL; + lhs = get_reg_access_replacement (loc, lacc, true, &lhs_conversion); gimple_assign_set_lhs (stmt, lhs); + if (lhs_conversion) + gsi_insert_after (gsi, lhs_conversion, GSI_NEW_STMT); modify_this_stmt = true; if (lacc->grp_partial_lhs) force_gimple_rhs = true; @@ -4020,7 +4291,10 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi) if (racc && racc->grp_to_be_replaced) { - rhs = get_access_replacement (racc); + gassign *rhs_conversion = NULL; + rhs = get_reg_access_replacement (loc, racc, false, &rhs_conversion); + if (rhs_conversion) + gsi_insert_before (&orig_gsi, rhs_conversion, GSI_SAME_STMT); modify_this_stmt = true; if (racc->grp_partial_lhs) force_gimple_rhs = true;