From patchwork Mon Nov 11 18:51:58 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1193072 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-512995-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="br6h1DOO"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47Bg3K71tWz9sV5 for ; Tue, 12 Nov 2019 05:52:13 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:references:date:in-reply-to:message-id:mime-version :content-type; q=dns; s=default; b=liZLtUXI5b4AejmVH4EsfxdBMjxKO NUA5kf8tbq1a/ZPzcyay7jqjqKjr7WbWHE8umWN6poQs8RnSF3Wz+OME+sicwlMs qPnQUpddPmaU6nx5PcmrIi6OI/oNe00DEnDFvHMJpGlJOi+Ib7McDJldQ0nSUm5B PZluLBrqt5Jico= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:references:date:in-reply-to:message-id:mime-version :content-type; s=default; bh=dx9hu0Jf04zhM4w7bWroJ6f+ft4=; b=br6 h1DOOyuI0M5zMqVOdykEUy1gIPUh1tugwiZL8u11Oul2dGIRLzQy1LN6eqhRujfJ ZhzAV99E+44qCMFonCXelnnwL63oZdenUwLRt6bjZ4RTzk76g128dsY4css1Yy8b WAOR6VWMesEOt7Xi0k5Jo4we+BwIBhfquKqLEjCw= Received: (qmail 18297 invoked by alias); 11 Nov 2019 18:52:06 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 18288 invoked by uid 89); 11 Nov 2019 18:52:05 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-10.8 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, SPF_PASS, UNSUBSCRIBE_BODY autolearn=ham version=3.3.1 spammy=tstr X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.110.172) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 11 Nov 2019 18:52:02 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4B56A1FB for ; Mon, 11 Nov 2019 10:52:00 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id CAD673F52E for ; Mon, 11 Nov 2019 10:51:59 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [8/8] Optimise WAR and WAW alias checks References: Date: Mon, 11 Nov 2019 18:51:58 +0000 In-Reply-To: (Richard Sandiford's message of "Mon, 11 Nov 2019 18:45:00 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 X-IsSubscribed: yes For: void f1 (int *x, int *y) { for (int i = 0; i < 32; ++i) x[i] += y[i]; } we checked at runtime whether one vector at x would overlap one vector at y. But in cases like this, the vector code would handle x <= y just fine, since any write to address A still happens after any read from address A. The only problem is if x is ahead of y by less than a vector. The same is true for two writes: void f2 (int *x, int *y) { for (int i = 0; i < 32; ++i) { x[i] = i; y[i] = 2; } } if y <= x then a vector write at y after a vector write at x would have the same net effect as the original scalar writes. This patch optimises the alias checks for these two cases. E.g., before the patch, f1 used: add x2, x0, 15 sub x2, x2, x1 cmp x2, 30 bls .L2 whereas after the patch it uses: add x2, x1, 4 sub x2, x0, x2 cmp x2, 8 bls .L2 Read-after-write cases like: int f3 (int *x, int *y) { int res = 0; for (int i = 0; i < 32; ++i) { x[i] = i; res += y[i]; } return res; } can cope with x == y, but otherwise don't allow overlap in either direction. Since checking for x == y at runtime would require extra code, we're probably better off sticking with the current overlap test. An overlap test is also needed if the scalar or vector accesses covered by the alias check are mixed together, rather than all statements for the second access following all statements for the first access. The new code for gcc.target/aarch64/sve/var_strict_[135].c is slightly better than before. 2019-11-11 Richard Sandiford gcc/ * tree-data-ref.c (create_intersect_range_checks_index): If the alias pair describes simple WAW and WAR dependencies, just check whether the first B access overlaps later A accesses. (create_waw_or_war_checks): New function that performs the same optimization on addresses. (create_intersect_range_checks): Call it. gcc/testsuite/ * gcc.dg/vect/vect-alias-check-8.c: Expect WAR/WAW checks to be used. * gcc.dg/vect/vect-alias-check-14.c: Likewise. * gcc.dg/vect/vect-alias-check-15.c: Likewise. * gcc.dg/vect/vect-alias-check-18.c: Likewise. * gcc.dg/vect/vect-alias-check-19.c: Likewise. * gcc.target/aarch64/sve/var_stride_1.c: Update expected sequence. * gcc.target/aarch64/sve/var_stride_2.c: Likewise. * gcc.target/aarch64/sve/var_stride_3.c: Likewise. * gcc.target/aarch64/sve/var_stride_5.c: Likewise. Index: gcc/tree-data-ref.c =================================================================== --- gcc/tree-data-ref.c 2019-11-11 18:32:12.000000000 +0000 +++ gcc/tree-data-ref.c 2019-11-11 18:32:13.186616541 +0000 @@ -1806,6 +1806,8 @@ create_intersect_range_checks_index (cla abs_step, &niter_access2)) return false; + bool waw_or_war_p = (alias_pair.flags & ~(DR_ALIAS_WAR | DR_ALIAS_WAW)) == 0; + unsigned int i; for (i = 0; i < DR_NUM_DIMENSIONS (dr_a.dr); i++) { @@ -1907,16 +1909,57 @@ create_intersect_range_checks_index (cla Combining the tests requires limit to be computable in an unsigned form of the index type; if it isn't, we fall back to the usual - pointer-based checks. */ - poly_offset_int limit = (idx_len1 + idx_access1 - 1 - + idx_len2 + idx_access2 - 1); + pointer-based checks. + + We can do better if DR_B is a write and if DR_A and DR_B are + well-ordered in both the original and the new code (see the + comment above the DR_ALIAS_* flags for details). In this case + we know that for each i in [0, n-1], the write performed by + access i of DR_B occurs after access numbers j<=i of DR_A in + both the original and the new code. Any write or anti + dependencies wrt those DR_A accesses are therefore maintained. + + We just need to make sure that each individual write in DR_B does not + overlap any higher-indexed access in DR_A; such DR_A accesses happen + after the DR_B access in the original code but happen before it in + the new code. + + We know the steps for both accesses are equal, so by induction, we + just need to test whether the first write of DR_B overlaps a later + access of DR_A. In other words, we need to move min1 along by + one iteration: + + min1' = min1 + idx_step + + and use the ranges: + + [min1' + low_offset1', min1' + high_offset1' + idx_access1 - 1] + + and: + + [min2, min2 + idx_access2 - 1] + + where: + + low_offset1' = +ve step ? 0 : -(idx_len1 - |idx_step|) + high_offset1' = +ve_step ? idx_len1 - |idx_step| : 0. */ + if (waw_or_war_p) + idx_len1 -= abs_idx_step; + + poly_offset_int limit = idx_len1 + idx_access1 - 1 + idx_access2 - 1; + if (!waw_or_war_p) + limit += idx_len2; + tree utype = unsigned_type_for (TREE_TYPE (min1)); if (!wi::fits_to_tree_p (limit, utype)) return false; poly_offset_int low_offset1 = neg_step ? -idx_len1 : 0; - poly_offset_int high_offset2 = neg_step ? 0 : idx_len2; + poly_offset_int high_offset2 = neg_step || waw_or_war_p ? 0 : idx_len2; poly_offset_int bias = high_offset2 + idx_access2 - 1 - low_offset1; + /* Equivalent to adding IDX_STEP to MIN1. */ + if (waw_or_war_p) + bias -= wi::to_offset (idx_step); tree subject = fold_build2 (MINUS_EXPR, utype, fold_convert (utype, min2), @@ -1932,7 +1975,169 @@ create_intersect_range_checks_index (cla *cond_expr = part_cond_expr; } if (dump_enabled_p ()) - dump_printf (MSG_NOTE, "using an index-based overlap test\n"); + { + if (waw_or_war_p) + dump_printf (MSG_NOTE, "using an index-based WAR/WAW test\n"); + else + dump_printf (MSG_NOTE, "using an index-based overlap test\n"); + } + return true; +} + +/* A subroutine of create_intersect_range_checks, with a subset of the + same arguments. Try to optimize cases in which the second access + is a write and in which some overlap is valid. */ + +static bool +create_waw_or_war_checks (tree *cond_expr, + const dr_with_seg_len_pair_t &alias_pair) +{ + const dr_with_seg_len& dr_a = alias_pair.first; + const dr_with_seg_len& dr_b = alias_pair.second; + + /* Check for cases in which: + + (a) DR_B is always a write; + (b) the accesses are well-ordered in both the original and new code + (see the comment above the DR_ALIAS_* flags for details); and + (c) the DR_STEPs describe all access pairs covered by ALIAS_PAIR. */ + if (alias_pair.flags & ~(DR_ALIAS_WAR | DR_ALIAS_WAW)) + return false; + + /* Check for equal (but possibly variable) steps. */ + tree step = DR_STEP (dr_a.dr); + if (!operand_equal_p (step, DR_STEP (dr_b.dr))) + return false; + + /* Make sure that we can operate on sizetype without loss of precision. */ + tree addr_type = TREE_TYPE (DR_BASE_ADDRESS (dr_a.dr)); + if (TYPE_PRECISION (addr_type) != TYPE_PRECISION (sizetype)) + return false; + + /* All addresses involved are known to have a common alignment ALIGN. + We can therefore subtract ALIGN from an exclusive endpoint to get + an inclusive endpoint. In the best (and common) case, ALIGN is the + same as the access sizes of both DRs, and so subtracting ALIGN + cancels out the addition of an access size. */ + unsigned int align = MIN (dr_a.align, dr_b.align); + poly_uint64 last_chunk_a = dr_a.access_size - align; + poly_uint64 last_chunk_b = dr_b.access_size - align; + + /* Get a boolean expression that is true when the step is negative. */ + tree indicator = dr_direction_indicator (dr_a.dr); + tree neg_step = fold_build2 (LT_EXPR, boolean_type_node, + fold_convert (ssizetype, indicator), + ssize_int (0)); + + /* Get lengths in sizetype. */ + tree seg_len_a + = fold_convert (sizetype, rewrite_to_non_trapping_overflow (dr_a.seg_len)); + step = fold_convert (sizetype, rewrite_to_non_trapping_overflow (step)); + + /* Each access has the following pattern: + + <- |seg_len| -> + <--- A: -ve step ---> + +-----+-------+-----+-------+-----+ + | n-1 | ..... | 0 | ..... | n-1 | + +-----+-------+-----+-------+-----+ + <--- B: +ve step ---> + <- |seg_len| -> + | + base address + + where "n" is the number of scalar iterations covered by the segment. + + A is the range of bytes accessed when the step is negative, + B is the range when the step is positive. + + We know that DR_B is a write. We also know (from checking that + DR_A and DR_B are well-ordered) that for each i in [0, n-1], + the write performed by access i of DR_B occurs after access numbers + j<=i of DR_A in both the original and the new code. Any write or + anti dependencies wrt those DR_A accesses are therefore maintained. + + We just need to make sure that each individual write in DR_B does not + overlap any higher-indexed access in DR_A; such DR_A accesses happen + after the DR_B access in the original code but happen before it in + the new code. + + We know the steps for both accesses are equal, so by induction, we + just need to test whether the first write of DR_B overlaps a later + access of DR_A. In other words, we need to move addr_a along by + one iteration: + + addr_a' = addr_a + step + + and check whether: + + [addr_b, addr_b + last_chunk_b] + + overlaps: + + [addr_a' + low_offset_a, addr_a' + high_offset_a + last_chunk_a] + + where [low_offset_a, high_offset_a] spans accesses [1, n-1]. I.e.: + + low_offset_a = +ve step ? 0 : seg_len_a - step + high_offset_a = +ve step ? seg_len_a - step : 0 + + This is equivalent to testing whether: + + addr_a' + low_offset_a <= addr_b + last_chunk_b + && addr_b <= addr_a' + high_offset_a + last_chunk_a + + Converting this into a single test, there is an overlap if: + + 0 <= addr_b + last_chunk_b - addr_a' - low_offset_a <= limit + + where limit = high_offset_a - low_offset_a + last_chunk_a + last_chunk_b + + If DR_A is performed, limit + |step| - last_chunk_b is known to be + less than the size of the object underlying DR_A. We also know + that last_chunk_b <= |step|; this is checked elsewhere if it isn't + guaranteed at compile time. There can therefore be no overflow if + "limit" is calculated in an unsigned type with pointer precision. */ + tree addr_a = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_a.dr), + DR_OFFSET (dr_a.dr)); + addr_a = fold_build_pointer_plus (addr_a, DR_INIT (dr_a.dr)); + + tree addr_b = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_b.dr), + DR_OFFSET (dr_b.dr)); + addr_b = fold_build_pointer_plus (addr_b, DR_INIT (dr_b.dr)); + + /* Advance ADDR_A by one iteration and adjust the length to compensate. */ + addr_a = fold_build_pointer_plus (addr_a, step); + tree seg_len_a_minus_step = fold_build2 (MINUS_EXPR, sizetype, + seg_len_a, step); + if (!CONSTANT_CLASS_P (seg_len_a_minus_step)) + seg_len_a_minus_step = build1 (SAVE_EXPR, sizetype, seg_len_a_minus_step); + + tree low_offset_a = fold_build3 (COND_EXPR, sizetype, neg_step, + seg_len_a_minus_step, size_zero_node); + if (!CONSTANT_CLASS_P (low_offset_a)) + low_offset_a = build1 (SAVE_EXPR, sizetype, low_offset_a); + + /* We could use COND_EXPR , + but it's usually more efficient to reuse the LOW_OFFSET_A result. */ + tree high_offset_a = fold_build2 (MINUS_EXPR, sizetype, seg_len_a_minus_step, + low_offset_a); + + /* The amount added to addr_b - addr_a'. */ + tree bias = fold_build2 (MINUS_EXPR, sizetype, + size_int (last_chunk_b), low_offset_a); + + tree limit = fold_build2 (MINUS_EXPR, sizetype, high_offset_a, low_offset_a); + limit = fold_build2 (PLUS_EXPR, sizetype, limit, + size_int (last_chunk_a + last_chunk_b)); + + tree subject = fold_build2 (POINTER_DIFF_EXPR, ssizetype, addr_b, addr_a); + subject = fold_build2 (PLUS_EXPR, sizetype, + fold_convert (sizetype, subject), bias); + + *cond_expr = fold_build2 (GT_EXPR, boolean_type_node, subject, limit); + if (dump_enabled_p ()) + dump_printf (MSG_NOTE, "using an address-based WAR/WAW test\n"); return true; } @@ -2036,6 +2241,9 @@ create_intersect_range_checks (class loo if (create_intersect_range_checks_index (loop, cond_expr, alias_pair)) return; + if (create_waw_or_war_checks (cond_expr, alias_pair)) + return; + unsigned HOST_WIDE_INT min_align; tree_code cmp_code; /* We don't have to check DR_ALIAS_MIXED_STEPS here, since both versions Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-8.c =================================================================== --- gcc/testsuite/gcc.dg/vect/vect-alias-check-8.c 2019-11-11 18:32:12.000000000 +0000 +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-8.c 2019-11-11 18:32:13.186616541 +0000 @@ -60,5 +60,5 @@ main (void) } /* { dg-final { scan-tree-dump {flags: *WAR\n} "vect" { target vect_int } } } */ -/* { dg-final { scan-tree-dump "using an index-based overlap test" "vect" } } */ +/* { dg-final { scan-tree-dump "using an index-based WAR/WAW test" "vect" } } */ /* { dg-final { scan-tree-dump-not "using an address-based" "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-14.c =================================================================== --- gcc/testsuite/gcc.dg/vect/vect-alias-check-14.c 2019-11-11 18:32:12.000000000 +0000 +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-14.c 2019-11-11 18:32:13.186616541 +0000 @@ -60,5 +60,5 @@ main (void) /* { dg-final { scan-tree-dump {flags: *WAR\n} "vect" { target vect_int } } } */ /* { dg-final { scan-tree-dump-not {flags: [^\n]*ARBITRARY\n} "vect" } } */ -/* { dg-final { scan-tree-dump "using an address-based overlap test" "vect" } } */ +/* { dg-final { scan-tree-dump "using an address-based WAR/WAW test" "vect" } } */ /* { dg-final { scan-tree-dump-not "using an index-based" "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-15.c =================================================================== --- gcc/testsuite/gcc.dg/vect/vect-alias-check-15.c 2019-11-11 18:32:12.000000000 +0000 +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-15.c 2019-11-11 18:32:13.186616541 +0000 @@ -57,5 +57,5 @@ main (void) } /* { dg-final { scan-tree-dump {flags: *WAW\n} "vect" { target vect_int } } } */ -/* { dg-final { scan-tree-dump "using an address-based overlap test" "vect" } } */ +/* { dg-final { scan-tree-dump "using an address-based WAR/WAW test" "vect" } } */ /* { dg-final { scan-tree-dump-not "using an index-based" "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-18.c =================================================================== --- gcc/testsuite/gcc.dg/vect/vect-alias-check-18.c 2019-11-11 18:32:12.000000000 +0000 +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-18.c 2019-11-11 18:32:13.186616541 +0000 @@ -60,5 +60,5 @@ main (void) } /* { dg-final { scan-tree-dump {flags: *WAR\n} "vect" { target vect_int } } } */ -/* { dg-final { scan-tree-dump "using an index-based overlap test" "vect" } } */ +/* { dg-final { scan-tree-dump "using an index-based WAR/WAW test" "vect" } } */ /* { dg-final { scan-tree-dump-not "using an address-based" "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-19.c =================================================================== --- gcc/testsuite/gcc.dg/vect/vect-alias-check-19.c 2019-11-11 18:32:12.000000000 +0000 +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-19.c 2019-11-11 18:32:13.186616541 +0000 @@ -58,5 +58,5 @@ main (void) } /* { dg-final { scan-tree-dump {flags: *WAW\n} "vect" { target vect_int } } } */ -/* { dg-final { scan-tree-dump "using an index-based overlap test" "vect" } } */ +/* { dg-final { scan-tree-dump "using an index-based WAR/WAW test" "vect" } } */ /* { dg-final { scan-tree-dump-not "using an address-based" "vect" } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/var_stride_1.c =================================================================== --- gcc/testsuite/gcc.target/aarch64/sve/var_stride_1.c 2019-11-11 18:32:12.000000000 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve/var_stride_1.c 2019-11-11 18:32:13.186616541 +0000 @@ -15,13 +15,9 @@ f (TYPE *x, TYPE *y, unsigned short n, l /* { dg-final { scan-assembler {\tst1w\tz[0-9]+} } } */ /* { dg-final { scan-assembler {\tldr\tw[0-9]+} } } */ /* { dg-final { scan-assembler {\tstr\tw[0-9]+} } } */ -/* Should multiply by (VF-1)*4 rather than (257-1)*4. */ -/* { dg-final { scan-assembler-not {, 1024} } } */ -/* { dg-final { scan-assembler-not {\t.bfiz\t} } } */ -/* { dg-final { scan-assembler-not {lsl[^\n]*[, ]10} } } */ -/* { dg-final { scan-assembler-not {\tcmp\tx[0-9]+, 0} } } */ -/* { dg-final { scan-assembler-not {\tcmp\tw[0-9]+, 0} } } */ -/* { dg-final { scan-assembler-not {\tcsel\tx[0-9]+} } } */ -/* Two range checks and a check for n being zero. */ -/* { dg-final { scan-assembler-times {\tcmp\t} 1 } } */ -/* { dg-final { scan-assembler-times {\tccmp\t} 2 } } */ +/* Should use a WAR check that multiplies by (VF-2)*4 rather than + an overlap check that multiplies by (257-1)*4. */ +/* { dg-final { scan-assembler {\tcntb\t(x[0-9]+)\n.*\tsub\tx[0-9]+, \1, #8\n.*\tmul\tx[0-9]+,[^\n]*\1} } } */ +/* One range check and a check for n being zero. */ +/* { dg-final { scan-assembler-times {\t(?:cmp|tst)\t} 1 } } */ +/* { dg-final { scan-assembler-times {\tccmp\t} 1 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/var_stride_2.c =================================================================== --- gcc/testsuite/gcc.target/aarch64/sve/var_stride_2.c 2019-11-11 18:32:12.000000000 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve/var_stride_2.c 2019-11-11 18:32:13.186616541 +0000 @@ -15,7 +15,7 @@ f (TYPE *x, TYPE *y, unsigned short n, u /* { dg-final { scan-assembler {\tst1w\tz[0-9]+} } } */ /* { dg-final { scan-assembler {\tldr\tw[0-9]+} } } */ /* { dg-final { scan-assembler {\tstr\tw[0-9]+} } } */ -/* Should multiply by (257-1)*4 rather than (VF-1)*4. */ +/* Should multiply by (257-1)*4 rather than (VF-1)*4 or (VF-2)*4. */ /* { dg-final { scan-assembler-times {\tubfiz\tx[0-9]+, x2, 10, 16\n} 1 } } */ /* { dg-final { scan-assembler-times {\tubfiz\tx[0-9]+, x3, 10, 16\n} 1 } } */ /* { dg-final { scan-assembler-not {\tcmp\tx[0-9]+, 0} } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/var_stride_3.c =================================================================== --- gcc/testsuite/gcc.target/aarch64/sve/var_stride_3.c 2019-11-11 18:32:12.000000000 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve/var_stride_3.c 2019-11-11 18:32:13.186616541 +0000 @@ -15,13 +15,10 @@ f (TYPE *x, TYPE *y, int n, long m __att /* { dg-final { scan-assembler {\tst1w\tz[0-9]+} } } */ /* { dg-final { scan-assembler {\tldr\tw[0-9]+} } } */ /* { dg-final { scan-assembler {\tstr\tw[0-9]+} } } */ -/* Should multiply by (VF-1)*4 rather than (257-1)*4. */ -/* { dg-final { scan-assembler-not {, 1024} } } */ -/* { dg-final { scan-assembler-not {\t.bfiz\t} } } */ -/* { dg-final { scan-assembler-not {lsl[^\n]*[, ]10} } } */ -/* { dg-final { scan-assembler-not {\tcmp\tx[0-9]+, 0} } } */ -/* { dg-final { scan-assembler {\tcmp\tw2, 0} } } */ -/* { dg-final { scan-assembler-times {\tcsel\tx[0-9]+} 2 } } */ -/* Two range checks and a check for n being zero. */ -/* { dg-final { scan-assembler {\tcmp\t} } } */ -/* { dg-final { scan-assembler-times {\tccmp\t} 2 } } */ +/* Should use a WAR check that multiplies by (VF-2)*4 rather than + an overlap check that multiplies by (257-1)*4. */ +/* { dg-final { scan-assembler {\tcntb\t(x[0-9]+)\n.*\tsub\tx[0-9]+, \1, #8\n.*\tmul\tx[0-9]+,[^\n]*\1} } } */ +/* { dg-final { scan-assembler-times {\tcsel\tx[0-9]+[^\n]*xzr} 1 } } */ +/* One range check and a check for n being zero. */ +/* { dg-final { scan-assembler-times {\tcmp\t} 1 } } */ +/* { dg-final { scan-assembler-times {\tccmp\t} 1 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/var_stride_5.c =================================================================== --- gcc/testsuite/gcc.target/aarch64/sve/var_stride_5.c 2019-11-11 18:32:12.000000000 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve/var_stride_5.c 2019-11-11 18:32:13.186616541 +0000 @@ -15,13 +15,10 @@ f (TYPE *x, TYPE *y, long n, long m __at /* { dg-final { scan-assembler {\tst1d\tz[0-9]+} } } */ /* { dg-final { scan-assembler {\tldr\td[0-9]+} } } */ /* { dg-final { scan-assembler {\tstr\td[0-9]+} } } */ -/* Should multiply by (VF-1)*8 rather than (257-1)*8. */ -/* { dg-final { scan-assembler-not {, 2048} } } */ -/* { dg-final { scan-assembler-not {\t.bfiz\t} } } */ -/* { dg-final { scan-assembler-not {lsl[^\n]*[, ]11} } } */ -/* { dg-final { scan-assembler {\tcmp\tx[0-9]+, 0} } } */ -/* { dg-final { scan-assembler-not {\tcmp\tw[0-9]+, 0} } } */ -/* { dg-final { scan-assembler-times {\tcsel\tx[0-9]+} 2 } } */ -/* Two range checks and a check for n being zero. */ -/* { dg-final { scan-assembler {\tcmp\t} } } */ -/* { dg-final { scan-assembler-times {\tccmp\t} 2 } } */ +/* Should use a WAR check that multiplies by (VF-2)*8 rather than + an overlap check that multiplies by (257-1)*4. */ +/* { dg-final { scan-assembler {\tcntb\t(x[0-9]+)\n.*\tsub\tx[0-9]+, \1, #16\n.*\tmul\tx[0-9]+,[^\n]*\1} } } */ +/* { dg-final { scan-assembler-times {\tcsel\tx[0-9]+[^\n]*xzr} 1 } } */ +/* One range check and a check for n being zero. */ +/* { dg-final { scan-assembler-times {\tcmp\t} 1 } } */ +/* { dg-final { scan-assembler-times {\tccmp\t} 1 } } */