From patchwork Thu May 26 07:52:22 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Rosen X-Patchwork-Id: 97504 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 79680B6FA5 for ; Thu, 26 May 2011 17:52:46 +1000 (EST) Received: (qmail 17939 invoked by alias); 26 May 2011 07:52:41 -0000 Received: (qmail 17929 invoked by uid 22791); 26 May 2011 07:52:38 -0000 X-SWARE-Spam-Status: No, hits=-2.2 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW, TW_TM X-Spam-Check-By: sourceware.org Received: from mail-yx0-f175.google.com (HELO mail-yx0-f175.google.com) (209.85.213.175) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 26 May 2011 07:52:23 +0000 Received: by yxn22 with SMTP id 22so198636yxn.20 for ; Thu, 26 May 2011 00:52:22 -0700 (PDT) MIME-Version: 1.0 Received: by 10.150.14.13 with SMTP id 13mr610901ybn.438.1306396342562; Thu, 26 May 2011 00:52:22 -0700 (PDT) Received: by 10.150.138.14 with HTTP; Thu, 26 May 2011 00:52:22 -0700 (PDT) Date: Thu, 26 May 2011 10:52:22 +0300 Message-ID: Subject: [patch] Fix PR tree-optimization/49038 From: Ira Rosen To: gcc-patches@gcc.gnu.org Cc: Patch Tracking Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hi, The vectorizer supports strided loads with gaps, e.g., when only a[4i] and a[4i+2] are accessed, it generates a vector load a[4i:4i+3], i.e., creating an access to a[4i+3], which doesn't exist in the scalar code. This access maybe invalid as described in the PR. This patch creates an epilogue loop (with at least one iteration) for such cases. Bootstrapped and tested on powerpc64-suse-linux. Applied to trunk. I'll prepare patches for 4.5 and 4.6 next week. Ira ChangeLog: PR tree-optimization/49038 * tree-vect-loop-manip.c (vect_generate_tmps_on_preheader): Ensure at least one epilogue iteration if required by data accesses with gaps. * tree-vectorizer.h (struct _loop_vec_info): Add new field to mark loops that require peeling for gaps. * tree-vect-loop.c (new_loop_vec_info): Initialize new field. (vect_get_known_peeling_cost): Take peeling for gaps into account. (vect_transform_loop): Generate epilogue if required by data access with gaps. * tree-vect-data-refs.c (vect_analyze_group_access): Mark the loop as requiring an epilogue if there are gaps in the end of the strided group. testsuite/ChangeLog: PR tree-optimization/49038 * gcc.dg/vect/vect-strided-u8-i8-gap4-unknown.c: New test. * gcc.dg/vect/pr49038.c: New test. Index: tree-vect-loop-manip.c =================================================================== --- tree-vect-loop-manip.c (revision 174264) +++ tree-vect-loop-manip.c (working copy) @@ -1551,7 +1551,7 @@ vect_generate_tmps_on_preheader (loop_vec_info loo edge pe; basic_block new_bb; gimple_seq stmts; - tree ni_name; + tree ni_name, ni_minus_gap_name; tree var; tree ratio_name; tree ratio_mult_vf_name; @@ -1568,9 +1568,39 @@ vect_generate_tmps_on_preheader (loop_vec_info loo ni_name = vect_build_loop_niters (loop_vinfo, cond_expr_stmt_list); log_vf = build_int_cst (TREE_TYPE (ni), exact_log2 (vf)); + /* If epilogue loop is required because of data accesses with gaps, we + subtract one iteration from the total number of iterations here for + correct calculation of RATIO. */ + if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)) + { + ni_minus_gap_name = fold_build2 (MINUS_EXPR, TREE_TYPE (ni_name), + ni_name, + build_one_cst (TREE_TYPE (ni_name))); + if (!is_gimple_val (ni_minus_gap_name)) + { + var = create_tmp_var (TREE_TYPE (ni), "ni_gap"); + add_referenced_var (var); + + stmts = NULL; + ni_minus_gap_name = force_gimple_operand (ni_minus_gap_name, &stmts, + true, var); + if (cond_expr_stmt_list) + gimple_seq_add_seq (&cond_expr_stmt_list, stmts); + else + { + pe = loop_preheader_edge (loop); + new_bb = gsi_insert_seq_on_edge_immediate (pe, stmts); + gcc_assert (!new_bb); + } + } + } + else + ni_minus_gap_name = ni_name; + /* Create: ratio = ni >> log2(vf) */ - ratio_name = fold_build2 (RSHIFT_EXPR, TREE_TYPE (ni_name), ni_name, log_vf); + ratio_name = fold_build2 (RSHIFT_EXPR, TREE_TYPE (ni_minus_gap_name), + ni_minus_gap_name, log_vf); if (!is_gimple_val (ratio_name)) { var = create_tmp_var (TREE_TYPE (ni), "bnd"); Index: testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4-unknown.c =================================================================== --- testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4-unknown.c (revision 0) +++ testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4-unknown.c (revision 0) @@ -0,0 +1,103 @@ +/* { dg-require-effective-target vect_int } */ + +#include +#include +#include "tree-vect.h" + +#define N 160 + +typedef struct { + unsigned char a; + unsigned char b; + unsigned char c; + unsigned char d; + unsigned char e; + unsigned char f; + unsigned char g; + unsigned char h; +} s; + +__attribute__ ((noinline)) int +main1 (s *arr, int n) +{ + int i; + s *ptr = arr; + s res[N]; + unsigned char x; + + /* Check peeling for gaps for unknown loop bound. */ + for (i = 0; i < n; i++) + { + res[i].c = ptr->b + ptr->c; + x = ptr->c + ptr->f; + res[i].a = x + ptr->b; + res[i].d = ptr->b + ptr->c; + res[i].b = ptr->c; + res[i].f = ptr->f + ptr->e; + res[i].e = ptr->b + ptr->e; + res[i].h = ptr->c; + res[i].g = ptr->b + ptr->c; + ptr++; + } + + /* check results: */ + for (i = 0; i < n; i++) + { + if (res[i].c != arr[i].b + arr[i].c + || res[i].a != arr[i].c + arr[i].f + arr[i].b + || res[i].d != arr[i].b + arr[i].c + || res[i].b != arr[i].c + || res[i].f != arr[i].f + arr[i].e + || res[i].e != arr[i].b + arr[i].e + || res[i].h != arr[i].c + || res[i].g != arr[i].b + arr[i].c) + abort (); + } + + /* Check also that we don't do more iterations than needed. */ + for (i = n; i < N; i++) + { + if (res[i].c == arr[i].b + arr[i].c + || res[i].a == arr[i].c + arr[i].f + arr[i].b + || res[i].d == arr[i].b + arr[i].c + || res[i].b == arr[i].c + || res[i].f == arr[i].f + arr[i].e + || res[i].e == arr[i].b + arr[i].e + || res[i].h == arr[i].c + || res[i].g == arr[i].b + arr[i].c) + abort (); + } + + return 0; +} + + +int main (void) +{ + int i; + s arr[N]; + + check_vect (); + + for (i = 0; i < N; i++) + { + arr[i].a = 5; + arr[i].b = 6; + arr[i].c = 17; + arr[i].d = 3; + arr[i].e = 16; + arr[i].f = 16; + arr[i].g = 3; + arr[i].h = 56; + if (arr[i].a == 178) + abort(); + } + + main1 (arr, N-2); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided8 } } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ + Index: testsuite/gcc.dg/vect/pr49038.c =================================================================== --- testsuite/gcc.dg/vect/pr49038.c (revision 0) +++ testsuite/gcc.dg/vect/pr49038.c (revision 0) @@ -0,0 +1,38 @@ +#include +#include + +#define COUNT 320 +#define MMAP_SIZE 0x10000 +#define ADDRESS 0x1122000000 +#define TYPE unsigned short + +void __attribute__((noinline)) +foo (TYPE *__restrict a, TYPE *__restrict b) +{ + int n; + + for (n = 0; n < COUNT; n++) + a[n] = b[n * 2]; +} + +int +main (void) +{ + void *x; + size_t b_offset; + + x = mmap ((void *) ADDRESS, MMAP_SIZE, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (x == MAP_FAILED) + { + perror ("mmap"); + return 1; + } + + b_offset = MMAP_SIZE - (2 * COUNT - 1) * sizeof (TYPE); + foo ((unsigned short *) x, + (unsigned short *) ((char *) x + b_offset)); + return 0; +} + +/* { dg-final { cleanup-tree-dump "vect" } } */ Index: tree-vectorizer.h =================================================================== --- tree-vectorizer.h (revision 174264) +++ tree-vectorizer.h (working copy) @@ -255,6 +255,11 @@ typedef struct _loop_vec_info { /* Hash table used to choose the best peeling option. */ htab_t peeling_htab; + /* When we have strided data accesses with gaps, we may introduce invalid + memory accesses. We peel the last iteration of the loop to prevent + this. */ + bool peeling_for_gaps; + } *loop_vec_info; /* Access Functions. */ @@ -283,6 +288,7 @@ typedef struct _loop_vec_info { #define LOOP_VINFO_REDUCTIONS(L) (L)->reductions #define LOOP_VINFO_REDUCTION_CHAINS(L) (L)->reduction_chains #define LOOP_VINFO_PEELING_HTAB(L) (L)->peeling_htab +#define LOOP_VINFO_PEELING_FOR_GAPS(L) (L)->peeling_for_gaps #define LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT(L) \ VEC_length (gimple, (L)->may_misalign_stmts) > 0 Index: tree-vect-loop.c =================================================================== --- tree-vect-loop.c (revision 174264) +++ tree-vect-loop.c (working copy) @@ -761,6 +761,7 @@ new_loop_vec_info (struct loop *loop) LOOP_VINFO_SLP_INSTANCES (res) = VEC_alloc (slp_instance, heap, 10); LOOP_VINFO_SLP_UNROLLING_FACTOR (res) = 1; LOOP_VINFO_PEELING_HTAB (res) = NULL; + LOOP_VINFO_PEELING_FOR_GAPS (res) = false; return res; } @@ -2333,6 +2334,10 @@ vect_get_known_peeling_cost (loop_vec_info loop_vi peel_iters_prologue = niters < peel_iters_prologue ? niters : peel_iters_prologue; *peel_iters_epilogue = (niters - peel_iters_prologue) % vf; + /* If we need to peel for gaps, but no peeling is required, we have to + peel VF iterations. */ + if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) && !*peel_iters_epilogue) + *peel_iters_epilogue = vf; } return (peel_iters_prologue * scalar_single_iter_cost) @@ -4987,7 +4992,8 @@ vect_transform_loop (loop_vec_info loop_vinfo) do_peeling_for_loop_bound = (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) || (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) - && LOOP_VINFO_INT_NITERS (loop_vinfo) % vectorization_factor != 0)); + && LOOP_VINFO_INT_NITERS (loop_vinfo) % vectorization_factor != 0) + || LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)); if (LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT (loop_vinfo) || LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo)) Index: tree-vect-data-refs.c =================================================================== --- tree-vect-data-refs.c (revision 174264) +++ tree-vect-data-refs.c (working copy) @@ -2043,7 +2043,7 @@ vect_analyze_group_access (struct data_reference * loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info); HOST_WIDE_INT dr_step = TREE_INT_CST_LOW (step); - HOST_WIDE_INT stride; + HOST_WIDE_INT stride, last_accessed_element = 1; bool slp_impossible = false; /* For interleaving, STRIDE is STEP counted in elements, i.e., the size of the @@ -2072,6 +2072,16 @@ vect_analyze_group_access (struct data_reference * fprintf (vect_dump, " step "); print_generic_expr (vect_dump, step, TDF_SLIM); } + + if (loop_vinfo) + { + LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) = true; + + if (vect_print_dump_info (REPORT_DETAILS)) + fprintf (vect_dump, "Data access with gaps requires scalar " + "epilogue loop"); + } + return true; } @@ -2137,6 +2147,7 @@ vect_analyze_group_access (struct data_reference * next = GROUP_NEXT_ELEMENT (vinfo_for_stmt (next)); continue; } + prev = next; /* Check that all the accesses have the same STEP. */ @@ -2167,6 +2178,8 @@ vect_analyze_group_access (struct data_reference * gaps += diff - 1; } + last_accessed_element += diff; + /* Store the gap from the previous member of the group. If there is no gap in the access, GROUP_GAP is always 1. */ GROUP_GAP (vinfo_for_stmt (next)) = diff; @@ -2245,6 +2258,15 @@ vect_analyze_group_access (struct data_reference * VEC_safe_push (gimple, heap, BB_VINFO_STRIDED_STORES (bb_vinfo), stmt); } + + /* There is a gap in the end of the group. */ + if (stride - last_accessed_element > 0 && loop_vinfo) + { + LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) = true; + if (vect_print_dump_info (REPORT_DETAILS)) + fprintf (vect_dump, "Data access with gaps requires scalar " + "epilogue loop"); + } } return true;