From patchwork Tue Oct 29 08:42:04 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiufu Guo X-Patchwork-Id: 1185913 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-511942-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="fHi2A1mj"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 472Q7h6szPz9sP3 for ; Tue, 29 Oct 2019 19:42:24 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id; q=dns; s=default; b=Uk83fY8RqK9t tdXrZMgd6hUF5lekgMmflQHX5T1xyhK2TLv6+necP3jcLYjwbQ7nfACOctmOEHKs k3oBii13h9EJX0AkUbS64AHEBCqv222+KmiEWd2zd/bVnZpuc3q4H3lVQUaqgzCp aTcubG4SwOkj9A3r6jYz1rj8BmQulQ8= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id; s=default; bh=xWdLyiNJY6RIv+fT9B PKm8fYC4k=; b=fHi2A1mjBhhT1YaXHISYaittmXukHcYCx+KthOTkk2pPJEBjkQ AEuo5nTwZB0LTarHx8xb9dsH0ksHxebPtNM0pDM3rSJRZwEmq/GdpFmYfDwXRdlB X7kB62LuE962HgMr6pgY5AyxTWpUKzYYDNfanKuDt7PvBegR1rQEg8t4g= Received: (qmail 84672 invoked by alias); 29 Oct 2019 08:42:17 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 84404 invoked by uid 89); 29 Oct 2019 08:42:16 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-23.0 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.1 spammy= X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.158.5) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 29 Oct 2019 08:42:14 +0000 Received: from pps.filterd (m0098419.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x9T8U1Zq004046 for ; Tue, 29 Oct 2019 04:42:12 -0400 Received: from e06smtp05.uk.ibm.com (e06smtp05.uk.ibm.com [195.75.94.101]) by mx0b-001b2d01.pphosted.com with ESMTP id 2vxhy98an6-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Tue, 29 Oct 2019 04:42:12 -0400 Received: from localhost by e06smtp05.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 29 Oct 2019 08:42:11 -0000 Received: from b06cxnps3074.portsmouth.uk.ibm.com (9.149.109.194) by e06smtp05.uk.ibm.com (192.168.101.135) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Tue, 29 Oct 2019 08:42:08 -0000 Received: from d06av21.portsmouth.uk.ibm.com (d06av21.portsmouth.uk.ibm.com [9.149.105.232]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x9T8g6Ya29949980 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 29 Oct 2019 08:42:06 GMT Received: from d06av21.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 214BB52075; Tue, 29 Oct 2019 08:42:06 +0000 (GMT) Received: from genoa.aus.stglabs.ibm.com (unknown [9.40.192.157]) by d06av21.portsmouth.uk.ibm.com (Postfix) with ESMTP id 41DB05205F; Tue, 29 Oct 2019 08:42:05 +0000 (GMT) From: Jiufu Guo To: gcc-patches@gcc.gnu.org Cc: guojiufu@linux.ibm.com, wschmidt@linux.ibm.com, segher@kernel.crashing.org, rguenther@suse.de Subject: [PATCH] rs6000: Refine implicit funroll-loops with unroll_adjust hook. Date: Tue, 29 Oct 2019 16:42:04 +0800 x-cbid: 19102908-0020-0000-0000-00000380878D X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19102908-0021-0000-0000-000021D68F24 Message-Id: <1572338524-29442-1-git-send-email-guojiufu@linux.ibm.com> X-IsSubscribed: yes Hi, In previous patch r277501, which is changing PARAM_MAX_UNROLL_TIMES and PARAM_MAX_UNROLLED_INSNS values during option overriding, while it would better to use target loop unroll adjust hook. The hook can also help to do further target related hueristic adjust. This patch is adding target loop unroll adjust hook for rs6000 to impliment previous behavior. Bootstrapped and regtested on powerpc64le. Is this ok for trunk? A combined patch is listed at the end of this mail for this and r277501. If you want to review it as a whole, it can be referenced. Jiufu Guo BR gcc/ 2019-10-29 Jiufu Guo PR tree-optimization/88760 * config/rs6000/rs6000.c (rs6000_option_override_internal): Remove changes to PARAM_MAX_UNROLL_TIMES and PARAM_MAX_UNROLLED_INSNS. (TARGET_LOOP_UNROLL_ADJUST): Add loop unroll adjust hook. (rs6000_loop_unroll_adjust): New hook for loop unroll adjust. Unrolling small loop 2 times if funroll-loops enabled implicitly. gcc.testsuite/ 2019-10-29 Jiufu Guo PR tree-optimization/88760 * gcc.dg/pr59643.c: Update back to r277550. Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 277550) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -1428,6 +1428,9 @@ static const struct attribute_spec rs6000_attribut #undef TARGET_VECTORIZE_DESTROY_COST_DATA #define TARGET_VECTORIZE_DESTROY_COST_DATA rs6000_destroy_cost_data +#undef TARGET_LOOP_UNROLL_ADJUST +#define TARGET_LOOP_UNROLL_ADJUST rs6000_loop_unroll_adjust + #undef TARGET_INIT_BUILTINS #define TARGET_INIT_BUILTINS rs6000_init_builtins #undef TARGET_BUILTIN_DECL @@ -4540,20 +4543,11 @@ rs6000_option_override_internal (bool global_init_ global_options.x_param_values, global_options_set.x_param_values); - /* unroll very small loops 2 time if no -funroll-loops. */ + /* If funroll-loops is implicitly enabled, do not turn fweb or + frename-registers on implicitly. */ if (!global_options_set.x_flag_unroll_loops && !global_options_set.x_flag_unroll_all_loops) { - maybe_set_param_value (PARAM_MAX_UNROLL_TIMES, 2, - global_options.x_param_values, - global_options_set.x_param_values); - - maybe_set_param_value (PARAM_MAX_UNROLLED_INSNS, 20, - global_options.x_param_values, - global_options_set.x_param_values); - - /* If fweb or frename-registers are not specificed in command-line, - do not turn them on implicitly. */ if (!global_options_set.x_flag_web) global_options.x_flag_web = 0; if (!global_options_set.x_flag_rename_registers) @@ -5101,6 +5095,29 @@ rs6000_destroy_cost_data (void *data) free (data); } +/* This target hook implementation for TARGET_LOOP_UNROLL_ADJUST calculates + a new heristic number struct loop *loop should be unrolled. */ + +static unsigned +rs6000_loop_unroll_adjust (unsigned nunroll, struct loop * loop) +{ + /* If funroll-loops was enabled implicitly, like at -O2, we only unroll + those small loops with small unroll factor. */ + if (!global_options_set.x_flag_unroll_loops + && !global_options_set.x_flag_unroll_all_loops) + { + /* If the loop contains few insns, treated it as small loops. + TODO: Uing 10 hard code for now. Continue to refine, For example, + if loop constians only 1-2 insns, we may unroll more times(4). + And we may use PARAM to control kinds of loop size. */ + if (loop->ninsns <= 10) + return 2; + else + return 0; + } + return nunroll; +} + /* Handler for the Mathematical Acceleration Subsystem (mass) interface to a library with vectorized intrinsics. */ Index: gcc/testsuite/gcc.dg/pr59643.c =================================================================== --- gcc/testsuite/gcc.dg/pr59643.c (revision 277550) +++ gcc/testsuite/gcc.dg/pr59643.c (working copy) @@ -1,9 +1,6 @@ /* PR tree-optimization/59643 */ /* { dg-do compile } */ /* { dg-options "-O3 -fdump-tree-pcom-details" } */ -/* { dg-additional-options "--param max-unrolled-insns=400" { target { powerpc*-*-* } } } */ -/* Implicit threashold of max-unrolled-insn on ppc at O3 is too small for the - loop of this case. */ void foo (double *a, double *b, double *c, double d, double e, int n) -------- combined patch--------- gcc/ChangeLog 2019-10-25 Jiufu Guo PR tree-optimization/88760 * config/rs6000/rs6000-common.c (rs6000_option_optimization_table): Enable -funroll-loops for -O2 and above. * config/rs6000/rs6000.c (rs6000_option_override_internal): Avoid turn on web and rnreg implicitly. (TARGET_LOOP_UNROLL_ADJUST): Add loop unroll adjust hook. (rs6000_loop_unroll_adjust): New hook for loop unroll adjust. Unrolling small loop 2 times if funroll-loops enabled implicitly. gcc/testsuite/ChangeLog 2019-10-25 Jiufu Guo PR tree-optimization/88760 * gcc.target/powerpc/small-loop-unroll.c: New test. * c-c++-common/tsan/thread_leak2.c: Update test. * gcc.target/powerpc/loop_align.c: Update test. * gcc.target/powerpc/ppc-fma-1.c: Update test. * gcc.target/powerpc/ppc-fma-2.c: Update test. * gcc.target/powerpc/ppc-fma-3.c: Update test. * gcc.target/powerpc/ppc-fma-4.c: Update test. * gcc.target/powerpc/pr78604.c: Update test. diff --git a/gcc/common/config/rs6000/rs6000-common.c b/gcc/common/config/rs6000/rs6000-common.c index 4b0c205..b947196 100644 --- a/gcc/common/config/rs6000/rs6000-common.c +++ b/gcc/common/config/rs6000/rs6000-common.c @@ -35,6 +35,7 @@ static const struct default_options rs6000_option_optimization_table[] = { OPT_LEVELS_ALL, OPT_fsplit_wide_types_early, NULL, 1 }, /* Enable -fsched-pressure for first pass instruction scheduling. */ { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 }, + { OPT_LEVELS_2_PLUS, OPT_funroll_loops, NULL, 1 }, { OPT_LEVELS_NONE, 0, NULL, 0 } }; diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 1399221..28ffa15 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -1428,6 +1428,9 @@ static const struct attribute_spec rs6000_attribute_table[] = #undef TARGET_VECTORIZE_DESTROY_COST_DATA #define TARGET_VECTORIZE_DESTROY_COST_DATA rs6000_destroy_cost_data +#undef TARGET_LOOP_UNROLL_ADJUST +#define TARGET_LOOP_UNROLL_ADJUST rs6000_loop_unroll_adjust + #undef TARGET_INIT_BUILTINS #define TARGET_INIT_BUILTINS rs6000_init_builtins #undef TARGET_BUILTIN_DECL @@ -4540,6 +4543,17 @@ rs6000_option_override_internal (bool global_init_p) global_options.x_param_values, global_options_set.x_param_values); + /* If funroll-loops is implicitly enabled, do not turn fweb or + frename-registers on implicitly. */ + if (!global_options_set.x_flag_unroll_loops + && !global_options_set.x_flag_unroll_all_loops) + { + if (!global_options_set.x_flag_web) + global_options.x_flag_web = 0; + if (!global_options_set.x_flag_rename_registers) + global_options.x_flag_rename_registers = 0; + } + /* If using typedef char *va_list, signal that __builtin_va_start (&ap, 0) can be optimized to ap = __builtin_next_arg (0). */ @@ -5081,6 +5095,29 @@ rs6000_destroy_cost_data (void *data) free (data); } +/* This target hook implementation for TARGET_LOOP_UNROLL_ADJUST calculates + a new heristic number struct loop *loop should be unrolled. */ + +static unsigned +rs6000_loop_unroll_adjust (unsigned nunroll, struct loop * loop) +{ + /* If funroll-loops was enabled implicitly, like at -O2, we only unroll + those small loops with small unroll factor. */ + if (!global_options_set.x_flag_unroll_loops + && !global_options_set.x_flag_unroll_all_loops) + { + /* If the loop contains few insns, treated it as small loops. + TODO: Uing 10 hard code for now. Continue to refine, For example, + if loop constians only 1-2 insns, we may unroll more times(4). + And we may use PARAM to control kinds of loop size. */ + if (loop->ninsns <= 10) + return 2; + else + return 0; + } + return nunroll; +} + /* Handler for the Mathematical Acceleration Subsystem (mass) interface to a library with vectorized intrinsics. */ diff --git a/gcc/testsuite/c-c++-common/tsan/thread_leak2.c b/gcc/testsuite/c-c++-common/tsan/thread_leak2.c index c9b8046..082f2aa 100644 --- a/gcc/testsuite/c-c++-common/tsan/thread_leak2.c +++ b/gcc/testsuite/c-c++-common/tsan/thread_leak2.c @@ -1,5 +1,9 @@ /* { dg-shouldfail "tsan" } */ +/* { dg-additional-options "-fno-unroll-loops" { target { powerpc*-*-* } } } */ +/* -fno-unroll-loops help to avoid ThreadSanitizer reporting multi-times + message for pthread_create at difference calling addresses. */ + #include #include diff --git a/gcc/testsuite/gcc.target/powerpc/loop_align.c b/gcc/testsuite/gcc.target/powerpc/loop_align.c index ebe3782..ef67f77 100644 --- a/gcc/testsuite/gcc.target/powerpc/loop_align.c +++ b/gcc/testsuite/gcc.target/powerpc/loop_align.c @@ -1,6 +1,6 @@ /* { dg-do compile { target { powerpc*-*-* } } } */ /* { dg-skip-if "" { powerpc*-*-darwin* powerpc-ibm-aix* } } */ -/* { dg-options "-O2 -mdejagnu-cpu=power7 -falign-functions=16" } */ +/* { dg-options "-O2 -mdejagnu-cpu=power7 -falign-functions=16 -fno-unroll-loops" } */ /* { dg-final { scan-assembler ".p2align 5" } } */ void f(double *a, double *b, double *c, unsigned long n) { diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-fma-1.c b/gcc/testsuite/gcc.target/powerpc/ppc-fma-1.c index b4945e6..2a5b92c 100644 --- a/gcc/testsuite/gcc.target/powerpc/ppc-fma-1.c +++ b/gcc/testsuite/gcc.target/powerpc/ppc-fma-1.c @@ -1,7 +1,7 @@ /* { dg-do compile { target { powerpc*-*-* } } } */ /* { dg-skip-if "" { powerpc*-*-darwin* } } */ /* { dg-require-effective-target powerpc_vsx_ok } */ -/* { dg-options "-O3 -ftree-vectorize -mdejagnu-cpu=power7 -ffast-math" } */ +/* { dg-options "-O3 -ftree-vectorize -mdejagnu-cpu=power7 -ffast-math -fno-unroll-loops" } */ /* { dg-final { scan-assembler-times "xvmadd" 4 } } */ /* { dg-final { scan-assembler-times "xsmadd\|fmadd\ " 2 } } */ /* { dg-final { scan-assembler-times "fmadds" 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-fma-2.c b/gcc/testsuite/gcc.target/powerpc/ppc-fma-2.c index 5ed630a..bf2c67f 100644 --- a/gcc/testsuite/gcc.target/powerpc/ppc-fma-2.c +++ b/gcc/testsuite/gcc.target/powerpc/ppc-fma-2.c @@ -1,7 +1,7 @@ /* { dg-do compile { target { powerpc*-*-* } } } */ /* { dg-skip-if "" { powerpc*-*-darwin* } } */ /* { dg-require-effective-target powerpc_vsx_ok } */ -/* { dg-options "-O3 -ftree-vectorize -mdejagnu-cpu=power7 -ffast-math -ffp-contract=off" } */ +/* { dg-options "-O3 -ftree-vectorize -mdejagnu-cpu=power7 -ffast-math -ffp-contract=off -fno-unroll-loops" } */ /* { dg-final { scan-assembler-times "xvmadd" 2 } } */ /* { dg-final { scan-assembler-times "xsmadd\|fmadd\ " 1 } } */ /* { dg-final { scan-assembler-times "fmadds" 1 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-fma-3.c b/gcc/testsuite/gcc.target/powerpc/ppc-fma-3.c index ef252b3..8608116 100644 --- a/gcc/testsuite/gcc.target/powerpc/ppc-fma-3.c +++ b/gcc/testsuite/gcc.target/powerpc/ppc-fma-3.c @@ -2,7 +2,7 @@ /* { dg-skip-if "" { powerpc*-*-darwin* } } */ /* { dg-require-effective-target powerpc_altivec_ok } */ /* { dg-require-effective-target powerpc_fprs } */ -/* { dg-options "-O3 -ftree-vectorize -mdejagnu-cpu=power6 -maltivec -ffast-math" } */ +/* { dg-options "-O3 -ftree-vectorize -mdejagnu-cpu=power6 -maltivec -ffast-math -fno-unroll-loops" } */ /* { dg-final { scan-assembler-times "vmaddfp" 2 } } */ /* { dg-final { scan-assembler-times "fmadd " 2 } } */ /* { dg-final { scan-assembler-times "fmadds" 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-fma-4.c b/gcc/testsuite/gcc.target/powerpc/ppc-fma-4.c index c2eaf1a..291c2ee 100644 --- a/gcc/testsuite/gcc.target/powerpc/ppc-fma-4.c +++ b/gcc/testsuite/gcc.target/powerpc/ppc-fma-4.c @@ -2,7 +2,7 @@ /* { dg-skip-if "" { powerpc*-*-darwin* } } */ /* { dg-require-effective-target powerpc_altivec_ok } */ /* { dg-require-effective-target powerpc_fprs } */ -/* { dg-options "-O3 -ftree-vectorize -mdejagnu-cpu=power6 -maltivec -ffast-math -ffp-contract=off" } */ +/* { dg-options "-O3 -ftree-vectorize -mdejagnu-cpu=power6 -maltivec -ffast-math -ffp-contract=off -fno-unroll-loops" } */ /* { dg-final { scan-assembler-times "vmaddfp" 1 } } */ /* { dg-final { scan-assembler-times "fmadd " 1 } } */ /* { dg-final { scan-assembler-times "fmadds" 1 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr78604.c b/gcc/testsuite/gcc.target/powerpc/pr78604.c index 76d8945..35bfdb3 100644 --- a/gcc/testsuite/gcc.target/powerpc/pr78604.c +++ b/gcc/testsuite/gcc.target/powerpc/pr78604.c @@ -1,7 +1,7 @@ /* { dg-do compile { target { powerpc*-*-* } } } */ /* { dg-skip-if "" { powerpc*-*-darwin* } } */ /* { dg-require-effective-target powerpc_p8vector_ok } */ -/* { dg-options "-mdejagnu-cpu=power8 -O2 -ftree-vectorize -fdump-tree-vect-details" } */ +/* { dg-options "-mdejagnu-cpu=power8 -O2 -ftree-vectorize -fdump-tree-vect-details -fno-unroll-loops" } */ #ifndef SIZE #define SIZE 1024 diff --git a/gcc/testsuite/gcc.target/powerpc/small-loop-unroll.c b/gcc/testsuite/gcc.target/powerpc/small-loop-unroll.c new file mode 100644 index 0000000..fec5ae9 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/small-loop-unroll.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-rtl-loop2_unroll" } */ + +void __attribute__ ((noinline)) foo(int n, int *arr) +{ + int i; + for (i = 0; i < n; i++) + arr[i] = arr[i] - 10; +} +/* { dg-final { scan-rtl-dump-times "Unrolled loop 1 times" 1 "loop2_unroll" } } */ +/* { dg-final { scan-assembler-times {\mlwz\M} 3 } } */ +/* { dg-final { scan-assembler-times {\mstw\M} 3 } } */ +