From patchwork Wed Sep 28 12:57:04 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Denys Vlasenko X-Patchwork-Id: 676217 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3skd6M5WLkz9sC4 for ; Wed, 28 Sep 2016 22:58:03 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=u3LczFXk; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references; q=dns; s= default; b=aggwQT2bd2OykNB5ey2BUDLP/eZw7bnwMDV1LUSvtgQIGZ03djMQH AFFxCMPhn6/++jk6U24Uo8zOmVPstCvkkVCOZPr1jhwofkQX88ipG6nQFBlcMiJ2 rDPUgSqkxiR0tjkEjTfOEn6nC+g8BIPSibawNkildVaptY8kkOz53I= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references; s= default; bh=OihOMqUjaCtHJayGZU6rS8OCRFs=; b=u3LczFXkOiz3MIVBIwuz BPI2edt2oO/GqYxn/Svqi6B5NimRlO7olNDE2p7paokrvhUdFnEYfqRhwS1PxZU2 +s01TqhUasyOfTucgI43xksmIMMqIrRJqZnAUUW8G6XMuUK6Oe2V2dEFb8t5DIXS wXutLaSUIxaoV0bjOHFDAHo= Received: (qmail 79751 invoked by alias); 28 Sep 2016 12:57:35 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 79696 invoked by uid 89); 28 Sep 2016 12:57:35 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-3.1 required=5.0 tests=BAYES_00, KAM_ASCII_DIVIDERS, KAM_LAZY_DOMAIN_SECURITY, RP_MATCHES_RCVD, SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=nm, fastest, pentium, NM X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 28 Sep 2016 12:57:28 +0000 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id C5BC632B82F; Wed, 28 Sep 2016 12:57:27 +0000 (UTC) Received: from localhost.localdomain (dhcp-1-129.brq.redhat.com [10.34.1.129]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u8SCvCxs016047; Wed, 28 Sep 2016 08:57:26 -0400 From: Denys Vlasenko To: gcc-patches@gcc.gnu.org Cc: Denys Vlasenko , Andrew Pinski , Uros Bizjak , Bernd Schmidt Subject: [PATCH 2/2] Extend -falign-FOO=N to N[,M[,N2[,M2]]] Date: Wed, 28 Sep 2016 14:57:04 +0200 Message-Id: <20160928125704.29624-3-dvlasenk@redhat.com> In-Reply-To: <20160928125704.29624-1-dvlasenk@redhat.com> References: <20160928125704.29624-1-dvlasenk@redhat.com> falign-functions=N is too simplistic. Ingo Molnar ran some tests and it seems that on latest x86 CPUs, 64-byte alignment of functions runs fastest (he tried many other possibilites): this way, after a call CPU can fetch a lot of insns in the first cacheline fill. However, developers are less than thrilled by the idea of a slam-dunk 64-byte aligning everything. Too much waste: On 05/20/2015 02:47 AM, Linus Torvalds wrote: > At the same time, I have to admit that I abhor a 64-byte function > alignment, when we have a fair number of functions that are (much) > smaller than that. > > Is there some way to get gcc to take the size of the function into > account? Because aligning a 16-byte or 32-byte function on a 64-byte > alignment is just criminally nasty and wasteful. This change makes it possible to align functions to 64-byte boundaries *if* this does not introduce huge amount of padding. Example syntax is -falign-functions=64,9: "align to 64 by skipping up to 9 bytes (not inclusive)". IOW: "after a call insn, CPU will always be able to fetch at least 9 bytes of insns". x86 had a tweak: -falign-functions=N with N > 8 was adding secondary alignment. For example, falign-functions=10 was emitting this before every function: .p2align 4,,9 .p2align 3 This tweak was removed by the previous patch. Now it is reinstated by the logic that if falign-functions=N[,M] is specified and N > 8, then default value of N2 is 8, not 1. But now this can be suppressed by falign-functions=N,M,1 - which wasn't possible before. In general, optional N2,M2 pair can be used to generate any secondary alignment user wants. Testing: Tested that with -falign-functions=N (tried 8, 15, 16, 17...) the alignment directives are the same before and after the patch. Tested that -falign-functions=N,N (two equal parameters) works exactly like -falign-functions=N. No change from past behavior: Tested that "-falign-functions" uses an arch-dependent alignment. Tested that "-O2" uses an arch-dependent alignment. Tested that "-O2 -falign-functions=N" uses explicitly given alignment. 2016-09-27 Denys Vlasenko * common.opt (-falign-functions=): Accept a string instead of integer. (-falign-jumps=): Likewise. (-falign-labels=): Likewise. (-falign-loops=): Likewise. * flags.h (struct target_flag_state): Revamp how alignment data is stored: for each of four alignment types, store two pairs of log/maxskip values. * toplev.c (read_uint): New function. (read_log_maxskip): New function. (parse_N_M): New function. (init_alignments): Set align_FOO[0/1].log/maxskip from specified falign-FOO=N[,M[,N[,M]]] options. * varasm.c (assemble_start_function): Call two ASM_OUTPUT_MAX_SKIP_ALIGN macros, first for N,M and second time for N2,M2 from falign-functions=N,M,N2,M2. This generates 0, 1, or 2 align directives. * config/aarch64/aarch64-protos.h (struct tune_params): Change foo_align members from integers to strings. * config/i386/i386.c (struct ptt): Likewise. * config/aarch64/aarch64.c (_tunings): Change foo_align field values from integers to strings. * config/i386/i386.c (processor_target_table): Likewise. * config/arm/arm.c (arm_override_options_after_change_1): Fix if() condition to detect that -falign-functions is specified. * config/i386/i386.c (ix86_default_align): Likewise. * common/config/i386/i386-common.c (ix86_handle_option): Remove conversion of malign_foo's to falign_foo's. The warning is still emitted. * doc/invoke.texi: Update option documentation. * testsuite/gcc.target/i386/falign-functions.c: New file. Index: gcc/common/config/i386/i386-common.c =================================================================== --- gcc/common/config/i386/i386-common.c (revision 239860) +++ gcc/common/config/i386/i386-common.c (working copy) @@ -998,36 +998,17 @@ ix86_handle_option (struct gcc_options *opts, } return true; - - /* Comes from final.c -- no real reason to change it. */ -#define MAX_CODE_ALIGN 16 - case OPT_malign_loops_: warning_at (loc, 0, "-malign-loops is obsolete, use -falign-loops"); - if (value > MAX_CODE_ALIGN) - error_at (loc, "-malign-loops=%d is not between 0 and %d", - value, MAX_CODE_ALIGN); - else - opts->x_align_loops = 1 << value; return true; case OPT_malign_jumps_: warning_at (loc, 0, "-malign-jumps is obsolete, use -falign-jumps"); - if (value > MAX_CODE_ALIGN) - error_at (loc, "-malign-jumps=%d is not between 0 and %d", - value, MAX_CODE_ALIGN); - else - opts->x_align_jumps = 1 << value; return true; case OPT_malign_functions_: warning_at (loc, 0, "-malign-functions is obsolete, use -falign-functions"); - if (value > MAX_CODE_ALIGN) - error_at (loc, "-malign-functions=%d is not between 0 and %d", - value, MAX_CODE_ALIGN); - else - opts->x_align_functions = 1 << value; return true; case OPT_mbranch_cost_: Index: gcc/common.opt =================================================================== --- gcc/common.opt (revision 239860) +++ gcc/common.opt (working copy) @@ -896,32 +896,32 @@ Common Report Var(flag_aggressive_loop_optimizatio Aggressively optimize loops using language constraints. falign-functions -Common Report Var(align_functions,0) Optimization UInteger +Common Report Var(flag_align_functions) Optimization Align the start of functions. falign-functions= -Common RejectNegative Joined UInteger Var(align_functions) +Common RejectNegative Joined Var(str_align_functions) falign-jumps -Common Report Var(align_jumps,0) Optimization UInteger +Common Report Var(flag_align_jumps) Optimization Align labels which are only reached by jumping. falign-jumps= -Common RejectNegative Joined UInteger Var(align_jumps) +Common RejectNegative Joined Var(str_align_jumps) falign-labels -Common Report Var(align_labels,0) Optimization UInteger +Common Report Var(flag_align_labels) Optimization Align all labels. falign-labels= -Common RejectNegative Joined UInteger Var(align_labels) +Common RejectNegative Joined Var(str_align_labels) falign-loops -Common Report Var(align_loops,0) Optimization UInteger +Common Report Var(flag_align_loops) Optimization Align the start of loops. falign-loops= -Common RejectNegative Joined UInteger Var(align_loops) +Common RejectNegative Joined Var(str_align_loops) fargument-alias Common Ignore Index: gcc/config/aarch64/aarch64-protos.h =================================================================== --- gcc/config/aarch64/aarch64-protos.h (revision 239860) +++ gcc/config/aarch64/aarch64-protos.h (working copy) @@ -208,9 +208,9 @@ struct tune_params int memmov_cost; int issue_rate; unsigned int fusible_ops; - int function_align; - int jump_align; - int loop_align; + const char *function_align; + const char *jump_align; + const char *loop_align; int int_reassoc_width; int fp_reassoc_width; int vec_reassoc_width; Index: gcc/config/aarch64/aarch64.c =================================================================== --- gcc/config/aarch64/aarch64.c (revision 239860) +++ gcc/config/aarch64/aarch64.c (working copy) @@ -521,9 +521,9 @@ static const struct tune_params generic_tunings = 4, /* memmov_cost */ 2, /* issue_rate */ AARCH64_FUSE_NOTHING, /* fusible_ops */ - 8, /* function_align. */ - 8, /* jump_align. */ - 4, /* loop_align. */ + "8", /* function_align. */ + "8", /* jump_align. */ + "4", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ 1, /* vec_reassoc_width. */ @@ -547,9 +547,9 @@ static const struct tune_params cortexa35_tunings 1, /* issue_rate */ (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops */ - 16, /* function_align. */ - 8, /* jump_align. */ - 8, /* loop_align. */ + "16", /* function_align. */ + "8", /* jump_align. */ + "8", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ 1, /* vec_reassoc_width. */ @@ -573,9 +573,9 @@ static const struct tune_params cortexa53_tunings 2, /* issue_rate */ (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops */ - 16, /* function_align. */ - 8, /* jump_align. */ - 8, /* loop_align. */ + "16", /* function_align. */ + "8", /* jump_align. */ + "8", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ 1, /* vec_reassoc_width. */ @@ -599,9 +599,9 @@ static const struct tune_params cortexa57_tunings 3, /* issue_rate */ (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops */ - 16, /* function_align. */ - 8, /* jump_align. */ - 8, /* loop_align. */ + "16", /* function_align. */ + "8", /* jump_align. */ + "8", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ 1, /* vec_reassoc_width. */ @@ -625,9 +625,9 @@ static const struct tune_params cortexa72_tunings 3, /* issue_rate */ (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops */ - 16, /* function_align. */ - 8, /* jump_align. */ - 8, /* loop_align. */ + "16", /* function_align. */ + "8", /* jump_align. */ + "8", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ 1, /* vec_reassoc_width. */ @@ -651,9 +651,9 @@ static const struct tune_params cortexa73_tunings 2, /* issue_rate. */ (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops */ - 16, /* function_align. */ - 8, /* jump_align. */ - 8, /* loop_align. */ + "16", /* function_align. */ + "8", /* jump_align. */ + "8", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ 1, /* vec_reassoc_width. */ @@ -676,9 +676,9 @@ static const struct tune_params exynosm1_tunings = 4, /* memmov_cost */ 3, /* issue_rate */ (AARCH64_FUSE_AES_AESMC), /* fusible_ops */ - 4, /* function_align. */ - 4, /* jump_align. */ - 4, /* loop_align. */ + "4", /* function_align. */ + "4", /* jump_align. */ + "4", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ 1, /* vec_reassoc_width. */ @@ -701,9 +701,9 @@ static const struct tune_params thunderx_tunings = 6, /* memmov_cost */ 2, /* issue_rate */ AARCH64_FUSE_CMP_BRANCH, /* fusible_ops */ - 8, /* function_align. */ - 8, /* jump_align. */ - 8, /* loop_align. */ + "8", /* function_align. */ + "8", /* jump_align. */ + "8", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ 1, /* vec_reassoc_width. */ @@ -726,9 +726,9 @@ static const struct tune_params xgene1_tunings = 6, /* memmov_cost */ 4, /* issue_rate */ AARCH64_FUSE_NOTHING, /* fusible_ops */ - 16, /* function_align. */ - 8, /* jump_align. */ - 16, /* loop_align. */ + "16", /* function_align. */ + "8", /* jump_align. */ + "16", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ 1, /* vec_reassoc_width. */ @@ -752,9 +752,9 @@ static const struct tune_params qdf24xx_tunings = 4, /* issue_rate */ (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_MOVK_MOVK), /* fuseable_ops */ - 16, /* function_align. */ - 8, /* jump_align. */ - 16, /* loop_align. */ + "16", /* function_align. */ + "8", /* jump_align. */ + "16", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ 1, /* vec_reassoc_width. */ @@ -777,9 +777,9 @@ static const struct tune_params vulcan_tunings = 4, /* memmov_cost. */ 4, /* issue_rate. */ AARCH64_FUSE_NOTHING, /* fuseable_ops. */ - 16, /* function_align. */ - 8, /* jump_align. */ - 16, /* loop_align. */ + "16", /* function_align. */ + "8", /* jump_align. */ + "16", /* loop_align. */ 3, /* int_reassoc_width. */ 2, /* fp_reassoc_width. */ 2, /* vec_reassoc_width. */ Index: gcc/config/arm/arm.c =================================================================== --- gcc/config/arm/arm.c (revision 239860) +++ gcc/config/arm/arm.c (working copy) @@ -2899,9 +2899,10 @@ static GTY(()) tree init_optimize; static void arm_override_options_after_change_1 (struct gcc_options *opts) { - if (opts->x_align_functions <= 0) - opts->x_align_functions = TARGET_THUMB_P (opts->x_target_flags) - && opts->x_optimize_size ? 2 : 4; + /* -falign-functions without argument: supply one */ + if (opts->x_flag_align_functions && !opts->x_str_align_functions) + opts->x_str_align_functions = TARGET_THUMB_P (opts->x_target_flags) + && opts->x_optimize_size ? "2" : "4"; } /* Implement targetm.override_options_after_change. */ Index: gcc/config/i386/i386.c =================================================================== --- gcc/config/i386/i386.c (revision 239860) +++ gcc/config/i386/i386.c (working copy) @@ -2626,45 +2626,43 @@ struct ptt { const char *const name; /* processor name */ const struct processor_costs *cost; /* Processor costs */ - const int align_loop; /* Default alignments. */ - const int align_loop_max_skip; - const int align_jump; - const int align_jump_max_skip; - const int align_func; + const char *const align_loop; /* Default alignments. */ + const char *const align_jump; + const char *const align_func; }; /* This table must be in sync with enum processor_type in i386.h. */ static const struct ptt processor_target_table[PROCESSOR_max] = { - {"generic", &generic_cost, 16, 10, 16, 10, 16}, - {"i386", &i386_cost, 4, 3, 4, 3, 4}, - {"i486", &i486_cost, 16, 15, 16, 15, 16}, - {"pentium", &pentium_cost, 16, 7, 16, 7, 16}, - {"lakemont", &lakemont_cost, 16, 7, 16, 7, 16}, - {"pentiumpro", &pentiumpro_cost, 16, 15, 16, 10, 16}, - {"pentium4", &pentium4_cost, 0, 0, 0, 0, 0}, - {"nocona", &nocona_cost, 0, 0, 0, 0, 0}, - {"core2", &core_cost, 16, 10, 16, 10, 16}, - {"nehalem", &core_cost, 16, 10, 16, 10, 16}, - {"sandybridge", &core_cost, 16, 10, 16, 10, 16}, - {"haswell", &core_cost, 16, 10, 16, 10, 16}, - {"bonnell", &atom_cost, 16, 15, 16, 7, 16}, - {"silvermont", &slm_cost, 16, 15, 16, 7, 16}, - {"knl", &slm_cost, 16, 15, 16, 7, 16}, - {"skylake-avx512", &core_cost, 16, 10, 16, 10, 16}, - {"intel", &intel_cost, 16, 15, 16, 7, 16}, - {"geode", &geode_cost, 0, 0, 0, 0, 0}, - {"k6", &k6_cost, 32, 7, 32, 7, 32}, - {"athlon", &athlon_cost, 16, 7, 16, 7, 16}, - {"k8", &k8_cost, 16, 7, 16, 7, 16}, - {"amdfam10", &amdfam10_cost, 32, 24, 32, 7, 32}, - {"bdver1", &bdver1_cost, 16, 10, 16, 7, 11}, - {"bdver2", &bdver2_cost, 16, 10, 16, 7, 11}, - {"bdver3", &bdver3_cost, 16, 10, 16, 7, 11}, - {"bdver4", &bdver4_cost, 16, 10, 16, 7, 11}, - {"btver1", &btver1_cost, 16, 10, 16, 7, 11}, - {"btver2", &btver2_cost, 16, 10, 16, 7, 11}, - {"znver1", &znver1_cost, 16, 10, 16, 7, 11} + {"generic", &generic_cost, "16,11", "16,11","16"}, + {"i386", &i386_cost, "4", "4", "4"}, + {"i486", &i486_cost, "16", "16", "16"}, + {"pentium", &pentium_cost, "16,8", "16,8", "16"}, + {"lakemont", &lakemont_cost,"16,8", "16,8", "16"}, + {"pentiumpro", &pentiumpro_cost,"16", "16,11","16"}, + {"pentium4", &pentium4_cost,NULL, NULL, NULL}, + {"nocona", &nocona_cost, NULL, NULL, NULL}, + {"core2", &core_cost, "16,11", "16,11","16"}, + {"nehalem", &core_cost, "16,11", "16,11","16"}, + {"sandybridge",&core_cost, "16,11", "16,11","16"}, + {"haswell", &core_cost, "16,11", "16,11","16"}, + {"bonnell", &atom_cost, "16", "16,8", "16"}, + {"silvermont", &slm_cost, "16", "16,8", "16"}, + {"knl", &slm_cost, "16", "16,8", "16"}, + {"skylake-avx512", &core_cost,"16,11", "16,11","16"}, + {"intel", &intel_cost, "16", "16,8", "16"}, + {"geode", &geode_cost, NULL, NULL, NULL}, + {"k6", &k6_cost, "32,8", "32,8", "32"}, + {"athlon", &athlon_cost, "16,8", "16,8", "16"}, + {"k8", &k8_cost, "16,8", "16,8", "16"}, + {"amdfam10", &amdfam10_cost,"32,25", "32,8", "32"}, + {"bdver1", &bdver1_cost, "16,11", "16,8", "11"}, + {"bdver2", &bdver2_cost, "16,11", "16,8", "11"}, + {"bdver3", &bdver3_cost, "16,11", "16,8", "11"}, + {"bdver4", &bdver4_cost, "16,11", "16,8", "11"}, + {"btver1", &btver1_cost, "16,11", "16,8", "11"}, + {"btver2", &btver2_cost, "16,11", "16,8", "11"}, + {"znver1", &znver1_cost, "16,11", "16,8", "11"} }; static unsigned int @@ -4695,19 +4693,18 @@ set_ix86_tune_features (enum processor_type ix86_t static void ix86_default_align (struct gcc_options *opts) { - if (opts->x_align_loops == 0) + /* -falign-foo without argument: supply one */ + if (opts->x_flag_align_loops && !opts->x_str_align_loops) { - opts->x_align_loops = processor_target_table[ix86_tune].align_loop; - align_loops_max_skip = processor_target_table[ix86_tune].align_loop_max_skip; + opts->x_str_align_loops = processor_target_table[ix86_tune].align_loop; } - if (opts->x_align_jumps == 0) + if (opts->x_flag_align_jumps && !opts->x_str_align_jumps) { - opts->x_align_jumps = processor_target_table[ix86_tune].align_jump; - align_jumps_max_skip = processor_target_table[ix86_tune].align_jump_max_skip; + opts->x_str_align_jumps = processor_target_table[ix86_tune].align_jump; } - if (opts->x_align_functions == 0) + if (opts->x_flag_align_functions && !opts->x_str_align_functions) { - opts->x_align_functions = processor_target_table[ix86_tune].align_func; + opts->x_str_align_functions = processor_target_table[ix86_tune].align_func; } } Index: gcc/doc/invoke.texi =================================================================== --- gcc/doc/invoke.texi (revision 240570) +++ gcc/doc/invoke.texi (working copy) @@ -339,9 +339,11 @@ Objective-C and Objective-C++ Dialects}. @item Optimization Options @xref{Optimize Options,,Options that Control Optimization}. -@gccoptlist{-faggressive-loop-optimizations -falign-functions[=@var{n}] @gol --falign-jumps[=@var{n}] @gol --falign-labels[=@var{n}] -falign-loops[=@var{n}] @gol +@gccoptlist{-faggressive-loop-optimizations @gol +-falign-functions[=@var{n}[,@var{m},[@var{n}[,@var{m}]]]] @gol +-falign-jumps[=@var{n}[,@var{m},[@var{n}[,@var{m}]]]] @gol +-falign-labels[=@var{n}[,@var{m},[@var{n}[,@var{m}]]]] @gol +-falign-loops[=@var{n}[,@var{m},[@var{n}[,@var{m}]]]] @gol -fassociative-math -fauto-profile -fauto-profile[=@var{path}] @gol -fauto-inc-dec -fbranch-probabilities @gol -fbranch-target-load-optimize -fbranch-target-load-optimize2 @gol @@ -8230,13 +8232,24 @@ The @option{-fstrict-overflow} option is enabled a @item -falign-functions @itemx -falign-functions=@var{n} +@itemx -falign-functions=@var{n},@var{m} +@itemx -falign-functions=@var{n},@var{m},@var{n},@var{m} @opindex falign-functions +If @var{m} is not specified, it defaults to @var{n}. Align the start of functions to the next power-of-two greater than -@var{n}, skipping up to @var{n} bytes. For instance, +@var{n}, skipping up to @var{m}-1 bytes. For instance, @option{-falign-functions=32} aligns functions to the next 32-byte -boundary, but @option{-falign-functions=24} aligns to the next -32-byte boundary only if this can be done by skipping 23 bytes or less. +boundary, @option{-falign-functions=24} aligns to the next +32-byte boundary only if this can be done by skipping 23 bytes or less, +@option{-falign-functions=32,7} aligns to the next +32-byte boundary only if this can be done by skipping 6 bytes or less. +The second pair of @var{n},@var{m} values allows to have a secondary +alignment: @option{-falign-functions=64,7,32,3} aligns to the next +64-byte boundary if this can be done by skipping 6 bytes or less, +otherwise aligns to the next 32-byte boundary if this can be done +by skipping 2 bytes or less. + @option{-fno-align-functions} and @option{-falign-functions=1} are equivalent and mean that functions are not aligned. @@ -8249,9 +8262,12 @@ Enabled at levels @option{-O2}, @option{-O3}. @item -falign-labels @itemx -falign-labels=@var{n} +@itemx -falign-labels=@var{n},@var{m} +@itemx -falign-labels=@var{n},@var{m},@var{n},@var{m} @opindex falign-labels +If @var{m} is not specified, it defaults to @var{n}. Align all branch targets to a power-of-two boundary, skipping up to -@var{n} bytes like @option{-falign-functions}. This option can easily +@var{m}-1 bytes like @option{-falign-functions}. This option can easily make code slower, because it must insert dummy operations for when the branch target is reached in the usual flow of the code. @@ -8268,8 +8284,11 @@ Enabled at levels @option{-O2}, @option{-O3}. @item -falign-loops @itemx -falign-loops=@var{n} +@itemx -falign-loops=@var{n},@var{m} +@itemx -falign-loops=@var{n},@var{m},@var{n},@var{m} @opindex falign-loops -Align loops to a power-of-two boundary, skipping up to @var{n} bytes +If @var{m} is not specified, it defaults to @var{n}. +Align loops to a power-of-two boundary, skipping up to @var{m}-1 bytes like @option{-falign-functions}. If the loops are executed many times, this makes up for any execution of the dummy operations. @@ -8283,9 +8302,12 @@ Enabled at levels @option{-O2}, @option{-O3}. @item -falign-jumps @itemx -falign-jumps=@var{n} +@itemx -falign-jumps=@var{n},@var{m} +@itemx -falign-jumps=@var{n},@var{m},@var{n},@var{m} @opindex falign-jumps +If @var{m} is not specified, it defaults to @var{n}. Align branch targets to a power-of-two boundary, for branch targets -where the targets can only be reached by jumping, skipping up to @var{n} +where the targets can only be reached by jumping, skipping up to @var{m}-1 bytes like @option{-falign-functions}. In this case, no dummy operations need be executed. Index: gcc/flags.h =================================================================== --- gcc/flags.h (revision 239860) +++ gcc/flags.h (working copy) @@ -43,19 +43,22 @@ extern bool final_insns_dump_p; /* Other basic status info about current function. */ /* Target-dependent global state. */ -struct target_flag_state { +struct align_flags { /* Values of the -falign-* flags: how much to align labels in code. - 0 means `use default', 1 means `don't align'. - For each variable, there is an _log variant which is the power - of two not less than the variable, for .align output. */ - int x_align_loops_log; - int x_align_loops_max_skip; - int x_align_jumps_log; - int x_align_jumps_max_skip; - int x_align_labels_log; - int x_align_labels_max_skip; - int x_align_functions_log; + log is "align to 2^log" (so 0 means no alignment). + maxskip is the maximum allowed amount of padding to insert. */ + int log; + int maxskip; +}; +struct target_flag_state { + /* Each falign-foo can generate up to two levels of alignment: + -falign-foo=N,M[,N2,M2] */ + struct align_flags x_align_loops[2]; + struct align_flags x_align_jumps[2]; + struct align_flags x_align_labels[2]; + struct align_flags x_align_functions[2]; + /* The excess precision currently in effect. */ enum excess_precision x_flag_excess_precision; }; @@ -67,20 +70,21 @@ extern struct target_flag_state *this_target_flag_ #define this_target_flag_state (&default_target_flag_state) #endif -#define align_loops_log \ - (this_target_flag_state->x_align_loops_log) -#define align_loops_max_skip \ - (this_target_flag_state->x_align_loops_max_skip) -#define align_jumps_log \ - (this_target_flag_state->x_align_jumps_log) -#define align_jumps_max_skip \ - (this_target_flag_state->x_align_jumps_max_skip) -#define align_labels_log \ - (this_target_flag_state->x_align_labels_log) -#define align_labels_max_skip \ - (this_target_flag_state->x_align_labels_max_skip) -#define align_functions_log \ - (this_target_flag_state->x_align_functions_log) +#define align_loops (this_target_flag_state->x_align_loops) +#define align_jumps (this_target_flag_state->x_align_jumps) +#define align_labels (this_target_flag_state->x_align_labels) +#define align_functions (this_target_flag_state->x_align_functions) +#define align_loops_log (align_loops[0].log) +#define align_jumps_log (align_jumps[0].log) +#define align_labels_log (align_labels[0].log) +#define align_functions_log (align_functions[0].log) +#define align_loops_max_skip (align_loops[0].maxskip) +#define align_jumps_max_skip (align_jumps[0].maxskip) +#define align_labels_max_skip (align_labels[0].maxskip) +#define align_functions_max_skip (align_functions[0].maxskip) +/* String representaions of the above options are available in + const char *str_align_foo. NULL if not set. */ + #define flag_excess_precision \ (this_target_flag_state->x_flag_excess_precision) Index: gcc/testsuite/gcc.target/i386/falign-functions.c =================================================================== --- gcc/testsuite/gcc.target/i386/falign-functions.c (nonexistent) +++ gcc/testsuite/gcc.target/i386/falign-functions.c (working copy) @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -falign-functions=64,8" } */ +/* { dg-final { scan-assembler ".p2align 6,,7" } } */ + +void +test_func (void) +{ +} Index: gcc/toplev.c =================================================================== --- gcc/toplev.c (revision 239860) +++ gcc/toplev.c (working copy) @@ -1177,29 +1177,78 @@ target_supports_section_anchors_p (void) return true; } -/* Default the align_* variables to 1 if they're still unset, and - set up the align_*_log variables. */ +static const char * +read_uint (const char *flag, const char *name, int *np) +{ + const char *flag_start = flag; + int n = 0; + char c; + + while ((c = *flag++) >= '0' && c <= '9') + n = n*10 + (c-'0'); + *np = n & 0x3fffffff; /* avoid accidentally negative numbers */ + if (c == '\0') + return NULL; + if (c == ',') + return flag; + + error_at (UNKNOWN_LOCATION, "-falign-%s parameter is bad at '%s'", + name, flag_start); + return NULL; +} + +static const char * +read_log_maxskip (const char *flag, const char *name, struct align_flags *a) +{ + int n, m; + flag = read_uint (flag, name, &a->log); + n = a->log; + if (n != 0) + a->log = floor_log2 (n * 2 - 1); + if (!flag) + { + a->maxskip = n ? n - 1 : 0; + return flag; + } + flag = read_uint (flag, name, &a->maxskip); + m = a->maxskip; + if (m > n) m = n; + if (m > 0) m--; /* -falign-foo=N,M means M-1 max bytes of padding, not M */ + a->maxskip = m; + return flag; +} + static void +parse_N_M (const char *flag, const char *name, struct align_flags a[2]) +{ + if (flag) + { + flag = read_log_maxskip (flag, name, &a[0]); + if (flag) + flag = read_log_maxskip (flag, name, &a[1]); +#ifdef SUBALIGN_LOG + /* Before -falign-foo=N,M,N2,M2 was introduced, x86 had a tweak. + -falign-functions=N with N > 8 was adding secondary alignment. + -falign-functions=10 was emitting this before every function: + .p2align 4,,9 + .p2align 3 + Now this behavior (and more) can be explicitly requested: + -falign-functions=16,10,8 + Retain old behavior if N2 is missing: */ + else if (a[0].log > SUBALIGN_LOG) + a[1].log = SUBALIGN_LOG; +#endif + } +} + +/* Process -falign-foo=N[,M[,N2[,M2]]] options. */ +static void init_alignments (void) { - if (align_loops <= 0) - align_loops = 1; - if (align_loops_max_skip > align_loops) - align_loops_max_skip = align_loops - 1; - align_loops_log = floor_log2 (align_loops * 2 - 1); - if (align_jumps <= 0) - align_jumps = 1; - if (align_jumps_max_skip > align_jumps) - align_jumps_max_skip = align_jumps - 1; - align_jumps_log = floor_log2 (align_jumps * 2 - 1); - if (align_labels <= 0) - align_labels = 1; - align_labels_log = floor_log2 (align_labels * 2 - 1); - if (align_labels_max_skip > align_labels) - align_labels_max_skip = align_labels - 1; - if (align_functions <= 0) - align_functions = 1; - align_functions_log = floor_log2 (align_functions * 2 - 1); + parse_N_M (str_align_loops, "loops", align_loops); + parse_N_M (str_align_jumps, "jumps", align_jumps); + parse_N_M (str_align_labels, "labels", align_labels); + parse_N_M (str_align_functions, "functions", align_functions); } /* Process the options that have been parsed. */ Index: gcc/varasm.c =================================================================== --- gcc/varasm.c (revision 239860) +++ gcc/varasm.c (working copy) @@ -1791,7 +1791,9 @@ assemble_start_function (tree decl, const char *fn { #ifdef ASM_OUTPUT_MAX_SKIP_ALIGN ASM_OUTPUT_MAX_SKIP_ALIGN (asm_out_file, - align_functions_log, align_functions - 1); + align_functions[0].log, align_functions[0].maxskip); + ASM_OUTPUT_MAX_SKIP_ALIGN (asm_out_file, + align_functions[1].log, align_functions[1].maxskip); #else ASM_OUTPUT_ALIGN (asm_out_file, align_functions_log); #endif