diff mbox

New parameters to control stringop expansion libcall strategy

Message ID CAAkRFZJNrW0WRJQwOUUz1NhHLXKiu44pwEkNSyJj70RNgCOY9w@mail.gmail.com
State New
Headers show

Commit Message

Xinliang David Li Aug. 5, 2013, 4:44 p.m. UTC
thanks. Updated patch attached.

David

On Mon, Aug 5, 2013 at 3:57 AM, Michael V. Zolotukhin
<michael.v.zolotukhin@gmail.com> wrote:
> Hi,
> This is a really convenient option, thanks for working on it.
> I can't approve it as I'm not a maintainer, but it looks ok to me,
> except fot a small nitpicking: afair, comments should end with
> dot-space-space.
>
> Michael
>
> On 04 Aug 20:01, Xinliang David Li wrote:
>> The attached is a new patch implementing the stringop inline strategy
>> control using two new -m options:
>>
>> -mmemcpy-strategy=
>> -mmemset-strategy=
>>
>> See changes in doc/invoke.texi for description of the new options. Example:
>>   -mmemcpy-strategy=rep_8byte:64:unaligned,unrolled_loop:2048:unaligned,libcall:-1:unaligned
>>
>> tells compiler to inline memcpy using rep_8byte when the size is no
>> larger than 64 byte, using unrolled_loop when size is no larger than
>> 2048, and for size > 2048, using library call. In all cases,
>> destination alignment adjustment is not done.
>>
>> Tested on x86-64/linux. Ok for trunk?
>>
>> thanks,
>>
>> David
>>
>> 2013-08-02  Xinliang David Li  <davidxl@google.com>
>>
>>         * config/i386/stringop.def: New file.
>>         * config/i386/stringop.opt: New file.
>>         * config/i386/i386-opts.h: Include stringopt.def.
>>         * config/i386/i386.opt: Include stringopt.opt.
>>         * config/i386/i386.c (ix86_option_override_internal):
>>         Override default size based stringop inline strategies
>>         with options.
>>         * config/i386/i386.c (ix86_parse_stringop_strategy_string):
>>         New function.
>>
>> 2013-08-04  Xinliang David Li  <davidxl@google.com>
>>
>>         * testsuite/gcc.target/i386/memcpy-strategy-1.c: New test.
>>         * testsuite/gcc.target/i386/memcpy-strategy-2.c: Ditto.
>>         * testsuite/gcc.target/i386/memset-strategy-1.c: Ditto.
>>         * testsuite/gcc.target/i386/memcpy-strategy-3.c: Ditto.
>>
>>
>>
>>
>> On Fri, Aug 2, 2013 at 9:21 PM, Xinliang David Li <davidxl@google.com> wrote:
>> > On x86_64, when the expected size of memcpy/memset is known (e.g, with
>> > FDO), libcall strategy is used with the size is > 8192. This value is
>> > hard coded, which makes it hard to do performance tuning. This patch
>> > adds two new parameters to do that. Potential usage includes
>> > per-application libcall strategy min-size tuning based on summary data
>> > with FDO (e.g, instruction workset size).
>> >
>> > Bootstrap and tested on x86_64/linux. Ok for trunk?
>> >
>> > thanks,
>> >
>> > David
>> >
>> >
>> > 2013-08-02  Xinliang David Li  <davidxl@google.com>
>> >
>> >         * params.def: New parameters.
>> >         * config/i386/i386.c (ix86_option_override_internal):
>> >         Override default libcall size limit with parameters.
>
>> Index: config/i386/stringop.def
>> ===================================================================
>> --- config/i386/stringop.def  (revision 0)
>> +++ config/i386/stringop.def  (revision 0)
>> @@ -0,0 +1,42 @@
>> +/* Definitions for option handling for IA-32.
>> +   Copyright (C) 2013 Free Software Foundation, Inc.
>> +
>> +This file is part of GCC.
>> +
>> +GCC is free software; you can redistribute it and/or modify
>> +it under the terms of the GNU General Public License as published by
>> +the Free Software Foundation; either version 3, or (at your option)
>> +any later version.
>> +
>> +GCC is distributed in the hope that it will be useful,
>> +but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +GNU General Public License for more details.
>> +
>> +Under Section 7 of GPL version 3, you are granted additional
>> +permissions described in the GCC Runtime Library Exception, version
>> +3.1, as published by the Free Software Foundation.
>> +
>> +You should have received a copy of the GNU General Public License and
>> +a copy of the GCC Runtime Library Exception along with this program;
>> +see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>> +<http://www.gnu.org/licenses/>.  */
>> +
>> +DEF_ENUM
>> +DEF_ALG (no_stringop, no_stringop)
>> +DEF_ENUM
>> +DEF_ALG (libcall, libcall)
>> +DEF_ENUM
>> +DEF_ALG (rep_prefix_1_byte, rep_byte)
>> +DEF_ENUM
>> +DEF_ALG (rep_prefix_4_byte, rep_4byte)
>> +DEF_ENUM
>> +DEF_ALG (rep_prefix_8_byte, rep_8byte)
>> +DEF_ENUM
>> +DEF_ALG (loop_1_byte, byte_loop)
>> +DEF_ENUM
>> +DEF_ALG (loop, loop)
>> +DEF_ENUM
>> +DEF_ALG (unrolled_loop, unrolled_loop)
>> +DEF_ENUM
>> +DEF_ALG (vector_loop, vector_loop)
>> Index: config/i386/i386.opt
>> ===================================================================
>> --- config/i386/i386.opt      (revision 201458)
>> +++ config/i386/i386.opt      (working copy)
>> @@ -316,6 +316,14 @@ mstack-arg-probe
>>  Target Report Mask(STACK_PROBE) Save
>>  Enable stack probing
>>
>> +mmemcpy-strategy=
>> +Target RejectNegative Joined Var(ix86_tune_memcpy_strategy)
>> +Specify memcpy expansion strategy when expected size is known
>> +
>> +mmemset-strategy=
>> +Target RejectNegative Joined Var(ix86_tune_memset_strategy)
>> +Specify memset expansion strategy when expected size is known
>> +
>>  mstringop-strategy=
>>  Target RejectNegative Joined Enum(stringop_alg) Var(ix86_stringop_alg) Init(no_stringop)
>>  Chose strategy to generate stringop using
>> Index: config/i386/stringop.opt
>> ===================================================================
>> --- config/i386/stringop.opt  (revision 0)
>> +++ config/i386/stringop.opt  (revision 0)
>> @@ -0,0 +1,36 @@
>> +/* Definitions for option handling for IA-32.
>> +   Copyright (C) 2013 Free Software Foundation, Inc.
>> +
>> +This file is part of GCC.
>> +
>> +GCC is free software; you can redistribute it and/or modify
>> +it under the terms of the GNU General Public License as published by
>> +the Free Software Foundation; either version 3, or (at your option)
>> +any later version.
>> +
>> +GCC is distributed in the hope that it will be useful,
>> +but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +GNU General Public License for more details.
>> +
>> +Under Section 7 of GPL version 3, you are granted additional
>> +permissions described in the GCC Runtime Library Exception, version
>> +3.1, as published by the Free Software Foundation.
>> +
>> +You should have received a copy of the GNU General Public License and
>> +a copy of the GCC Runtime Library Exception along with this program;
>> +see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>> +<http://www.gnu.org/licenses/>.  */
>> +
>> +Enum(stringop_alg) String(rep_byte) Value(rep_prefix_1_byte)
>> +
>> +#undef DEF_ENUM
>> +#define DEF_ENUM EnumValue
>> +
>> +#undef DEF_ALG
>> +#define DEF_ALG(alg, name) Enum(stringop_alg) String(name) Value(alg)
>> +
>> +#include "stringop.def"
>> +
>> +#undef DEF_ENUM
>> +#undef DEF_ALG
>> Index: config/i386/i386.c
>> ===================================================================
>> --- config/i386/i386.c        (revision 201458)
>> +++ config/i386/i386.c        (working copy)
>> @@ -156,7 +156,7 @@ struct processor_costs ix86_size_cost =
>>  };
>>
>>  /* Processor costs (relative to an add) */
>> -static const
>> +static
>>  struct processor_costs i386_cost = { /* 386 specific costs */
>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
>> @@ -226,7 +226,7 @@ struct processor_costs i386_cost = {      /*
>>    1,                                 /* cond_not_taken_branch_cost.  */
>>  };
>>
>> -static const
>> +static
>>  struct processor_costs i486_cost = { /* 486 specific costs */
>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
>> @@ -298,7 +298,7 @@ struct processor_costs i486_cost = {      /*
>>    1,                                 /* cond_not_taken_branch_cost.  */
>>  };
>>
>> -static const
>> +static
>>  struct processor_costs pentium_cost = {
>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
>> @@ -368,7 +368,7 @@ struct processor_costs pentium_cost = {
>>    1,                                 /* cond_not_taken_branch_cost.  */
>>  };
>>
>> -static const
>> +static
>>  struct processor_costs pentiumpro_cost = {
>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
>> @@ -447,7 +447,7 @@ struct processor_costs pentiumpro_cost =
>>    1,                                 /* cond_not_taken_branch_cost.  */
>>  };
>>
>> -static const
>> +static
>>  struct processor_costs geode_cost = {
>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
>> @@ -518,7 +518,7 @@ struct processor_costs geode_cost = {
>>    1,                                 /* cond_not_taken_branch_cost.  */
>>  };
>>
>> -static const
>> +static
>>  struct processor_costs k6_cost = {
>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>    COSTS_N_INSNS (2),                 /* cost of a lea instruction */
>> @@ -591,7 +591,7 @@ struct processor_costs k6_cost = {
>>    1,                                 /* cond_not_taken_branch_cost.  */
>>  };
>>
>> -static const
>> +static
>>  struct processor_costs athlon_cost = {
>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>    COSTS_N_INSNS (2),                 /* cost of a lea instruction */
>> @@ -664,7 +664,7 @@ struct processor_costs athlon_cost = {
>>    1,                                 /* cond_not_taken_branch_cost.  */
>>  };
>>
>> -static const
>> +static
>>  struct processor_costs k8_cost = {
>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>    COSTS_N_INSNS (2),                 /* cost of a lea instruction */
>> @@ -1265,7 +1265,7 @@ struct processor_costs btver2_cost = {
>>    1,                                 /* cond_not_taken_branch_cost.  */
>>  };
>>
>> -static const
>> +static
>>  struct processor_costs pentium4_cost = {
>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>    COSTS_N_INSNS (3),                 /* cost of a lea instruction */
>> @@ -1336,7 +1336,7 @@ struct processor_costs pentium4_cost = {
>>    1,                                 /* cond_not_taken_branch_cost.  */
>>  };
>>
>> -static const
>> +static
>>  struct processor_costs nocona_cost = {
>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
>> @@ -1409,7 +1409,7 @@ struct processor_costs nocona_cost = {
>>    1,                                 /* cond_not_taken_branch_cost.  */
>>  };
>>
>> -static const
>> +static
>>  struct processor_costs atom_cost = {
>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>    COSTS_N_INSNS (1) + 1,             /* cost of a lea instruction */
>> @@ -1556,7 +1556,7 @@ struct processor_costs slm_cost = {
>>  };
>>
>>  /* Generic64 should produce code tuned for Nocona and K8.  */
>> -static const
>> +static
>>  struct processor_costs generic64_cost = {
>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>    /* On all chips taken into consideration lea is 2 cycles and more.  With
>> @@ -1635,7 +1635,7 @@ struct processor_costs generic64_cost =
>>  };
>>
>>  /* core_cost should produce code tuned for Core familly of CPUs.  */
>> -static const
>> +static
>>  struct processor_costs core_cost = {
>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>    /* On all chips taken into consideration lea is 2 cycles and more.  With
>> @@ -1717,7 +1717,7 @@ struct processor_costs core_cost = {
>>
>>  /* Generic32 should produce code tuned for PPro, Pentium4, Nocona,
>>     Athlon and K8.  */
>> -static const
>> +static
>>  struct processor_costs generic32_cost = {
>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>    COSTS_N_INSNS (1) + 1,             /* cost of a lea instruction */
>> @@ -2900,6 +2900,150 @@ ix86_debug_options (void)
>>
>>    return;
>>  }
>> +
>> +static const char *stringop_alg_names[] = {
>> +#define DEF_ENUM
>> +#define DEF_ALG(alg, name) #name,
>> +#include "stringop.def"
>> +#undef DEF_ENUM
>> +#undef DEF_ALG
>> +};
>> +
>> +/* Parse parameter string passed to -mmemcpy-strategy= or -mmemset-strategy=.
>> +   The string is of the following form (or comma separated list of it):
>> +
>> +     strategy_alg:max_size:[align|noalign]
>> +
>> +   where the full size range for the strategy is either [0, max_size] or
>> +   [min_size, max_size], in which min_size is the max_size + 1 of the
>> +   preceding range.  The last size range must have max_size == -1.
>> +
>> +   Examples:
>> +
>> +    1.
>> +       -mmemcpy-strategy=libcall:-1:noalign
>> +
>> +      this is equivalent to (for known size memcpy) -mstringop-strategy=libcall
>> +
>> +
>> +   2.
>> +      -mmemset-strategy=rep_8byte:16:noalign,vector_loop:2048:align,libcall:-1:noalign
>> +
>> +      This is to tell the compiler to use the following strategy for memset
>> +      1) when the expected size is between [1, 16], use rep_8byte strategy;
>> +      2) when the size is between [17, 2048], use vector_loop;
>> +      3) when the size is > 2048, use libcall.
>> +
>> +*/
>> +
>> +struct stringop_size_range
>> +{
>> +  int min;
>> +  int max;
>> +  stringop_alg alg;
>> +  bool noalign;
>> +};
>> +
>> +static void
>> +ix86_parse_stringop_strategy_string (char *strategy_str, bool is_memset)
>> +{
>> +  const struct stringop_algs *default_algs;
>> +  stringop_size_range input_ranges[MAX_STRINGOP_ALGS];
>> +  char *curr_range_str, *next_range_str;
>> +  int i = 0, n = 0;
>> +
>> +  if (is_memset)
>> +    default_algs = &ix86_cost->memset[TARGET_64BIT != 0];
>> +  else
>> +    default_algs = &ix86_cost->memcpy[TARGET_64BIT != 0];
>> +
>> +  curr_range_str = strategy_str;
>> +
>> +  do {
>> +
>> +    int mins, maxs;
>> +    stringop_alg alg;
>> +    char alg_name[128];
>> +    char align[16];
>> +
>> +    next_range_str = strchr (curr_range_str, ',');
>> +    if (next_range_str)
>> +      *next_range_str++ = '\0';
>> +
>> +    if (3 != sscanf (curr_range_str, "%[^:]:%d:%s", alg_name, &maxs, align))
>> +      {
>> +        warning (0, "Wrong arg %s to option %s", curr_range_str,
>> +                 is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
>> +        return;
>> +      }
>> +
>> +    if (n > 0 && (maxs < (mins = input_ranges[n - 1].max + 1) && maxs != -1))
>> +      {
>> +        warning (0, "Size ranges of option %s should be increasing",
>> +                 is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
>> +        return;
>> +      }
>> +
>> +    for (i = 0; i < last_alg; i++)
>> +      {
>> +        if (!strcmp (alg_name, stringop_alg_names[i]))
>> +       {
>> +         alg = (stringop_alg) i;
>> +         break;
>> +          }
>> +      }
>> +
>> +    if (i == last_alg)
>> +      {
>> +        warning (0, "Wrong stringop strategy name %s specified for option %s",
>> +              alg_name,
>> +                 is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
>> +     return;
>> +      }
>> +
>> +    input_ranges[n].min = mins;
>> +    input_ranges[n].max = maxs;
>> +    input_ranges[n].alg = alg;
>> +    if (!strcmp (align, "align"))
>> +      input_ranges[n].noalign = false;
>> +    else if (!strcmp (align, "noalign"))
>> +      input_ranges[n].noalign = true;
>> +    else
>> +      {
>> +        warning (0, "Unknown alignment %s specified for option %s",
>> +                 align, is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
>> +        return;
>> +      }
>> +    n++;
>> +    curr_range_str = next_range_str;
>> +  } while (curr_range_str);
>> +
>> +  if (input_ranges[n - 1].max != -1)
>> +    {
>> +      warning (0, "The max value for the last size range should be -1"
>> +               " for option %s",
>> +               is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
>> +      return;
>> +    }
>> +
>> +  if (n > MAX_STRINGOP_ALGS)
>> +    {
>> +      warning (0, "Too many size ranges specified in option %s",
>> +               is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
>> +      return;
>> +    }
>> +
>> +  /* Now override the default algs array  */
>> +  for (i = 0; i < n; i++)
>> +    {
>> +      *const_cast<int *>(&default_algs->size[i].max) = input_ranges[i].max;
>> +      *const_cast<stringop_alg *>(&default_algs->size[i].alg)
>> +          = input_ranges[i].alg;
>> +      *const_cast<int *>(&default_algs->size[i].noalign)
>> +          = input_ranges[i].noalign;
>> +    }
>> +}
>> +
>>
>>  /* Override various settings based on options.  If MAIN_ARGS_P, the
>>     options are from the command line, otherwise they are from
>> @@ -4021,6 +4165,21 @@ ix86_option_override_internal (bool main
>>    /* Handle stack protector */
>>    if (!global_options_set.x_ix86_stack_protector_guard)
>>      ix86_stack_protector_guard = TARGET_HAS_BIONIC ? SSP_GLOBAL : SSP_TLS;
>> +
>> +  /* Handle -mmemcpy-strategy= and -mmemset-strategy=  */
>> +  if (ix86_tune_memcpy_strategy)
>> +    {
>> +      char *str = xstrdup (ix86_tune_memcpy_strategy);
>> +      ix86_parse_stringop_strategy_string (str, false);
>> +      free (str);
>> +    }
>> +
>> +  if (ix86_tune_memset_strategy)
>> +    {
>> +      char *str = xstrdup (ix86_tune_memset_strategy);
>> +      ix86_parse_stringop_strategy_string (str, true);
>> +      free (str);
>> +    }
>>  }
>>
>>  /* Implement the TARGET_OPTION_OVERRIDE hook.  */
>> @@ -22903,6 +23062,7 @@ ix86_expand_movmem (rtx dst, rtx src, rt
>>      {
>>      case libcall:
>>      case no_stringop:
>> +    case last_alg:
>>        gcc_unreachable ();
>>      case loop_1_byte:
>>        need_zero_guard = true;
>> @@ -23093,6 +23253,7 @@ ix86_expand_movmem (rtx dst, rtx src, rt
>>      {
>>      case libcall:
>>      case no_stringop:
>> +    case last_alg:
>>        gcc_unreachable ();
>>      case loop_1_byte:
>>      case loop:
>> @@ -23304,6 +23465,7 @@ ix86_expand_setmem (rtx dst, rtx count_e
>>      {
>>      case libcall:
>>      case no_stringop:
>> +    case last_alg:
>>        gcc_unreachable ();
>>      case loop:
>>        need_zero_guard = true;
>> @@ -23481,6 +23643,7 @@ ix86_expand_setmem (rtx dst, rtx count_e
>>      {
>>      case libcall:
>>      case no_stringop:
>> +    case last_alg:
>>        gcc_unreachable ();
>>      case loop_1_byte:
>>      case loop:
>> Index: config/i386/i386-opts.h
>> ===================================================================
>> --- config/i386/i386-opts.h   (revision 201458)
>> +++ config/i386/i386-opts.h   (working copy)
>> @@ -28,15 +28,17 @@ see the files COPYING3 and COPYING.RUNTI
>>  /* Algorithm to expand string function with.  */
>>  enum stringop_alg
>>  {
>> -   no_stringop,
>> -   libcall,
>> -   rep_prefix_1_byte,
>> -   rep_prefix_4_byte,
>> -   rep_prefix_8_byte,
>> -   loop_1_byte,
>> -   loop,
>> -   unrolled_loop,
>> -   vector_loop
>> +#undef DEF_ENUM
>> +#define DEF_ENUM
>> +
>> +#undef DEF_ALG
>> +#define DEF_ALG(alg, name) alg,
>> +
>> +#include "stringop.def"
>> +last_alg
>> +
>> +#undef DEF_ENUM
>> +#undef DEF_ALG
>>  };
>>
>>  /* Available call abi.  */
>> Index: doc/invoke.texi
>> ===================================================================
>> --- doc/invoke.texi   (revision 201458)
>> +++ doc/invoke.texi   (working copy)
>> @@ -649,6 +649,7 @@ Objective-C and Objective-C++ Dialects}.
>>  -mbmi2 -mrtm -mlwp -mthreads @gol
>>  -mno-align-stringops  -minline-all-stringops @gol
>>  -minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol
>> +-mmemcpy-strategy=@var{strategy} -mmemset-strategy=@var{strategy}
>>  -mpush-args  -maccumulate-outgoing-args  -m128bit-long-double @gol
>>  -m96bit-long-double -mlong-double-64 -mlong-double-80 @gol
>>  -mregparm=@var{num}  -msseregparm @gol
>> @@ -14598,6 +14599,24 @@ Expand into an inline loop.
>>  Always use a library call.
>>  @end table
>>
>> +@item -mmemcpy-strategy=@var{strategy}
>> +@opindex mmemcpy-strategy=@var{strategy}
>> +Override the internal decision heuristic to decide if @code{__builtin_memcpy}
>> +should be inlined and what inline algorithm to use when the expected size
>> +of the copy operation is known. @var{strategy}
>> +is a comma-separated list of @var{alg}:@var{max_size}:@var{dest_align} triplets.
>> +@var{alg} is specified in @option{-mstringop-strategy}, @var{max_size} specifies
>> +the max byte size with which inline algorithm @var{alg} is allowed. For the last
>> +triplet, the @var{max_size} must be @code{-1}. The @var{max_size} of the triplets
>> +in the list must be specified in increasing order. The minimal byte size for
>> +@var{alg} is @code{0} for the first triplet and @code{@var{max_size} + 1} of the
>> +preceding range.
>> +
>> +@item -mmemset-strategy=@var{strategy}
>> +@opindex mmemset-strategy=@var{strategy}
>> +The option is similar to @option{-mmemcpy-strategy=} except that it is to control
>> +@code{__builtin_memset} expansion.
>> +
>>  @item -momit-leaf-frame-pointer
>>  @opindex momit-leaf-frame-pointer
>>  Don't keep the frame pointer in a register for leaf functions.  This
>> Index: testsuite/gcc.target/i386/memcpy-strategy-1.c
>> ===================================================================
>> --- testsuite/gcc.target/i386/memcpy-strategy-1.c     (revision 0)
>> +++ testsuite/gcc.target/i386/memcpy-strategy-1.c     (revision 0)
>> @@ -0,0 +1,12 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -march=atom -mmemcpy-strategy=vector_loop:-1:align" } */
>> +/* { dg-final { scan-assembler-times "movdqa" 8 { target { ! { ia32 } } } } } */
>> +/* { dg-final { scan-assembler-times "movdqa" 4 { target { ia32 } } } } */
>> +
>> +char a[2048];
>> +char b[2048];
>> +void t (void)
>> +{
>> +  __builtin_memcpy (a, b, 2048);
>> +}
>> +
>> Index: testsuite/gcc.target/i386/memcpy-strategy-2.c
>> ===================================================================
>> --- testsuite/gcc.target/i386/memcpy-strategy-2.c     (revision 0)
>> +++ testsuite/gcc.target/i386/memcpy-strategy-2.c     (revision 0)
>> @@ -0,0 +1,12 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -march=atom -mmemcpy-strategy=vector_loop:3000:align,libcall:-1:align" } */
>> +/* { dg-final { scan-assembler-times "movdqa" 8 { target { ! { ia32 } } } } } */
>> +/* { dg-final { scan-assembler-times "movdqa" 4 { target { ia32 } } } } */
>> +
>> +char a[2048];
>> +char b[2048];
>> +void t (void)
>> +{
>> +  __builtin_memcpy (a, b, 2048);
>> +}
>> +
>> Index: testsuite/gcc.target/i386/memset-strategy-1.c
>> ===================================================================
>> --- testsuite/gcc.target/i386/memset-strategy-1.c     (revision 0)
>> +++ testsuite/gcc.target/i386/memset-strategy-1.c     (revision 0)
>> @@ -0,0 +1,10 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -march=atom -mmemset-strategy=libcall:-1:align" } */
>> +/* { dg-final { scan-assembler-times "memset" 2  } } */
>> +
>> +char a[2048];
>> +void t (void)
>> +{
>> +  __builtin_memset (a, 1, 2048);
>> +}
>> +
>> Index: testsuite/gcc.target/i386/memcpy-strategy-3.c
>> ===================================================================
>> --- testsuite/gcc.target/i386/memcpy-strategy-3.c     (revision 0)
>> +++ testsuite/gcc.target/i386/memcpy-strategy-3.c     (revision 0)
>> @@ -0,0 +1,11 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -march=atom -mmemcpy-strategy=vector_loop:2000:align,libcall:-1:align" } */
>> +/* { dg-final { scan-assembler-times "memcpy" 2  } } */
>> +
>> +char a[2048];
>> +char b[2048];
>> +void t (void)
>> +{
>> +  __builtin_memcpy (a, b, 2048);
>> +}
>> +
>

Comments

Michael Zolotukhin Aug. 6, 2013, 8:46 a.m. UTC | #1
There are still some formatting issues (like 8 spaces instead of a
tab, wrong indentation of do-loop and some other places) - to reveal
some of them you could use contrib/check_GNU_style.sh script.
But that was a nitpicking again:) Actually I wanted to ask whether
you're going to use this option for some performance experiments
involving memmov/memset - if so, probably you could tune existing
cost-models as well? Is it possible?

Michael

On 5 August 2013 20:44, Xinliang David Li <davidxl@google.com> wrote:
> thanks. Updated patch attached.
>
> David
>
> On Mon, Aug 5, 2013 at 3:57 AM, Michael V. Zolotukhin
> <michael.v.zolotukhin@gmail.com> wrote:
>> Hi,
>> This is a really convenient option, thanks for working on it.
>> I can't approve it as I'm not a maintainer, but it looks ok to me,
>> except fot a small nitpicking: afair, comments should end with
>> dot-space-space.
>>
>> Michael
>>
>> On 04 Aug 20:01, Xinliang David Li wrote:
>>> The attached is a new patch implementing the stringop inline strategy
>>> control using two new -m options:
>>>
>>> -mmemcpy-strategy=
>>> -mmemset-strategy=
>>>
>>> See changes in doc/invoke.texi for description of the new options. Example:
>>>   -mmemcpy-strategy=rep_8byte:64:unaligned,unrolled_loop:2048:unaligned,libcall:-1:unaligned
>>>
>>> tells compiler to inline memcpy using rep_8byte when the size is no
>>> larger than 64 byte, using unrolled_loop when size is no larger than
>>> 2048, and for size > 2048, using library call. In all cases,
>>> destination alignment adjustment is not done.
>>>
>>> Tested on x86-64/linux. Ok for trunk?
>>>
>>> thanks,
>>>
>>> David
>>>
>>> 2013-08-02  Xinliang David Li  <davidxl@google.com>
>>>
>>>         * config/i386/stringop.def: New file.
>>>         * config/i386/stringop.opt: New file.
>>>         * config/i386/i386-opts.h: Include stringopt.def.
>>>         * config/i386/i386.opt: Include stringopt.opt.
>>>         * config/i386/i386.c (ix86_option_override_internal):
>>>         Override default size based stringop inline strategies
>>>         with options.
>>>         * config/i386/i386.c (ix86_parse_stringop_strategy_string):
>>>         New function.
>>>
>>> 2013-08-04  Xinliang David Li  <davidxl@google.com>
>>>
>>>         * testsuite/gcc.target/i386/memcpy-strategy-1.c: New test.
>>>         * testsuite/gcc.target/i386/memcpy-strategy-2.c: Ditto.
>>>         * testsuite/gcc.target/i386/memset-strategy-1.c: Ditto.
>>>         * testsuite/gcc.target/i386/memcpy-strategy-3.c: Ditto.
>>>
>>>
>>>
>>>
>>> On Fri, Aug 2, 2013 at 9:21 PM, Xinliang David Li <davidxl@google.com> wrote:
>>> > On x86_64, when the expected size of memcpy/memset is known (e.g, with
>>> > FDO), libcall strategy is used with the size is > 8192. This value is
>>> > hard coded, which makes it hard to do performance tuning. This patch
>>> > adds two new parameters to do that. Potential usage includes
>>> > per-application libcall strategy min-size tuning based on summary data
>>> > with FDO (e.g, instruction workset size).
>>> >
>>> > Bootstrap and tested on x86_64/linux. Ok for trunk?
>>> >
>>> > thanks,
>>> >
>>> > David
>>> >
>>> >
>>> > 2013-08-02  Xinliang David Li  <davidxl@google.com>
>>> >
>>> >         * params.def: New parameters.
>>> >         * config/i386/i386.c (ix86_option_override_internal):
>>> >         Override default libcall size limit with parameters.
>>
>>> Index: config/i386/stringop.def
>>> ===================================================================
>>> --- config/i386/stringop.def  (revision 0)
>>> +++ config/i386/stringop.def  (revision 0)
>>> @@ -0,0 +1,42 @@
>>> +/* Definitions for option handling for IA-32.
>>> +   Copyright (C) 2013 Free Software Foundation, Inc.
>>> +
>>> +This file is part of GCC.
>>> +
>>> +GCC is free software; you can redistribute it and/or modify
>>> +it under the terms of the GNU General Public License as published by
>>> +the Free Software Foundation; either version 3, or (at your option)
>>> +any later version.
>>> +
>>> +GCC is distributed in the hope that it will be useful,
>>> +but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>> +GNU General Public License for more details.
>>> +
>>> +Under Section 7 of GPL version 3, you are granted additional
>>> +permissions described in the GCC Runtime Library Exception, version
>>> +3.1, as published by the Free Software Foundation.
>>> +
>>> +You should have received a copy of the GNU General Public License and
>>> +a copy of the GCC Runtime Library Exception along with this program;
>>> +see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>>> +<http://www.gnu.org/licenses/>.  */
>>> +
>>> +DEF_ENUM
>>> +DEF_ALG (no_stringop, no_stringop)
>>> +DEF_ENUM
>>> +DEF_ALG (libcall, libcall)
>>> +DEF_ENUM
>>> +DEF_ALG (rep_prefix_1_byte, rep_byte)
>>> +DEF_ENUM
>>> +DEF_ALG (rep_prefix_4_byte, rep_4byte)
>>> +DEF_ENUM
>>> +DEF_ALG (rep_prefix_8_byte, rep_8byte)
>>> +DEF_ENUM
>>> +DEF_ALG (loop_1_byte, byte_loop)
>>> +DEF_ENUM
>>> +DEF_ALG (loop, loop)
>>> +DEF_ENUM
>>> +DEF_ALG (unrolled_loop, unrolled_loop)
>>> +DEF_ENUM
>>> +DEF_ALG (vector_loop, vector_loop)
>>> Index: config/i386/i386.opt
>>> ===================================================================
>>> --- config/i386/i386.opt      (revision 201458)
>>> +++ config/i386/i386.opt      (working copy)
>>> @@ -316,6 +316,14 @@ mstack-arg-probe
>>>  Target Report Mask(STACK_PROBE) Save
>>>  Enable stack probing
>>>
>>> +mmemcpy-strategy=
>>> +Target RejectNegative Joined Var(ix86_tune_memcpy_strategy)
>>> +Specify memcpy expansion strategy when expected size is known
>>> +
>>> +mmemset-strategy=
>>> +Target RejectNegative Joined Var(ix86_tune_memset_strategy)
>>> +Specify memset expansion strategy when expected size is known
>>> +
>>>  mstringop-strategy=
>>>  Target RejectNegative Joined Enum(stringop_alg) Var(ix86_stringop_alg) Init(no_stringop)
>>>  Chose strategy to generate stringop using
>>> Index: config/i386/stringop.opt
>>> ===================================================================
>>> --- config/i386/stringop.opt  (revision 0)
>>> +++ config/i386/stringop.opt  (revision 0)
>>> @@ -0,0 +1,36 @@
>>> +/* Definitions for option handling for IA-32.
>>> +   Copyright (C) 2013 Free Software Foundation, Inc.
>>> +
>>> +This file is part of GCC.
>>> +
>>> +GCC is free software; you can redistribute it and/or modify
>>> +it under the terms of the GNU General Public License as published by
>>> +the Free Software Foundation; either version 3, or (at your option)
>>> +any later version.
>>> +
>>> +GCC is distributed in the hope that it will be useful,
>>> +but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>> +GNU General Public License for more details.
>>> +
>>> +Under Section 7 of GPL version 3, you are granted additional
>>> +permissions described in the GCC Runtime Library Exception, version
>>> +3.1, as published by the Free Software Foundation.
>>> +
>>> +You should have received a copy of the GNU General Public License and
>>> +a copy of the GCC Runtime Library Exception along with this program;
>>> +see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>>> +<http://www.gnu.org/licenses/>.  */
>>> +
>>> +Enum(stringop_alg) String(rep_byte) Value(rep_prefix_1_byte)
>>> +
>>> +#undef DEF_ENUM
>>> +#define DEF_ENUM EnumValue
>>> +
>>> +#undef DEF_ALG
>>> +#define DEF_ALG(alg, name) Enum(stringop_alg) String(name) Value(alg)
>>> +
>>> +#include "stringop.def"
>>> +
>>> +#undef DEF_ENUM
>>> +#undef DEF_ALG
>>> Index: config/i386/i386.c
>>> ===================================================================
>>> --- config/i386/i386.c        (revision 201458)
>>> +++ config/i386/i386.c        (working copy)
>>> @@ -156,7 +156,7 @@ struct processor_costs ix86_size_cost =
>>>  };
>>>
>>>  /* Processor costs (relative to an add) */
>>> -static const
>>> +static
>>>  struct processor_costs i386_cost = { /* 386 specific costs */
>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
>>> @@ -226,7 +226,7 @@ struct processor_costs i386_cost = {      /*
>>>    1,                                 /* cond_not_taken_branch_cost.  */
>>>  };
>>>
>>> -static const
>>> +static
>>>  struct processor_costs i486_cost = { /* 486 specific costs */
>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
>>> @@ -298,7 +298,7 @@ struct processor_costs i486_cost = {      /*
>>>    1,                                 /* cond_not_taken_branch_cost.  */
>>>  };
>>>
>>> -static const
>>> +static
>>>  struct processor_costs pentium_cost = {
>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
>>> @@ -368,7 +368,7 @@ struct processor_costs pentium_cost = {
>>>    1,                                 /* cond_not_taken_branch_cost.  */
>>>  };
>>>
>>> -static const
>>> +static
>>>  struct processor_costs pentiumpro_cost = {
>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
>>> @@ -447,7 +447,7 @@ struct processor_costs pentiumpro_cost =
>>>    1,                                 /* cond_not_taken_branch_cost.  */
>>>  };
>>>
>>> -static const
>>> +static
>>>  struct processor_costs geode_cost = {
>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
>>> @@ -518,7 +518,7 @@ struct processor_costs geode_cost = {
>>>    1,                                 /* cond_not_taken_branch_cost.  */
>>>  };
>>>
>>> -static const
>>> +static
>>>  struct processor_costs k6_cost = {
>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>    COSTS_N_INSNS (2),                 /* cost of a lea instruction */
>>> @@ -591,7 +591,7 @@ struct processor_costs k6_cost = {
>>>    1,                                 /* cond_not_taken_branch_cost.  */
>>>  };
>>>
>>> -static const
>>> +static
>>>  struct processor_costs athlon_cost = {
>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>    COSTS_N_INSNS (2),                 /* cost of a lea instruction */
>>> @@ -664,7 +664,7 @@ struct processor_costs athlon_cost = {
>>>    1,                                 /* cond_not_taken_branch_cost.  */
>>>  };
>>>
>>> -static const
>>> +static
>>>  struct processor_costs k8_cost = {
>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>    COSTS_N_INSNS (2),                 /* cost of a lea instruction */
>>> @@ -1265,7 +1265,7 @@ struct processor_costs btver2_cost = {
>>>    1,                                 /* cond_not_taken_branch_cost.  */
>>>  };
>>>
>>> -static const
>>> +static
>>>  struct processor_costs pentium4_cost = {
>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>    COSTS_N_INSNS (3),                 /* cost of a lea instruction */
>>> @@ -1336,7 +1336,7 @@ struct processor_costs pentium4_cost = {
>>>    1,                                 /* cond_not_taken_branch_cost.  */
>>>  };
>>>
>>> -static const
>>> +static
>>>  struct processor_costs nocona_cost = {
>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
>>> @@ -1409,7 +1409,7 @@ struct processor_costs nocona_cost = {
>>>    1,                                 /* cond_not_taken_branch_cost.  */
>>>  };
>>>
>>> -static const
>>> +static
>>>  struct processor_costs atom_cost = {
>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>    COSTS_N_INSNS (1) + 1,             /* cost of a lea instruction */
>>> @@ -1556,7 +1556,7 @@ struct processor_costs slm_cost = {
>>>  };
>>>
>>>  /* Generic64 should produce code tuned for Nocona and K8.  */
>>> -static const
>>> +static
>>>  struct processor_costs generic64_cost = {
>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>    /* On all chips taken into consideration lea is 2 cycles and more.  With
>>> @@ -1635,7 +1635,7 @@ struct processor_costs generic64_cost =
>>>  };
>>>
>>>  /* core_cost should produce code tuned for Core familly of CPUs.  */
>>> -static const
>>> +static
>>>  struct processor_costs core_cost = {
>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>    /* On all chips taken into consideration lea is 2 cycles and more.  With
>>> @@ -1717,7 +1717,7 @@ struct processor_costs core_cost = {
>>>
>>>  /* Generic32 should produce code tuned for PPro, Pentium4, Nocona,
>>>     Athlon and K8.  */
>>> -static const
>>> +static
>>>  struct processor_costs generic32_cost = {
>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>    COSTS_N_INSNS (1) + 1,             /* cost of a lea instruction */
>>> @@ -2900,6 +2900,150 @@ ix86_debug_options (void)
>>>
>>>    return;
>>>  }
>>> +
>>> +static const char *stringop_alg_names[] = {
>>> +#define DEF_ENUM
>>> +#define DEF_ALG(alg, name) #name,
>>> +#include "stringop.def"
>>> +#undef DEF_ENUM
>>> +#undef DEF_ALG
>>> +};
>>> +
>>> +/* Parse parameter string passed to -mmemcpy-strategy= or -mmemset-strategy=.
>>> +   The string is of the following form (or comma separated list of it):
>>> +
>>> +     strategy_alg:max_size:[align|noalign]
>>> +
>>> +   where the full size range for the strategy is either [0, max_size] or
>>> +   [min_size, max_size], in which min_size is the max_size + 1 of the
>>> +   preceding range.  The last size range must have max_size == -1.
>>> +
>>> +   Examples:
>>> +
>>> +    1.
>>> +       -mmemcpy-strategy=libcall:-1:noalign
>>> +
>>> +      this is equivalent to (for known size memcpy) -mstringop-strategy=libcall
>>> +
>>> +
>>> +   2.
>>> +      -mmemset-strategy=rep_8byte:16:noalign,vector_loop:2048:align,libcall:-1:noalign
>>> +
>>> +      This is to tell the compiler to use the following strategy for memset
>>> +      1) when the expected size is between [1, 16], use rep_8byte strategy;
>>> +      2) when the size is between [17, 2048], use vector_loop;
>>> +      3) when the size is > 2048, use libcall.
>>> +
>>> +*/
>>> +
>>> +struct stringop_size_range
>>> +{
>>> +  int min;
>>> +  int max;
>>> +  stringop_alg alg;
>>> +  bool noalign;
>>> +};
>>> +
>>> +static void
>>> +ix86_parse_stringop_strategy_string (char *strategy_str, bool is_memset)
>>> +{
>>> +  const struct stringop_algs *default_algs;
>>> +  stringop_size_range input_ranges[MAX_STRINGOP_ALGS];
>>> +  char *curr_range_str, *next_range_str;
>>> +  int i = 0, n = 0;
>>> +
>>> +  if (is_memset)
>>> +    default_algs = &ix86_cost->memset[TARGET_64BIT != 0];
>>> +  else
>>> +    default_algs = &ix86_cost->memcpy[TARGET_64BIT != 0];
>>> +
>>> +  curr_range_str = strategy_str;
>>> +
>>> +  do {
>>> +
>>> +    int mins, maxs;
>>> +    stringop_alg alg;
>>> +    char alg_name[128];
>>> +    char align[16];
>>> +
>>> +    next_range_str = strchr (curr_range_str, ',');
>>> +    if (next_range_str)
>>> +      *next_range_str++ = '\0';
>>> +
>>> +    if (3 != sscanf (curr_range_str, "%[^:]:%d:%s", alg_name, &maxs, align))
>>> +      {
>>> +        warning (0, "Wrong arg %s to option %s", curr_range_str,
>>> +                 is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
>>> +        return;
>>> +      }
>>> +
>>> +    if (n > 0 && (maxs < (mins = input_ranges[n - 1].max + 1) && maxs != -1))
>>> +      {
>>> +        warning (0, "Size ranges of option %s should be increasing",
>>> +                 is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
>>> +        return;
>>> +      }
>>> +
>>> +    for (i = 0; i < last_alg; i++)
>>> +      {
>>> +        if (!strcmp (alg_name, stringop_alg_names[i]))
>>> +       {
>>> +         alg = (stringop_alg) i;
>>> +         break;
>>> +          }
>>> +      }
>>> +
>>> +    if (i == last_alg)
>>> +      {
>>> +        warning (0, "Wrong stringop strategy name %s specified for option %s",
>>> +              alg_name,
>>> +                 is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
>>> +     return;
>>> +      }
>>> +
>>> +    input_ranges[n].min = mins;
>>> +    input_ranges[n].max = maxs;
>>> +    input_ranges[n].alg = alg;
>>> +    if (!strcmp (align, "align"))
>>> +      input_ranges[n].noalign = false;
>>> +    else if (!strcmp (align, "noalign"))
>>> +      input_ranges[n].noalign = true;
>>> +    else
>>> +      {
>>> +        warning (0, "Unknown alignment %s specified for option %s",
>>> +                 align, is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
>>> +        return;
>>> +      }
>>> +    n++;
>>> +    curr_range_str = next_range_str;
>>> +  } while (curr_range_str);
>>> +
>>> +  if (input_ranges[n - 1].max != -1)
>>> +    {
>>> +      warning (0, "The max value for the last size range should be -1"
>>> +               " for option %s",
>>> +               is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
>>> +      return;
>>> +    }
>>> +
>>> +  if (n > MAX_STRINGOP_ALGS)
>>> +    {
>>> +      warning (0, "Too many size ranges specified in option %s",
>>> +               is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
>>> +      return;
>>> +    }
>>> +
>>> +  /* Now override the default algs array  */
>>> +  for (i = 0; i < n; i++)
>>> +    {
>>> +      *const_cast<int *>(&default_algs->size[i].max) = input_ranges[i].max;
>>> +      *const_cast<stringop_alg *>(&default_algs->size[i].alg)
>>> +          = input_ranges[i].alg;
>>> +      *const_cast<int *>(&default_algs->size[i].noalign)
>>> +          = input_ranges[i].noalign;
>>> +    }
>>> +}
>>> +
>>>
>>>  /* Override various settings based on options.  If MAIN_ARGS_P, the
>>>     options are from the command line, otherwise they are from
>>> @@ -4021,6 +4165,21 @@ ix86_option_override_internal (bool main
>>>    /* Handle stack protector */
>>>    if (!global_options_set.x_ix86_stack_protector_guard)
>>>      ix86_stack_protector_guard = TARGET_HAS_BIONIC ? SSP_GLOBAL : SSP_TLS;
>>> +
>>> +  /* Handle -mmemcpy-strategy= and -mmemset-strategy=  */
>>> +  if (ix86_tune_memcpy_strategy)
>>> +    {
>>> +      char *str = xstrdup (ix86_tune_memcpy_strategy);
>>> +      ix86_parse_stringop_strategy_string (str, false);
>>> +      free (str);
>>> +    }
>>> +
>>> +  if (ix86_tune_memset_strategy)
>>> +    {
>>> +      char *str = xstrdup (ix86_tune_memset_strategy);
>>> +      ix86_parse_stringop_strategy_string (str, true);
>>> +      free (str);
>>> +    }
>>>  }
>>>
>>>  /* Implement the TARGET_OPTION_OVERRIDE hook.  */
>>> @@ -22903,6 +23062,7 @@ ix86_expand_movmem (rtx dst, rtx src, rt
>>>      {
>>>      case libcall:
>>>      case no_stringop:
>>> +    case last_alg:
>>>        gcc_unreachable ();
>>>      case loop_1_byte:
>>>        need_zero_guard = true;
>>> @@ -23093,6 +23253,7 @@ ix86_expand_movmem (rtx dst, rtx src, rt
>>>      {
>>>      case libcall:
>>>      case no_stringop:
>>> +    case last_alg:
>>>        gcc_unreachable ();
>>>      case loop_1_byte:
>>>      case loop:
>>> @@ -23304,6 +23465,7 @@ ix86_expand_setmem (rtx dst, rtx count_e
>>>      {
>>>      case libcall:
>>>      case no_stringop:
>>> +    case last_alg:
>>>        gcc_unreachable ();
>>>      case loop:
>>>        need_zero_guard = true;
>>> @@ -23481,6 +23643,7 @@ ix86_expand_setmem (rtx dst, rtx count_e
>>>      {
>>>      case libcall:
>>>      case no_stringop:
>>> +    case last_alg:
>>>        gcc_unreachable ();
>>>      case loop_1_byte:
>>>      case loop:
>>> Index: config/i386/i386-opts.h
>>> ===================================================================
>>> --- config/i386/i386-opts.h   (revision 201458)
>>> +++ config/i386/i386-opts.h   (working copy)
>>> @@ -28,15 +28,17 @@ see the files COPYING3 and COPYING.RUNTI
>>>  /* Algorithm to expand string function with.  */
>>>  enum stringop_alg
>>>  {
>>> -   no_stringop,
>>> -   libcall,
>>> -   rep_prefix_1_byte,
>>> -   rep_prefix_4_byte,
>>> -   rep_prefix_8_byte,
>>> -   loop_1_byte,
>>> -   loop,
>>> -   unrolled_loop,
>>> -   vector_loop
>>> +#undef DEF_ENUM
>>> +#define DEF_ENUM
>>> +
>>> +#undef DEF_ALG
>>> +#define DEF_ALG(alg, name) alg,
>>> +
>>> +#include "stringop.def"
>>> +last_alg
>>> +
>>> +#undef DEF_ENUM
>>> +#undef DEF_ALG
>>>  };
>>>
>>>  /* Available call abi.  */
>>> Index: doc/invoke.texi
>>> ===================================================================
>>> --- doc/invoke.texi   (revision 201458)
>>> +++ doc/invoke.texi   (working copy)
>>> @@ -649,6 +649,7 @@ Objective-C and Objective-C++ Dialects}.
>>>  -mbmi2 -mrtm -mlwp -mthreads @gol
>>>  -mno-align-stringops  -minline-all-stringops @gol
>>>  -minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol
>>> +-mmemcpy-strategy=@var{strategy} -mmemset-strategy=@var{strategy}
>>>  -mpush-args  -maccumulate-outgoing-args  -m128bit-long-double @gol
>>>  -m96bit-long-double -mlong-double-64 -mlong-double-80 @gol
>>>  -mregparm=@var{num}  -msseregparm @gol
>>> @@ -14598,6 +14599,24 @@ Expand into an inline loop.
>>>  Always use a library call.
>>>  @end table
>>>
>>> +@item -mmemcpy-strategy=@var{strategy}
>>> +@opindex mmemcpy-strategy=@var{strategy}
>>> +Override the internal decision heuristic to decide if @code{__builtin_memcpy}
>>> +should be inlined and what inline algorithm to use when the expected size
>>> +of the copy operation is known. @var{strategy}
>>> +is a comma-separated list of @var{alg}:@var{max_size}:@var{dest_align} triplets.
>>> +@var{alg} is specified in @option{-mstringop-strategy}, @var{max_size} specifies
>>> +the max byte size with which inline algorithm @var{alg} is allowed. For the last
>>> +triplet, the @var{max_size} must be @code{-1}. The @var{max_size} of the triplets
>>> +in the list must be specified in increasing order. The minimal byte size for
>>> +@var{alg} is @code{0} for the first triplet and @code{@var{max_size} + 1} of the
>>> +preceding range.
>>> +
>>> +@item -mmemset-strategy=@var{strategy}
>>> +@opindex mmemset-strategy=@var{strategy}
>>> +The option is similar to @option{-mmemcpy-strategy=} except that it is to control
>>> +@code{__builtin_memset} expansion.
>>> +
>>>  @item -momit-leaf-frame-pointer
>>>  @opindex momit-leaf-frame-pointer
>>>  Don't keep the frame pointer in a register for leaf functions.  This
>>> Index: testsuite/gcc.target/i386/memcpy-strategy-1.c
>>> ===================================================================
>>> --- testsuite/gcc.target/i386/memcpy-strategy-1.c     (revision 0)
>>> +++ testsuite/gcc.target/i386/memcpy-strategy-1.c     (revision 0)
>>> @@ -0,0 +1,12 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -march=atom -mmemcpy-strategy=vector_loop:-1:align" } */
>>> +/* { dg-final { scan-assembler-times "movdqa" 8 { target { ! { ia32 } } } } } */
>>> +/* { dg-final { scan-assembler-times "movdqa" 4 { target { ia32 } } } } */
>>> +
>>> +char a[2048];
>>> +char b[2048];
>>> +void t (void)
>>> +{
>>> +  __builtin_memcpy (a, b, 2048);
>>> +}
>>> +
>>> Index: testsuite/gcc.target/i386/memcpy-strategy-2.c
>>> ===================================================================
>>> --- testsuite/gcc.target/i386/memcpy-strategy-2.c     (revision 0)
>>> +++ testsuite/gcc.target/i386/memcpy-strategy-2.c     (revision 0)
>>> @@ -0,0 +1,12 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -march=atom -mmemcpy-strategy=vector_loop:3000:align,libcall:-1:align" } */
>>> +/* { dg-final { scan-assembler-times "movdqa" 8 { target { ! { ia32 } } } } } */
>>> +/* { dg-final { scan-assembler-times "movdqa" 4 { target { ia32 } } } } */
>>> +
>>> +char a[2048];
>>> +char b[2048];
>>> +void t (void)
>>> +{
>>> +  __builtin_memcpy (a, b, 2048);
>>> +}
>>> +
>>> Index: testsuite/gcc.target/i386/memset-strategy-1.c
>>> ===================================================================
>>> --- testsuite/gcc.target/i386/memset-strategy-1.c     (revision 0)
>>> +++ testsuite/gcc.target/i386/memset-strategy-1.c     (revision 0)
>>> @@ -0,0 +1,10 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -march=atom -mmemset-strategy=libcall:-1:align" } */
>>> +/* { dg-final { scan-assembler-times "memset" 2  } } */
>>> +
>>> +char a[2048];
>>> +void t (void)
>>> +{
>>> +  __builtin_memset (a, 1, 2048);
>>> +}
>>> +
>>> Index: testsuite/gcc.target/i386/memcpy-strategy-3.c
>>> ===================================================================
>>> --- testsuite/gcc.target/i386/memcpy-strategy-3.c     (revision 0)
>>> +++ testsuite/gcc.target/i386/memcpy-strategy-3.c     (revision 0)
>>> @@ -0,0 +1,11 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -march=atom -mmemcpy-strategy=vector_loop:2000:align,libcall:-1:align" } */
>>> +/* { dg-final { scan-assembler-times "memcpy" 2  } } */
>>> +
>>> +char a[2048];
>>> +char b[2048];
>>> +void t (void)
>>> +{
>>> +  __builtin_memcpy (a, b, 2048);
>>> +}
>>> +
>>
Jan Hubicka Aug. 6, 2013, 9:42 a.m. UTC | #2
> >>> 2013-08-02  Xinliang David Li  <davidxl@google.com>
> >>>
> >>>         * config/i386/stringop.def: New file.
> >>>         * config/i386/stringop.opt: New file.
> >>>         * config/i386/i386-opts.h: Include stringopt.def.
> >>>         * config/i386/i386.opt: Include stringopt.opt.
> >>>         * config/i386/i386.c (ix86_option_override_internal):
> >>>         Override default size based stringop inline strategies
> >>>         with options.
> >>>         * config/i386/i386.c (ix86_parse_stringop_strategy_string):
> >>>         New function.
> >>>
> >>> 2013-08-04  Xinliang David Li  <davidxl@google.com>
> >>>
> >>>         * testsuite/gcc.target/i386/memcpy-strategy-1.c: New test.
> >>>         * testsuite/gcc.target/i386/memcpy-strategy-2.c: Ditto.
> >>>         * testsuite/gcc.target/i386/memset-strategy-1.c: Ditto.
> >>>         * testsuite/gcc.target/i386/memcpy-strategy-3.c: Ditto.

The patch looks resonable to me in general.  I wonder why we need to bring
all the cost tables non-const instead of just having writable storage for
the "current strategy" like we do with other flags anyway.

Your strings are definitely more readable than the in-memory representation
I came up with. Perhaps we can even turn the cost tables into strings
for easier maintenance?  I guess they are bit confusing for people
not familiar with a code.

Honza
> >>>
> >>>
> >>>
> >>>
> >>> On Fri, Aug 2, 2013 at 9:21 PM, Xinliang David Li <davidxl@google.com> wrote:
> >>> > On x86_64, when the expected size of memcpy/memset is known (e.g, with
> >>> > FDO), libcall strategy is used with the size is > 8192. This value is
> >>> > hard coded, which makes it hard to do performance tuning. This patch
> >>> > adds two new parameters to do that. Potential usage includes
> >>> > per-application libcall strategy min-size tuning based on summary data
> >>> > with FDO (e.g, instruction workset size).
> >>> >
> >>> > Bootstrap and tested on x86_64/linux. Ok for trunk?
> >>> >
> >>> > thanks,
> >>> >
> >>> > David
> >>> >
> >>> >
> >>> > 2013-08-02  Xinliang David Li  <davidxl@google.com>
> >>> >
> >>> >         * params.def: New parameters.
> >>> >         * config/i386/i386.c (ix86_option_override_internal):
> >>> >         Override default libcall size limit with parameters.
> >>
> >>> Index: config/i386/stringop.def
> >>> ===================================================================
> >>> --- config/i386/stringop.def  (revision 0)
> >>> +++ config/i386/stringop.def  (revision 0)
> >>> @@ -0,0 +1,42 @@
> >>> +/* Definitions for option handling for IA-32.
> >>> +   Copyright (C) 2013 Free Software Foundation, Inc.
> >>> +
> >>> +This file is part of GCC.
> >>> +
> >>> +GCC is free software; you can redistribute it and/or modify
> >>> +it under the terms of the GNU General Public License as published by
> >>> +the Free Software Foundation; either version 3, or (at your option)
> >>> +any later version.
> >>> +
> >>> +GCC is distributed in the hope that it will be useful,
> >>> +but WITHOUT ANY WARRANTY; without even the implied warranty of
> >>> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >>> +GNU General Public License for more details.
> >>> +
> >>> +Under Section 7 of GPL version 3, you are granted additional
> >>> +permissions described in the GCC Runtime Library Exception, version
> >>> +3.1, as published by the Free Software Foundation.
> >>> +
> >>> +You should have received a copy of the GNU General Public License and
> >>> +a copy of the GCC Runtime Library Exception along with this program;
> >>> +see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> >>> +<http://www.gnu.org/licenses/>.  */
> >>> +
> >>> +DEF_ENUM
> >>> +DEF_ALG (no_stringop, no_stringop)
> >>> +DEF_ENUM
> >>> +DEF_ALG (libcall, libcall)
> >>> +DEF_ENUM
> >>> +DEF_ALG (rep_prefix_1_byte, rep_byte)
> >>> +DEF_ENUM
> >>> +DEF_ALG (rep_prefix_4_byte, rep_4byte)
> >>> +DEF_ENUM
> >>> +DEF_ALG (rep_prefix_8_byte, rep_8byte)
> >>> +DEF_ENUM
> >>> +DEF_ALG (loop_1_byte, byte_loop)
> >>> +DEF_ENUM
> >>> +DEF_ALG (loop, loop)
> >>> +DEF_ENUM
> >>> +DEF_ALG (unrolled_loop, unrolled_loop)
> >>> +DEF_ENUM
> >>> +DEF_ALG (vector_loop, vector_loop)
> >>> Index: config/i386/i386.opt
> >>> ===================================================================
> >>> --- config/i386/i386.opt      (revision 201458)
> >>> +++ config/i386/i386.opt      (working copy)
> >>> @@ -316,6 +316,14 @@ mstack-arg-probe
> >>>  Target Report Mask(STACK_PROBE) Save
> >>>  Enable stack probing
> >>>
> >>> +mmemcpy-strategy=
> >>> +Target RejectNegative Joined Var(ix86_tune_memcpy_strategy)
> >>> +Specify memcpy expansion strategy when expected size is known
> >>> +
> >>> +mmemset-strategy=
> >>> +Target RejectNegative Joined Var(ix86_tune_memset_strategy)
> >>> +Specify memset expansion strategy when expected size is known
> >>> +
> >>>  mstringop-strategy=
> >>>  Target RejectNegative Joined Enum(stringop_alg) Var(ix86_stringop_alg) Init(no_stringop)
> >>>  Chose strategy to generate stringop using
> >>> Index: config/i386/stringop.opt
> >>> ===================================================================
> >>> --- config/i386/stringop.opt  (revision 0)
> >>> +++ config/i386/stringop.opt  (revision 0)
> >>> @@ -0,0 +1,36 @@
> >>> +/* Definitions for option handling for IA-32.
> >>> +   Copyright (C) 2013 Free Software Foundation, Inc.
> >>> +
> >>> +This file is part of GCC.
> >>> +
> >>> +GCC is free software; you can redistribute it and/or modify
> >>> +it under the terms of the GNU General Public License as published by
> >>> +the Free Software Foundation; either version 3, or (at your option)
> >>> +any later version.
> >>> +
> >>> +GCC is distributed in the hope that it will be useful,
> >>> +but WITHOUT ANY WARRANTY; without even the implied warranty of
> >>> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >>> +GNU General Public License for more details.
> >>> +
> >>> +Under Section 7 of GPL version 3, you are granted additional
> >>> +permissions described in the GCC Runtime Library Exception, version
> >>> +3.1, as published by the Free Software Foundation.
> >>> +
> >>> +You should have received a copy of the GNU General Public License and
> >>> +a copy of the GCC Runtime Library Exception along with this program;
> >>> +see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> >>> +<http://www.gnu.org/licenses/>.  */
> >>> +
> >>> +Enum(stringop_alg) String(rep_byte) Value(rep_prefix_1_byte)
> >>> +
> >>> +#undef DEF_ENUM
> >>> +#define DEF_ENUM EnumValue
> >>> +
> >>> +#undef DEF_ALG
> >>> +#define DEF_ALG(alg, name) Enum(stringop_alg) String(name) Value(alg)
> >>> +
> >>> +#include "stringop.def"
> >>> +
> >>> +#undef DEF_ENUM
> >>> +#undef DEF_ALG
> >>> Index: config/i386/i386.c
> >>> ===================================================================
> >>> --- config/i386/i386.c        (revision 201458)
> >>> +++ config/i386/i386.c        (working copy)
> >>> @@ -156,7 +156,7 @@ struct processor_costs ix86_size_cost =
> >>>  };
> >>>
> >>>  /* Processor costs (relative to an add) */
> >>> -static const
> >>> +static
> >>>  struct processor_costs i386_cost = { /* 386 specific costs */
> >>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
> >>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
> >>> @@ -226,7 +226,7 @@ struct processor_costs i386_cost = {      /*
> >>>    1,                                 /* cond_not_taken_branch_cost.  */
> >>>  };
> >>>
> >>> -static const
> >>> +static
> >>>  struct processor_costs i486_cost = { /* 486 specific costs */
> >>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
> >>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
> >>> @@ -298,7 +298,7 @@ struct processor_costs i486_cost = {      /*
> >>>    1,                                 /* cond_not_taken_branch_cost.  */
> >>>  };
> >>>
> >>> -static const
> >>> +static
> >>>  struct processor_costs pentium_cost = {
> >>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
> >>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
> >>> @@ -368,7 +368,7 @@ struct processor_costs pentium_cost = {
> >>>    1,                                 /* cond_not_taken_branch_cost.  */
> >>>  };
> >>>
> >>> -static const
> >>> +static
> >>>  struct processor_costs pentiumpro_cost = {
> >>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
> >>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
> >>> @@ -447,7 +447,7 @@ struct processor_costs pentiumpro_cost =
> >>>    1,                                 /* cond_not_taken_branch_cost.  */
> >>>  };
> >>>
> >>> -static const
> >>> +static
> >>>  struct processor_costs geode_cost = {
> >>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
> >>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
> >>> @@ -518,7 +518,7 @@ struct processor_costs geode_cost = {
> >>>    1,                                 /* cond_not_taken_branch_cost.  */
> >>>  };
> >>>
> >>> -static const
> >>> +static
> >>>  struct processor_costs k6_cost = {
> >>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
> >>>    COSTS_N_INSNS (2),                 /* cost of a lea instruction */
> >>> @@ -591,7 +591,7 @@ struct processor_costs k6_cost = {
> >>>    1,                                 /* cond_not_taken_branch_cost.  */
> >>>  };
> >>>
> >>> -static const
> >>> +static
> >>>  struct processor_costs athlon_cost = {
> >>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
> >>>    COSTS_N_INSNS (2),                 /* cost of a lea instruction */
> >>> @@ -664,7 +664,7 @@ struct processor_costs athlon_cost = {
> >>>    1,                                 /* cond_not_taken_branch_cost.  */
> >>>  };
> >>>
> >>> -static const
> >>> +static
> >>>  struct processor_costs k8_cost = {
> >>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
> >>>    COSTS_N_INSNS (2),                 /* cost of a lea instruction */
> >>> @@ -1265,7 +1265,7 @@ struct processor_costs btver2_cost = {
> >>>    1,                                 /* cond_not_taken_branch_cost.  */
> >>>  };
> >>>
> >>> -static const
> >>> +static
> >>>  struct processor_costs pentium4_cost = {
> >>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
> >>>    COSTS_N_INSNS (3),                 /* cost of a lea instruction */
> >>> @@ -1336,7 +1336,7 @@ struct processor_costs pentium4_cost = {
> >>>    1,                                 /* cond_not_taken_branch_cost.  */
> >>>  };
> >>>
> >>> -static const
> >>> +static
> >>>  struct processor_costs nocona_cost = {
> >>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
> >>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
> >>> @@ -1409,7 +1409,7 @@ struct processor_costs nocona_cost = {
> >>>    1,                                 /* cond_not_taken_branch_cost.  */
> >>>  };
> >>>
> >>> -static const
> >>> +static
> >>>  struct processor_costs atom_cost = {
> >>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
> >>>    COSTS_N_INSNS (1) + 1,             /* cost of a lea instruction */
> >>> @@ -1556,7 +1556,7 @@ struct processor_costs slm_cost = {
> >>>  };
> >>>
> >>>  /* Generic64 should produce code tuned for Nocona and K8.  */
> >>> -static const
> >>> +static
> >>>  struct processor_costs generic64_cost = {
> >>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
> >>>    /* On all chips taken into consideration lea is 2 cycles and more.  With
> >>> @@ -1635,7 +1635,7 @@ struct processor_costs generic64_cost =
> >>>  };
> >>>
> >>>  /* core_cost should produce code tuned for Core familly of CPUs.  */
> >>> -static const
> >>> +static
> >>>  struct processor_costs core_cost = {
> >>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
> >>>    /* On all chips taken into consideration lea is 2 cycles and more.  With
> >>> @@ -1717,7 +1717,7 @@ struct processor_costs core_cost = {
> >>>
> >>>  /* Generic32 should produce code tuned for PPro, Pentium4, Nocona,
> >>>     Athlon and K8.  */
> >>> -static const
> >>> +static
> >>>  struct processor_costs generic32_cost = {
> >>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
> >>>    COSTS_N_INSNS (1) + 1,             /* cost of a lea instruction */
> >>> @@ -2900,6 +2900,150 @@ ix86_debug_options (void)
> >>>
> >>>    return;
> >>>  }
> >>> +
> >>> +static const char *stringop_alg_names[] = {
> >>> +#define DEF_ENUM
> >>> +#define DEF_ALG(alg, name) #name,
> >>> +#include "stringop.def"
> >>> +#undef DEF_ENUM
> >>> +#undef DEF_ALG
> >>> +};
> >>> +
> >>> +/* Parse parameter string passed to -mmemcpy-strategy= or -mmemset-strategy=.
> >>> +   The string is of the following form (or comma separated list of it):
> >>> +
> >>> +     strategy_alg:max_size:[align|noalign]
> >>> +
> >>> +   where the full size range for the strategy is either [0, max_size] or
> >>> +   [min_size, max_size], in which min_size is the max_size + 1 of the
> >>> +   preceding range.  The last size range must have max_size == -1.
> >>> +
> >>> +   Examples:
> >>> +
> >>> +    1.
> >>> +       -mmemcpy-strategy=libcall:-1:noalign
> >>> +
> >>> +      this is equivalent to (for known size memcpy) -mstringop-strategy=libcall
> >>> +
> >>> +
> >>> +   2.
> >>> +      -mmemset-strategy=rep_8byte:16:noalign,vector_loop:2048:align,libcall:-1:noalign
> >>> +
> >>> +      This is to tell the compiler to use the following strategy for memset
> >>> +      1) when the expected size is between [1, 16], use rep_8byte strategy;
> >>> +      2) when the size is between [17, 2048], use vector_loop;
> >>> +      3) when the size is > 2048, use libcall.
> >>> +
> >>> +*/
> >>> +
> >>> +struct stringop_size_range
> >>> +{
> >>> +  int min;
> >>> +  int max;
> >>> +  stringop_alg alg;
> >>> +  bool noalign;
> >>> +};
> >>> +
> >>> +static void
> >>> +ix86_parse_stringop_strategy_string (char *strategy_str, bool is_memset)
> >>> +{
> >>> +  const struct stringop_algs *default_algs;
> >>> +  stringop_size_range input_ranges[MAX_STRINGOP_ALGS];
> >>> +  char *curr_range_str, *next_range_str;
> >>> +  int i = 0, n = 0;
> >>> +
> >>> +  if (is_memset)
> >>> +    default_algs = &ix86_cost->memset[TARGET_64BIT != 0];
> >>> +  else
> >>> +    default_algs = &ix86_cost->memcpy[TARGET_64BIT != 0];
> >>> +
> >>> +  curr_range_str = strategy_str;
> >>> +
> >>> +  do {
> >>> +
> >>> +    int mins, maxs;
> >>> +    stringop_alg alg;
> >>> +    char alg_name[128];
> >>> +    char align[16];
> >>> +
> >>> +    next_range_str = strchr (curr_range_str, ',');
> >>> +    if (next_range_str)
> >>> +      *next_range_str++ = '\0';
> >>> +
> >>> +    if (3 != sscanf (curr_range_str, "%[^:]:%d:%s", alg_name, &maxs, align))
> >>> +      {
> >>> +        warning (0, "Wrong arg %s to option %s", curr_range_str,
> >>> +                 is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
> >>> +        return;
> >>> +      }
> >>> +
> >>> +    if (n > 0 && (maxs < (mins = input_ranges[n - 1].max + 1) && maxs != -1))
> >>> +      {
> >>> +        warning (0, "Size ranges of option %s should be increasing",
> >>> +                 is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
> >>> +        return;
> >>> +      }
> >>> +
> >>> +    for (i = 0; i < last_alg; i++)
> >>> +      {
> >>> +        if (!strcmp (alg_name, stringop_alg_names[i]))
> >>> +       {
> >>> +         alg = (stringop_alg) i;
> >>> +         break;
> >>> +          }
> >>> +      }
> >>> +
> >>> +    if (i == last_alg)
> >>> +      {
> >>> +        warning (0, "Wrong stringop strategy name %s specified for option %s",
> >>> +              alg_name,
> >>> +                 is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
> >>> +     return;
> >>> +      }
> >>> +
> >>> +    input_ranges[n].min = mins;
> >>> +    input_ranges[n].max = maxs;
> >>> +    input_ranges[n].alg = alg;
> >>> +    if (!strcmp (align, "align"))
> >>> +      input_ranges[n].noalign = false;
> >>> +    else if (!strcmp (align, "noalign"))
> >>> +      input_ranges[n].noalign = true;
> >>> +    else
> >>> +      {
> >>> +        warning (0, "Unknown alignment %s specified for option %s",
> >>> +                 align, is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
> >>> +        return;
> >>> +      }
> >>> +    n++;
> >>> +    curr_range_str = next_range_str;
> >>> +  } while (curr_range_str);
> >>> +
> >>> +  if (input_ranges[n - 1].max != -1)
> >>> +    {
> >>> +      warning (0, "The max value for the last size range should be -1"
> >>> +               " for option %s",
> >>> +               is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
> >>> +      return;
> >>> +    }
> >>> +
> >>> +  if (n > MAX_STRINGOP_ALGS)
> >>> +    {
> >>> +      warning (0, "Too many size ranges specified in option %s",
> >>> +               is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
> >>> +      return;
> >>> +    }
> >>> +
> >>> +  /* Now override the default algs array  */
> >>> +  for (i = 0; i < n; i++)
> >>> +    {
> >>> +      *const_cast<int *>(&default_algs->size[i].max) = input_ranges[i].max;
> >>> +      *const_cast<stringop_alg *>(&default_algs->size[i].alg)
> >>> +          = input_ranges[i].alg;
> >>> +      *const_cast<int *>(&default_algs->size[i].noalign)
> >>> +          = input_ranges[i].noalign;
> >>> +    }
> >>> +}
> >>> +
> >>>
> >>>  /* Override various settings based on options.  If MAIN_ARGS_P, the
> >>>     options are from the command line, otherwise they are from
> >>> @@ -4021,6 +4165,21 @@ ix86_option_override_internal (bool main
> >>>    /* Handle stack protector */
> >>>    if (!global_options_set.x_ix86_stack_protector_guard)
> >>>      ix86_stack_protector_guard = TARGET_HAS_BIONIC ? SSP_GLOBAL : SSP_TLS;
> >>> +
> >>> +  /* Handle -mmemcpy-strategy= and -mmemset-strategy=  */
> >>> +  if (ix86_tune_memcpy_strategy)
> >>> +    {
> >>> +      char *str = xstrdup (ix86_tune_memcpy_strategy);
> >>> +      ix86_parse_stringop_strategy_string (str, false);
> >>> +      free (str);
> >>> +    }
> >>> +
> >>> +  if (ix86_tune_memset_strategy)
> >>> +    {
> >>> +      char *str = xstrdup (ix86_tune_memset_strategy);
> >>> +      ix86_parse_stringop_strategy_string (str, true);
> >>> +      free (str);
> >>> +    }
> >>>  }
> >>>
> >>>  /* Implement the TARGET_OPTION_OVERRIDE hook.  */
> >>> @@ -22903,6 +23062,7 @@ ix86_expand_movmem (rtx dst, rtx src, rt
> >>>      {
> >>>      case libcall:
> >>>      case no_stringop:
> >>> +    case last_alg:
> >>>        gcc_unreachable ();
> >>>      case loop_1_byte:
> >>>        need_zero_guard = true;
> >>> @@ -23093,6 +23253,7 @@ ix86_expand_movmem (rtx dst, rtx src, rt
> >>>      {
> >>>      case libcall:
> >>>      case no_stringop:
> >>> +    case last_alg:
> >>>        gcc_unreachable ();
> >>>      case loop_1_byte:
> >>>      case loop:
> >>> @@ -23304,6 +23465,7 @@ ix86_expand_setmem (rtx dst, rtx count_e
> >>>      {
> >>>      case libcall:
> >>>      case no_stringop:
> >>> +    case last_alg:
> >>>        gcc_unreachable ();
> >>>      case loop:
> >>>        need_zero_guard = true;
> >>> @@ -23481,6 +23643,7 @@ ix86_expand_setmem (rtx dst, rtx count_e
> >>>      {
> >>>      case libcall:
> >>>      case no_stringop:
> >>> +    case last_alg:
> >>>        gcc_unreachable ();
> >>>      case loop_1_byte:
> >>>      case loop:
> >>> Index: config/i386/i386-opts.h
> >>> ===================================================================
> >>> --- config/i386/i386-opts.h   (revision 201458)
> >>> +++ config/i386/i386-opts.h   (working copy)
> >>> @@ -28,15 +28,17 @@ see the files COPYING3 and COPYING.RUNTI
> >>>  /* Algorithm to expand string function with.  */
> >>>  enum stringop_alg
> >>>  {
> >>> -   no_stringop,
> >>> -   libcall,
> >>> -   rep_prefix_1_byte,
> >>> -   rep_prefix_4_byte,
> >>> -   rep_prefix_8_byte,
> >>> -   loop_1_byte,
> >>> -   loop,
> >>> -   unrolled_loop,
> >>> -   vector_loop
> >>> +#undef DEF_ENUM
> >>> +#define DEF_ENUM
> >>> +
> >>> +#undef DEF_ALG
> >>> +#define DEF_ALG(alg, name) alg,
> >>> +
> >>> +#include "stringop.def"
> >>> +last_alg
> >>> +
> >>> +#undef DEF_ENUM
> >>> +#undef DEF_ALG
> >>>  };
> >>>
> >>>  /* Available call abi.  */
> >>> Index: doc/invoke.texi
> >>> ===================================================================
> >>> --- doc/invoke.texi   (revision 201458)
> >>> +++ doc/invoke.texi   (working copy)
> >>> @@ -649,6 +649,7 @@ Objective-C and Objective-C++ Dialects}.
> >>>  -mbmi2 -mrtm -mlwp -mthreads @gol
> >>>  -mno-align-stringops  -minline-all-stringops @gol
> >>>  -minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol
> >>> +-mmemcpy-strategy=@var{strategy} -mmemset-strategy=@var{strategy}
> >>>  -mpush-args  -maccumulate-outgoing-args  -m128bit-long-double @gol
> >>>  -m96bit-long-double -mlong-double-64 -mlong-double-80 @gol
> >>>  -mregparm=@var{num}  -msseregparm @gol
> >>> @@ -14598,6 +14599,24 @@ Expand into an inline loop.
> >>>  Always use a library call.
> >>>  @end table
> >>>
> >>> +@item -mmemcpy-strategy=@var{strategy}
> >>> +@opindex mmemcpy-strategy=@var{strategy}
> >>> +Override the internal decision heuristic to decide if @code{__builtin_memcpy}
> >>> +should be inlined and what inline algorithm to use when the expected size
> >>> +of the copy operation is known. @var{strategy}
> >>> +is a comma-separated list of @var{alg}:@var{max_size}:@var{dest_align} triplets.
> >>> +@var{alg} is specified in @option{-mstringop-strategy}, @var{max_size} specifies
> >>> +the max byte size with which inline algorithm @var{alg} is allowed. For the last
> >>> +triplet, the @var{max_size} must be @code{-1}. The @var{max_size} of the triplets
> >>> +in the list must be specified in increasing order. The minimal byte size for
> >>> +@var{alg} is @code{0} for the first triplet and @code{@var{max_size} + 1} of the
> >>> +preceding range.
> >>> +
> >>> +@item -mmemset-strategy=@var{strategy}
> >>> +@opindex mmemset-strategy=@var{strategy}
> >>> +The option is similar to @option{-mmemcpy-strategy=} except that it is to control
> >>> +@code{__builtin_memset} expansion.
> >>> +
> >>>  @item -momit-leaf-frame-pointer
> >>>  @opindex momit-leaf-frame-pointer
> >>>  Don't keep the frame pointer in a register for leaf functions.  This
> >>> Index: testsuite/gcc.target/i386/memcpy-strategy-1.c
> >>> ===================================================================
> >>> --- testsuite/gcc.target/i386/memcpy-strategy-1.c     (revision 0)
> >>> +++ testsuite/gcc.target/i386/memcpy-strategy-1.c     (revision 0)
> >>> @@ -0,0 +1,12 @@
> >>> +/* { dg-do compile } */
> >>> +/* { dg-options "-O2 -march=atom -mmemcpy-strategy=vector_loop:-1:align" } */
> >>> +/* { dg-final { scan-assembler-times "movdqa" 8 { target { ! { ia32 } } } } } */
> >>> +/* { dg-final { scan-assembler-times "movdqa" 4 { target { ia32 } } } } */
> >>> +
> >>> +char a[2048];
> >>> +char b[2048];
> >>> +void t (void)
> >>> +{
> >>> +  __builtin_memcpy (a, b, 2048);
> >>> +}
> >>> +
> >>> Index: testsuite/gcc.target/i386/memcpy-strategy-2.c
> >>> ===================================================================
> >>> --- testsuite/gcc.target/i386/memcpy-strategy-2.c     (revision 0)
> >>> +++ testsuite/gcc.target/i386/memcpy-strategy-2.c     (revision 0)
> >>> @@ -0,0 +1,12 @@
> >>> +/* { dg-do compile } */
> >>> +/* { dg-options "-O2 -march=atom -mmemcpy-strategy=vector_loop:3000:align,libcall:-1:align" } */
> >>> +/* { dg-final { scan-assembler-times "movdqa" 8 { target { ! { ia32 } } } } } */
> >>> +/* { dg-final { scan-assembler-times "movdqa" 4 { target { ia32 } } } } */
> >>> +
> >>> +char a[2048];
> >>> +char b[2048];
> >>> +void t (void)
> >>> +{
> >>> +  __builtin_memcpy (a, b, 2048);
> >>> +}
> >>> +
> >>> Index: testsuite/gcc.target/i386/memset-strategy-1.c
> >>> ===================================================================
> >>> --- testsuite/gcc.target/i386/memset-strategy-1.c     (revision 0)
> >>> +++ testsuite/gcc.target/i386/memset-strategy-1.c     (revision 0)
> >>> @@ -0,0 +1,10 @@
> >>> +/* { dg-do compile } */
> >>> +/* { dg-options "-O2 -march=atom -mmemset-strategy=libcall:-1:align" } */
> >>> +/* { dg-final { scan-assembler-times "memset" 2  } } */
> >>> +
> >>> +char a[2048];
> >>> +void t (void)
> >>> +{
> >>> +  __builtin_memset (a, 1, 2048);
> >>> +}
> >>> +
> >>> Index: testsuite/gcc.target/i386/memcpy-strategy-3.c
> >>> ===================================================================
> >>> --- testsuite/gcc.target/i386/memcpy-strategy-3.c     (revision 0)
> >>> +++ testsuite/gcc.target/i386/memcpy-strategy-3.c     (revision 0)
> >>> @@ -0,0 +1,11 @@
> >>> +/* { dg-do compile } */
> >>> +/* { dg-options "-O2 -march=atom -mmemcpy-strategy=vector_loop:2000:align,libcall:-1:align" } */
> >>> +/* { dg-final { scan-assembler-times "memcpy" 2  } } */
> >>> +
> >>> +char a[2048];
> >>> +char b[2048];
> >>> +void t (void)
> >>> +{
> >>> +  __builtin_memcpy (a, b, 2048);
> >>> +}
> >>> +
> >>
> 
> 
> 
> -- 
> ---
> Best regards,
> Michael V. Zolotukhin,
> Software Engineer
> Intel Corporation.
Xinliang David Li Aug. 6, 2013, 4:07 p.m. UTC | #3
On Tue, Aug 6, 2013 at 2:42 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> >>> 2013-08-02  Xinliang David Li  <davidxl@google.com>
>> >>>
>> >>>         * config/i386/stringop.def: New file.
>> >>>         * config/i386/stringop.opt: New file.
>> >>>         * config/i386/i386-opts.h: Include stringopt.def.
>> >>>         * config/i386/i386.opt: Include stringopt.opt.
>> >>>         * config/i386/i386.c (ix86_option_override_internal):
>> >>>         Override default size based stringop inline strategies
>> >>>         with options.
>> >>>         * config/i386/i386.c (ix86_parse_stringop_strategy_string):
>> >>>         New function.
>> >>>
>> >>> 2013-08-04  Xinliang David Li  <davidxl@google.com>
>> >>>
>> >>>         * testsuite/gcc.target/i386/memcpy-strategy-1.c: New test.
>> >>>         * testsuite/gcc.target/i386/memcpy-strategy-2.c: Ditto.
>> >>>         * testsuite/gcc.target/i386/memset-strategy-1.c: Ditto.
>> >>>         * testsuite/gcc.target/i386/memcpy-strategy-3.c: Ditto.
>
> The patch looks resonable to me in general.  I wonder why we need to bring
> all the cost tables non-const instead of just having writable storage for
> the "current strategy" like we do with other flags anyway.

Having const on those arrays do not bring us anything -- those tables
will be accessed indirectly so const-prop won't happen anyways.
current_strategy is an embedded struct in the cost array so it ends up
in RO data when top level array is const.

>
> Your strings are definitely more readable than the in-memory representation
> I came up with. Perhaps we can even turn the cost tables into strings
> for easier maintenance?  I guess they are bit confusing for people
> not familiar with a code.

I think the in memory representation is fine -- if there is a need for
internal representation cleanup, it should done as another patch.
WDTY?

thanks,

David

>
> Honza
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On Fri, Aug 2, 2013 at 9:21 PM, Xinliang David Li <davidxl@google.com> wrote:
>> >>> > On x86_64, when the expected size of memcpy/memset is known (e.g, with
>> >>> > FDO), libcall strategy is used with the size is > 8192. This value is
>> >>> > hard coded, which makes it hard to do performance tuning. This patch
>> >>> > adds two new parameters to do that. Potential usage includes
>> >>> > per-application libcall strategy min-size tuning based on summary data
>> >>> > with FDO (e.g, instruction workset size).
>> >>> >
>> >>> > Bootstrap and tested on x86_64/linux. Ok for trunk?
>> >>> >
>> >>> > thanks,
>> >>> >
>> >>> > David
>> >>> >
>> >>> >
>> >>> > 2013-08-02  Xinliang David Li  <davidxl@google.com>
>> >>> >
>> >>> >         * params.def: New parameters.
>> >>> >         * config/i386/i386.c (ix86_option_override_internal):
>> >>> >         Override default libcall size limit with parameters.
>> >>
>> >>> Index: config/i386/stringop.def
>> >>> ===================================================================
>> >>> --- config/i386/stringop.def  (revision 0)
>> >>> +++ config/i386/stringop.def  (revision 0)
>> >>> @@ -0,0 +1,42 @@
>> >>> +/* Definitions for option handling for IA-32.
>> >>> +   Copyright (C) 2013 Free Software Foundation, Inc.
>> >>> +
>> >>> +This file is part of GCC.
>> >>> +
>> >>> +GCC is free software; you can redistribute it and/or modify
>> >>> +it under the terms of the GNU General Public License as published by
>> >>> +the Free Software Foundation; either version 3, or (at your option)
>> >>> +any later version.
>> >>> +
>> >>> +GCC is distributed in the hope that it will be useful,
>> >>> +but WITHOUT ANY WARRANTY; without even the implied warranty of
>> >>> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> >>> +GNU General Public License for more details.
>> >>> +
>> >>> +Under Section 7 of GPL version 3, you are granted additional
>> >>> +permissions described in the GCC Runtime Library Exception, version
>> >>> +3.1, as published by the Free Software Foundation.
>> >>> +
>> >>> +You should have received a copy of the GNU General Public License and
>> >>> +a copy of the GCC Runtime Library Exception along with this program;
>> >>> +see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>> >>> +<http://www.gnu.org/licenses/>.  */
>> >>> +
>> >>> +DEF_ENUM
>> >>> +DEF_ALG (no_stringop, no_stringop)
>> >>> +DEF_ENUM
>> >>> +DEF_ALG (libcall, libcall)
>> >>> +DEF_ENUM
>> >>> +DEF_ALG (rep_prefix_1_byte, rep_byte)
>> >>> +DEF_ENUM
>> >>> +DEF_ALG (rep_prefix_4_byte, rep_4byte)
>> >>> +DEF_ENUM
>> >>> +DEF_ALG (rep_prefix_8_byte, rep_8byte)
>> >>> +DEF_ENUM
>> >>> +DEF_ALG (loop_1_byte, byte_loop)
>> >>> +DEF_ENUM
>> >>> +DEF_ALG (loop, loop)
>> >>> +DEF_ENUM
>> >>> +DEF_ALG (unrolled_loop, unrolled_loop)
>> >>> +DEF_ENUM
>> >>> +DEF_ALG (vector_loop, vector_loop)
>> >>> Index: config/i386/i386.opt
>> >>> ===================================================================
>> >>> --- config/i386/i386.opt      (revision 201458)
>> >>> +++ config/i386/i386.opt      (working copy)
>> >>> @@ -316,6 +316,14 @@ mstack-arg-probe
>> >>>  Target Report Mask(STACK_PROBE) Save
>> >>>  Enable stack probing
>> >>>
>> >>> +mmemcpy-strategy=
>> >>> +Target RejectNegative Joined Var(ix86_tune_memcpy_strategy)
>> >>> +Specify memcpy expansion strategy when expected size is known
>> >>> +
>> >>> +mmemset-strategy=
>> >>> +Target RejectNegative Joined Var(ix86_tune_memset_strategy)
>> >>> +Specify memset expansion strategy when expected size is known
>> >>> +
>> >>>  mstringop-strategy=
>> >>>  Target RejectNegative Joined Enum(stringop_alg) Var(ix86_stringop_alg) Init(no_stringop)
>> >>>  Chose strategy to generate stringop using
>> >>> Index: config/i386/stringop.opt
>> >>> ===================================================================
>> >>> --- config/i386/stringop.opt  (revision 0)
>> >>> +++ config/i386/stringop.opt  (revision 0)
>> >>> @@ -0,0 +1,36 @@
>> >>> +/* Definitions for option handling for IA-32.
>> >>> +   Copyright (C) 2013 Free Software Foundation, Inc.
>> >>> +
>> >>> +This file is part of GCC.
>> >>> +
>> >>> +GCC is free software; you can redistribute it and/or modify
>> >>> +it under the terms of the GNU General Public License as published by
>> >>> +the Free Software Foundation; either version 3, or (at your option)
>> >>> +any later version.
>> >>> +
>> >>> +GCC is distributed in the hope that it will be useful,
>> >>> +but WITHOUT ANY WARRANTY; without even the implied warranty of
>> >>> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> >>> +GNU General Public License for more details.
>> >>> +
>> >>> +Under Section 7 of GPL version 3, you are granted additional
>> >>> +permissions described in the GCC Runtime Library Exception, version
>> >>> +3.1, as published by the Free Software Foundation.
>> >>> +
>> >>> +You should have received a copy of the GNU General Public License and
>> >>> +a copy of the GCC Runtime Library Exception along with this program;
>> >>> +see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>> >>> +<http://www.gnu.org/licenses/>.  */
>> >>> +
>> >>> +Enum(stringop_alg) String(rep_byte) Value(rep_prefix_1_byte)
>> >>> +
>> >>> +#undef DEF_ENUM
>> >>> +#define DEF_ENUM EnumValue
>> >>> +
>> >>> +#undef DEF_ALG
>> >>> +#define DEF_ALG(alg, name) Enum(stringop_alg) String(name) Value(alg)
>> >>> +
>> >>> +#include "stringop.def"
>> >>> +
>> >>> +#undef DEF_ENUM
>> >>> +#undef DEF_ALG
>> >>> Index: config/i386/i386.c
>> >>> ===================================================================
>> >>> --- config/i386/i386.c        (revision 201458)
>> >>> +++ config/i386/i386.c        (working copy)
>> >>> @@ -156,7 +156,7 @@ struct processor_costs ix86_size_cost =
>> >>>  };
>> >>>
>> >>>  /* Processor costs (relative to an add) */
>> >>> -static const
>> >>> +static
>> >>>  struct processor_costs i386_cost = { /* 386 specific costs */
>> >>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>> >>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
>> >>> @@ -226,7 +226,7 @@ struct processor_costs i386_cost = {      /*
>> >>>    1,                                 /* cond_not_taken_branch_cost.  */
>> >>>  };
>> >>>
>> >>> -static const
>> >>> +static
>> >>>  struct processor_costs i486_cost = { /* 486 specific costs */
>> >>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>> >>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
>> >>> @@ -298,7 +298,7 @@ struct processor_costs i486_cost = {      /*
>> >>>    1,                                 /* cond_not_taken_branch_cost.  */
>> >>>  };
>> >>>
>> >>> -static const
>> >>> +static
>> >>>  struct processor_costs pentium_cost = {
>> >>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>> >>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
>> >>> @@ -368,7 +368,7 @@ struct processor_costs pentium_cost = {
>> >>>    1,                                 /* cond_not_taken_branch_cost.  */
>> >>>  };
>> >>>
>> >>> -static const
>> >>> +static
>> >>>  struct processor_costs pentiumpro_cost = {
>> >>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>> >>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
>> >>> @@ -447,7 +447,7 @@ struct processor_costs pentiumpro_cost =
>> >>>    1,                                 /* cond_not_taken_branch_cost.  */
>> >>>  };
>> >>>
>> >>> -static const
>> >>> +static
>> >>>  struct processor_costs geode_cost = {
>> >>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>> >>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
>> >>> @@ -518,7 +518,7 @@ struct processor_costs geode_cost = {
>> >>>    1,                                 /* cond_not_taken_branch_cost.  */
>> >>>  };
>> >>>
>> >>> -static const
>> >>> +static
>> >>>  struct processor_costs k6_cost = {
>> >>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>> >>>    COSTS_N_INSNS (2),                 /* cost of a lea instruction */
>> >>> @@ -591,7 +591,7 @@ struct processor_costs k6_cost = {
>> >>>    1,                                 /* cond_not_taken_branch_cost.  */
>> >>>  };
>> >>>
>> >>> -static const
>> >>> +static
>> >>>  struct processor_costs athlon_cost = {
>> >>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>> >>>    COSTS_N_INSNS (2),                 /* cost of a lea instruction */
>> >>> @@ -664,7 +664,7 @@ struct processor_costs athlon_cost = {
>> >>>    1,                                 /* cond_not_taken_branch_cost.  */
>> >>>  };
>> >>>
>> >>> -static const
>> >>> +static
>> >>>  struct processor_costs k8_cost = {
>> >>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>> >>>    COSTS_N_INSNS (2),                 /* cost of a lea instruction */
>> >>> @@ -1265,7 +1265,7 @@ struct processor_costs btver2_cost = {
>> >>>    1,                                 /* cond_not_taken_branch_cost.  */
>> >>>  };
>> >>>
>> >>> -static const
>> >>> +static
>> >>>  struct processor_costs pentium4_cost = {
>> >>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>> >>>    COSTS_N_INSNS (3),                 /* cost of a lea instruction */
>> >>> @@ -1336,7 +1336,7 @@ struct processor_costs pentium4_cost = {
>> >>>    1,                                 /* cond_not_taken_branch_cost.  */
>> >>>  };
>> >>>
>> >>> -static const
>> >>> +static
>> >>>  struct processor_costs nocona_cost = {
>> >>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>> >>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
>> >>> @@ -1409,7 +1409,7 @@ struct processor_costs nocona_cost = {
>> >>>    1,                                 /* cond_not_taken_branch_cost.  */
>> >>>  };
>> >>>
>> >>> -static const
>> >>> +static
>> >>>  struct processor_costs atom_cost = {
>> >>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>> >>>    COSTS_N_INSNS (1) + 1,             /* cost of a lea instruction */
>> >>> @@ -1556,7 +1556,7 @@ struct processor_costs slm_cost = {
>> >>>  };
>> >>>
>> >>>  /* Generic64 should produce code tuned for Nocona and K8.  */
>> >>> -static const
>> >>> +static
>> >>>  struct processor_costs generic64_cost = {
>> >>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>> >>>    /* On all chips taken into consideration lea is 2 cycles and more.  With
>> >>> @@ -1635,7 +1635,7 @@ struct processor_costs generic64_cost =
>> >>>  };
>> >>>
>> >>>  /* core_cost should produce code tuned for Core familly of CPUs.  */
>> >>> -static const
>> >>> +static
>> >>>  struct processor_costs core_cost = {
>> >>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>> >>>    /* On all chips taken into consideration lea is 2 cycles and more.  With
>> >>> @@ -1717,7 +1717,7 @@ struct processor_costs core_cost = {
>> >>>
>> >>>  /* Generic32 should produce code tuned for PPro, Pentium4, Nocona,
>> >>>     Athlon and K8.  */
>> >>> -static const
>> >>> +static
>> >>>  struct processor_costs generic32_cost = {
>> >>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>> >>>    COSTS_N_INSNS (1) + 1,             /* cost of a lea instruction */
>> >>> @@ -2900,6 +2900,150 @@ ix86_debug_options (void)
>> >>>
>> >>>    return;
>> >>>  }
>> >>> +
>> >>> +static const char *stringop_alg_names[] = {
>> >>> +#define DEF_ENUM
>> >>> +#define DEF_ALG(alg, name) #name,
>> >>> +#include "stringop.def"
>> >>> +#undef DEF_ENUM
>> >>> +#undef DEF_ALG
>> >>> +};
>> >>> +
>> >>> +/* Parse parameter string passed to -mmemcpy-strategy= or -mmemset-strategy=.
>> >>> +   The string is of the following form (or comma separated list of it):
>> >>> +
>> >>> +     strategy_alg:max_size:[align|noalign]
>> >>> +
>> >>> +   where the full size range for the strategy is either [0, max_size] or
>> >>> +   [min_size, max_size], in which min_size is the max_size + 1 of the
>> >>> +   preceding range.  The last size range must have max_size == -1.
>> >>> +
>> >>> +   Examples:
>> >>> +
>> >>> +    1.
>> >>> +       -mmemcpy-strategy=libcall:-1:noalign
>> >>> +
>> >>> +      this is equivalent to (for known size memcpy) -mstringop-strategy=libcall
>> >>> +
>> >>> +
>> >>> +   2.
>> >>> +      -mmemset-strategy=rep_8byte:16:noalign,vector_loop:2048:align,libcall:-1:noalign
>> >>> +
>> >>> +      This is to tell the compiler to use the following strategy for memset
>> >>> +      1) when the expected size is between [1, 16], use rep_8byte strategy;
>> >>> +      2) when the size is between [17, 2048], use vector_loop;
>> >>> +      3) when the size is > 2048, use libcall.
>> >>> +
>> >>> +*/
>> >>> +
>> >>> +struct stringop_size_range
>> >>> +{
>> >>> +  int min;
>> >>> +  int max;
>> >>> +  stringop_alg alg;
>> >>> +  bool noalign;
>> >>> +};
>> >>> +
>> >>> +static void
>> >>> +ix86_parse_stringop_strategy_string (char *strategy_str, bool is_memset)
>> >>> +{
>> >>> +  const struct stringop_algs *default_algs;
>> >>> +  stringop_size_range input_ranges[MAX_STRINGOP_ALGS];
>> >>> +  char *curr_range_str, *next_range_str;
>> >>> +  int i = 0, n = 0;
>> >>> +
>> >>> +  if (is_memset)
>> >>> +    default_algs = &ix86_cost->memset[TARGET_64BIT != 0];
>> >>> +  else
>> >>> +    default_algs = &ix86_cost->memcpy[TARGET_64BIT != 0];
>> >>> +
>> >>> +  curr_range_str = strategy_str;
>> >>> +
>> >>> +  do {
>> >>> +
>> >>> +    int mins, maxs;
>> >>> +    stringop_alg alg;
>> >>> +    char alg_name[128];
>> >>> +    char align[16];
>> >>> +
>> >>> +    next_range_str = strchr (curr_range_str, ',');
>> >>> +    if (next_range_str)
>> >>> +      *next_range_str++ = '\0';
>> >>> +
>> >>> +    if (3 != sscanf (curr_range_str, "%[^:]:%d:%s", alg_name, &maxs, align))
>> >>> +      {
>> >>> +        warning (0, "Wrong arg %s to option %s", curr_range_str,
>> >>> +                 is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
>> >>> +        return;
>> >>> +      }
>> >>> +
>> >>> +    if (n > 0 && (maxs < (mins = input_ranges[n - 1].max + 1) && maxs != -1))
>> >>> +      {
>> >>> +        warning (0, "Size ranges of option %s should be increasing",
>> >>> +                 is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
>> >>> +        return;
>> >>> +      }
>> >>> +
>> >>> +    for (i = 0; i < last_alg; i++)
>> >>> +      {
>> >>> +        if (!strcmp (alg_name, stringop_alg_names[i]))
>> >>> +       {
>> >>> +         alg = (stringop_alg) i;
>> >>> +         break;
>> >>> +          }
>> >>> +      }
>> >>> +
>> >>> +    if (i == last_alg)
>> >>> +      {
>> >>> +        warning (0, "Wrong stringop strategy name %s specified for option %s",
>> >>> +              alg_name,
>> >>> +                 is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
>> >>> +     return;
>> >>> +      }
>> >>> +
>> >>> +    input_ranges[n].min = mins;
>> >>> +    input_ranges[n].max = maxs;
>> >>> +    input_ranges[n].alg = alg;
>> >>> +    if (!strcmp (align, "align"))
>> >>> +      input_ranges[n].noalign = false;
>> >>> +    else if (!strcmp (align, "noalign"))
>> >>> +      input_ranges[n].noalign = true;
>> >>> +    else
>> >>> +      {
>> >>> +        warning (0, "Unknown alignment %s specified for option %s",
>> >>> +                 align, is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
>> >>> +        return;
>> >>> +      }
>> >>> +    n++;
>> >>> +    curr_range_str = next_range_str;
>> >>> +  } while (curr_range_str);
>> >>> +
>> >>> +  if (input_ranges[n - 1].max != -1)
>> >>> +    {
>> >>> +      warning (0, "The max value for the last size range should be -1"
>> >>> +               " for option %s",
>> >>> +               is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
>> >>> +      return;
>> >>> +    }
>> >>> +
>> >>> +  if (n > MAX_STRINGOP_ALGS)
>> >>> +    {
>> >>> +      warning (0, "Too many size ranges specified in option %s",
>> >>> +               is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
>> >>> +      return;
>> >>> +    }
>> >>> +
>> >>> +  /* Now override the default algs array  */
>> >>> +  for (i = 0; i < n; i++)
>> >>> +    {
>> >>> +      *const_cast<int *>(&default_algs->size[i].max) = input_ranges[i].max;
>> >>> +      *const_cast<stringop_alg *>(&default_algs->size[i].alg)
>> >>> +          = input_ranges[i].alg;
>> >>> +      *const_cast<int *>(&default_algs->size[i].noalign)
>> >>> +          = input_ranges[i].noalign;
>> >>> +    }
>> >>> +}
>> >>> +
>> >>>
>> >>>  /* Override various settings based on options.  If MAIN_ARGS_P, the
>> >>>     options are from the command line, otherwise they are from
>> >>> @@ -4021,6 +4165,21 @@ ix86_option_override_internal (bool main
>> >>>    /* Handle stack protector */
>> >>>    if (!global_options_set.x_ix86_stack_protector_guard)
>> >>>      ix86_stack_protector_guard = TARGET_HAS_BIONIC ? SSP_GLOBAL : SSP_TLS;
>> >>> +
>> >>> +  /* Handle -mmemcpy-strategy= and -mmemset-strategy=  */
>> >>> +  if (ix86_tune_memcpy_strategy)
>> >>> +    {
>> >>> +      char *str = xstrdup (ix86_tune_memcpy_strategy);
>> >>> +      ix86_parse_stringop_strategy_string (str, false);
>> >>> +      free (str);
>> >>> +    }
>> >>> +
>> >>> +  if (ix86_tune_memset_strategy)
>> >>> +    {
>> >>> +      char *str = xstrdup (ix86_tune_memset_strategy);
>> >>> +      ix86_parse_stringop_strategy_string (str, true);
>> >>> +      free (str);
>> >>> +    }
>> >>>  }
>> >>>
>> >>>  /* Implement the TARGET_OPTION_OVERRIDE hook.  */
>> >>> @@ -22903,6 +23062,7 @@ ix86_expand_movmem (rtx dst, rtx src, rt
>> >>>      {
>> >>>      case libcall:
>> >>>      case no_stringop:
>> >>> +    case last_alg:
>> >>>        gcc_unreachable ();
>> >>>      case loop_1_byte:
>> >>>        need_zero_guard = true;
>> >>> @@ -23093,6 +23253,7 @@ ix86_expand_movmem (rtx dst, rtx src, rt
>> >>>      {
>> >>>      case libcall:
>> >>>      case no_stringop:
>> >>> +    case last_alg:
>> >>>        gcc_unreachable ();
>> >>>      case loop_1_byte:
>> >>>      case loop:
>> >>> @@ -23304,6 +23465,7 @@ ix86_expand_setmem (rtx dst, rtx count_e
>> >>>      {
>> >>>      case libcall:
>> >>>      case no_stringop:
>> >>> +    case last_alg:
>> >>>        gcc_unreachable ();
>> >>>      case loop:
>> >>>        need_zero_guard = true;
>> >>> @@ -23481,6 +23643,7 @@ ix86_expand_setmem (rtx dst, rtx count_e
>> >>>      {
>> >>>      case libcall:
>> >>>      case no_stringop:
>> >>> +    case last_alg:
>> >>>        gcc_unreachable ();
>> >>>      case loop_1_byte:
>> >>>      case loop:
>> >>> Index: config/i386/i386-opts.h
>> >>> ===================================================================
>> >>> --- config/i386/i386-opts.h   (revision 201458)
>> >>> +++ config/i386/i386-opts.h   (working copy)
>> >>> @@ -28,15 +28,17 @@ see the files COPYING3 and COPYING.RUNTI
>> >>>  /* Algorithm to expand string function with.  */
>> >>>  enum stringop_alg
>> >>>  {
>> >>> -   no_stringop,
>> >>> -   libcall,
>> >>> -   rep_prefix_1_byte,
>> >>> -   rep_prefix_4_byte,
>> >>> -   rep_prefix_8_byte,
>> >>> -   loop_1_byte,
>> >>> -   loop,
>> >>> -   unrolled_loop,
>> >>> -   vector_loop
>> >>> +#undef DEF_ENUM
>> >>> +#define DEF_ENUM
>> >>> +
>> >>> +#undef DEF_ALG
>> >>> +#define DEF_ALG(alg, name) alg,
>> >>> +
>> >>> +#include "stringop.def"
>> >>> +last_alg
>> >>> +
>> >>> +#undef DEF_ENUM
>> >>> +#undef DEF_ALG
>> >>>  };
>> >>>
>> >>>  /* Available call abi.  */
>> >>> Index: doc/invoke.texi
>> >>> ===================================================================
>> >>> --- doc/invoke.texi   (revision 201458)
>> >>> +++ doc/invoke.texi   (working copy)
>> >>> @@ -649,6 +649,7 @@ Objective-C and Objective-C++ Dialects}.
>> >>>  -mbmi2 -mrtm -mlwp -mthreads @gol
>> >>>  -mno-align-stringops  -minline-all-stringops @gol
>> >>>  -minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol
>> >>> +-mmemcpy-strategy=@var{strategy} -mmemset-strategy=@var{strategy}
>> >>>  -mpush-args  -maccumulate-outgoing-args  -m128bit-long-double @gol
>> >>>  -m96bit-long-double -mlong-double-64 -mlong-double-80 @gol
>> >>>  -mregparm=@var{num}  -msseregparm @gol
>> >>> @@ -14598,6 +14599,24 @@ Expand into an inline loop.
>> >>>  Always use a library call.
>> >>>  @end table
>> >>>
>> >>> +@item -mmemcpy-strategy=@var{strategy}
>> >>> +@opindex mmemcpy-strategy=@var{strategy}
>> >>> +Override the internal decision heuristic to decide if @code{__builtin_memcpy}
>> >>> +should be inlined and what inline algorithm to use when the expected size
>> >>> +of the copy operation is known. @var{strategy}
>> >>> +is a comma-separated list of @var{alg}:@var{max_size}:@var{dest_align} triplets.
>> >>> +@var{alg} is specified in @option{-mstringop-strategy}, @var{max_size} specifies
>> >>> +the max byte size with which inline algorithm @var{alg} is allowed. For the last
>> >>> +triplet, the @var{max_size} must be @code{-1}. The @var{max_size} of the triplets
>> >>> +in the list must be specified in increasing order. The minimal byte size for
>> >>> +@var{alg} is @code{0} for the first triplet and @code{@var{max_size} + 1} of the
>> >>> +preceding range.
>> >>> +
>> >>> +@item -mmemset-strategy=@var{strategy}
>> >>> +@opindex mmemset-strategy=@var{strategy}
>> >>> +The option is similar to @option{-mmemcpy-strategy=} except that it is to control
>> >>> +@code{__builtin_memset} expansion.
>> >>> +
>> >>>  @item -momit-leaf-frame-pointer
>> >>>  @opindex momit-leaf-frame-pointer
>> >>>  Don't keep the frame pointer in a register for leaf functions.  This
>> >>> Index: testsuite/gcc.target/i386/memcpy-strategy-1.c
>> >>> ===================================================================
>> >>> --- testsuite/gcc.target/i386/memcpy-strategy-1.c     (revision 0)
>> >>> +++ testsuite/gcc.target/i386/memcpy-strategy-1.c     (revision 0)
>> >>> @@ -0,0 +1,12 @@
>> >>> +/* { dg-do compile } */
>> >>> +/* { dg-options "-O2 -march=atom -mmemcpy-strategy=vector_loop:-1:align" } */
>> >>> +/* { dg-final { scan-assembler-times "movdqa" 8 { target { ! { ia32 } } } } } */
>> >>> +/* { dg-final { scan-assembler-times "movdqa" 4 { target { ia32 } } } } */
>> >>> +
>> >>> +char a[2048];
>> >>> +char b[2048];
>> >>> +void t (void)
>> >>> +{
>> >>> +  __builtin_memcpy (a, b, 2048);
>> >>> +}
>> >>> +
>> >>> Index: testsuite/gcc.target/i386/memcpy-strategy-2.c
>> >>> ===================================================================
>> >>> --- testsuite/gcc.target/i386/memcpy-strategy-2.c     (revision 0)
>> >>> +++ testsuite/gcc.target/i386/memcpy-strategy-2.c     (revision 0)
>> >>> @@ -0,0 +1,12 @@
>> >>> +/* { dg-do compile } */
>> >>> +/* { dg-options "-O2 -march=atom -mmemcpy-strategy=vector_loop:3000:align,libcall:-1:align" } */
>> >>> +/* { dg-final { scan-assembler-times "movdqa" 8 { target { ! { ia32 } } } } } */
>> >>> +/* { dg-final { scan-assembler-times "movdqa" 4 { target { ia32 } } } } */
>> >>> +
>> >>> +char a[2048];
>> >>> +char b[2048];
>> >>> +void t (void)
>> >>> +{
>> >>> +  __builtin_memcpy (a, b, 2048);
>> >>> +}
>> >>> +
>> >>> Index: testsuite/gcc.target/i386/memset-strategy-1.c
>> >>> ===================================================================
>> >>> --- testsuite/gcc.target/i386/memset-strategy-1.c     (revision 0)
>> >>> +++ testsuite/gcc.target/i386/memset-strategy-1.c     (revision 0)
>> >>> @@ -0,0 +1,10 @@
>> >>> +/* { dg-do compile } */
>> >>> +/* { dg-options "-O2 -march=atom -mmemset-strategy=libcall:-1:align" } */
>> >>> +/* { dg-final { scan-assembler-times "memset" 2  } } */
>> >>> +
>> >>> +char a[2048];
>> >>> +void t (void)
>> >>> +{
>> >>> +  __builtin_memset (a, 1, 2048);
>> >>> +}
>> >>> +
>> >>> Index: testsuite/gcc.target/i386/memcpy-strategy-3.c
>> >>> ===================================================================
>> >>> --- testsuite/gcc.target/i386/memcpy-strategy-3.c     (revision 0)
>> >>> +++ testsuite/gcc.target/i386/memcpy-strategy-3.c     (revision 0)
>> >>> @@ -0,0 +1,11 @@
>> >>> +/* { dg-do compile } */
>> >>> +/* { dg-options "-O2 -march=atom -mmemcpy-strategy=vector_loop:2000:align,libcall:-1:align" } */
>> >>> +/* { dg-final { scan-assembler-times "memcpy" 2  } } */
>> >>> +
>> >>> +char a[2048];
>> >>> +char b[2048];
>> >>> +void t (void)
>> >>> +{
>> >>> +  __builtin_memcpy (a, b, 2048);
>> >>> +}
>> >>> +
>> >>
>>
>>
>>
>> --
>> ---
>> Best regards,
>> Michael V. Zolotukhin,
>> Software Engineer
>> Intel Corporation.
Xinliang David Li Aug. 6, 2013, 4:42 p.m. UTC | #4
Corrected two small problems reported by the style checker (The
warnings about the EnumValue for options  in stringopt.opt are not
valid).

On Tue, Aug 6, 2013 at 1:46 AM, Michael Zolotukhin
<michael.v.zolotukhin@gmail.com> wrote:
> There are still some formatting issues (like 8 spaces instead of a
> tab, wrong indentation of do-loop and some other places) - to reveal
> some of them you could use contrib/check_GNU_style.sh script.
> But that was a nitpicking again:) Actually I wanted to ask whether
> you're going to use this option for some performance experiments
> involving memmov/memset - if so, probably you could tune existing
> cost-models as well? Is it possible?

the option is designed for purpose like this.

thanks,

David

>
> Michael
>
> On 5 August 2013 20:44, Xinliang David Li <davidxl@google.com> wrote:
>> thanks. Updated patch attached.
>>
>> David
>>
>> On Mon, Aug 5, 2013 at 3:57 AM, Michael V. Zolotukhin
>> <michael.v.zolotukhin@gmail.com> wrote:
>>> Hi,
>>> This is a really convenient option, thanks for working on it.
>>> I can't approve it as I'm not a maintainer, but it looks ok to me,
>>> except fot a small nitpicking: afair, comments should end with
>>> dot-space-space.
>>>
>>> Michael
>>>
>>> On 04 Aug 20:01, Xinliang David Li wrote:
>>>> The attached is a new patch implementing the stringop inline strategy
>>>> control using two new -m options:
>>>>
>>>> -mmemcpy-strategy=
>>>> -mmemset-strategy=
>>>>
>>>> See changes in doc/invoke.texi for description of the new options. Example:
>>>>   -mmemcpy-strategy=rep_8byte:64:unaligned,unrolled_loop:2048:unaligned,libcall:-1:unaligned
>>>>
>>>> tells compiler to inline memcpy using rep_8byte when the size is no
>>>> larger than 64 byte, using unrolled_loop when size is no larger than
>>>> 2048, and for size > 2048, using library call. In all cases,
>>>> destination alignment adjustment is not done.
>>>>
>>>> Tested on x86-64/linux. Ok for trunk?
>>>>
>>>> thanks,
>>>>
>>>> David
>>>>
>>>> 2013-08-02  Xinliang David Li  <davidxl@google.com>
>>>>
>>>>         * config/i386/stringop.def: New file.
>>>>         * config/i386/stringop.opt: New file.
>>>>         * config/i386/i386-opts.h: Include stringopt.def.
>>>>         * config/i386/i386.opt: Include stringopt.opt.
>>>>         * config/i386/i386.c (ix86_option_override_internal):
>>>>         Override default size based stringop inline strategies
>>>>         with options.
>>>>         * config/i386/i386.c (ix86_parse_stringop_strategy_string):
>>>>         New function.
>>>>
>>>> 2013-08-04  Xinliang David Li  <davidxl@google.com>
>>>>
>>>>         * testsuite/gcc.target/i386/memcpy-strategy-1.c: New test.
>>>>         * testsuite/gcc.target/i386/memcpy-strategy-2.c: Ditto.
>>>>         * testsuite/gcc.target/i386/memset-strategy-1.c: Ditto.
>>>>         * testsuite/gcc.target/i386/memcpy-strategy-3.c: Ditto.
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Aug 2, 2013 at 9:21 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>> > On x86_64, when the expected size of memcpy/memset is known (e.g, with
>>>> > FDO), libcall strategy is used with the size is > 8192. This value is
>>>> > hard coded, which makes it hard to do performance tuning. This patch
>>>> > adds two new parameters to do that. Potential usage includes
>>>> > per-application libcall strategy min-size tuning based on summary data
>>>> > with FDO (e.g, instruction workset size).
>>>> >
>>>> > Bootstrap and tested on x86_64/linux. Ok for trunk?
>>>> >
>>>> > thanks,
>>>> >
>>>> > David
>>>> >
>>>> >
>>>> > 2013-08-02  Xinliang David Li  <davidxl@google.com>
>>>> >
>>>> >         * params.def: New parameters.
>>>> >         * config/i386/i386.c (ix86_option_override_internal):
>>>> >         Override default libcall size limit with parameters.
>>>
>>>> Index: config/i386/stringop.def
>>>> ===================================================================
>>>> --- config/i386/stringop.def  (revision 0)
>>>> +++ config/i386/stringop.def  (revision 0)
>>>> @@ -0,0 +1,42 @@
>>>> +/* Definitions for option handling for IA-32.
>>>> +   Copyright (C) 2013 Free Software Foundation, Inc.
>>>> +
>>>> +This file is part of GCC.
>>>> +
>>>> +GCC is free software; you can redistribute it and/or modify
>>>> +it under the terms of the GNU General Public License as published by
>>>> +the Free Software Foundation; either version 3, or (at your option)
>>>> +any later version.
>>>> +
>>>> +GCC is distributed in the hope that it will be useful,
>>>> +but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>>> +GNU General Public License for more details.
>>>> +
>>>> +Under Section 7 of GPL version 3, you are granted additional
>>>> +permissions described in the GCC Runtime Library Exception, version
>>>> +3.1, as published by the Free Software Foundation.
>>>> +
>>>> +You should have received a copy of the GNU General Public License and
>>>> +a copy of the GCC Runtime Library Exception along with this program;
>>>> +see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>>>> +<http://www.gnu.org/licenses/>.  */
>>>> +
>>>> +DEF_ENUM
>>>> +DEF_ALG (no_stringop, no_stringop)
>>>> +DEF_ENUM
>>>> +DEF_ALG (libcall, libcall)
>>>> +DEF_ENUM
>>>> +DEF_ALG (rep_prefix_1_byte, rep_byte)
>>>> +DEF_ENUM
>>>> +DEF_ALG (rep_prefix_4_byte, rep_4byte)
>>>> +DEF_ENUM
>>>> +DEF_ALG (rep_prefix_8_byte, rep_8byte)
>>>> +DEF_ENUM
>>>> +DEF_ALG (loop_1_byte, byte_loop)
>>>> +DEF_ENUM
>>>> +DEF_ALG (loop, loop)
>>>> +DEF_ENUM
>>>> +DEF_ALG (unrolled_loop, unrolled_loop)
>>>> +DEF_ENUM
>>>> +DEF_ALG (vector_loop, vector_loop)
>>>> Index: config/i386/i386.opt
>>>> ===================================================================
>>>> --- config/i386/i386.opt      (revision 201458)
>>>> +++ config/i386/i386.opt      (working copy)
>>>> @@ -316,6 +316,14 @@ mstack-arg-probe
>>>>  Target Report Mask(STACK_PROBE) Save
>>>>  Enable stack probing
>>>>
>>>> +mmemcpy-strategy=
>>>> +Target RejectNegative Joined Var(ix86_tune_memcpy_strategy)
>>>> +Specify memcpy expansion strategy when expected size is known
>>>> +
>>>> +mmemset-strategy=
>>>> +Target RejectNegative Joined Var(ix86_tune_memset_strategy)
>>>> +Specify memset expansion strategy when expected size is known
>>>> +
>>>>  mstringop-strategy=
>>>>  Target RejectNegative Joined Enum(stringop_alg) Var(ix86_stringop_alg) Init(no_stringop)
>>>>  Chose strategy to generate stringop using
>>>> Index: config/i386/stringop.opt
>>>> ===================================================================
>>>> --- config/i386/stringop.opt  (revision 0)
>>>> +++ config/i386/stringop.opt  (revision 0)
>>>> @@ -0,0 +1,36 @@
>>>> +/* Definitions for option handling for IA-32.
>>>> +   Copyright (C) 2013 Free Software Foundation, Inc.
>>>> +
>>>> +This file is part of GCC.
>>>> +
>>>> +GCC is free software; you can redistribute it and/or modify
>>>> +it under the terms of the GNU General Public License as published by
>>>> +the Free Software Foundation; either version 3, or (at your option)
>>>> +any later version.
>>>> +
>>>> +GCC is distributed in the hope that it will be useful,
>>>> +but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>>> +GNU General Public License for more details.
>>>> +
>>>> +Under Section 7 of GPL version 3, you are granted additional
>>>> +permissions described in the GCC Runtime Library Exception, version
>>>> +3.1, as published by the Free Software Foundation.
>>>> +
>>>> +You should have received a copy of the GNU General Public License and
>>>> +a copy of the GCC Runtime Library Exception along with this program;
>>>> +see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>>>> +<http://www.gnu.org/licenses/>.  */
>>>> +
>>>> +Enum(stringop_alg) String(rep_byte) Value(rep_prefix_1_byte)
>>>> +
>>>> +#undef DEF_ENUM
>>>> +#define DEF_ENUM EnumValue
>>>> +
>>>> +#undef DEF_ALG
>>>> +#define DEF_ALG(alg, name) Enum(stringop_alg) String(name) Value(alg)
>>>> +
>>>> +#include "stringop.def"
>>>> +
>>>> +#undef DEF_ENUM
>>>> +#undef DEF_ALG
>>>> Index: config/i386/i386.c
>>>> ===================================================================
>>>> --- config/i386/i386.c        (revision 201458)
>>>> +++ config/i386/i386.c        (working copy)
>>>> @@ -156,7 +156,7 @@ struct processor_costs ix86_size_cost =
>>>>  };
>>>>
>>>>  /* Processor costs (relative to an add) */
>>>> -static const
>>>> +static
>>>>  struct processor_costs i386_cost = { /* 386 specific costs */
>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
>>>> @@ -226,7 +226,7 @@ struct processor_costs i386_cost = {      /*
>>>>    1,                                 /* cond_not_taken_branch_cost.  */
>>>>  };
>>>>
>>>> -static const
>>>> +static
>>>>  struct processor_costs i486_cost = { /* 486 specific costs */
>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
>>>> @@ -298,7 +298,7 @@ struct processor_costs i486_cost = {      /*
>>>>    1,                                 /* cond_not_taken_branch_cost.  */
>>>>  };
>>>>
>>>> -static const
>>>> +static
>>>>  struct processor_costs pentium_cost = {
>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
>>>> @@ -368,7 +368,7 @@ struct processor_costs pentium_cost = {
>>>>    1,                                 /* cond_not_taken_branch_cost.  */
>>>>  };
>>>>
>>>> -static const
>>>> +static
>>>>  struct processor_costs pentiumpro_cost = {
>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
>>>> @@ -447,7 +447,7 @@ struct processor_costs pentiumpro_cost =
>>>>    1,                                 /* cond_not_taken_branch_cost.  */
>>>>  };
>>>>
>>>> -static const
>>>> +static
>>>>  struct processor_costs geode_cost = {
>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
>>>> @@ -518,7 +518,7 @@ struct processor_costs geode_cost = {
>>>>    1,                                 /* cond_not_taken_branch_cost.  */
>>>>  };
>>>>
>>>> -static const
>>>> +static
>>>>  struct processor_costs k6_cost = {
>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>    COSTS_N_INSNS (2),                 /* cost of a lea instruction */
>>>> @@ -591,7 +591,7 @@ struct processor_costs k6_cost = {
>>>>    1,                                 /* cond_not_taken_branch_cost.  */
>>>>  };
>>>>
>>>> -static const
>>>> +static
>>>>  struct processor_costs athlon_cost = {
>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>    COSTS_N_INSNS (2),                 /* cost of a lea instruction */
>>>> @@ -664,7 +664,7 @@ struct processor_costs athlon_cost = {
>>>>    1,                                 /* cond_not_taken_branch_cost.  */
>>>>  };
>>>>
>>>> -static const
>>>> +static
>>>>  struct processor_costs k8_cost = {
>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>    COSTS_N_INSNS (2),                 /* cost of a lea instruction */
>>>> @@ -1265,7 +1265,7 @@ struct processor_costs btver2_cost = {
>>>>    1,                                 /* cond_not_taken_branch_cost.  */
>>>>  };
>>>>
>>>> -static const
>>>> +static
>>>>  struct processor_costs pentium4_cost = {
>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>    COSTS_N_INSNS (3),                 /* cost of a lea instruction */
>>>> @@ -1336,7 +1336,7 @@ struct processor_costs pentium4_cost = {
>>>>    1,                                 /* cond_not_taken_branch_cost.  */
>>>>  };
>>>>
>>>> -static const
>>>> +static
>>>>  struct processor_costs nocona_cost = {
>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
>>>> @@ -1409,7 +1409,7 @@ struct processor_costs nocona_cost = {
>>>>    1,                                 /* cond_not_taken_branch_cost.  */
>>>>  };
>>>>
>>>> -static const
>>>> +static
>>>>  struct processor_costs atom_cost = {
>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>    COSTS_N_INSNS (1) + 1,             /* cost of a lea instruction */
>>>> @@ -1556,7 +1556,7 @@ struct processor_costs slm_cost = {
>>>>  };
>>>>
>>>>  /* Generic64 should produce code tuned for Nocona and K8.  */
>>>> -static const
>>>> +static
>>>>  struct processor_costs generic64_cost = {
>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>    /* On all chips taken into consideration lea is 2 cycles and more.  With
>>>> @@ -1635,7 +1635,7 @@ struct processor_costs generic64_cost =
>>>>  };
>>>>
>>>>  /* core_cost should produce code tuned for Core familly of CPUs.  */
>>>> -static const
>>>> +static
>>>>  struct processor_costs core_cost = {
>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>    /* On all chips taken into consideration lea is 2 cycles and more.  With
>>>> @@ -1717,7 +1717,7 @@ struct processor_costs core_cost = {
>>>>
>>>>  /* Generic32 should produce code tuned for PPro, Pentium4, Nocona,
>>>>     Athlon and K8.  */
>>>> -static const
>>>> +static
>>>>  struct processor_costs generic32_cost = {
>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>    COSTS_N_INSNS (1) + 1,             /* cost of a lea instruction */
>>>> @@ -2900,6 +2900,150 @@ ix86_debug_options (void)
>>>>
>>>>    return;
>>>>  }
>>>> +
>>>> +static const char *stringop_alg_names[] = {
>>>> +#define DEF_ENUM
>>>> +#define DEF_ALG(alg, name) #name,
>>>> +#include "stringop.def"
>>>> +#undef DEF_ENUM
>>>> +#undef DEF_ALG
>>>> +};
>>>> +
>>>> +/* Parse parameter string passed to -mmemcpy-strategy= or -mmemset-strategy=.
>>>> +   The string is of the following form (or comma separated list of it):
>>>> +
>>>> +     strategy_alg:max_size:[align|noalign]
>>>> +
>>>> +   where the full size range for the strategy is either [0, max_size] or
>>>> +   [min_size, max_size], in which min_size is the max_size + 1 of the
>>>> +   preceding range.  The last size range must have max_size == -1.
>>>> +
>>>> +   Examples:
>>>> +
>>>> +    1.
>>>> +       -mmemcpy-strategy=libcall:-1:noalign
>>>> +
>>>> +      this is equivalent to (for known size memcpy) -mstringop-strategy=libcall
>>>> +
>>>> +
>>>> +   2.
>>>> +      -mmemset-strategy=rep_8byte:16:noalign,vector_loop:2048:align,libcall:-1:noalign
>>>> +
>>>> +      This is to tell the compiler to use the following strategy for memset
>>>> +      1) when the expected size is between [1, 16], use rep_8byte strategy;
>>>> +      2) when the size is between [17, 2048], use vector_loop;
>>>> +      3) when the size is > 2048, use libcall.
>>>> +
>>>> +*/
>>>> +
>>>> +struct stringop_size_range
>>>> +{
>>>> +  int min;
>>>> +  int max;
>>>> +  stringop_alg alg;
>>>> +  bool noalign;
>>>> +};
>>>> +
>>>> +static void
>>>> +ix86_parse_stringop_strategy_string (char *strategy_str, bool is_memset)
>>>> +{
>>>> +  const struct stringop_algs *default_algs;
>>>> +  stringop_size_range input_ranges[MAX_STRINGOP_ALGS];
>>>> +  char *curr_range_str, *next_range_str;
>>>> +  int i = 0, n = 0;
>>>> +
>>>> +  if (is_memset)
>>>> +    default_algs = &ix86_cost->memset[TARGET_64BIT != 0];
>>>> +  else
>>>> +    default_algs = &ix86_cost->memcpy[TARGET_64BIT != 0];
>>>> +
>>>> +  curr_range_str = strategy_str;
>>>> +
>>>> +  do {
>>>> +
>>>> +    int mins, maxs;
>>>> +    stringop_alg alg;
>>>> +    char alg_name[128];
>>>> +    char align[16];
>>>> +
>>>> +    next_range_str = strchr (curr_range_str, ',');
>>>> +    if (next_range_str)
>>>> +      *next_range_str++ = '\0';
>>>> +
>>>> +    if (3 != sscanf (curr_range_str, "%[^:]:%d:%s", alg_name, &maxs, align))
>>>> +      {
>>>> +        warning (0, "Wrong arg %s to option %s", curr_range_str,
>>>> +                 is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
>>>> +        return;
>>>> +      }
>>>> +
>>>> +    if (n > 0 && (maxs < (mins = input_ranges[n - 1].max + 1) && maxs != -1))
>>>> +      {
>>>> +        warning (0, "Size ranges of option %s should be increasing",
>>>> +                 is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
>>>> +        return;
>>>> +      }
>>>> +
>>>> +    for (i = 0; i < last_alg; i++)
>>>> +      {
>>>> +        if (!strcmp (alg_name, stringop_alg_names[i]))
>>>> +       {
>>>> +         alg = (stringop_alg) i;
>>>> +         break;
>>>> +          }
>>>> +      }
>>>> +
>>>> +    if (i == last_alg)
>>>> +      {
>>>> +        warning (0, "Wrong stringop strategy name %s specified for option %s",
>>>> +              alg_name,
>>>> +                 is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
>>>> +     return;
>>>> +      }
>>>> +
>>>> +    input_ranges[n].min = mins;
>>>> +    input_ranges[n].max = maxs;
>>>> +    input_ranges[n].alg = alg;
>>>> +    if (!strcmp (align, "align"))
>>>> +      input_ranges[n].noalign = false;
>>>> +    else if (!strcmp (align, "noalign"))
>>>> +      input_ranges[n].noalign = true;
>>>> +    else
>>>> +      {
>>>> +        warning (0, "Unknown alignment %s specified for option %s",
>>>> +                 align, is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
>>>> +        return;
>>>> +      }
>>>> +    n++;
>>>> +    curr_range_str = next_range_str;
>>>> +  } while (curr_range_str);
>>>> +
>>>> +  if (input_ranges[n - 1].max != -1)
>>>> +    {
>>>> +      warning (0, "The max value for the last size range should be -1"
>>>> +               " for option %s",
>>>> +               is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
>>>> +      return;
>>>> +    }
>>>> +
>>>> +  if (n > MAX_STRINGOP_ALGS)
>>>> +    {
>>>> +      warning (0, "Too many size ranges specified in option %s",
>>>> +               is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
>>>> +      return;
>>>> +    }
>>>> +
>>>> +  /* Now override the default algs array  */
>>>> +  for (i = 0; i < n; i++)
>>>> +    {
>>>> +      *const_cast<int *>(&default_algs->size[i].max) = input_ranges[i].max;
>>>> +      *const_cast<stringop_alg *>(&default_algs->size[i].alg)
>>>> +          = input_ranges[i].alg;
>>>> +      *const_cast<int *>(&default_algs->size[i].noalign)
>>>> +          = input_ranges[i].noalign;
>>>> +    }
>>>> +}
>>>> +
>>>>
>>>>  /* Override various settings based on options.  If MAIN_ARGS_P, the
>>>>     options are from the command line, otherwise they are from
>>>> @@ -4021,6 +4165,21 @@ ix86_option_override_internal (bool main
>>>>    /* Handle stack protector */
>>>>    if (!global_options_set.x_ix86_stack_protector_guard)
>>>>      ix86_stack_protector_guard = TARGET_HAS_BIONIC ? SSP_GLOBAL : SSP_TLS;
>>>> +
>>>> +  /* Handle -mmemcpy-strategy= and -mmemset-strategy=  */
>>>> +  if (ix86_tune_memcpy_strategy)
>>>> +    {
>>>> +      char *str = xstrdup (ix86_tune_memcpy_strategy);
>>>> +      ix86_parse_stringop_strategy_string (str, false);
>>>> +      free (str);
>>>> +    }
>>>> +
>>>> +  if (ix86_tune_memset_strategy)
>>>> +    {
>>>> +      char *str = xstrdup (ix86_tune_memset_strategy);
>>>> +      ix86_parse_stringop_strategy_string (str, true);
>>>> +      free (str);
>>>> +    }
>>>>  }
>>>>
>>>>  /* Implement the TARGET_OPTION_OVERRIDE hook.  */
>>>> @@ -22903,6 +23062,7 @@ ix86_expand_movmem (rtx dst, rtx src, rt
>>>>      {
>>>>      case libcall:
>>>>      case no_stringop:
>>>> +    case last_alg:
>>>>        gcc_unreachable ();
>>>>      case loop_1_byte:
>>>>        need_zero_guard = true;
>>>> @@ -23093,6 +23253,7 @@ ix86_expand_movmem (rtx dst, rtx src, rt
>>>>      {
>>>>      case libcall:
>>>>      case no_stringop:
>>>> +    case last_alg:
>>>>        gcc_unreachable ();
>>>>      case loop_1_byte:
>>>>      case loop:
>>>> @@ -23304,6 +23465,7 @@ ix86_expand_setmem (rtx dst, rtx count_e
>>>>      {
>>>>      case libcall:
>>>>      case no_stringop:
>>>> +    case last_alg:
>>>>        gcc_unreachable ();
>>>>      case loop:
>>>>        need_zero_guard = true;
>>>> @@ -23481,6 +23643,7 @@ ix86_expand_setmem (rtx dst, rtx count_e
>>>>      {
>>>>      case libcall:
>>>>      case no_stringop:
>>>> +    case last_alg:
>>>>        gcc_unreachable ();
>>>>      case loop_1_byte:
>>>>      case loop:
>>>> Index: config/i386/i386-opts.h
>>>> ===================================================================
>>>> --- config/i386/i386-opts.h   (revision 201458)
>>>> +++ config/i386/i386-opts.h   (working copy)
>>>> @@ -28,15 +28,17 @@ see the files COPYING3 and COPYING.RUNTI
>>>>  /* Algorithm to expand string function with.  */
>>>>  enum stringop_alg
>>>>  {
>>>> -   no_stringop,
>>>> -   libcall,
>>>> -   rep_prefix_1_byte,
>>>> -   rep_prefix_4_byte,
>>>> -   rep_prefix_8_byte,
>>>> -   loop_1_byte,
>>>> -   loop,
>>>> -   unrolled_loop,
>>>> -   vector_loop
>>>> +#undef DEF_ENUM
>>>> +#define DEF_ENUM
>>>> +
>>>> +#undef DEF_ALG
>>>> +#define DEF_ALG(alg, name) alg,
>>>> +
>>>> +#include "stringop.def"
>>>> +last_alg
>>>> +
>>>> +#undef DEF_ENUM
>>>> +#undef DEF_ALG
>>>>  };
>>>>
>>>>  /* Available call abi.  */
>>>> Index: doc/invoke.texi
>>>> ===================================================================
>>>> --- doc/invoke.texi   (revision 201458)
>>>> +++ doc/invoke.texi   (working copy)
>>>> @@ -649,6 +649,7 @@ Objective-C and Objective-C++ Dialects}.
>>>>  -mbmi2 -mrtm -mlwp -mthreads @gol
>>>>  -mno-align-stringops  -minline-all-stringops @gol
>>>>  -minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol
>>>> +-mmemcpy-strategy=@var{strategy} -mmemset-strategy=@var{strategy}
>>>>  -mpush-args  -maccumulate-outgoing-args  -m128bit-long-double @gol
>>>>  -m96bit-long-double -mlong-double-64 -mlong-double-80 @gol
>>>>  -mregparm=@var{num}  -msseregparm @gol
>>>> @@ -14598,6 +14599,24 @@ Expand into an inline loop.
>>>>  Always use a library call.
>>>>  @end table
>>>>
>>>> +@item -mmemcpy-strategy=@var{strategy}
>>>> +@opindex mmemcpy-strategy=@var{strategy}
>>>> +Override the internal decision heuristic to decide if @code{__builtin_memcpy}
>>>> +should be inlined and what inline algorithm to use when the expected size
>>>> +of the copy operation is known. @var{strategy}
>>>> +is a comma-separated list of @var{alg}:@var{max_size}:@var{dest_align} triplets.
>>>> +@var{alg} is specified in @option{-mstringop-strategy}, @var{max_size} specifies
>>>> +the max byte size with which inline algorithm @var{alg} is allowed. For the last
>>>> +triplet, the @var{max_size} must be @code{-1}. The @var{max_size} of the triplets
>>>> +in the list must be specified in increasing order. The minimal byte size for
>>>> +@var{alg} is @code{0} for the first triplet and @code{@var{max_size} + 1} of the
>>>> +preceding range.
>>>> +
>>>> +@item -mmemset-strategy=@var{strategy}
>>>> +@opindex mmemset-strategy=@var{strategy}
>>>> +The option is similar to @option{-mmemcpy-strategy=} except that it is to control
>>>> +@code{__builtin_memset} expansion.
>>>> +
>>>>  @item -momit-leaf-frame-pointer
>>>>  @opindex momit-leaf-frame-pointer
>>>>  Don't keep the frame pointer in a register for leaf functions.  This
>>>> Index: testsuite/gcc.target/i386/memcpy-strategy-1.c
>>>> ===================================================================
>>>> --- testsuite/gcc.target/i386/memcpy-strategy-1.c     (revision 0)
>>>> +++ testsuite/gcc.target/i386/memcpy-strategy-1.c     (revision 0)
>>>> @@ -0,0 +1,12 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -march=atom -mmemcpy-strategy=vector_loop:-1:align" } */
>>>> +/* { dg-final { scan-assembler-times "movdqa" 8 { target { ! { ia32 } } } } } */
>>>> +/* { dg-final { scan-assembler-times "movdqa" 4 { target { ia32 } } } } */
>>>> +
>>>> +char a[2048];
>>>> +char b[2048];
>>>> +void t (void)
>>>> +{
>>>> +  __builtin_memcpy (a, b, 2048);
>>>> +}
>>>> +
>>>> Index: testsuite/gcc.target/i386/memcpy-strategy-2.c
>>>> ===================================================================
>>>> --- testsuite/gcc.target/i386/memcpy-strategy-2.c     (revision 0)
>>>> +++ testsuite/gcc.target/i386/memcpy-strategy-2.c     (revision 0)
>>>> @@ -0,0 +1,12 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -march=atom -mmemcpy-strategy=vector_loop:3000:align,libcall:-1:align" } */
>>>> +/* { dg-final { scan-assembler-times "movdqa" 8 { target { ! { ia32 } } } } } */
>>>> +/* { dg-final { scan-assembler-times "movdqa" 4 { target { ia32 } } } } */
>>>> +
>>>> +char a[2048];
>>>> +char b[2048];
>>>> +void t (void)
>>>> +{
>>>> +  __builtin_memcpy (a, b, 2048);
>>>> +}
>>>> +
>>>> Index: testsuite/gcc.target/i386/memset-strategy-1.c
>>>> ===================================================================
>>>> --- testsuite/gcc.target/i386/memset-strategy-1.c     (revision 0)
>>>> +++ testsuite/gcc.target/i386/memset-strategy-1.c     (revision 0)
>>>> @@ -0,0 +1,10 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -march=atom -mmemset-strategy=libcall:-1:align" } */
>>>> +/* { dg-final { scan-assembler-times "memset" 2  } } */
>>>> +
>>>> +char a[2048];
>>>> +void t (void)
>>>> +{
>>>> +  __builtin_memset (a, 1, 2048);
>>>> +}
>>>> +
>>>> Index: testsuite/gcc.target/i386/memcpy-strategy-3.c
>>>> ===================================================================
>>>> --- testsuite/gcc.target/i386/memcpy-strategy-3.c     (revision 0)
>>>> +++ testsuite/gcc.target/i386/memcpy-strategy-3.c     (revision 0)
>>>> @@ -0,0 +1,11 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -march=atom -mmemcpy-strategy=vector_loop:2000:align,libcall:-1:align" } */
>>>> +/* { dg-final { scan-assembler-times "memcpy" 2  } } */
>>>> +
>>>> +char a[2048];
>>>> +char b[2048];
>>>> +void t (void)
>>>> +{
>>>> +  __builtin_memcpy (a, b, 2048);
>>>> +}
>>>> +
>>>
>
>
>
> --
> ---
> Best regards,
> Michael V. Zolotukhin,
> Software Engineer
> Intel Corporation.
Michael Zolotukhin Aug. 7, 2013, 7:14 a.m. UTC | #5
> the option is designed for purpose like this.
That's great, thanks!

Michael
> David

On 6 August 2013 20:42, Xinliang David Li <davidxl@google.com> wrote:
> Corrected two small problems reported by the style checker (The
> warnings about the EnumValue for options  in stringopt.opt are not
> valid).
>
> On Tue, Aug 6, 2013 at 1:46 AM, Michael Zolotukhin
> <michael.v.zolotukhin@gmail.com> wrote:
>> There are still some formatting issues (like 8 spaces instead of a
>> tab, wrong indentation of do-loop and some other places) - to reveal
>> some of them you could use contrib/check_GNU_style.sh script.
>> But that was a nitpicking again:) Actually I wanted to ask whether
>> you're going to use this option for some performance experiments
>> involving memmov/memset - if so, probably you could tune existing
>> cost-models as well? Is it possible?
>
> the option is designed for purpose like this.
>
> thanks,
>
> David
>
>>
>> Michael
>>
>> On 5 August 2013 20:44, Xinliang David Li <davidxl@google.com> wrote:
>>> thanks. Updated patch attached.
>>>
>>> David
>>>
>>> On Mon, Aug 5, 2013 at 3:57 AM, Michael V. Zolotukhin
>>> <michael.v.zolotukhin@gmail.com> wrote:
>>>> Hi,
>>>> This is a really convenient option, thanks for working on it.
>>>> I can't approve it as I'm not a maintainer, but it looks ok to me,
>>>> except fot a small nitpicking: afair, comments should end with
>>>> dot-space-space.
>>>>
>>>> Michael
>>>>
>>>> On 04 Aug 20:01, Xinliang David Li wrote:
>>>>> The attached is a new patch implementing the stringop inline strategy
>>>>> control using two new -m options:
>>>>>
>>>>> -mmemcpy-strategy=
>>>>> -mmemset-strategy=
>>>>>
>>>>> See changes in doc/invoke.texi for description of the new options. Example:
>>>>>   -mmemcpy-strategy=rep_8byte:64:unaligned,unrolled_loop:2048:unaligned,libcall:-1:unaligned
>>>>>
>>>>> tells compiler to inline memcpy using rep_8byte when the size is no
>>>>> larger than 64 byte, using unrolled_loop when size is no larger than
>>>>> 2048, and for size > 2048, using library call. In all cases,
>>>>> destination alignment adjustment is not done.
>>>>>
>>>>> Tested on x86-64/linux. Ok for trunk?
>>>>>
>>>>> thanks,
>>>>>
>>>>> David
>>>>>
>>>>> 2013-08-02  Xinliang David Li  <davidxl@google.com>
>>>>>
>>>>>         * config/i386/stringop.def: New file.
>>>>>         * config/i386/stringop.opt: New file.
>>>>>         * config/i386/i386-opts.h: Include stringopt.def.
>>>>>         * config/i386/i386.opt: Include stringopt.opt.
>>>>>         * config/i386/i386.c (ix86_option_override_internal):
>>>>>         Override default size based stringop inline strategies
>>>>>         with options.
>>>>>         * config/i386/i386.c (ix86_parse_stringop_strategy_string):
>>>>>         New function.
>>>>>
>>>>> 2013-08-04  Xinliang David Li  <davidxl@google.com>
>>>>>
>>>>>         * testsuite/gcc.target/i386/memcpy-strategy-1.c: New test.
>>>>>         * testsuite/gcc.target/i386/memcpy-strategy-2.c: Ditto.
>>>>>         * testsuite/gcc.target/i386/memset-strategy-1.c: Ditto.
>>>>>         * testsuite/gcc.target/i386/memcpy-strategy-3.c: Ditto.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Aug 2, 2013 at 9:21 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>>> > On x86_64, when the expected size of memcpy/memset is known (e.g, with
>>>>> > FDO), libcall strategy is used with the size is > 8192. This value is
>>>>> > hard coded, which makes it hard to do performance tuning. This patch
>>>>> > adds two new parameters to do that. Potential usage includes
>>>>> > per-application libcall strategy min-size tuning based on summary data
>>>>> > with FDO (e.g, instruction workset size).
>>>>> >
>>>>> > Bootstrap and tested on x86_64/linux. Ok for trunk?
>>>>> >
>>>>> > thanks,
>>>>> >
>>>>> > David
>>>>> >
>>>>> >
>>>>> > 2013-08-02  Xinliang David Li  <davidxl@google.com>
>>>>> >
>>>>> >         * params.def: New parameters.
>>>>> >         * config/i386/i386.c (ix86_option_override_internal):
>>>>> >         Override default libcall size limit with parameters.
>>>>
>>>>> Index: config/i386/stringop.def
>>>>> ===================================================================
>>>>> --- config/i386/stringop.def  (revision 0)
>>>>> +++ config/i386/stringop.def  (revision 0)
>>>>> @@ -0,0 +1,42 @@
>>>>> +/* Definitions for option handling for IA-32.
>>>>> +   Copyright (C) 2013 Free Software Foundation, Inc.
>>>>> +
>>>>> +This file is part of GCC.
>>>>> +
>>>>> +GCC is free software; you can redistribute it and/or modify
>>>>> +it under the terms of the GNU General Public License as published by
>>>>> +the Free Software Foundation; either version 3, or (at your option)
>>>>> +any later version.
>>>>> +
>>>>> +GCC is distributed in the hope that it will be useful,
>>>>> +but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>>> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>>>> +GNU General Public License for more details.
>>>>> +
>>>>> +Under Section 7 of GPL version 3, you are granted additional
>>>>> +permissions described in the GCC Runtime Library Exception, version
>>>>> +3.1, as published by the Free Software Foundation.
>>>>> +
>>>>> +You should have received a copy of the GNU General Public License and
>>>>> +a copy of the GCC Runtime Library Exception along with this program;
>>>>> +see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>>>>> +<http://www.gnu.org/licenses/>.  */
>>>>> +
>>>>> +DEF_ENUM
>>>>> +DEF_ALG (no_stringop, no_stringop)
>>>>> +DEF_ENUM
>>>>> +DEF_ALG (libcall, libcall)
>>>>> +DEF_ENUM
>>>>> +DEF_ALG (rep_prefix_1_byte, rep_byte)
>>>>> +DEF_ENUM
>>>>> +DEF_ALG (rep_prefix_4_byte, rep_4byte)
>>>>> +DEF_ENUM
>>>>> +DEF_ALG (rep_prefix_8_byte, rep_8byte)
>>>>> +DEF_ENUM
>>>>> +DEF_ALG (loop_1_byte, byte_loop)
>>>>> +DEF_ENUM
>>>>> +DEF_ALG (loop, loop)
>>>>> +DEF_ENUM
>>>>> +DEF_ALG (unrolled_loop, unrolled_loop)
>>>>> +DEF_ENUM
>>>>> +DEF_ALG (vector_loop, vector_loop)
>>>>> Index: config/i386/i386.opt
>>>>> ===================================================================
>>>>> --- config/i386/i386.opt      (revision 201458)
>>>>> +++ config/i386/i386.opt      (working copy)
>>>>> @@ -316,6 +316,14 @@ mstack-arg-probe
>>>>>  Target Report Mask(STACK_PROBE) Save
>>>>>  Enable stack probing
>>>>>
>>>>> +mmemcpy-strategy=
>>>>> +Target RejectNegative Joined Var(ix86_tune_memcpy_strategy)
>>>>> +Specify memcpy expansion strategy when expected size is known
>>>>> +
>>>>> +mmemset-strategy=
>>>>> +Target RejectNegative Joined Var(ix86_tune_memset_strategy)
>>>>> +Specify memset expansion strategy when expected size is known
>>>>> +
>>>>>  mstringop-strategy=
>>>>>  Target RejectNegative Joined Enum(stringop_alg) Var(ix86_stringop_alg) Init(no_stringop)
>>>>>  Chose strategy to generate stringop using
>>>>> Index: config/i386/stringop.opt
>>>>> ===================================================================
>>>>> --- config/i386/stringop.opt  (revision 0)
>>>>> +++ config/i386/stringop.opt  (revision 0)
>>>>> @@ -0,0 +1,36 @@
>>>>> +/* Definitions for option handling for IA-32.
>>>>> +   Copyright (C) 2013 Free Software Foundation, Inc.
>>>>> +
>>>>> +This file is part of GCC.
>>>>> +
>>>>> +GCC is free software; you can redistribute it and/or modify
>>>>> +it under the terms of the GNU General Public License as published by
>>>>> +the Free Software Foundation; either version 3, or (at your option)
>>>>> +any later version.
>>>>> +
>>>>> +GCC is distributed in the hope that it will be useful,
>>>>> +but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>>> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>>>> +GNU General Public License for more details.
>>>>> +
>>>>> +Under Section 7 of GPL version 3, you are granted additional
>>>>> +permissions described in the GCC Runtime Library Exception, version
>>>>> +3.1, as published by the Free Software Foundation.
>>>>> +
>>>>> +You should have received a copy of the GNU General Public License and
>>>>> +a copy of the GCC Runtime Library Exception along with this program;
>>>>> +see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>>>>> +<http://www.gnu.org/licenses/>.  */
>>>>> +
>>>>> +Enum(stringop_alg) String(rep_byte) Value(rep_prefix_1_byte)
>>>>> +
>>>>> +#undef DEF_ENUM
>>>>> +#define DEF_ENUM EnumValue
>>>>> +
>>>>> +#undef DEF_ALG
>>>>> +#define DEF_ALG(alg, name) Enum(stringop_alg) String(name) Value(alg)
>>>>> +
>>>>> +#include "stringop.def"
>>>>> +
>>>>> +#undef DEF_ENUM
>>>>> +#undef DEF_ALG
>>>>> Index: config/i386/i386.c
>>>>> ===================================================================
>>>>> --- config/i386/i386.c        (revision 201458)
>>>>> +++ config/i386/i386.c        (working copy)
>>>>> @@ -156,7 +156,7 @@ struct processor_costs ix86_size_cost =
>>>>>  };
>>>>>
>>>>>  /* Processor costs (relative to an add) */
>>>>> -static const
>>>>> +static
>>>>>  struct processor_costs i386_cost = { /* 386 specific costs */
>>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
>>>>> @@ -226,7 +226,7 @@ struct processor_costs i386_cost = {      /*
>>>>>    1,                                 /* cond_not_taken_branch_cost.  */
>>>>>  };
>>>>>
>>>>> -static const
>>>>> +static
>>>>>  struct processor_costs i486_cost = { /* 486 specific costs */
>>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
>>>>> @@ -298,7 +298,7 @@ struct processor_costs i486_cost = {      /*
>>>>>    1,                                 /* cond_not_taken_branch_cost.  */
>>>>>  };
>>>>>
>>>>> -static const
>>>>> +static
>>>>>  struct processor_costs pentium_cost = {
>>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
>>>>> @@ -368,7 +368,7 @@ struct processor_costs pentium_cost = {
>>>>>    1,                                 /* cond_not_taken_branch_cost.  */
>>>>>  };
>>>>>
>>>>> -static const
>>>>> +static
>>>>>  struct processor_costs pentiumpro_cost = {
>>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
>>>>> @@ -447,7 +447,7 @@ struct processor_costs pentiumpro_cost =
>>>>>    1,                                 /* cond_not_taken_branch_cost.  */
>>>>>  };
>>>>>
>>>>> -static const
>>>>> +static
>>>>>  struct processor_costs geode_cost = {
>>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
>>>>> @@ -518,7 +518,7 @@ struct processor_costs geode_cost = {
>>>>>    1,                                 /* cond_not_taken_branch_cost.  */
>>>>>  };
>>>>>
>>>>> -static const
>>>>> +static
>>>>>  struct processor_costs k6_cost = {
>>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>>    COSTS_N_INSNS (2),                 /* cost of a lea instruction */
>>>>> @@ -591,7 +591,7 @@ struct processor_costs k6_cost = {
>>>>>    1,                                 /* cond_not_taken_branch_cost.  */
>>>>>  };
>>>>>
>>>>> -static const
>>>>> +static
>>>>>  struct processor_costs athlon_cost = {
>>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>>    COSTS_N_INSNS (2),                 /* cost of a lea instruction */
>>>>> @@ -664,7 +664,7 @@ struct processor_costs athlon_cost = {
>>>>>    1,                                 /* cond_not_taken_branch_cost.  */
>>>>>  };
>>>>>
>>>>> -static const
>>>>> +static
>>>>>  struct processor_costs k8_cost = {
>>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>>    COSTS_N_INSNS (2),                 /* cost of a lea instruction */
>>>>> @@ -1265,7 +1265,7 @@ struct processor_costs btver2_cost = {
>>>>>    1,                                 /* cond_not_taken_branch_cost.  */
>>>>>  };
>>>>>
>>>>> -static const
>>>>> +static
>>>>>  struct processor_costs pentium4_cost = {
>>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>>    COSTS_N_INSNS (3),                 /* cost of a lea instruction */
>>>>> @@ -1336,7 +1336,7 @@ struct processor_costs pentium4_cost = {
>>>>>    1,                                 /* cond_not_taken_branch_cost.  */
>>>>>  };
>>>>>
>>>>> -static const
>>>>> +static
>>>>>  struct processor_costs nocona_cost = {
>>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
>>>>> @@ -1409,7 +1409,7 @@ struct processor_costs nocona_cost = {
>>>>>    1,                                 /* cond_not_taken_branch_cost.  */
>>>>>  };
>>>>>
>>>>> -static const
>>>>> +static
>>>>>  struct processor_costs atom_cost = {
>>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>>    COSTS_N_INSNS (1) + 1,             /* cost of a lea instruction */
>>>>> @@ -1556,7 +1556,7 @@ struct processor_costs slm_cost = {
>>>>>  };
>>>>>
>>>>>  /* Generic64 should produce code tuned for Nocona and K8.  */
>>>>> -static const
>>>>> +static
>>>>>  struct processor_costs generic64_cost = {
>>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>>    /* On all chips taken into consideration lea is 2 cycles and more.  With
>>>>> @@ -1635,7 +1635,7 @@ struct processor_costs generic64_cost =
>>>>>  };
>>>>>
>>>>>  /* core_cost should produce code tuned for Core familly of CPUs.  */
>>>>> -static const
>>>>> +static
>>>>>  struct processor_costs core_cost = {
>>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>>    /* On all chips taken into consideration lea is 2 cycles and more.  With
>>>>> @@ -1717,7 +1717,7 @@ struct processor_costs core_cost = {
>>>>>
>>>>>  /* Generic32 should produce code tuned for PPro, Pentium4, Nocona,
>>>>>     Athlon and K8.  */
>>>>> -static const
>>>>> +static
>>>>>  struct processor_costs generic32_cost = {
>>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>>    COSTS_N_INSNS (1) + 1,             /* cost of a lea instruction */
>>>>> @@ -2900,6 +2900,150 @@ ix86_debug_options (void)
>>>>>
>>>>>    return;
>>>>>  }
>>>>> +
>>>>> +static const char *stringop_alg_names[] = {
>>>>> +#define DEF_ENUM
>>>>> +#define DEF_ALG(alg, name) #name,
>>>>> +#include "stringop.def"
>>>>> +#undef DEF_ENUM
>>>>> +#undef DEF_ALG
>>>>> +};
>>>>> +
>>>>> +/* Parse parameter string passed to -mmemcpy-strategy= or -mmemset-strategy=.
>>>>> +   The string is of the following form (or comma separated list of it):
>>>>> +
>>>>> +     strategy_alg:max_size:[align|noalign]
>>>>> +
>>>>> +   where the full size range for the strategy is either [0, max_size] or
>>>>> +   [min_size, max_size], in which min_size is the max_size + 1 of the
>>>>> +   preceding range.  The last size range must have max_size == -1.
>>>>> +
>>>>> +   Examples:
>>>>> +
>>>>> +    1.
>>>>> +       -mmemcpy-strategy=libcall:-1:noalign
>>>>> +
>>>>> +      this is equivalent to (for known size memcpy) -mstringop-strategy=libcall
>>>>> +
>>>>> +
>>>>> +   2.
>>>>> +      -mmemset-strategy=rep_8byte:16:noalign,vector_loop:2048:align,libcall:-1:noalign
>>>>> +
>>>>> +      This is to tell the compiler to use the following strategy for memset
>>>>> +      1) when the expected size is between [1, 16], use rep_8byte strategy;
>>>>> +      2) when the size is between [17, 2048], use vector_loop;
>>>>> +      3) when the size is > 2048, use libcall.
>>>>> +
>>>>> +*/
>>>>> +
>>>>> +struct stringop_size_range
>>>>> +{
>>>>> +  int min;
>>>>> +  int max;
>>>>> +  stringop_alg alg;
>>>>> +  bool noalign;
>>>>> +};
>>>>> +
>>>>> +static void
>>>>> +ix86_parse_stringop_strategy_string (char *strategy_str, bool is_memset)
>>>>> +{
>>>>> +  const struct stringop_algs *default_algs;
>>>>> +  stringop_size_range input_ranges[MAX_STRINGOP_ALGS];
>>>>> +  char *curr_range_str, *next_range_str;
>>>>> +  int i = 0, n = 0;
>>>>> +
>>>>> +  if (is_memset)
>>>>> +    default_algs = &ix86_cost->memset[TARGET_64BIT != 0];
>>>>> +  else
>>>>> +    default_algs = &ix86_cost->memcpy[TARGET_64BIT != 0];
>>>>> +
>>>>> +  curr_range_str = strategy_str;
>>>>> +
>>>>> +  do {
>>>>> +
>>>>> +    int mins, maxs;
>>>>> +    stringop_alg alg;
>>>>> +    char alg_name[128];
>>>>> +    char align[16];
>>>>> +
>>>>> +    next_range_str = strchr (curr_range_str, ',');
>>>>> +    if (next_range_str)
>>>>> +      *next_range_str++ = '\0';
>>>>> +
>>>>> +    if (3 != sscanf (curr_range_str, "%[^:]:%d:%s", alg_name, &maxs, align))
>>>>> +      {
>>>>> +        warning (0, "Wrong arg %s to option %s", curr_range_str,
>>>>> +                 is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
>>>>> +        return;
>>>>> +      }
>>>>> +
>>>>> +    if (n > 0 && (maxs < (mins = input_ranges[n - 1].max + 1) && maxs != -1))
>>>>> +      {
>>>>> +        warning (0, "Size ranges of option %s should be increasing",
>>>>> +                 is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
>>>>> +        return;
>>>>> +      }
>>>>> +
>>>>> +    for (i = 0; i < last_alg; i++)
>>>>> +      {
>>>>> +        if (!strcmp (alg_name, stringop_alg_names[i]))
>>>>> +       {
>>>>> +         alg = (stringop_alg) i;
>>>>> +         break;
>>>>> +          }
>>>>> +      }
>>>>> +
>>>>> +    if (i == last_alg)
>>>>> +      {
>>>>> +        warning (0, "Wrong stringop strategy name %s specified for option %s",
>>>>> +              alg_name,
>>>>> +                 is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
>>>>> +     return;
>>>>> +      }
>>>>> +
>>>>> +    input_ranges[n].min = mins;
>>>>> +    input_ranges[n].max = maxs;
>>>>> +    input_ranges[n].alg = alg;
>>>>> +    if (!strcmp (align, "align"))
>>>>> +      input_ranges[n].noalign = false;
>>>>> +    else if (!strcmp (align, "noalign"))
>>>>> +      input_ranges[n].noalign = true;
>>>>> +    else
>>>>> +      {
>>>>> +        warning (0, "Unknown alignment %s specified for option %s",
>>>>> +                 align, is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
>>>>> +        return;
>>>>> +      }
>>>>> +    n++;
>>>>> +    curr_range_str = next_range_str;
>>>>> +  } while (curr_range_str);
>>>>> +
>>>>> +  if (input_ranges[n - 1].max != -1)
>>>>> +    {
>>>>> +      warning (0, "The max value for the last size range should be -1"
>>>>> +               " for option %s",
>>>>> +               is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
>>>>> +      return;
>>>>> +    }
>>>>> +
>>>>> +  if (n > MAX_STRINGOP_ALGS)
>>>>> +    {
>>>>> +      warning (0, "Too many size ranges specified in option %s",
>>>>> +               is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
>>>>> +      return;
>>>>> +    }
>>>>> +
>>>>> +  /* Now override the default algs array  */
>>>>> +  for (i = 0; i < n; i++)
>>>>> +    {
>>>>> +      *const_cast<int *>(&default_algs->size[i].max) = input_ranges[i].max;
>>>>> +      *const_cast<stringop_alg *>(&default_algs->size[i].alg)
>>>>> +          = input_ranges[i].alg;
>>>>> +      *const_cast<int *>(&default_algs->size[i].noalign)
>>>>> +          = input_ranges[i].noalign;
>>>>> +    }
>>>>> +}
>>>>> +
>>>>>
>>>>>  /* Override various settings based on options.  If MAIN_ARGS_P, the
>>>>>     options are from the command line, otherwise they are from
>>>>> @@ -4021,6 +4165,21 @@ ix86_option_override_internal (bool main
>>>>>    /* Handle stack protector */
>>>>>    if (!global_options_set.x_ix86_stack_protector_guard)
>>>>>      ix86_stack_protector_guard = TARGET_HAS_BIONIC ? SSP_GLOBAL : SSP_TLS;
>>>>> +
>>>>> +  /* Handle -mmemcpy-strategy= and -mmemset-strategy=  */
>>>>> +  if (ix86_tune_memcpy_strategy)
>>>>> +    {
>>>>> +      char *str = xstrdup (ix86_tune_memcpy_strategy);
>>>>> +      ix86_parse_stringop_strategy_string (str, false);
>>>>> +      free (str);
>>>>> +    }
>>>>> +
>>>>> +  if (ix86_tune_memset_strategy)
>>>>> +    {
>>>>> +      char *str = xstrdup (ix86_tune_memset_strategy);
>>>>> +      ix86_parse_stringop_strategy_string (str, true);
>>>>> +      free (str);
>>>>> +    }
>>>>>  }
>>>>>
>>>>>  /* Implement the TARGET_OPTION_OVERRIDE hook.  */
>>>>> @@ -22903,6 +23062,7 @@ ix86_expand_movmem (rtx dst, rtx src, rt
>>>>>      {
>>>>>      case libcall:
>>>>>      case no_stringop:
>>>>> +    case last_alg:
>>>>>        gcc_unreachable ();
>>>>>      case loop_1_byte:
>>>>>        need_zero_guard = true;
>>>>> @@ -23093,6 +23253,7 @@ ix86_expand_movmem (rtx dst, rtx src, rt
>>>>>      {
>>>>>      case libcall:
>>>>>      case no_stringop:
>>>>> +    case last_alg:
>>>>>        gcc_unreachable ();
>>>>>      case loop_1_byte:
>>>>>      case loop:
>>>>> @@ -23304,6 +23465,7 @@ ix86_expand_setmem (rtx dst, rtx count_e
>>>>>      {
>>>>>      case libcall:
>>>>>      case no_stringop:
>>>>> +    case last_alg:
>>>>>        gcc_unreachable ();
>>>>>      case loop:
>>>>>        need_zero_guard = true;
>>>>> @@ -23481,6 +23643,7 @@ ix86_expand_setmem (rtx dst, rtx count_e
>>>>>      {
>>>>>      case libcall:
>>>>>      case no_stringop:
>>>>> +    case last_alg:
>>>>>        gcc_unreachable ();
>>>>>      case loop_1_byte:
>>>>>      case loop:
>>>>> Index: config/i386/i386-opts.h
>>>>> ===================================================================
>>>>> --- config/i386/i386-opts.h   (revision 201458)
>>>>> +++ config/i386/i386-opts.h   (working copy)
>>>>> @@ -28,15 +28,17 @@ see the files COPYING3 and COPYING.RUNTI
>>>>>  /* Algorithm to expand string function with.  */
>>>>>  enum stringop_alg
>>>>>  {
>>>>> -   no_stringop,
>>>>> -   libcall,
>>>>> -   rep_prefix_1_byte,
>>>>> -   rep_prefix_4_byte,
>>>>> -   rep_prefix_8_byte,
>>>>> -   loop_1_byte,
>>>>> -   loop,
>>>>> -   unrolled_loop,
>>>>> -   vector_loop
>>>>> +#undef DEF_ENUM
>>>>> +#define DEF_ENUM
>>>>> +
>>>>> +#undef DEF_ALG
>>>>> +#define DEF_ALG(alg, name) alg,
>>>>> +
>>>>> +#include "stringop.def"
>>>>> +last_alg
>>>>> +
>>>>> +#undef DEF_ENUM
>>>>> +#undef DEF_ALG
>>>>>  };
>>>>>
>>>>>  /* Available call abi.  */
>>>>> Index: doc/invoke.texi
>>>>> ===================================================================
>>>>> --- doc/invoke.texi   (revision 201458)
>>>>> +++ doc/invoke.texi   (working copy)
>>>>> @@ -649,6 +649,7 @@ Objective-C and Objective-C++ Dialects}.
>>>>>  -mbmi2 -mrtm -mlwp -mthreads @gol
>>>>>  -mno-align-stringops  -minline-all-stringops @gol
>>>>>  -minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol
>>>>> +-mmemcpy-strategy=@var{strategy} -mmemset-strategy=@var{strategy}
>>>>>  -mpush-args  -maccumulate-outgoing-args  -m128bit-long-double @gol
>>>>>  -m96bit-long-double -mlong-double-64 -mlong-double-80 @gol
>>>>>  -mregparm=@var{num}  -msseregparm @gol
>>>>> @@ -14598,6 +14599,24 @@ Expand into an inline loop.
>>>>>  Always use a library call.
>>>>>  @end table
>>>>>
>>>>> +@item -mmemcpy-strategy=@var{strategy}
>>>>> +@opindex mmemcpy-strategy=@var{strategy}
>>>>> +Override the internal decision heuristic to decide if @code{__builtin_memcpy}
>>>>> +should be inlined and what inline algorithm to use when the expected size
>>>>> +of the copy operation is known. @var{strategy}
>>>>> +is a comma-separated list of @var{alg}:@var{max_size}:@var{dest_align} triplets.
>>>>> +@var{alg} is specified in @option{-mstringop-strategy}, @var{max_size} specifies
>>>>> +the max byte size with which inline algorithm @var{alg} is allowed. For the last
>>>>> +triplet, the @var{max_size} must be @code{-1}. The @var{max_size} of the triplets
>>>>> +in the list must be specified in increasing order. The minimal byte size for
>>>>> +@var{alg} is @code{0} for the first triplet and @code{@var{max_size} + 1} of the
>>>>> +preceding range.
>>>>> +
>>>>> +@item -mmemset-strategy=@var{strategy}
>>>>> +@opindex mmemset-strategy=@var{strategy}
>>>>> +The option is similar to @option{-mmemcpy-strategy=} except that it is to control
>>>>> +@code{__builtin_memset} expansion.
>>>>> +
>>>>>  @item -momit-leaf-frame-pointer
>>>>>  @opindex momit-leaf-frame-pointer
>>>>>  Don't keep the frame pointer in a register for leaf functions.  This
>>>>> Index: testsuite/gcc.target/i386/memcpy-strategy-1.c
>>>>> ===================================================================
>>>>> --- testsuite/gcc.target/i386/memcpy-strategy-1.c     (revision 0)
>>>>> +++ testsuite/gcc.target/i386/memcpy-strategy-1.c     (revision 0)
>>>>> @@ -0,0 +1,12 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-O2 -march=atom -mmemcpy-strategy=vector_loop:-1:align" } */
>>>>> +/* { dg-final { scan-assembler-times "movdqa" 8 { target { ! { ia32 } } } } } */
>>>>> +/* { dg-final { scan-assembler-times "movdqa" 4 { target { ia32 } } } } */
>>>>> +
>>>>> +char a[2048];
>>>>> +char b[2048];
>>>>> +void t (void)
>>>>> +{
>>>>> +  __builtin_memcpy (a, b, 2048);
>>>>> +}
>>>>> +
>>>>> Index: testsuite/gcc.target/i386/memcpy-strategy-2.c
>>>>> ===================================================================
>>>>> --- testsuite/gcc.target/i386/memcpy-strategy-2.c     (revision 0)
>>>>> +++ testsuite/gcc.target/i386/memcpy-strategy-2.c     (revision 0)
>>>>> @@ -0,0 +1,12 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-O2 -march=atom -mmemcpy-strategy=vector_loop:3000:align,libcall:-1:align" } */
>>>>> +/* { dg-final { scan-assembler-times "movdqa" 8 { target { ! { ia32 } } } } } */
>>>>> +/* { dg-final { scan-assembler-times "movdqa" 4 { target { ia32 } } } } */
>>>>> +
>>>>> +char a[2048];
>>>>> +char b[2048];
>>>>> +void t (void)
>>>>> +{
>>>>> +  __builtin_memcpy (a, b, 2048);
>>>>> +}
>>>>> +
>>>>> Index: testsuite/gcc.target/i386/memset-strategy-1.c
>>>>> ===================================================================
>>>>> --- testsuite/gcc.target/i386/memset-strategy-1.c     (revision 0)
>>>>> +++ testsuite/gcc.target/i386/memset-strategy-1.c     (revision 0)
>>>>> @@ -0,0 +1,10 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-O2 -march=atom -mmemset-strategy=libcall:-1:align" } */
>>>>> +/* { dg-final { scan-assembler-times "memset" 2  } } */
>>>>> +
>>>>> +char a[2048];
>>>>> +void t (void)
>>>>> +{
>>>>> +  __builtin_memset (a, 1, 2048);
>>>>> +}
>>>>> +
>>>>> Index: testsuite/gcc.target/i386/memcpy-strategy-3.c
>>>>> ===================================================================
>>>>> --- testsuite/gcc.target/i386/memcpy-strategy-3.c     (revision 0)
>>>>> +++ testsuite/gcc.target/i386/memcpy-strategy-3.c     (revision 0)
>>>>> @@ -0,0 +1,11 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-O2 -march=atom -mmemcpy-strategy=vector_loop:2000:align,libcall:-1:align" } */
>>>>> +/* { dg-final { scan-assembler-times "memcpy" 2  } } */
>>>>> +
>>>>> +char a[2048];
>>>>> +char b[2048];
>>>>> +void t (void)
>>>>> +{
>>>>> +  __builtin_memcpy (a, b, 2048);
>>>>> +}
>>>>> +
>>>>
>>
>>
>>
>> --
>> ---
>> Best regards,
>> Michael V. Zolotukhin,
>> Software Engineer
>> Intel Corporation.
diff mbox

Patch

Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 201458)
+++ doc/invoke.texi	(working copy)
@@ -649,6 +649,7 @@  Objective-C and Objective-C++ Dialects}.
 -mbmi2 -mrtm -mlwp -mthreads @gol
 -mno-align-stringops  -minline-all-stringops @gol
 -minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol
+-mmemcpy-strategy=@var{strategy} -mmemset-strategy=@var{strategy} 
 -mpush-args  -maccumulate-outgoing-args  -m128bit-long-double @gol
 -m96bit-long-double -mlong-double-64 -mlong-double-80 @gol
 -mregparm=@var{num}  -msseregparm @gol
@@ -14598,6 +14599,24 @@  Expand into an inline loop.
 Always use a library call.
 @end table
 
+@item -mmemcpy-strategy=@var{strategy}
+@opindex mmemcpy-strategy=@var{strategy}
+Override the internal decision heuristic to decide if @code{__builtin_memcpy}
+should be inlined and what inline algorithm to use when the expected size
+of the copy operation is known. @var{strategy} 
+is a comma-separated list of @var{alg}:@var{max_size}:@var{dest_align} triplets. 
+@var{alg} is specified in @option{-mstringop-strategy}, @var{max_size} specifies
+the max byte size with which inline algorithm @var{alg} is allowed. For the last
+triplet, the @var{max_size} must be @code{-1}. The @var{max_size} of the triplets
+in the list must be specified in increasing order. The minimal byte size for 
+@var{alg} is @code{0} for the first triplet and @code{@var{max_size} + 1} of the 
+preceding range.
+
+@item -mmemset-strategy=@var{strategy}
+@opindex mmemset-strategy=@var{strategy}
+The option is similar to @option{-mmemcpy-strategy=} except that it is to control
+@code{__builtin_memset} expansion.
+
 @item -momit-leaf-frame-pointer
 @opindex momit-leaf-frame-pointer
 Don't keep the frame pointer in a register for leaf functions.  This
Index: testsuite/gcc.target/i386/memcpy-strategy-2.c
===================================================================
--- testsuite/gcc.target/i386/memcpy-strategy-2.c	(revision 0)
+++ testsuite/gcc.target/i386/memcpy-strategy-2.c	(revision 0)
@@ -0,0 +1,12 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=atom -mmemcpy-strategy=vector_loop:3000:align,libcall:-1:align" } */
+/* { dg-final { scan-assembler-times "movdqa" 8 { target { ! { ia32 } } } } } */
+/* { dg-final { scan-assembler-times "movdqa" 4 { target { ia32 } } } } */
+
+char a[2048];
+char b[2048];
+void t (void)
+{
+  __builtin_memcpy (a, b, 2048);
+}
+
Index: testsuite/gcc.target/i386/memset-strategy-1.c
===================================================================
--- testsuite/gcc.target/i386/memset-strategy-1.c	(revision 0)
+++ testsuite/gcc.target/i386/memset-strategy-1.c	(revision 0)
@@ -0,0 +1,10 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=atom -mmemset-strategy=libcall:-1:align" } */
+/* { dg-final { scan-assembler-times "memset" 2  } } */
+
+char a[2048];
+void t (void)
+{
+  __builtin_memset (a, 1, 2048);
+}
+
Index: testsuite/gcc.target/i386/memcpy-strategy-3.c
===================================================================
--- testsuite/gcc.target/i386/memcpy-strategy-3.c	(revision 0)
+++ testsuite/gcc.target/i386/memcpy-strategy-3.c	(revision 0)
@@ -0,0 +1,11 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=atom -mmemcpy-strategy=vector_loop:2000:align,libcall:-1:align" } */
+/* { dg-final { scan-assembler-times "memcpy" 2  } } */
+
+char a[2048];
+char b[2048];
+void t (void)
+{
+  __builtin_memcpy (a, b, 2048);
+}
+
Index: testsuite/gcc.target/i386/memcpy-strategy-1.c
===================================================================
--- testsuite/gcc.target/i386/memcpy-strategy-1.c	(revision 0)
+++ testsuite/gcc.target/i386/memcpy-strategy-1.c	(revision 0)
@@ -0,0 +1,12 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=atom -mmemcpy-strategy=vector_loop:-1:align" } */
+/* { dg-final { scan-assembler-times "movdqa" 8 { target { ! { ia32 } } } } } */
+/* { dg-final { scan-assembler-times "movdqa" 4 { target { ia32 } } } } */
+
+char a[2048];
+char b[2048];
+void t (void)
+{
+  __builtin_memcpy (a, b, 2048);
+}
+
Index: config/i386/stringop.def
===================================================================
--- config/i386/stringop.def	(revision 0)
+++ config/i386/stringop.def	(revision 0)
@@ -0,0 +1,42 @@ 
+/* Definitions for option handling for IA-32.
+   Copyright (C) 2013 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+DEF_ENUM
+DEF_ALG (no_stringop, no_stringop)
+DEF_ENUM
+DEF_ALG (libcall, libcall)
+DEF_ENUM
+DEF_ALG (rep_prefix_1_byte, rep_byte)
+DEF_ENUM
+DEF_ALG (rep_prefix_4_byte, rep_4byte)
+DEF_ENUM
+DEF_ALG (rep_prefix_8_byte, rep_8byte)
+DEF_ENUM
+DEF_ALG (loop_1_byte, byte_loop)
+DEF_ENUM
+DEF_ALG (loop, loop)
+DEF_ENUM
+DEF_ALG (unrolled_loop, unrolled_loop)
+DEF_ENUM
+DEF_ALG (vector_loop, vector_loop)
Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c	(revision 201458)
+++ config/i386/i386.c	(working copy)
@@ -156,7 +156,7 @@  struct processor_costs ix86_size_cost =
 };
 
 /* Processor costs (relative to an add) */
-static const
+static
 struct processor_costs i386_cost = {	/* 386 specific costs */
   COSTS_N_INSNS (1),			/* cost of an add instruction */
   COSTS_N_INSNS (1),			/* cost of a lea instruction */
@@ -226,7 +226,7 @@  struct processor_costs i386_cost = {	/*
   1,					/* cond_not_taken_branch_cost.  */
 };
 
-static const
+static
 struct processor_costs i486_cost = {	/* 486 specific costs */
   COSTS_N_INSNS (1),			/* cost of an add instruction */
   COSTS_N_INSNS (1),			/* cost of a lea instruction */
@@ -298,7 +298,7 @@  struct processor_costs i486_cost = {	/*
   1,					/* cond_not_taken_branch_cost.  */
 };
 
-static const
+static
 struct processor_costs pentium_cost = {
   COSTS_N_INSNS (1),			/* cost of an add instruction */
   COSTS_N_INSNS (1),			/* cost of a lea instruction */
@@ -368,7 +368,7 @@  struct processor_costs pentium_cost = {
   1,					/* cond_not_taken_branch_cost.  */
 };
 
-static const
+static
 struct processor_costs pentiumpro_cost = {
   COSTS_N_INSNS (1),			/* cost of an add instruction */
   COSTS_N_INSNS (1),			/* cost of a lea instruction */
@@ -447,7 +447,7 @@  struct processor_costs pentiumpro_cost =
   1,					/* cond_not_taken_branch_cost.  */
 };
 
-static const
+static
 struct processor_costs geode_cost = {
   COSTS_N_INSNS (1),			/* cost of an add instruction */
   COSTS_N_INSNS (1),			/* cost of a lea instruction */
@@ -518,7 +518,7 @@  struct processor_costs geode_cost = {
   1,					/* cond_not_taken_branch_cost.  */
 };
 
-static const
+static
 struct processor_costs k6_cost = {
   COSTS_N_INSNS (1),			/* cost of an add instruction */
   COSTS_N_INSNS (2),			/* cost of a lea instruction */
@@ -591,7 +591,7 @@  struct processor_costs k6_cost = {
   1,					/* cond_not_taken_branch_cost.  */
 };
 
-static const
+static
 struct processor_costs athlon_cost = {
   COSTS_N_INSNS (1),			/* cost of an add instruction */
   COSTS_N_INSNS (2),			/* cost of a lea instruction */
@@ -664,7 +664,7 @@  struct processor_costs athlon_cost = {
   1,					/* cond_not_taken_branch_cost.  */
 };
 
-static const
+static
 struct processor_costs k8_cost = {
   COSTS_N_INSNS (1),			/* cost of an add instruction */
   COSTS_N_INSNS (2),			/* cost of a lea instruction */
@@ -1265,7 +1265,7 @@  struct processor_costs btver2_cost = {
   1,					/* cond_not_taken_branch_cost.  */
 };
 
-static const
+static
 struct processor_costs pentium4_cost = {
   COSTS_N_INSNS (1),			/* cost of an add instruction */
   COSTS_N_INSNS (3),			/* cost of a lea instruction */
@@ -1336,7 +1336,7 @@  struct processor_costs pentium4_cost = {
   1,					/* cond_not_taken_branch_cost.  */
 };
 
-static const
+static
 struct processor_costs nocona_cost = {
   COSTS_N_INSNS (1),			/* cost of an add instruction */
   COSTS_N_INSNS (1),			/* cost of a lea instruction */
@@ -1409,7 +1409,7 @@  struct processor_costs nocona_cost = {
   1,					/* cond_not_taken_branch_cost.  */
 };
 
-static const
+static
 struct processor_costs atom_cost = {
   COSTS_N_INSNS (1),			/* cost of an add instruction */
   COSTS_N_INSNS (1) + 1,		/* cost of a lea instruction */
@@ -1556,7 +1556,7 @@  struct processor_costs slm_cost = {
 };
 
 /* Generic64 should produce code tuned for Nocona and K8.  */
-static const
+static
 struct processor_costs generic64_cost = {
   COSTS_N_INSNS (1),			/* cost of an add instruction */
   /* On all chips taken into consideration lea is 2 cycles and more.  With
@@ -1635,7 +1635,7 @@  struct processor_costs generic64_cost =
 };
 
 /* core_cost should produce code tuned for Core familly of CPUs.  */
-static const
+static
 struct processor_costs core_cost = {
   COSTS_N_INSNS (1),			/* cost of an add instruction */
   /* On all chips taken into consideration lea is 2 cycles and more.  With
@@ -1717,7 +1717,7 @@  struct processor_costs core_cost = {
 
 /* Generic32 should produce code tuned for PPro, Pentium4, Nocona,
    Athlon and K8.  */
-static const
+static
 struct processor_costs generic32_cost = {
   COSTS_N_INSNS (1),			/* cost of an add instruction */
   COSTS_N_INSNS (1) + 1,		/* cost of a lea instruction */
@@ -2900,6 +2900,148 @@  ix86_debug_options (void)
 
   return;
 }
+
+static const char *stringop_alg_names[] = {
+#define DEF_ENUM
+#define DEF_ALG(alg, name) #name,
+#include "stringop.def"
+#undef DEF_ENUM
+#undef DEF_ALG
+};
+
+/* Parse parameter string passed to -mmemcpy-strategy= or -mmemset-strategy=.
+   The string is of the following form (or comma separated list of it):
+
+     strategy_alg:max_size:[align|noalign]
+
+   where the full size range for the strategy is either [0, max_size] or
+   [min_size, max_size], in which min_size is the max_size + 1 of the
+   preceding range.  The last size range must have max_size == -1.
+
+   Examples:
+
+    1.
+       -mmemcpy-strategy=libcall:-1:noalign
+
+      this is equivalent to (for known size memcpy) -mstringop-strategy=libcall
+
+
+   2.
+      -mmemset-strategy=rep_8byte:16:noalign,vector_loop:2048:align,libcall:-1:noalign
+
+      This is to tell the compiler to use the following strategy for memset
+      1) when the expected size is between [1, 16], use rep_8byte strategy;
+      2) when the size is between [17, 2048], use vector_loop;
+      3) when the size is > 2048, use libcall.  */
+
+struct stringop_size_range
+{
+  int min;
+  int max;
+  stringop_alg alg;
+  bool noalign;
+};
+
+static void
+ix86_parse_stringop_strategy_string (char *strategy_str, bool is_memset)
+{
+  const struct stringop_algs *default_algs;
+  stringop_size_range input_ranges[MAX_STRINGOP_ALGS];
+  char *curr_range_str, *next_range_str;
+  int i = 0, n = 0;
+
+  if (is_memset)
+    default_algs = &ix86_cost->memset[TARGET_64BIT != 0];
+  else
+    default_algs = &ix86_cost->memcpy[TARGET_64BIT != 0];
+
+  curr_range_str = strategy_str;
+
+  do {
+
+    int mins, maxs;
+    stringop_alg alg;
+    char alg_name[128];
+    char align[16];
+
+    next_range_str = strchr (curr_range_str, ',');
+    if (next_range_str)
+      *next_range_str++ = '\0';
+
+    if (3 != sscanf (curr_range_str, "%[^:]:%d:%s", alg_name, &maxs, align))
+      {
+        warning (0, "Wrong arg %s to option %s", curr_range_str,
+                 is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
+        return;
+      }
+
+    if (n > 0 && (maxs < (mins = input_ranges[n - 1].max + 1) && maxs != -1))
+      {
+        warning (0, "Size ranges of option %s should be increasing",
+                 is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
+        return;
+      }
+
+    for (i = 0; i < last_alg; i++)
+      {
+        if (!strcmp (alg_name, stringop_alg_names[i]))
+	  {
+	    alg = (stringop_alg) i;
+	    break;
+          }
+      }
+
+    if (i == last_alg)
+      {
+        warning (0, "Wrong stringop strategy name %s specified for option %s",
+	         alg_name,
+                 is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
+	return;
+      }
+
+    input_ranges[n].min = mins;
+    input_ranges[n].max = maxs;
+    input_ranges[n].alg = alg;
+    if (!strcmp (align, "align"))
+      input_ranges[n].noalign = false;
+    else if (!strcmp (align, "noalign"))
+      input_ranges[n].noalign = true;
+    else
+      {
+        warning (0, "Unknown alignment %s specified for option %s",
+                 align, is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
+        return;
+      }
+    n++;
+    curr_range_str = next_range_str;
+  } while (curr_range_str);
+
+  if (input_ranges[n - 1].max != -1)
+    {
+      warning (0, "The max value for the last size range should be -1"
+               " for option %s",
+               is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
+      return;
+    }
+
+  if (n > MAX_STRINGOP_ALGS)
+    {
+      warning (0, "Too many size ranges specified in option %s",
+               is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
+      return;
+    }
+
+  /* Now override the default algs array  */
+  for (i = 0; i < n; i++)
+    {
+      *const_cast<int *>(&default_algs->size[i].max) = input_ranges[i].max;
+      *const_cast<stringop_alg *>(&default_algs->size[i].alg)
+          = input_ranges[i].alg;
+      *const_cast<int *>(&default_algs->size[i].noalign)
+          = input_ranges[i].noalign;
+    }
+}
+
 
 /* Override various settings based on options.  If MAIN_ARGS_P, the
    options are from the command line, otherwise they are from
@@ -4021,6 +4163,21 @@  ix86_option_override_internal (bool main
   /* Handle stack protector */
   if (!global_options_set.x_ix86_stack_protector_guard)
     ix86_stack_protector_guard = TARGET_HAS_BIONIC ? SSP_GLOBAL : SSP_TLS;
+
+  /* Handle -mmemcpy-strategy= and -mmemset-strategy=  */
+  if (ix86_tune_memcpy_strategy)
+    {
+      char *str = xstrdup (ix86_tune_memcpy_strategy);
+      ix86_parse_stringop_strategy_string (str, false);
+      free (str);
+    }
+
+  if (ix86_tune_memset_strategy)
+    {
+      char *str = xstrdup (ix86_tune_memset_strategy);
+      ix86_parse_stringop_strategy_string (str, true);
+      free (str);
+    }
 }
 
 /* Implement the TARGET_OPTION_OVERRIDE hook.  */
@@ -22903,6 +23060,7 @@  ix86_expand_movmem (rtx dst, rtx src, rt
     {
     case libcall:
     case no_stringop:
+    case last_alg:
       gcc_unreachable ();
     case loop_1_byte:
       need_zero_guard = true;
@@ -23093,6 +23251,7 @@  ix86_expand_movmem (rtx dst, rtx src, rt
     {
     case libcall:
     case no_stringop:
+    case last_alg:
       gcc_unreachable ();
     case loop_1_byte:
     case loop:
@@ -23304,6 +23463,7 @@  ix86_expand_setmem (rtx dst, rtx count_e
     {
     case libcall:
     case no_stringop:
+    case last_alg:
       gcc_unreachable ();
     case loop:
       need_zero_guard = true;
@@ -23481,6 +23641,7 @@  ix86_expand_setmem (rtx dst, rtx count_e
     {
     case libcall:
     case no_stringop:
+    case last_alg:
       gcc_unreachable ();
     case loop_1_byte:
     case loop:
Index: config/i386/stringop.opt
===================================================================
--- config/i386/stringop.opt	(revision 0)
+++ config/i386/stringop.opt	(revision 0)
@@ -0,0 +1,36 @@ 
+/* Definitions for option handling for IA-32.
+   Copyright (C) 2013 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+Enum(stringop_alg) String(rep_byte) Value(rep_prefix_1_byte)
+
+#undef DEF_ENUM
+#define DEF_ENUM EnumValue
+
+#undef DEF_ALG
+#define DEF_ALG(alg, name) Enum(stringop_alg) String(name) Value(alg)
+
+#include "stringop.def"
+
+#undef DEF_ENUM
+#undef DEF_ALG
Index: config/i386/i386-opts.h
===================================================================
--- config/i386/i386-opts.h	(revision 201458)
+++ config/i386/i386-opts.h	(working copy)
@@ -28,15 +28,17 @@  see the files COPYING3 and COPYING.RUNTI
 /* Algorithm to expand string function with.  */
 enum stringop_alg
 {
-   no_stringop,
-   libcall,
-   rep_prefix_1_byte,
-   rep_prefix_4_byte,
-   rep_prefix_8_byte,
-   loop_1_byte,
-   loop,
-   unrolled_loop,
-   vector_loop
+#undef DEF_ENUM
+#define DEF_ENUM
+
+#undef DEF_ALG
+#define DEF_ALG(alg, name) alg, 
+
+#include "stringop.def"
+last_alg
+
+#undef DEF_ENUM
+#undef DEF_ALG
 };
 
 /* Available call abi.  */
Index: config/i386/i386.opt
===================================================================
--- config/i386/i386.opt	(revision 201458)
+++ config/i386/i386.opt	(working copy)
@@ -316,6 +316,14 @@  mstack-arg-probe
 Target Report Mask(STACK_PROBE) Save
 Enable stack probing
 
+mmemcpy-strategy=
+Target RejectNegative Joined Var(ix86_tune_memcpy_strategy)
+Specify memcpy expansion strategy when expected size is known
+
+mmemset-strategy=
+Target RejectNegative Joined Var(ix86_tune_memset_strategy)
+Specify memset expansion strategy when expected size is known
+
 mstringop-strategy=
 Target RejectNegative Joined Enum(stringop_alg) Var(ix86_stringop_alg) Init(no_stringop)
 Chose strategy to generate stringop using