diff mbox series

Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.

Message ID 20230810011149.23432-1-hongtao.liu@intel.com
State New
Headers show
Series Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions. | expand

Commit Message

Liu, Hongtao Aug. 10, 2023, 1:11 a.m. UTC
Currently we have 3 different independent tunes for gather
"use_gather,use_gather_2parts,use_gather_4parts",
similar for scatter, there're
"use_scatter,use_scatter_2parts,use_scatter_4parts"

The patch support 2 standardizing options to enable/disable
vectorization for all gather/scatter instructions. The options is
interpreted by driver to 3 tunes.

bootstrapped and regtested on x86_64-pc-linux-gnu.
Ok for trunk?

gcc/ChangeLog:

	* config/i386/i386.h (DRIVER_SELF_SPECS): Add
	GATHER_SCATTER_DRIVER_SELF_SPECS.
	(GATHER_SCATTER_DRIVER_SELF_SPECS): New macro.
	* config/i386/i386.opt (mgather): New option.
	(mscatter): Ditto.
---
 gcc/config/i386/i386.h   | 12 +++++++++++-
 gcc/config/i386/i386.opt |  8 ++++++++
 2 files changed, 19 insertions(+), 1 deletion(-)

Comments

Xi Ruoyao Aug. 10, 2023, 1:47 a.m. UTC | #1
On Thu, 2023-08-10 at 09:11 +0800, liuhongt via Gcc-patches wrote:
> Currently we have 3 different independent tunes for gather
> "use_gather,use_gather_2parts,use_gather_4parts",
> similar for scatter, there're
> "use_scatter,use_scatter_2parts,use_scatter_4parts"
> 
> The patch support 2 standardizing options to enable/disable
> vectorization for all gather/scatter instructions. The options is
> interpreted by driver to 3 tunes.
> 
> bootstrapped and regtested on x86_64-pc-linux-gnu.
> Ok for trunk?

And should we set -mno-gather as the default for GDS affected
processors?  We'll likely apply the ucode update for them, and then the
gathering instructions will be much slower.

> gcc/ChangeLog:
> 
>         * config/i386/i386.h (DRIVER_SELF_SPECS): Add
>         GATHER_SCATTER_DRIVER_SELF_SPECS.
>         (GATHER_SCATTER_DRIVER_SELF_SPECS): New macro.
>         * config/i386/i386.opt (mgather): New option.
>         (mscatter): Ditto.
> ---
>  gcc/config/i386/i386.h   | 12 +++++++++++-
>  gcc/config/i386/i386.opt |  8 ++++++++
>  2 files changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index ef342fcee9b..d9ac2c29bde 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -565,7 +565,17 @@ extern GTY(()) tree x86_mfence;
>  # define SUBTARGET_DRIVER_SELF_SPECS ""
>  #endif
>  
> -#define DRIVER_SELF_SPECS SUBTARGET_DRIVER_SELF_SPECS
> +#ifndef GATHER_SCATTER_DRIVER_SELF_SPECS
> +# define GATHER_SCATTER_DRIVER_SELF_SPECS \
> +  "%{mno-gather:-mtune-
> ctrl=^use_gather_2parts,^use_gather_4parts,^use_gather} \
> +   %{mgather:-mtune-
> ctrl=use_gather_2parts,use_gather_4parts,use_gather} \
> +   %{mno-scatter:-mtune-
> ctrl=^use_scatter_2parts,^use_scatter_4parts,^use_scatter} \
> +   %{mscatter:-mtune-
> ctrl=use_scatter_2parts,use_scatter_4parts,use_scatter}"
> +#endif
> +
> +#define DRIVER_SELF_SPECS \
> +  SUBTARGET_DRIVER_SELF_SPECS " " \
> +  GATHER_SCATTER_DRIVER_SELF_SPECS
>  
>  /* -march=native handling only makes sense with compiler running on
>     an x86 or x86_64 chip.  If changing this condition, also change
> diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> index ddb7f110aa2..99948644a8d 100644
> --- a/gcc/config/i386/i386.opt
> +++ b/gcc/config/i386/i386.opt
> @@ -424,6 +424,14 @@ mdaz-ftz
>  Target
>  Set the FTZ and DAZ Flags.
>  
> +mgather
> +Target
> +Enable vectorization for gather instruction.
> +
> +mscatter
> +Target
> +Enable vectorization for scatter instruction.
> +
>  mpreferred-stack-boundary=
>  Target RejectNegative Joined UInteger
> Var(ix86_preferred_stack_boundary_arg)
>  Attempt to keep stack aligned to this power of 2.
Li, Pan2 via Gcc-patches Aug. 10, 2023, 1:52 a.m. UTC | #2
> -----Original Message-----
> From: Xi Ruoyao <xry111@xry111.site>
> Sent: Thursday, August 10, 2023 9:48 AM
> To: Liu, Hongtao <hongtao.liu@intel.com>; gcc-patches@gcc.gnu.org
> Cc: richard.guenther@gmail.com; ubizjak@gmail.com; hubicka@ucw.cz
> Subject: Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable
> vectorization for all gather/scatter instructions.
> 
> On Thu, 2023-08-10 at 09:11 +0800, liuhongt via Gcc-patches wrote:
> > Currently we have 3 different independent tunes for gather
> > "use_gather,use_gather_2parts,use_gather_4parts",
> > similar for scatter, there're
> > "use_scatter,use_scatter_2parts,use_scatter_4parts"
> >
> > The patch support 2 standardizing options to enable/disable
> > vectorization for all gather/scatter instructions. The options is
> > interpreted by driver to 3 tunes.
> >
> > bootstrapped and regtested on x86_64-pc-linux-gnu.
> > Ok for trunk?
> 
> And should we set -mno-gather as the default for GDS affected processors?
> We'll likely apply the ucode update for them, and then the gathering
> instructions will be much slower.
Assume you're talking about https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/gather-data-sampling.html
Yes, there will be an separate patch for microarchitecture tuning.
> 
> > gcc/ChangeLog:
> >
> >         * config/i386/i386.h (DRIVER_SELF_SPECS): Add
> >         GATHER_SCATTER_DRIVER_SELF_SPECS.
> >         (GATHER_SCATTER_DRIVER_SELF_SPECS): New macro.
> >         * config/i386/i386.opt (mgather): New option.
> >         (mscatter): Ditto.
> > ---
> >  gcc/config/i386/i386.h   | 12 +++++++++++-
> >  gcc/config/i386/i386.opt |  8 ++++++++
> >  2 files changed, 19 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index
> > ef342fcee9b..d9ac2c29bde 100644
> > --- a/gcc/config/i386/i386.h
> > +++ b/gcc/config/i386/i386.h
> > @@ -565,7 +565,17 @@ extern GTY(()) tree x86_mfence;
> >  # define SUBTARGET_DRIVER_SELF_SPECS ""
> >  #endif
> >
> > -#define DRIVER_SELF_SPECS SUBTARGET_DRIVER_SELF_SPECS
> > +#ifndef GATHER_SCATTER_DRIVER_SELF_SPECS # define
> > +GATHER_SCATTER_DRIVER_SELF_SPECS \
> > +  "%{mno-gather:-mtune-
> > ctrl=^use_gather_2parts,^use_gather_4parts,^use_gather} \
> > +   %{mgather:-mtune-
> > ctrl=use_gather_2parts,use_gather_4parts,use_gather} \
> > +   %{mno-scatter:-mtune-
> > ctrl=^use_scatter_2parts,^use_scatter_4parts,^use_scatter} \
> > +   %{mscatter:-mtune-
> > ctrl=use_scatter_2parts,use_scatter_4parts,use_scatter}"
> > +#endif
> > +
> > +#define DRIVER_SELF_SPECS \
> > +  SUBTARGET_DRIVER_SELF_SPECS " " \
> > +  GATHER_SCATTER_DRIVER_SELF_SPECS
> >
> >  /* -march=native handling only makes sense with compiler running on
> >     an x86 or x86_64 chip.  If changing this condition, also change
> > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index
> > ddb7f110aa2..99948644a8d 100644
> > --- a/gcc/config/i386/i386.opt
> > +++ b/gcc/config/i386/i386.opt
> > @@ -424,6 +424,14 @@ mdaz-ftz
> >  Target
> >  Set the FTZ and DAZ Flags.
> >
> > +mgather
> > +Target
> > +Enable vectorization for gather instruction.
> > +
> > +mscatter
> > +Target
> > +Enable vectorization for scatter instruction.
> > +
> >  mpreferred-stack-boundary=
> >  Target RejectNegative Joined UInteger
> > Var(ix86_preferred_stack_boundary_arg)
> >  Attempt to keep stack aligned to this power of 2.
> 
> --
> Xi Ruoyao <xry111@xry111.site>
> School of Aerospace Science and Technology, Xidian University
Uros Bizjak Aug. 10, 2023, 6:04 a.m. UTC | #3
On Thu, Aug 10, 2023 at 3:13 AM liuhongt <hongtao.liu@intel.com> wrote:
>
> Currently we have 3 different independent tunes for gather
> "use_gather,use_gather_2parts,use_gather_4parts",
> similar for scatter, there're
> "use_scatter,use_scatter_2parts,use_scatter_4parts"
>
> The patch support 2 standardizing options to enable/disable
> vectorization for all gather/scatter instructions. The options is
> interpreted by driver to 3 tunes.
>
> bootstrapped and regtested on x86_64-pc-linux-gnu.
> Ok for trunk?
>
> gcc/ChangeLog:
>
>         * config/i386/i386.h (DRIVER_SELF_SPECS): Add
>         GATHER_SCATTER_DRIVER_SELF_SPECS.
>         (GATHER_SCATTER_DRIVER_SELF_SPECS): New macro.
>         * config/i386/i386.opt (mgather): New option.
>         (mscatter): Ditto.
> ---
>  gcc/config/i386/i386.h   | 12 +++++++++++-
>  gcc/config/i386/i386.opt |  8 ++++++++
>  2 files changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index ef342fcee9b..d9ac2c29bde 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -565,7 +565,17 @@ extern GTY(()) tree x86_mfence;
>  # define SUBTARGET_DRIVER_SELF_SPECS ""
>  #endif
>
> -#define DRIVER_SELF_SPECS SUBTARGET_DRIVER_SELF_SPECS
> +#ifndef GATHER_SCATTER_DRIVER_SELF_SPECS
> +# define GATHER_SCATTER_DRIVER_SELF_SPECS \
> +  "%{mno-gather:-mtune-ctrl=^use_gather_2parts,^use_gather_4parts,^use_gather} \
> +   %{mgather:-mtune-ctrl=use_gather_2parts,use_gather_4parts,use_gather} \
> +   %{mno-scatter:-mtune-ctrl=^use_scatter_2parts,^use_scatter_4parts,^use_scatter} \
> +   %{mscatter:-mtune-ctrl=use_scatter_2parts,use_scatter_4parts,use_scatter}"
> +#endif
> +
> +#define DRIVER_SELF_SPECS \
> +  SUBTARGET_DRIVER_SELF_SPECS " " \
> +  GATHER_SCATTER_DRIVER_SELF_SPECS
>
>  /* -march=native handling only makes sense with compiler running on
>     an x86 or x86_64 chip.  If changing this condition, also change
> diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> index ddb7f110aa2..99948644a8d 100644
> --- a/gcc/config/i386/i386.opt
> +++ b/gcc/config/i386/i386.opt
> @@ -424,6 +424,14 @@ mdaz-ftz
>  Target
>  Set the FTZ and DAZ Flags.
>
> +mgather
> +Target
> +Enable vectorization for gather instruction.
> +
> +mscatter
> +Target
> +Enable vectorization for scatter instruction.

Are gather and scatter instructions affected in a separate way, or
should we use one -mgather-scatter option to cover all gather/scatter
tunings?

Uros.

> +
>  mpreferred-stack-boundary=
>  Target RejectNegative Joined UInteger Var(ix86_preferred_stack_boundary_arg)
>  Attempt to keep stack aligned to this power of 2.
> --
> 2.31.1
>
Hongtao Liu Aug. 10, 2023, 6:12 a.m. UTC | #4
On Thu, Aug 10, 2023 at 2:04 PM Uros Bizjak via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Thu, Aug 10, 2023 at 3:13 AM liuhongt <hongtao.liu@intel.com> wrote:
> >
> > Currently we have 3 different independent tunes for gather
> > "use_gather,use_gather_2parts,use_gather_4parts",
> > similar for scatter, there're
> > "use_scatter,use_scatter_2parts,use_scatter_4parts"
> >
> > The patch support 2 standardizing options to enable/disable
> > vectorization for all gather/scatter instructions. The options is
> > interpreted by driver to 3 tunes.
> >
> > bootstrapped and regtested on x86_64-pc-linux-gnu.
> > Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> >         * config/i386/i386.h (DRIVER_SELF_SPECS): Add
> >         GATHER_SCATTER_DRIVER_SELF_SPECS.
> >         (GATHER_SCATTER_DRIVER_SELF_SPECS): New macro.
> >         * config/i386/i386.opt (mgather): New option.
> >         (mscatter): Ditto.
> > ---
> >  gcc/config/i386/i386.h   | 12 +++++++++++-
> >  gcc/config/i386/i386.opt |  8 ++++++++
> >  2 files changed, 19 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> > index ef342fcee9b..d9ac2c29bde 100644
> > --- a/gcc/config/i386/i386.h
> > +++ b/gcc/config/i386/i386.h
> > @@ -565,7 +565,17 @@ extern GTY(()) tree x86_mfence;
> >  # define SUBTARGET_DRIVER_SELF_SPECS ""
> >  #endif
> >
> > -#define DRIVER_SELF_SPECS SUBTARGET_DRIVER_SELF_SPECS
> > +#ifndef GATHER_SCATTER_DRIVER_SELF_SPECS
> > +# define GATHER_SCATTER_DRIVER_SELF_SPECS \
> > +  "%{mno-gather:-mtune-ctrl=^use_gather_2parts,^use_gather_4parts,^use_gather} \
> > +   %{mgather:-mtune-ctrl=use_gather_2parts,use_gather_4parts,use_gather} \
> > +   %{mno-scatter:-mtune-ctrl=^use_scatter_2parts,^use_scatter_4parts,^use_scatter} \
> > +   %{mscatter:-mtune-ctrl=use_scatter_2parts,use_scatter_4parts,use_scatter}"
> > +#endif
> > +
> > +#define DRIVER_SELF_SPECS \
> > +  SUBTARGET_DRIVER_SELF_SPECS " " \
> > +  GATHER_SCATTER_DRIVER_SELF_SPECS
> >
> >  /* -march=native handling only makes sense with compiler running on
> >     an x86 or x86_64 chip.  If changing this condition, also change
> > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> > index ddb7f110aa2..99948644a8d 100644
> > --- a/gcc/config/i386/i386.opt
> > +++ b/gcc/config/i386/i386.opt
> > @@ -424,6 +424,14 @@ mdaz-ftz
> >  Target
> >  Set the FTZ and DAZ Flags.
> >
> > +mgather
> > +Target
> > +Enable vectorization for gather instruction.
> > +
> > +mscatter
> > +Target
> > +Enable vectorization for scatter instruction.
>
> Are gather and scatter instructions affected in a separate way, or
> should we use one -mgather-scatter option to cover all gather/scatter
> tunings?
A separate way.
Gather Data Sampling is only for gather.
https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/gather-data-sampling.html
>
> Uros.
>
> > +
> >  mpreferred-stack-boundary=
> >  Target RejectNegative Joined UInteger Var(ix86_preferred_stack_boundary_arg)
> >  Attempt to keep stack aligned to this power of 2.
> > --
> > 2.31.1
> >
Richard Biener Aug. 10, 2023, 7:39 a.m. UTC | #5
On Thu, Aug 10, 2023 at 3:13 AM liuhongt <hongtao.liu@intel.com> wrote:
>
> Currently we have 3 different independent tunes for gather
> "use_gather,use_gather_2parts,use_gather_4parts",
> similar for scatter, there're
> "use_scatter,use_scatter_2parts,use_scatter_4parts"
>
> The patch support 2 standardizing options to enable/disable
> vectorization for all gather/scatter instructions. The options is
> interpreted by driver to 3 tunes.
>
> bootstrapped and regtested on x86_64-pc-linux-gnu.
> Ok for trunk?

I think -mgather/-mscatter are too close to -mfma suggesting they
enable part of an ISA but they won't disable the use of intrinsics
or enable gather/scatter on CPUs where the ISA doesn't have them.

May I suggest to invent a more generic "short-cut" to
-mtune-ctrl=^X, maybe -mdisable=X?  And for gather/scatter
tunables add ^use_gather_any to cover all cases?  (or
change what use_gather controls - it seems we changed its
meaning before, and instead add use_gather_8parts and
use_gather_16parts)

That is, what's the point of this?

Richard.

> gcc/ChangeLog:
>
>         * config/i386/i386.h (DRIVER_SELF_SPECS): Add
>         GATHER_SCATTER_DRIVER_SELF_SPECS.
>         (GATHER_SCATTER_DRIVER_SELF_SPECS): New macro.
>         * config/i386/i386.opt (mgather): New option.
>         (mscatter): Ditto.
> ---
>  gcc/config/i386/i386.h   | 12 +++++++++++-
>  gcc/config/i386/i386.opt |  8 ++++++++
>  2 files changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index ef342fcee9b..d9ac2c29bde 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -565,7 +565,17 @@ extern GTY(()) tree x86_mfence;
>  # define SUBTARGET_DRIVER_SELF_SPECS ""
>  #endif
>
> -#define DRIVER_SELF_SPECS SUBTARGET_DRIVER_SELF_SPECS
> +#ifndef GATHER_SCATTER_DRIVER_SELF_SPECS
> +# define GATHER_SCATTER_DRIVER_SELF_SPECS \
> +  "%{mno-gather:-mtune-ctrl=^use_gather_2parts,^use_gather_4parts,^use_gather} \
> +   %{mgather:-mtune-ctrl=use_gather_2parts,use_gather_4parts,use_gather} \
> +   %{mno-scatter:-mtune-ctrl=^use_scatter_2parts,^use_scatter_4parts,^use_scatter} \
> +   %{mscatter:-mtune-ctrl=use_scatter_2parts,use_scatter_4parts,use_scatter}"
> +#endif
> +
> +#define DRIVER_SELF_SPECS \
> +  SUBTARGET_DRIVER_SELF_SPECS " " \
> +  GATHER_SCATTER_DRIVER_SELF_SPECS
>
>  /* -march=native handling only makes sense with compiler running on
>     an x86 or x86_64 chip.  If changing this condition, also change
> diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> index ddb7f110aa2..99948644a8d 100644
> --- a/gcc/config/i386/i386.opt
> +++ b/gcc/config/i386/i386.opt
> @@ -424,6 +424,14 @@ mdaz-ftz
>  Target
>  Set the FTZ and DAZ Flags.
>
> +mgather
> +Target
> +Enable vectorization for gather instruction.
> +
> +mscatter
> +Target
> +Enable vectorization for scatter instruction.
> +
>  mpreferred-stack-boundary=
>  Target RejectNegative Joined UInteger Var(ix86_preferred_stack_boundary_arg)
>  Attempt to keep stack aligned to this power of 2.
> --
> 2.31.1
>
Uros Bizjak Aug. 10, 2023, 7:42 a.m. UTC | #6
On Thu, Aug 10, 2023 at 9:40 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Thu, Aug 10, 2023 at 3:13 AM liuhongt <hongtao.liu@intel.com> wrote:
> >
> > Currently we have 3 different independent tunes for gather
> > "use_gather,use_gather_2parts,use_gather_4parts",
> > similar for scatter, there're
> > "use_scatter,use_scatter_2parts,use_scatter_4parts"
> >
> > The patch support 2 standardizing options to enable/disable
> > vectorization for all gather/scatter instructions. The options is
> > interpreted by driver to 3 tunes.
> >
> > bootstrapped and regtested on x86_64-pc-linux-gnu.
> > Ok for trunk?
>
> I think -mgather/-mscatter are too close to -mfma suggesting they
> enable part of an ISA but they won't disable the use of intrinsics
> or enable gather/scatter on CPUs where the ISA doesn't have them.
>
> May I suggest to invent a more generic "short-cut" to
> -mtune-ctrl=^X, maybe -mdisable=X?  And for gather/scatter
> tunables add ^use_gather_any to cover all cases?  (or
> change what use_gather controls - it seems we changed its
> meaning before, and instead add use_gather_8parts and
> use_gather_16parts)
>
> That is, what's the point of this?

https://www.phoronix.com/review/downfall

that caused:

https://www.phoronix.com/review/intel-downfall-benchmarks

Uros.
Richard Biener Aug. 10, 2023, 7:47 a.m. UTC | #7
On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Thu, Aug 10, 2023 at 9:40 AM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Thu, Aug 10, 2023 at 3:13 AM liuhongt <hongtao.liu@intel.com> wrote:
> > >
> > > Currently we have 3 different independent tunes for gather
> > > "use_gather,use_gather_2parts,use_gather_4parts",
> > > similar for scatter, there're
> > > "use_scatter,use_scatter_2parts,use_scatter_4parts"
> > >
> > > The patch support 2 standardizing options to enable/disable
> > > vectorization for all gather/scatter instructions. The options is
> > > interpreted by driver to 3 tunes.
> > >
> > > bootstrapped and regtested on x86_64-pc-linux-gnu.
> > > Ok for trunk?
> >
> > I think -mgather/-mscatter are too close to -mfma suggesting they
> > enable part of an ISA but they won't disable the use of intrinsics
> > or enable gather/scatter on CPUs where the ISA doesn't have them.
> >
> > May I suggest to invent a more generic "short-cut" to
> > -mtune-ctrl=^X, maybe -mdisable=X?  And for gather/scatter
> > tunables add ^use_gather_any to cover all cases?  (or
> > change what use_gather controls - it seems we changed its
> > meaning before, and instead add use_gather_8parts and
> > use_gather_16parts)
> >
> > That is, what's the point of this?
>
> https://www.phoronix.com/review/downfall
>
> that caused:
>
> https://www.phoronix.com/review/intel-downfall-benchmarks

Yes, I know.  But there's -mtune-ctl=<very long line> doing the trick.
GCC 11 had only 'use_gather', covering all number of lanes.  I suggest
to resurrect that behavior and add use_gather_8+parts (or two, IIRC
gather works only on SI/SFmode or larger).

Then -mtune-ctl=^use_gather works which I think is nice enough?

Richard.

> Uros.
Hongtao Liu Aug. 10, 2023, 7:55 a.m. UTC | #8
On Thu, Aug 10, 2023 at 3:49 PM Richard Biener via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Thu, Aug 10, 2023 at 9:40 AM Richard Biener
> > <richard.guenther@gmail.com> wrote:
> > >
> > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt <hongtao.liu@intel.com> wrote:
> > > >
> > > > Currently we have 3 different independent tunes for gather
> > > > "use_gather,use_gather_2parts,use_gather_4parts",
> > > > similar for scatter, there're
> > > > "use_scatter,use_scatter_2parts,use_scatter_4parts"
> > > >
> > > > The patch support 2 standardizing options to enable/disable
> > > > vectorization for all gather/scatter instructions. The options is
> > > > interpreted by driver to 3 tunes.
> > > >
> > > > bootstrapped and regtested on x86_64-pc-linux-gnu.
> > > > Ok for trunk?
> > >
> > > I think -mgather/-mscatter are too close to -mfma suggesting they
> > > enable part of an ISA but they won't disable the use of intrinsics
> > > or enable gather/scatter on CPUs where the ISA doesn't have them.
> > >
> > > May I suggest to invent a more generic "short-cut" to
> > > -mtune-ctrl=^X, maybe -mdisable=X?  And for gather/scatter
> > > tunables add ^use_gather_any to cover all cases?  (or
> > > change what use_gather controls - it seems we changed its
> > > meaning before, and instead add use_gather_8parts and
> > > use_gather_16parts)
> > >
> > > That is, what's the point of this?
> >
> > https://www.phoronix.com/review/downfall
> >
> > that caused:
> >
> > https://www.phoronix.com/review/intel-downfall-benchmarks
>
> Yes, I know.  But there's -mtune-ctl=<very long line> doing the trick.
> GCC 11 had only 'use_gather', covering all number of lanes.  I suggest
> to resurrect that behavior and add use_gather_8+parts (or two, IIRC
> gather works only on SI/SFmode or larger).
>
> Then -mtune-ctl=^use_gather works which I think is nice enough?
So basically, -mtune-ctrl=^use_gather is used to turn off all gather
vectorization, but -mtune-ctrl=use_gather doesn't turn on all of them?
We don't have an extrat explicit flag for target tune, just single bit
- ix86_tune_features[X86_TUNE_USE_GATHER]
>
> Richard.
>
> > Uros.
Hongtao Liu Aug. 10, 2023, 8:07 a.m. UTC | #9
On Thu, Aug 10, 2023 at 3:55 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Thu, Aug 10, 2023 at 3:49 PM Richard Biener via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > On Thu, Aug 10, 2023 at 9:40 AM Richard Biener
> > > <richard.guenther@gmail.com> wrote:
> > > >
> > > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt <hongtao.liu@intel.com> wrote:
> > > > >
> > > > > Currently we have 3 different independent tunes for gather
> > > > > "use_gather,use_gather_2parts,use_gather_4parts",
> > > > > similar for scatter, there're
> > > > > "use_scatter,use_scatter_2parts,use_scatter_4parts"
> > > > >
> > > > > The patch support 2 standardizing options to enable/disable
> > > > > vectorization for all gather/scatter instructions. The options is
> > > > > interpreted by driver to 3 tunes.
> > > > >
> > > > > bootstrapped and regtested on x86_64-pc-linux-gnu.
> > > > > Ok for trunk?
> > > >
> > > > I think -mgather/-mscatter are too close to -mfma suggesting they
> > > > enable part of an ISA but they won't disable the use of intrinsics
> > > > or enable gather/scatter on CPUs where the ISA doesn't have them.
> > > >
> > > > May I suggest to invent a more generic "short-cut" to
> > > > -mtune-ctrl=^X, maybe -mdisable=X?  And for gather/scatter
> > > > tunables add ^use_gather_any to cover all cases?  (or
> > > > change what use_gather controls - it seems we changed its
> > > > meaning before, and instead add use_gather_8parts and
> > > > use_gather_16parts)
> > > >
> > > > That is, what's the point of this?
> > >
> > > https://www.phoronix.com/review/downfall
> > >
> > > that caused:
> > >
> > > https://www.phoronix.com/review/intel-downfall-benchmarks
> >
> > Yes, I know.  But there's -mtune-ctl=<very long line> doing the trick.
> > GCC 11 had only 'use_gather', covering all number of lanes.  I suggest
> > to resurrect that behavior and add use_gather_8+parts (or two, IIRC
> > gather works only on SI/SFmode or larger).
> >
> > Then -mtune-ctl=^use_gather works which I think is nice enough?
> So basically, -mtune-ctrl=^use_gather is used to turn off all gather
> vectorization, but -mtune-ctrl=use_gather doesn't turn on all of them?
> We don't have an extrat explicit flag for target tune, just single bit
> - ix86_tune_features[X86_TUNE_USE_GATHER]
Looks like I can handle it specially in parse_mtune_ctrl_str, let me try.
> >
> > Richard.
> >
> > > Uros.
>
>
>
> --
> BR,
> Hongtao
Hongtao Liu Aug. 10, 2023, 9:16 a.m. UTC | #10
On Thu, Aug 10, 2023 at 4:07 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Thu, Aug 10, 2023 at 3:55 PM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Thu, Aug 10, 2023 at 3:49 PM Richard Biener via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > >
> > > > On Thu, Aug 10, 2023 at 9:40 AM Richard Biener
> > > > <richard.guenther@gmail.com> wrote:
> > > > >
> > > > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt <hongtao.liu@intel.com> wrote:
> > > > > >
> > > > > > Currently we have 3 different independent tunes for gather
> > > > > > "use_gather,use_gather_2parts,use_gather_4parts",
> > > > > > similar for scatter, there're
> > > > > > "use_scatter,use_scatter_2parts,use_scatter_4parts"
> > > > > >
> > > > > > The patch support 2 standardizing options to enable/disable
> > > > > > vectorization for all gather/scatter instructions. The options is
> > > > > > interpreted by driver to 3 tunes.
> > > > > >
> > > > > > bootstrapped and regtested on x86_64-pc-linux-gnu.
> > > > > > Ok for trunk?
> > > > >
> > > > > I think -mgather/-mscatter are too close to -mfma suggesting they
> > > > > enable part of an ISA but they won't disable the use of intrinsics
> > > > > or enable gather/scatter on CPUs where the ISA doesn't have them.
> > > > >
> > > > > May I suggest to invent a more generic "short-cut" to
> > > > > -mtune-ctrl=^X, maybe -mdisable=X?  And for gather/scatter
> > > > > tunables add ^use_gather_any to cover all cases?  (or
> > > > > change what use_gather controls - it seems we changed its
> > > > > meaning before, and instead add use_gather_8parts and
> > > > > use_gather_16parts)
> > > > >
> > > > > That is, what's the point of this?
The point of this is to keep consistent between GCC, LLVM, and
ICX(Intel® oneAPI DPC++/C++ Compiler) .
LLVM,ICX will support that option.
> > > >
> > > > https://www.phoronix.com/review/downfall
> > > >
> > > > that caused:
> > > >
> > > > https://www.phoronix.com/review/intel-downfall-benchmarks
> > >
> > > Yes, I know.  But there's -mtune-ctl=<very long line> doing the trick.
> > > GCC 11 had only 'use_gather', covering all number of lanes.  I suggest
> > > to resurrect that behavior and add use_gather_8+parts (or two, IIRC
> > > gather works only on SI/SFmode or larger).
> > >
> > > Then -mtune-ctl=^use_gather works which I think is nice enough?
> > So basically, -mtune-ctrl=^use_gather is used to turn off all gather
> > vectorization, but -mtune-ctrl=use_gather doesn't turn on all of them?
> > We don't have an extrat explicit flag for target tune, just single bit
> > - ix86_tune_features[X86_TUNE_USE_GATHER]
> Looks like I can handle it specially in parse_mtune_ctrl_str, let me try.
> > >
> > > Richard.
> > >
> > > > Uros.
> >
> >
> >
> > --
> > BR,
> > Hongtao
>
>
>
> --
> BR,
> Hongtao
Richard Biener Aug. 10, 2023, 11:11 a.m. UTC | #11
On Thu, Aug 10, 2023 at 9:55 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Thu, Aug 10, 2023 at 3:49 PM Richard Biener via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > On Thu, Aug 10, 2023 at 9:40 AM Richard Biener
> > > <richard.guenther@gmail.com> wrote:
> > > >
> > > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt <hongtao.liu@intel.com> wrote:
> > > > >
> > > > > Currently we have 3 different independent tunes for gather
> > > > > "use_gather,use_gather_2parts,use_gather_4parts",
> > > > > similar for scatter, there're
> > > > > "use_scatter,use_scatter_2parts,use_scatter_4parts"
> > > > >
> > > > > The patch support 2 standardizing options to enable/disable
> > > > > vectorization for all gather/scatter instructions. The options is
> > > > > interpreted by driver to 3 tunes.
> > > > >
> > > > > bootstrapped and regtested on x86_64-pc-linux-gnu.
> > > > > Ok for trunk?
> > > >
> > > > I think -mgather/-mscatter are too close to -mfma suggesting they
> > > > enable part of an ISA but they won't disable the use of intrinsics
> > > > or enable gather/scatter on CPUs where the ISA doesn't have them.
> > > >
> > > > May I suggest to invent a more generic "short-cut" to
> > > > -mtune-ctrl=^X, maybe -mdisable=X?  And for gather/scatter
> > > > tunables add ^use_gather_any to cover all cases?  (or
> > > > change what use_gather controls - it seems we changed its
> > > > meaning before, and instead add use_gather_8parts and
> > > > use_gather_16parts)
> > > >
> > > > That is, what's the point of this?
> > >
> > > https://www.phoronix.com/review/downfall
> > >
> > > that caused:
> > >
> > > https://www.phoronix.com/review/intel-downfall-benchmarks
> >
> > Yes, I know.  But there's -mtune-ctl=<very long line> doing the trick.
> > GCC 11 had only 'use_gather', covering all number of lanes.  I suggest
> > to resurrect that behavior and add use_gather_8+parts (or two, IIRC
> > gather works only on SI/SFmode or larger).
> >
> > Then -mtune-ctl=^use_gather works which I think is nice enough?
> So basically, -mtune-ctrl=^use_gather is used to turn off all gather
> vectorization, but -mtune-ctrl=use_gather doesn't turn on all of them?

No, -mtune-ctl=use_gather should turn them all on as well.

> We don't have an extrat explicit flag for target tune, just single bit
> - ix86_tune_features[X86_TUNE_USE_GATHER]

GCC 11 just had that single bit for all.  I'm not sure how awkward it is
to have use_gather alias use_gather_2_parts, use_gather_4_parts ...

> >
> > Richard.
> >
> > > Uros.
>
>
>
> --
> BR,
> Hongtao
Richard Biener Aug. 10, 2023, 11:12 a.m. UTC | #12
On Thu, Aug 10, 2023 at 11:16 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Thu, Aug 10, 2023 at 4:07 PM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Thu, Aug 10, 2023 at 3:55 PM Hongtao Liu <crazylht@gmail.com> wrote:
> > >
> > > On Thu, Aug 10, 2023 at 3:49 PM Richard Biener via Gcc-patches
> > > <gcc-patches@gcc.gnu.org> wrote:
> > > >
> > > > On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > > >
> > > > > On Thu, Aug 10, 2023 at 9:40 AM Richard Biener
> > > > > <richard.guenther@gmail.com> wrote:
> > > > > >
> > > > > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt <hongtao.liu@intel.com> wrote:
> > > > > > >
> > > > > > > Currently we have 3 different independent tunes for gather
> > > > > > > "use_gather,use_gather_2parts,use_gather_4parts",
> > > > > > > similar for scatter, there're
> > > > > > > "use_scatter,use_scatter_2parts,use_scatter_4parts"
> > > > > > >
> > > > > > > The patch support 2 standardizing options to enable/disable
> > > > > > > vectorization for all gather/scatter instructions. The options is
> > > > > > > interpreted by driver to 3 tunes.
> > > > > > >
> > > > > > > bootstrapped and regtested on x86_64-pc-linux-gnu.
> > > > > > > Ok for trunk?
> > > > > >
> > > > > > I think -mgather/-mscatter are too close to -mfma suggesting they
> > > > > > enable part of an ISA but they won't disable the use of intrinsics
> > > > > > or enable gather/scatter on CPUs where the ISA doesn't have them.
> > > > > >
> > > > > > May I suggest to invent a more generic "short-cut" to
> > > > > > -mtune-ctrl=^X, maybe -mdisable=X?  And for gather/scatter
> > > > > > tunables add ^use_gather_any to cover all cases?  (or
> > > > > > change what use_gather controls - it seems we changed its
> > > > > > meaning before, and instead add use_gather_8parts and
> > > > > > use_gather_16parts)
> > > > > >
> > > > > > That is, what's the point of this?
> The point of this is to keep consistent between GCC, LLVM, and
> ICX(Intel® oneAPI DPC++/C++ Compiler) .
> LLVM,ICX will support that option.

GCC has very many options that are not the same as LLVM or ICX,
I don't see a good reason to special case this one.  As said, it's
a very bad name IMHO.

Richard.

> > > > >
> > > > > https://www.phoronix.com/review/downfall
> > > > >
> > > > > that caused:
> > > > >
> > > > > https://www.phoronix.com/review/intel-downfall-benchmarks
> > > >
> > > > Yes, I know.  But there's -mtune-ctl=<very long line> doing the trick.
> > > > GCC 11 had only 'use_gather', covering all number of lanes.  I suggest
> > > > to resurrect that behavior and add use_gather_8+parts (or two, IIRC
> > > > gather works only on SI/SFmode or larger).
> > > >
> > > > Then -mtune-ctl=^use_gather works which I think is nice enough?
> > > So basically, -mtune-ctrl=^use_gather is used to turn off all gather
> > > vectorization, but -mtune-ctrl=use_gather doesn't turn on all of them?
> > > We don't have an extrat explicit flag for target tune, just single bit
> > > - ix86_tune_features[X86_TUNE_USE_GATHER]
> > Looks like I can handle it specially in parse_mtune_ctrl_str, let me try.
> > > >
> > > > Richard.
> > > >
> > > > > Uros.
> > >
> > >
> > >
> > > --
> > > BR,
> > > Hongtao
> >
> >
> >
> > --
> > BR,
> > Hongtao
>
>
>
> --
> BR,
> Hongtao
Jan Hubicka Aug. 10, 2023, 12:05 p.m. UTC | #13
> On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Thu, Aug 10, 2023 at 9:40 AM Richard Biener
> > <richard.guenther@gmail.com> wrote:
> > >
> > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt <hongtao.liu@intel.com> wrote:
> > > >
> > > > Currently we have 3 different independent tunes for gather
> > > > "use_gather,use_gather_2parts,use_gather_4parts",
> > > > similar for scatter, there're
> > > > "use_scatter,use_scatter_2parts,use_scatter_4parts"
> > > >
> > > > The patch support 2 standardizing options to enable/disable
> > > > vectorization for all gather/scatter instructions. The options is
> > > > interpreted by driver to 3 tunes.
> > > >
> > > > bootstrapped and regtested on x86_64-pc-linux-gnu.
> > > > Ok for trunk?
> > >
> > > I think -mgather/-mscatter are too close to -mfma suggesting they
> > > enable part of an ISA but they won't disable the use of intrinsics
> > > or enable gather/scatter on CPUs where the ISA doesn't have them.
> > >
> > > May I suggest to invent a more generic "short-cut" to
> > > -mtune-ctrl=^X, maybe -mdisable=X?  And for gather/scatter
> > > tunables add ^use_gather_any to cover all cases?  (or
> > > change what use_gather controls - it seems we changed its
> > > meaning before, and instead add use_gather_8parts and
> > > use_gather_16parts)
> > >
> > > That is, what's the point of this?
> >
> > https://www.phoronix.com/review/downfall
> >
> > that caused:
> >
> > https://www.phoronix.com/review/intel-downfall-benchmarks
> 
> Yes, I know.  But there's -mtune-ctl=<very long line> doing the trick.
> GCC 11 had only 'use_gather', covering all number of lanes.  I suggest
> to resurrect that behavior and add use_gather_8+parts (or two, IIRC
> gather works only on SI/SFmode or larger).
> 
> Then -mtune-ctl=^use_gather works which I think is nice enough?

-mtune-ctl is really intended for GCC developers.  It is not backward
compatible, fully documented and bad sets of values may trigger ICEs.
If gathers became very slow, I think normal users may want to disable
them and in such situation specialized command line option makes sense
to me.

Honza
> 
> Richard.
> 
> > Uros.
Hongtao Liu Aug. 10, 2023, 1:23 p.m. UTC | #14
On Thu, Aug 10, 2023 at 7:13 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Thu, Aug 10, 2023 at 11:16 AM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Thu, Aug 10, 2023 at 4:07 PM Hongtao Liu <crazylht@gmail.com> wrote:
> > >
> > > On Thu, Aug 10, 2023 at 3:55 PM Hongtao Liu <crazylht@gmail.com> wrote:
> > > >
> > > > On Thu, Aug 10, 2023 at 3:49 PM Richard Biener via Gcc-patches
> > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > >
> > > > > On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > > > >
> > > > > > On Thu, Aug 10, 2023 at 9:40 AM Richard Biener
> > > > > > <richard.guenther@gmail.com> wrote:
> > > > > > >
> > > > > > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt <hongtao.liu@intel.com> wrote:
> > > > > > > >
> > > > > > > > Currently we have 3 different independent tunes for gather
> > > > > > > > "use_gather,use_gather_2parts,use_gather_4parts",
> > > > > > > > similar for scatter, there're
> > > > > > > > "use_scatter,use_scatter_2parts,use_scatter_4parts"
> > > > > > > >
> > > > > > > > The patch support 2 standardizing options to enable/disable
> > > > > > > > vectorization for all gather/scatter instructions. The options is
> > > > > > > > interpreted by driver to 3 tunes.
> > > > > > > >
> > > > > > > > bootstrapped and regtested on x86_64-pc-linux-gnu.
> > > > > > > > Ok for trunk?
> > > > > > >
> > > > > > > I think -mgather/-mscatter are too close to -mfma suggesting they
> > > > > > > enable part of an ISA but they won't disable the use of intrinsics
> > > > > > > or enable gather/scatter on CPUs where the ISA doesn't have them.
> > > > > > >
> > > > > > > May I suggest to invent a more generic "short-cut" to
> > > > > > > -mtune-ctrl=^X, maybe -mdisable=X?  And for gather/scatter
> > > > > > > tunables add ^use_gather_any to cover all cases?  (or
> > > > > > > change what use_gather controls - it seems we changed its
> > > > > > > meaning before, and instead add use_gather_8parts and
> > > > > > > use_gather_16parts)
> > > > > > >
> > > > > > > That is, what's the point of this?
> > The point of this is to keep consistent between GCC, LLVM, and
> > ICX(Intel® oneAPI DPC++/C++ Compiler) .
> > LLVM,ICX will support that option.
>
> GCC has very many options that are not the same as LLVM or ICX,
> I don't see a good reason to special case this one.  As said, it's
> a very bad name IMHO.
In general terms, yes.
But this is a new option, shouldn't it be better to be consistent?
And the problem with mfma is mainly that the cpuid is just called fma,
but we don't have a cpuid called gather/scatter, with clear document
that the option is only for auto-vectorization,
-m{no-,}{gather,scattter} looks fine to me.
As Honza mentioned, users need to option to turn on/off gather/scatter
auto vectorization, I don't think they will expect the option is also
valid for intrinsic.
If -mtune-crtl= is not suitable for direct exposure to usersusers,
then the original proposal should be ok?
Developers will manintain the relation between mgather/scatter and
-mtune-crtl=XXX to make it consistent between GCC versions.
>
> Richard.
>
> > > > > >
> > > > > > https://www.phoronix.com/review/downfall
> > > > > >
> > > > > > that caused:
> > > > > >
> > > > > > https://www.phoronix.com/review/intel-downfall-benchmarks
> > > > >
> > > > > Yes, I know.  But there's -mtune-ctl=<very long line> doing the trick.
> > > > > GCC 11 had only 'use_gather', covering all number of lanes.  I suggest
> > > > > to resurrect that behavior and add use_gather_8+parts (or two, IIRC
> > > > > gather works only on SI/SFmode or larger).
> > > > >
> > > > > Then -mtune-ctl=^use_gather works which I think is nice enough?
> > > > So basically, -mtune-ctrl=^use_gather is used to turn off all gather
> > > > vectorization, but -mtune-ctrl=use_gather doesn't turn on all of them?
> > > > We don't have an extrat explicit flag for target tune, just single bit
> > > > - ix86_tune_features[X86_TUNE_USE_GATHER]
> > > Looks like I can handle it specially in parse_mtune_ctrl_str, let me try.
> > > > >
> > > > > Richard.
> > > > >
> > > > > > Uros.
> > > >
> > > >
> > > >
> > > > --
> > > > BR,
> > > > Hongtao
> > >
> > >
> > >
> > > --
> > > BR,
> > > Hongtao
> >
> >
> >
> > --
> > BR,
> > Hongtao
diff mbox series

Patch

diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index ef342fcee9b..d9ac2c29bde 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -565,7 +565,17 @@  extern GTY(()) tree x86_mfence;
 # define SUBTARGET_DRIVER_SELF_SPECS ""
 #endif
 
-#define DRIVER_SELF_SPECS SUBTARGET_DRIVER_SELF_SPECS
+#ifndef GATHER_SCATTER_DRIVER_SELF_SPECS
+# define GATHER_SCATTER_DRIVER_SELF_SPECS \
+  "%{mno-gather:-mtune-ctrl=^use_gather_2parts,^use_gather_4parts,^use_gather} \
+   %{mgather:-mtune-ctrl=use_gather_2parts,use_gather_4parts,use_gather} \
+   %{mno-scatter:-mtune-ctrl=^use_scatter_2parts,^use_scatter_4parts,^use_scatter} \
+   %{mscatter:-mtune-ctrl=use_scatter_2parts,use_scatter_4parts,use_scatter}"
+#endif
+
+#define DRIVER_SELF_SPECS \
+  SUBTARGET_DRIVER_SELF_SPECS " " \
+  GATHER_SCATTER_DRIVER_SELF_SPECS
 
 /* -march=native handling only makes sense with compiler running on
    an x86 or x86_64 chip.  If changing this condition, also change
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index ddb7f110aa2..99948644a8d 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -424,6 +424,14 @@  mdaz-ftz
 Target
 Set the FTZ and DAZ Flags.
 
+mgather
+Target
+Enable vectorization for gather instruction.
+
+mscatter
+Target
+Enable vectorization for scatter instruction.
+
 mpreferred-stack-boundary=
 Target RejectNegative Joined UInteger Var(ix86_preferred_stack_boundary_arg)
 Attempt to keep stack aligned to this power of 2.