[Scalar,masks,2/x] Use bool masks in if-conversion

Message ID	20150817162544.GB12565@msticlxl57.ims.intel.com
State	New
Headers	show Return-Path: <gcc-patches-return-405374-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:mime-version:content-type; q=dns; s= default; b=aQ9Bwmgqbas9C1p+UF85rWyJqUmbwqfpUs9iDUpAlFCW1XOgD/lWf 2fJ5Eh5wTBwC4B+qFRnnTGDXRFNAwZHBf5rSB79GaWE462ZVzVpXH8RSlHsH4AGU xvSwXLzhYaHdQucbSOG9BCib61WiKsTSKxG4Qq/jw+FEqIVmr1IbeA= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org Date: Mon, 17 Aug 2015 19:25:44 +0300 From: Ilya Enkovich <enkovich.gnu@gmail.com> To: gcc-patches@gcc.gnu.org Subject: [Scalar masks 2/x] Use bool masks in if-conversion Message-ID: <20150817162544.GB12565@msticlxl57.ims.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.23 (2014-03-12)

Ilya Enkovich Aug. 17, 2015, 4:25 p.m. UTC

Hi,

This patch intoriduces a new vectorizer hook use_scalar_mask_p which affects code generated by if-conversion pass (and affects patterns in later patches).

Thanks,
Ilya
--
2015-08-17  Ilya Enkovich  <enkovich.gnu@gmail.com>

	* doc/tm.texi (TARGET_VECTORIZE_USE_SCALAR_MASK_P): New.
	* doc/tm.texi.in: Regenerated.
	* target.def (use_scalar_mask_p): New.
	* tree-if-conv.c: Include target.h.
	(predicate_mem_writes): Don't convert boolean predicates into
	integer when scalar masks are used.

Jeff Law Aug. 20, 2015, 6:46 p.m. UTC | #1

On 08/17/2015 10:25 AM, Ilya Enkovich wrote:
> Hi,
>
> This patch intoriduces a new vectorizer hook use_scalar_mask_p which affects code generated by if-conversion pass (and affects patterns in later patches).
>
> Thanks,
> Ilya
> --
> 2015-08-17  Ilya Enkovich  <enkovich.gnu@gmail.com>
>
> 	* doc/tm.texi (TARGET_VECTORIZE_USE_SCALAR_MASK_P): New.
> 	* doc/tm.texi.in: Regenerated.
> 	* target.def (use_scalar_mask_p): New.
> 	* tree-if-conv.c: Include target.h.
> 	(predicate_mem_writes): Don't convert boolean predicates into
> 	integer when scalar masks are used.
Presumably this is how you prevent the generation of scalar masks rather 
than boolean masks on targets which don't have the former?

I hate to ask, but how painful would it be to go from a boolean to 
integer masks later such as during expansion?  Or vice-versa.

WIthout a deep knowledge of the entire patchkit, it feels like we're 
introducing target stuff in a place where we don't want it and that we'd 
be better served with a canonical representation through gimple, then 
dropping into something more target specific during gimple->rtl expansion.

Jeff

Richard Biener Aug. 21, 2015, 8:15 a.m. UTC | #2

On Thu, Aug 20, 2015 at 8:46 PM, Jeff Law <law@redhat.com> wrote:
> On 08/17/2015 10:25 AM, Ilya Enkovich wrote:
>>
>> Hi,
>>
>> This patch intoriduces a new vectorizer hook use_scalar_mask_p which
>> affects code generated by if-conversion pass (and affects patterns in later
>> patches).
>>
>> Thanks,
>> Ilya
>> --
>> 2015-08-17  Ilya Enkovich  <enkovich.gnu@gmail.com>
>>
>>         * doc/tm.texi (TARGET_VECTORIZE_USE_SCALAR_MASK_P): New.
>>         * doc/tm.texi.in: Regenerated.
>>         * target.def (use_scalar_mask_p): New.
>>         * tree-if-conv.c: Include target.h.
>>         (predicate_mem_writes): Don't convert boolean predicates into
>>         integer when scalar masks are used.
>
> Presumably this is how you prevent the generation of scalar masks rather
> than boolean masks on targets which don't have the former?
>
> I hate to ask, but how painful would it be to go from a boolean to integer
> masks later such as during expansion?  Or vice-versa.
>
> WIthout a deep knowledge of the entire patchkit, it feels like we're
> introducing target stuff in a place where we don't want it and that we'd be
> better served with a canonical representation through gimple, then dropping
> into something more target specific during gimple->rtl expansion.

Indeed.  I don't remember my exact comments during the talk at the Cauldron
but the scheme used there was sth like

  mask = GEN_MASK <vec1 < vec2>;
  b = a + 1;
  x = VEC_COND <mask, a, b>

to model conditional execution already at the if-conversion stage (for
all scalar
stmts made executed unconditionally rather than just the PHI results).  I was
asking for the condition to be removed from GEN_MASK (patch 1 has this
fixed now AFAICS).  And I also asked why it was necessary to do this "lowering"
here and not simply do

mask = vec1 < vec2;  // regular vector mask!
b = a + 1;
x = VEC_COND <mask, a, b>

and have the lowering to an integer mask done later.  You'd still
change if-conversion
to predicate _all_ statements, not just those with side-effects.  So I
think there
still needs to be a new target hook to trigger this, similar to how
the target capabilities
trigger the masked load/store path in if-conversion.

But I don't like changing our IL so much as to allow 'integer' masks everywhere.

Richard.

>
> Jeff

Ilya Enkovich Aug. 21, 2015, 10:49 a.m. UTC | #3

2015-08-21 11:15 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Thu, Aug 20, 2015 at 8:46 PM, Jeff Law <law@redhat.com> wrote:
>> On 08/17/2015 10:25 AM, Ilya Enkovich wrote:
>>>
>>> Hi,
>>>
>>> This patch intoriduces a new vectorizer hook use_scalar_mask_p which
>>> affects code generated by if-conversion pass (and affects patterns in later
>>> patches).
>>>
>>> Thanks,
>>> Ilya
>>> --
>>> 2015-08-17  Ilya Enkovich  <enkovich.gnu@gmail.com>
>>>
>>>         * doc/tm.texi (TARGET_VECTORIZE_USE_SCALAR_MASK_P): New.
>>>         * doc/tm.texi.in: Regenerated.
>>>         * target.def (use_scalar_mask_p): New.
>>>         * tree-if-conv.c: Include target.h.
>>>         (predicate_mem_writes): Don't convert boolean predicates into
>>>         integer when scalar masks are used.
>>
>> Presumably this is how you prevent the generation of scalar masks rather
>> than boolean masks on targets which don't have the former?
>>
>> I hate to ask, but how painful would it be to go from a boolean to integer
>> masks later such as during expansion?  Or vice-versa.
>>
>> WIthout a deep knowledge of the entire patchkit, it feels like we're
>> introducing target stuff in a place where we don't want it and that we'd be
>> better served with a canonical representation through gimple, then dropping
>> into something more target specific during gimple->rtl expansion.

I want a work with bitmasks to be expressed in a natural way using
regular integer operations. Currently all masks manipulations are
emulated via vector statements (mostly using a bunch of vec_cond). For
complex predicates it may be nontrivial to transform it back to scalar
masks and get an efficient code. Also the same vector may be used as
both a mask and an integer vector. Things become more complex if you
additionally have broadcasts and vector pack/unpack code. It also
should be transformed into a scalar masks manipulations somehow.

Also according to vector ABI integer mask should be used for mask
operand in case of masked vector call.

Current implementation of masked loads, masked stores and bool
patterns in vectorizer just reflect SSE4 and AVX. Can (and should) we
really call it a canonical representation for all targets? Using
scalar masks everywhere should probably cause the same conversion
problem for SSE I listed above though.

Talking about a canonical representation, shouldn't we use some
special masks representation and not mixing it with integer and vector
of integers then? Only in this case target would be able to
efficiently expand it into a corresponding rtl.

>
> Indeed.  I don't remember my exact comments during the talk at the Cauldron
> but the scheme used there was sth like
>
>   mask = GEN_MASK <vec1 < vec2>;
>   b = a + 1;
>   x = VEC_COND <mask, a, b>
>
> to model conditional execution already at the if-conversion stage (for
> all scalar
> stmts made executed unconditionally rather than just the PHI results).  I was
> asking for the condition to be removed from GEN_MASK (patch 1 has this
> fixed now AFAICS).  And I also asked why it was necessary to do this "lowering"
> here and not simply do
>
> mask = vec1 < vec2;  // regular vector mask!
> b = a + 1;
> x = VEC_COND <mask, a, b>
>
> and have the lowering to an integer mask done later.  You'd still
> change if-conversion
> to predicate _all_ statements, not just those with side-effects.  So I
> think there
> still needs to be a new target hook to trigger this, similar to how
> the target capabilities
> trigger the masked load/store path in if-conversion.

I think you mix scalar masks with a loop reminders optimization. I'm
not going to do other changes in if-conversion other then in this
posted patch to support scalar masks. Statements predication will be
used to vectorize loop reminders. And not all of them, only reduction
definitions. This will be independent from scalar masks and will work
for vector masks also. And these changes are not going to be in
if-conversion.

Thanks,
Ilya

>
> But I don't like changing our IL so much as to allow 'integer' masks everywhere.
>
> Richard.
>
>
>>
>> Jeff

Richard Biener Aug. 21, 2015, 11 a.m. UTC | #4

On Fri, Aug 21, 2015 at 12:49 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> 2015-08-21 11:15 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Thu, Aug 20, 2015 at 8:46 PM, Jeff Law <law@redhat.com> wrote:
>>> On 08/17/2015 10:25 AM, Ilya Enkovich wrote:
>>>>
>>>> Hi,
>>>>
>>>> This patch intoriduces a new vectorizer hook use_scalar_mask_p which
>>>> affects code generated by if-conversion pass (and affects patterns in later
>>>> patches).
>>>>
>>>> Thanks,
>>>> Ilya
>>>> --
>>>> 2015-08-17  Ilya Enkovich  <enkovich.gnu@gmail.com>
>>>>
>>>>         * doc/tm.texi (TARGET_VECTORIZE_USE_SCALAR_MASK_P): New.
>>>>         * doc/tm.texi.in: Regenerated.
>>>>         * target.def (use_scalar_mask_p): New.
>>>>         * tree-if-conv.c: Include target.h.
>>>>         (predicate_mem_writes): Don't convert boolean predicates into
>>>>         integer when scalar masks are used.
>>>
>>> Presumably this is how you prevent the generation of scalar masks rather
>>> than boolean masks on targets which don't have the former?
>>>
>>> I hate to ask, but how painful would it be to go from a boolean to integer
>>> masks later such as during expansion?  Or vice-versa.
>>>
>>> WIthout a deep knowledge of the entire patchkit, it feels like we're
>>> introducing target stuff in a place where we don't want it and that we'd be
>>> better served with a canonical representation through gimple, then dropping
>>> into something more target specific during gimple->rtl expansion.
>
> I want a work with bitmasks to be expressed in a natural way using
> regular integer operations. Currently all masks manipulations are
> emulated via vector statements (mostly using a bunch of vec_cond). For
> complex predicates it may be nontrivial to transform it back to scalar
> masks and get an efficient code. Also the same vector may be used as
> both a mask and an integer vector. Things become more complex if you
> additionally have broadcasts and vector pack/unpack code. It also
> should be transformed into a scalar masks manipulations somehow.

Hmm, I don't see how vector masks are more difficult to operate with.

> Also according to vector ABI integer mask should be used for mask
> operand in case of masked vector call.

What ABI?  The function signature of the intrinsics?  How would that
come into play here?

> Current implementation of masked loads, masked stores and bool
> patterns in vectorizer just reflect SSE4 and AVX. Can (and should) we
> really call it a canonical representation for all targets?

No idea - we'll revisit when another targets adds a similar capability.

> Using scalar masks everywhere should probably cause the same conversion
> problem for SSE I listed above though.
>
> Talking about a canonical representation, shouldn't we use some
> special masks representation and not mixing it with integer and vector
> of integers then? Only in this case target would be able to
> efficiently expand it into a corresponding rtl.

That was my idea of vector<bool> ... but I didn't explore it and see where
it will cause issues.

Fact is GCC already copes with vector masks generated by vector compares
just fine everywhere and I'd rather leave it as that.

>>
>> Indeed.  I don't remember my exact comments during the talk at the Cauldron
>> but the scheme used there was sth like
>>
>>   mask = GEN_MASK <vec1 < vec2>;
>>   b = a + 1;
>>   x = VEC_COND <mask, a, b>
>>
>> to model conditional execution already at the if-conversion stage (for
>> all scalar
>> stmts made executed unconditionally rather than just the PHI results).  I was
>> asking for the condition to be removed from GEN_MASK (patch 1 has this
>> fixed now AFAICS).  And I also asked why it was necessary to do this "lowering"
>> here and not simply do
>>
>> mask = vec1 < vec2;  // regular vector mask!
>> b = a + 1;
>> x = VEC_COND <mask, a, b>
>>
>> and have the lowering to an integer mask done later.  You'd still
>> change if-conversion
>> to predicate _all_ statements, not just those with side-effects.  So I
>> think there
>> still needs to be a new target hook to trigger this, similar to how
>> the target capabilities
>> trigger the masked load/store path in if-conversion.
>
> I think you mix scalar masks with a loop reminders optimization. I'm
> not going to do other changes in if-conversion other then in this
> posted patch to support scalar masks. Statements predication will be
> used to vectorize loop reminders. And not all of them, only reduction
> definitions. This will be independent from scalar masks and will work
> for vector masks also. And these changes are not going to be in
> if-conversion.

Maybe I misremember.  Didn't look at the patch in detail yet.

Richard.

>
> Thanks,
> Ilya
>
>>
>> But I don't like changing our IL so much as to allow 'integer' masks everywhere.
>>
>> Richard.
>>
>>
>>>
>>> Jeff

Ilya Enkovich Aug. 21, 2015, 12:17 p.m. UTC | #5

2015-08-21 14:00 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Fri, Aug 21, 2015 at 12:49 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>> 2015-08-21 11:15 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Thu, Aug 20, 2015 at 8:46 PM, Jeff Law <law@redhat.com> wrote:
>>>> On 08/17/2015 10:25 AM, Ilya Enkovich wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> This patch intoriduces a new vectorizer hook use_scalar_mask_p which
>>>>> affects code generated by if-conversion pass (and affects patterns in later
>>>>> patches).
>>>>>
>>>>> Thanks,
>>>>> Ilya
>>>>> --
>>>>> 2015-08-17  Ilya Enkovich  <enkovich.gnu@gmail.com>
>>>>>
>>>>>         * doc/tm.texi (TARGET_VECTORIZE_USE_SCALAR_MASK_P): New.
>>>>>         * doc/tm.texi.in: Regenerated.
>>>>>         * target.def (use_scalar_mask_p): New.
>>>>>         * tree-if-conv.c: Include target.h.
>>>>>         (predicate_mem_writes): Don't convert boolean predicates into
>>>>>         integer when scalar masks are used.
>>>>
>>>> Presumably this is how you prevent the generation of scalar masks rather
>>>> than boolean masks on targets which don't have the former?
>>>>
>>>> I hate to ask, but how painful would it be to go from a boolean to integer
>>>> masks later such as during expansion?  Or vice-versa.
>>>>
>>>> WIthout a deep knowledge of the entire patchkit, it feels like we're
>>>> introducing target stuff in a place where we don't want it and that we'd be
>>>> better served with a canonical representation through gimple, then dropping
>>>> into something more target specific during gimple->rtl expansion.
>>
>> I want a work with bitmasks to be expressed in a natural way using
>> regular integer operations. Currently all masks manipulations are
>> emulated via vector statements (mostly using a bunch of vec_cond). For
>> complex predicates it may be nontrivial to transform it back to scalar
>> masks and get an efficient code. Also the same vector may be used as
>> both a mask and an integer vector. Things become more complex if you
>> additionally have broadcasts and vector pack/unpack code. It also
>> should be transformed into a scalar masks manipulations somehow.
>
> Hmm, I don't see how vector masks are more difficult to operate with.

There are just no instructions for that but you have to pretend you
have to get code vectorized.

>
>> Also according to vector ABI integer mask should be used for mask
>> operand in case of masked vector call.
>
> What ABI?  The function signature of the intrinsics?  How would that
> come into play here?

Not intrinsics. I mean OpenMP vector functions which require integer
arg for a mask in case of 512-bit vector.

>
>> Current implementation of masked loads, masked stores and bool
>> patterns in vectorizer just reflect SSE4 and AVX. Can (and should) we
>> really call it a canonical representation for all targets?
>
> No idea - we'll revisit when another targets adds a similar capability.

AVX-512 is such target. Current representation forces multiple scalar
mask -> vector mask and back transformations which are artificially
introduced by current bool patterns and are hard to optimize out.

>
>> Using scalar masks everywhere should probably cause the same conversion
>> problem for SSE I listed above though.
>>
>> Talking about a canonical representation, shouldn't we use some
>> special masks representation and not mixing it with integer and vector
>> of integers then? Only in this case target would be able to
>> efficiently expand it into a corresponding rtl.
>
> That was my idea of vector<bool> ... but I didn't explore it and see where
> it will cause issues.
>
> Fact is GCC already copes with vector masks generated by vector compares
> just fine everywhere and I'd rather leave it as that.

Nope. Currently vector mask is obtained from a vec_cond <A op B, {0 ..
0}, {-1 .. -1}>. AND and IOR on bools are also expressed via
additional vec_cond. I don't think vectorizer ever generates vector
comparison.

And I wouldn't say it's fine 'everywhere' because there is a single
target utilizing them. Masked loads and stored for AVX-512 just don't
work now. And if we extend existing MASK_LOAD and MASK_STORE optabs to
512-bit vector then we get an ugly inefficient code. The question is
where to fight with this inefficiency: in RTL or in GIMPLE. I want to
fight with it where it appears, i.e. in GIMPLE by preventing bool ->
int conversions applied everywhere even if target doesn't need it.

If we don't want to support both types of masks in GIMPLE then it's
more reasonable to make bool -> int conversion in expand for targets
requiring it, rather than do it for everyone and then leave it to
target to transform it back and try to get rid of all those redundant
transformations. I'd give vector<bool> a chance to become a canonical
mask representation for that.


Thanks,
Ilya

>
>>>
>>> Indeed.  I don't remember my exact comments during the talk at the Cauldron
>>> but the scheme used there was sth like
>>>
>>>   mask = GEN_MASK <vec1 < vec2>;
>>>   b = a + 1;
>>>   x = VEC_COND <mask, a, b>
>>>
>>> to model conditional execution already at the if-conversion stage (for
>>> all scalar
>>> stmts made executed unconditionally rather than just the PHI results).  I was
>>> asking for the condition to be removed from GEN_MASK (patch 1 has this
>>> fixed now AFAICS).  And I also asked why it was necessary to do this "lowering"
>>> here and not simply do
>>>
>>> mask = vec1 < vec2;  // regular vector mask!
>>> b = a + 1;
>>> x = VEC_COND <mask, a, b>
>>>
>>> and have the lowering to an integer mask done later.  You'd still
>>> change if-conversion
>>> to predicate _all_ statements, not just those with side-effects.  So I
>>> think there
>>> still needs to be a new target hook to trigger this, similar to how
>>> the target capabilities
>>> trigger the masked load/store path in if-conversion.
>>
>> I think you mix scalar masks with a loop reminders optimization. I'm
>> not going to do other changes in if-conversion other then in this
>> posted patch to support scalar masks. Statements predication will be
>> used to vectorize loop reminders. And not all of them, only reduction
>> definitions. This will be independent from scalar masks and will work
>> for vector masks also. And these changes are not going to be in
>> if-conversion.
>
> Maybe I misremember.  Didn't look at the patch in detail yet.
>
> Richard.
>
>>
>> Thanks,
>> Ilya
>>
>>>
>>> But I don't like changing our IL so much as to allow 'integer' masks everywhere.
>>>
>>> Richard.
>>>
>>>
>>>>
>>>> Jeff

Jeff Law Aug. 21, 2015, 3:44 p.m. UTC | #6

On 08/21/2015 02:15 AM, Richard Biener wrote:
>
> Indeed.  I don't remember my exact comments during the talk at the
> Cauldron but the scheme used there was sth like
>
> mask = GEN_MASK <vec1 < vec2>; b = a + 1; x = VEC_COND <mask, a, b>
>
> to model conditional execution already at the if-conversion stage
> (for all scalar stmts made executed unconditionally rather than just
> the PHI results).  I was asking for the condition to be removed from
> GEN_MASK (patch 1 has this fixed now AFAICS).  And I also asked why
> it was necessary to do this "lowering" here and not simply do
>
> mask = vec1 < vec2;  // regular vector mask! b = a + 1; x = VEC_COND
> <mask, a, b>
>
> and have the lowering to an integer mask done later.  You'd still
> change if-conversion to predicate _all_ statements, not just those
> with side-effects.  So I think there still needs to be a new target
> hook to trigger this, similar to how the target capabilities trigger
> the masked load/store path in if-conversion.
>
> But I don't like changing our IL so much as to allow 'integer' masks
> everywhere.
Right.  I'd be *much* less concerned with a hook that tells the 
if-converter to predicate everything and for the expander to do the 
lowering.

We'd still have the same representation through gimple, what changes is 
how many statements have predication and the lowering from gimple to RTL.

Contrast to a introducing a totally new way to represent predication in 
if-conversion and beyond, which just seems like a nightmare.

jeff

Jeff Law Aug. 25, 2015, 9:26 p.m. UTC | #7

On 08/21/2015 06:17 AM, Ilya Enkovich wrote:
>>
>> Hmm, I don't see how vector masks are more difficult to operate with.
>
> There are just no instructions for that but you have to pretend you
> have to get code vectorized.
>
>>
>>> Also according to vector ABI integer mask should be used for mask
>>> operand in case of masked vector call.
>>
>> What ABI?  The function signature of the intrinsics?  How would that
>> come into play here?
>
> Not intrinsics. I mean OpenMP vector functions which require integer
> arg for a mask in case of 512-bit vector.
That's what I assumed -- you can pass in a mask as an argument and it's 
supposed to be a simple integer, right?


>
>>
>>> Current implementation of masked loads, masked stores and bool
>>> patterns in vectorizer just reflect SSE4 and AVX. Can (and should) we
>>> really call it a canonical representation for all targets?
>>
>> No idea - we'll revisit when another targets adds a similar capability.
>
> AVX-512 is such target. Current representation forces multiple scalar
> mask -> vector mask and back transformations which are artificially
> introduced by current bool patterns and are hard to optimize out.
I'm a bit surprised they're so prevalent and hard to optimize away. 
ISTM PRE ought to handle this kind of thing with relative ease.


>> Fact is GCC already copes with vector masks generated by vector compares
>> just fine everywhere and I'd rather leave it as that.
>
> Nope. Currently vector mask is obtained from a vec_cond <A op B, {0 ..
> 0}, {-1 .. -1}>. AND and IOR on bools are also expressed via
> additional vec_cond. I don't think vectorizer ever generates vector
> comparison.
>
> And I wouldn't say it's fine 'everywhere' because there is a single
> target utilizing them. Masked loads and stored for AVX-512 just don't
> work now. And if we extend existing MASK_LOAD and MASK_STORE optabs to
> 512-bit vector then we get an ugly inefficient code. The question is
> where to fight with this inefficiency: in RTL or in GIMPLE. I want to
> fight with it where it appears, i.e. in GIMPLE by preventing bool ->
> int conversions applied everywhere even if target doesn't need it.
You should expect pushback anytime target dependencies are added to 
gimple, even if it's stuff in the vectorizer, which is infested with 
target dependencies.

>
> If we don't want to support both types of masks in GIMPLE then it's
> more reasonable to make bool -> int conversion in expand for targets
> requiring it, rather than do it for everyone and then leave it to
> target to transform it back and try to get rid of all those redundant
> transformations. I'd give vector<bool> a chance to become a canonical
> mask representation for that.
Might be worth some experimentation.

Jeff

Jeff Law Aug. 25, 2015, 9:42 p.m. UTC | #8

On 08/21/2015 04:49 AM, Ilya Enkovich wrote:
>
> I want a work with bitmasks to be expressed in a natural way using
> regular integer operations. Currently all masks manipulations are
> emulated via vector statements (mostly using a bunch of vec_cond). For
> complex predicates it may be nontrivial to transform it back to scalar
> masks and get an efficient code. Also the same vector may be used as
> both a mask and an integer vector. Things become more complex if you
> additionally have broadcasts and vector pack/unpack code. It also
> should be transformed into a scalar masks manipulations somehow.
Or why not model the conversion at the gimple level using a 
CONVERT_EXPR?   In fact, the more I think about it, that seems to make 
more sense to me.

We pick a canonical form for the mask, whatever it may be.  We use that 
canonical form and model conversions between it and the other form via 
CONVERT_EXPR.  We then let DOM/PRE find/eliminate the redundant 
conversions.  If it's not up to the task, we should really look into why 
and resolve.

Yes, that does mean we have two forms which I'm not terribly happy about 
and it means some target dependencies on what the masked vector 
operation looks like (ie, does it accept a simple integer or vector 
mask), but I'm starting to wonder if, as distasteful as I find it, it's 
the right thing to do.

>>
>> But I don't like changing our IL so much as to allow 'integer' masks everywhere.
I'm warming up to that idea...

jeff

Ilya Enkovich Aug. 26, 2015, 10:50 a.m. UTC | #9

2015-08-26 0:26 GMT+03:00 Jeff Law <law@redhat.com>:
> On 08/21/2015 06:17 AM, Ilya Enkovich wrote:
>>>
>>>
>>> Hmm, I don't see how vector masks are more difficult to operate with.
>>
>>
>> There are just no instructions for that but you have to pretend you
>> have to get code vectorized.
>>
>>>
>>>> Also according to vector ABI integer mask should be used for mask
>>>> operand in case of masked vector call.
>>>
>>>
>>> What ABI?  The function signature of the intrinsics?  How would that
>>> come into play here?
>>
>>
>> Not intrinsics. I mean OpenMP vector functions which require integer
>> arg for a mask in case of 512-bit vector.
>
> That's what I assumed -- you can pass in a mask as an argument and it's
> supposed to be a simple integer, right?

Depending on target ABI requires either vector mask or a simple integer value.

>
>
>>
>>>
>>>> Current implementation of masked loads, masked stores and bool
>>>> patterns in vectorizer just reflect SSE4 and AVX. Can (and should) we
>>>> really call it a canonical representation for all targets?
>>>
>>>
>>> No idea - we'll revisit when another targets adds a similar capability.
>>
>>
>> AVX-512 is such target. Current representation forces multiple scalar
>> mask -> vector mask and back transformations which are artificially
>> introduced by current bool patterns and are hard to optimize out.
>
> I'm a bit surprised they're so prevalent and hard to optimize away. ISTM PRE
> ought to handle this kind of thing with relative ease.

Most of vector comparisons are UNSPEC. And I doubt PRE may actually
help much even if get rid of UNSPEC somehow. Is there really a
redundancy in:

if ((v1 cmp v2) && (v3 cmp v4))
  load

v1 cmp v2 -> mask1
select mask1 vec_cst_-1 vec_cst_0 -> vec_mask1
v3 cmp v4 -> mask2
select mask2 vec_mask1 vec_cst_0 -> vec_mask2
vec_mask2 NE vec_cst_0 -> mask3
load by mask3

It looks to me more like a i386 specific instruction selection problem.

Ilya

>
>
>>> Fact is GCC already copes with vector masks generated by vector compares
>>> just fine everywhere and I'd rather leave it as that.
>>
>>
>> Nope. Currently vector mask is obtained from a vec_cond <A op B, {0 ..
>> 0}, {-1 .. -1}>. AND and IOR on bools are also expressed via
>> additional vec_cond. I don't think vectorizer ever generates vector
>> comparison.
>>
>> And I wouldn't say it's fine 'everywhere' because there is a single
>> target utilizing them. Masked loads and stored for AVX-512 just don't
>> work now. And if we extend existing MASK_LOAD and MASK_STORE optabs to
>> 512-bit vector then we get an ugly inefficient code. The question is
>> where to fight with this inefficiency: in RTL or in GIMPLE. I want to
>> fight with it where it appears, i.e. in GIMPLE by preventing bool ->
>> int conversions applied everywhere even if target doesn't need it.
>
> You should expect pushback anytime target dependencies are added to gimple,
> even if it's stuff in the vectorizer, which is infested with target
> dependencies.
>
>>
>> If we don't want to support both types of masks in GIMPLE then it's
>> more reasonable to make bool -> int conversion in expand for targets
>> requiring it, rather than do it for everyone and then leave it to
>> target to transform it back and try to get rid of all those redundant
>> transformations. I'd give vector<bool> a chance to become a canonical
>> mask representation for that.
>
> Might be worth some experimentation.
>
> Jeff

Ilya Enkovich Aug. 26, 2015, 11:13 a.m. UTC | #10

2015-08-26 0:42 GMT+03:00 Jeff Law <law@redhat.com>:
> On 08/21/2015 04:49 AM, Ilya Enkovich wrote:
>>
>>
>> I want a work with bitmasks to be expressed in a natural way using
>> regular integer operations. Currently all masks manipulations are
>> emulated via vector statements (mostly using a bunch of vec_cond). For
>> complex predicates it may be nontrivial to transform it back to scalar
>> masks and get an efficient code. Also the same vector may be used as
>> both a mask and an integer vector. Things become more complex if you
>> additionally have broadcasts and vector pack/unpack code. It also
>> should be transformed into a scalar masks manipulations somehow.
>
> Or why not model the conversion at the gimple level using a CONVERT_EXPR?
> In fact, the more I think about it, that seems to make more sense to me.
>
> We pick a canonical form for the mask, whatever it may be.  We use that
> canonical form and model conversions between it and the other form via
> CONVERT_EXPR.  We then let DOM/PRE find/eliminate the redundant conversions.
> If it's not up to the task, we should really look into why and resolve.
>
> Yes, that does mean we have two forms which I'm not terribly happy about and
> it means some target dependencies on what the masked vector operation looks
> like (ie, does it accept a simple integer or vector mask), but I'm starting
> to wonder if, as distasteful as I find it, it's the right thing to do.

If we have some special representation for masks in GIMPLE then we
might not need any conversions. We could ask a target to define a MODE
for this type and use it directly everywhere: directly compare into
it, use it directly for masked loads and stores, AND, IOR, EQ etc. If
that type is reserved for masks usage then you previous suggestion to
transform masks into target specific form at GIMPLE->RTL phase should
work fine. This would allow to support only a single masks
representation in GIMPLE.

Thanks,
Ilya

>
>>>
>>> But I don't like changing our IL so much as to allow 'integer' masks
>>> everywhere.
>
> I'm warming up to that idea...
>
> jeff
>

Richard Biener Aug. 26, 2015, 1:02 p.m. UTC | #11

On Fri, Aug 21, 2015 at 2:17 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> 2015-08-21 14:00 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Fri, Aug 21, 2015 at 12:49 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>> 2015-08-21 11:15 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Thu, Aug 20, 2015 at 8:46 PM, Jeff Law <law@redhat.com> wrote:
>>>>> On 08/17/2015 10:25 AM, Ilya Enkovich wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> This patch intoriduces a new vectorizer hook use_scalar_mask_p which
>>>>>> affects code generated by if-conversion pass (and affects patterns in later
>>>>>> patches).
>>>>>>
>>>>>> Thanks,
>>>>>> Ilya
>>>>>> --
>>>>>> 2015-08-17  Ilya Enkovich  <enkovich.gnu@gmail.com>
>>>>>>
>>>>>>         * doc/tm.texi (TARGET_VECTORIZE_USE_SCALAR_MASK_P): New.
>>>>>>         * doc/tm.texi.in: Regenerated.
>>>>>>         * target.def (use_scalar_mask_p): New.
>>>>>>         * tree-if-conv.c: Include target.h.
>>>>>>         (predicate_mem_writes): Don't convert boolean predicates into
>>>>>>         integer when scalar masks are used.
>>>>>
>>>>> Presumably this is how you prevent the generation of scalar masks rather
>>>>> than boolean masks on targets which don't have the former?
>>>>>
>>>>> I hate to ask, but how painful would it be to go from a boolean to integer
>>>>> masks later such as during expansion?  Or vice-versa.
>>>>>
>>>>> WIthout a deep knowledge of the entire patchkit, it feels like we're
>>>>> introducing target stuff in a place where we don't want it and that we'd be
>>>>> better served with a canonical representation through gimple, then dropping
>>>>> into something more target specific during gimple->rtl expansion.
>>>
>>> I want a work with bitmasks to be expressed in a natural way using
>>> regular integer operations. Currently all masks manipulations are
>>> emulated via vector statements (mostly using a bunch of vec_cond). For
>>> complex predicates it may be nontrivial to transform it back to scalar
>>> masks and get an efficient code. Also the same vector may be used as
>>> both a mask and an integer vector. Things become more complex if you
>>> additionally have broadcasts and vector pack/unpack code. It also
>>> should be transformed into a scalar masks manipulations somehow.
>>
>> Hmm, I don't see how vector masks are more difficult to operate with.
>
> There are just no instructions for that but you have to pretend you
> have to get code vectorized.

Huh?  Bitwise ops should be readily available.

>>
>>> Also according to vector ABI integer mask should be used for mask
>>> operand in case of masked vector call.
>>
>> What ABI?  The function signature of the intrinsics?  How would that
>> come into play here?
>
> Not intrinsics. I mean OpenMP vector functions which require integer
> arg for a mask in case of 512-bit vector.

How do you declare those?

>>
>>> Current implementation of masked loads, masked stores and bool
>>> patterns in vectorizer just reflect SSE4 and AVX. Can (and should) we
>>> really call it a canonical representation for all targets?
>>
>> No idea - we'll revisit when another targets adds a similar capability.
>
> AVX-512 is such target. Current representation forces multiple scalar
> mask -> vector mask and back transformations which are artificially
> introduced by current bool patterns and are hard to optimize out.

I dislike the bool patterns anyway and we should try to remove those
and make the vectorizer handle them in other ways (they have single-use
issues anyway).  I don't remember exactly what caused us to add them
but one reason was there wasn't a vector type for 'bool' (but I don't see how
it should be necessary to ask "get me a vector type for 'bool'").

>>
>>> Using scalar masks everywhere should probably cause the same conversion
>>> problem for SSE I listed above though.
>>>
>>> Talking about a canonical representation, shouldn't we use some
>>> special masks representation and not mixing it with integer and vector
>>> of integers then? Only in this case target would be able to
>>> efficiently expand it into a corresponding rtl.
>>
>> That was my idea of vector<bool> ... but I didn't explore it and see where
>> it will cause issues.
>>
>> Fact is GCC already copes with vector masks generated by vector compares
>> just fine everywhere and I'd rather leave it as that.
>
> Nope. Currently vector mask is obtained from a vec_cond <A op B, {0 ..
> 0}, {-1 .. -1}>. AND and IOR on bools are also expressed via
> additional vec_cond. I don't think vectorizer ever generates vector
> comparison.

Ok, well that's an implementation detail then.  Are you sure about AND and IOR?
The comment above vect_recog_bool_pattern says

        Assuming size of TYPE is the same as size of all comparisons
        (otherwise some casts would be added where needed), the above
        sequence we create related pattern stmts:
        S1'  a_T = x1 CMP1 y1 ? 1 : 0;
        S3'  c_T = x2 CMP2 y2 ? a_T : 0;
        S4'  d_T = x3 CMP3 y3 ? 1 : 0;
        S5'  e_T = c_T | d_T;
        S6'  f_T = e_T;

thus has vector mask |

> And I wouldn't say it's fine 'everywhere' because there is a single
> target utilizing them. Masked loads and stored for AVX-512 just don't
> work now. And if we extend existing MASK_LOAD and MASK_STORE optabs to
> 512-bit vector then we get an ugly inefficient code. The question is
> where to fight with this inefficiency: in RTL or in GIMPLE. I want to
> fight with it where it appears, i.e. in GIMPLE by preventing bool ->
> int conversions applied everywhere even if target doesn't need it.
>
> If we don't want to support both types of masks in GIMPLE then it's
> more reasonable to make bool -> int conversion in expand for targets
> requiring it, rather than do it for everyone and then leave it to
> target to transform it back and try to get rid of all those redundant
> transformations. I'd give vector<bool> a chance to become a canonical
> mask representation for that.

Well, you are missing the case of

   bool b = a < b;
   int x = (int)b;

where the bool is used as integer (and thus an integer mask would have to be
"expanded").  When the bool is a mask in itself the integer use is either free
or a matter of a widening/shortening operation.

Richard.

>
> Thanks,
> Ilya
>
>>
>>>>
>>>> Indeed.  I don't remember my exact comments during the talk at the Cauldron
>>>> but the scheme used there was sth like
>>>>
>>>>   mask = GEN_MASK <vec1 < vec2>;
>>>>   b = a + 1;
>>>>   x = VEC_COND <mask, a, b>
>>>>
>>>> to model conditional execution already at the if-conversion stage (for
>>>> all scalar
>>>> stmts made executed unconditionally rather than just the PHI results).  I was
>>>> asking for the condition to be removed from GEN_MASK (patch 1 has this
>>>> fixed now AFAICS).  And I also asked why it was necessary to do this "lowering"
>>>> here and not simply do
>>>>
>>>> mask = vec1 < vec2;  // regular vector mask!
>>>> b = a + 1;
>>>> x = VEC_COND <mask, a, b>
>>>>
>>>> and have the lowering to an integer mask done later.  You'd still
>>>> change if-conversion
>>>> to predicate _all_ statements, not just those with side-effects.  So I
>>>> think there
>>>> still needs to be a new target hook to trigger this, similar to how
>>>> the target capabilities
>>>> trigger the masked load/store path in if-conversion.
>>>
>>> I think you mix scalar masks with a loop reminders optimization. I'm
>>> not going to do other changes in if-conversion other then in this
>>> posted patch to support scalar masks. Statements predication will be
>>> used to vectorize loop reminders. And not all of them, only reduction
>>> definitions. This will be independent from scalar masks and will work
>>> for vector masks also. And these changes are not going to be in
>>> if-conversion.
>>
>> Maybe I misremember.  Didn't look at the patch in detail yet.
>>
>> Richard.
>>
>>>
>>> Thanks,
>>> Ilya
>>>
>>>>
>>>> But I don't like changing our IL so much as to allow 'integer' masks everywhere.
>>>>
>>>> Richard.
>>>>
>>>>
>>>>>
>>>>> Jeff

Richard Biener Aug. 26, 2015, 1:08 p.m. UTC | #12

On Wed, Aug 26, 2015 at 1:13 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> 2015-08-26 0:42 GMT+03:00 Jeff Law <law@redhat.com>:
>> On 08/21/2015 04:49 AM, Ilya Enkovich wrote:
>>>
>>>
>>> I want a work with bitmasks to be expressed in a natural way using
>>> regular integer operations. Currently all masks manipulations are
>>> emulated via vector statements (mostly using a bunch of vec_cond). For
>>> complex predicates it may be nontrivial to transform it back to scalar
>>> masks and get an efficient code. Also the same vector may be used as
>>> both a mask and an integer vector. Things become more complex if you
>>> additionally have broadcasts and vector pack/unpack code. It also
>>> should be transformed into a scalar masks manipulations somehow.
>>
>> Or why not model the conversion at the gimple level using a CONVERT_EXPR?
>> In fact, the more I think about it, that seems to make more sense to me.
>>
>> We pick a canonical form for the mask, whatever it may be.  We use that
>> canonical form and model conversions between it and the other form via
>> CONVERT_EXPR.  We then let DOM/PRE find/eliminate the redundant conversions.
>> If it's not up to the task, we should really look into why and resolve.
>>
>> Yes, that does mean we have two forms which I'm not terribly happy about and
>> it means some target dependencies on what the masked vector operation looks
>> like (ie, does it accept a simple integer or vector mask), but I'm starting
>> to wonder if, as distasteful as I find it, it's the right thing to do.
>
> If we have some special representation for masks in GIMPLE then we
> might not need any conversions. We could ask a target to define a MODE
> for this type and use it directly everywhere: directly compare into
> it, use it directly for masked loads and stores, AND, IOR, EQ etc. If
> that type is reserved for masks usage then you previous suggestion to
> transform masks into target specific form at GIMPLE->RTL phase should
> work fine. This would allow to support only a single masks
> representation in GIMPLE.

But we can already do all this with the integer vector masks we have.
If you think that the vectorizer generated

  mask = VEC_COND <v1 < v2 ? { -1,...} : { 0, ...} >

is ugly then we can remove that implementation detail and use

  mask = v1 < v2;

directly.  Note that the VEC_COND form was invented to avoid
the need to touch RTL expansion for vector compares (IIRC).
Or it pre-dated specifying what compares generate on GIMPLE.

Richard.

> Thanks,
> Ilya
>
>>
>>>>
>>>> But I don't like changing our IL so much as to allow 'integer' masks
>>>> everywhere.
>>
>> I'm warming up to that idea...
>>
>> jeff
>>

Jakub Jelinek Aug. 26, 2015, 1:16 p.m. UTC | #13

On Wed, Aug 26, 2015 at 03:02:02PM +0200, Richard Biener wrote:
> > AVX-512 is such target. Current representation forces multiple scalar
> > mask -> vector mask and back transformations which are artificially
> > introduced by current bool patterns and are hard to optimize out.
> 
> I dislike the bool patterns anyway and we should try to remove those
> and make the vectorizer handle them in other ways (they have single-use
> issues anyway).  I don't remember exactly what caused us to add them
> but one reason was there wasn't a vector type for 'bool' (but I don't see how
> it should be necessary to ask "get me a vector type for 'bool'").

That was just one of the reasons.  The other reason is that even if we would
choose some vector of integer type as vector of bool, the question is what
type.  E.g. if you use vector of chars, you almost always get terrible
vectorized code, except for the AVX-512 you really want an integral type
that has the size of the types you are comparing.
And I'd say this is very much related to the need to do some type promotions
or demotions on the scalar code meant to be vectorized (but only the copy
for vectorizations), so that we have as few different scalar type sizes in
the loop as possible, because widening / narrowing vector conversions aren't
exactly cheap and a single char operation in a loop otherwise full of long
long operations might unnecessarily turn a vf=2 (or 4 or 8) loop into
vf=16 (or 32 or 64), increasing it a lot.

	Jakub

Richard Biener Aug. 26, 2015, 1:21 p.m. UTC | #14

On Wed, Aug 26, 2015 at 3:16 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Wed, Aug 26, 2015 at 03:02:02PM +0200, Richard Biener wrote:
>> > AVX-512 is such target. Current representation forces multiple scalar
>> > mask -> vector mask and back transformations which are artificially
>> > introduced by current bool patterns and are hard to optimize out.
>>
>> I dislike the bool patterns anyway and we should try to remove those
>> and make the vectorizer handle them in other ways (they have single-use
>> issues anyway).  I don't remember exactly what caused us to add them
>> but one reason was there wasn't a vector type for 'bool' (but I don't see how
>> it should be necessary to ask "get me a vector type for 'bool'").
>
> That was just one of the reasons.  The other reason is that even if we would
> choose some vector of integer type as vector of bool, the question is what
> type.  E.g. if you use vector of chars, you almost always get terrible
> vectorized code, except for the AVX-512 you really want an integral type
> that has the size of the types you are comparing.

Yeah, but the way STMT_VINFO_VECTYPE is computed is that we always
first compute the vector type for the comparison itself (which is "fixed") and
thus we can compute the vector type of any bitwise op on it as well.

> And I'd say this is very much related to the need to do some type promotions
> or demotions on the scalar code meant to be vectorized (but only the copy
> for vectorizations), so that we have as few different scalar type sizes in
> the loop as possible, because widening / narrowing vector conversions aren't
> exactly cheap and a single char operation in a loop otherwise full of long
> long operations might unnecessarily turn a vf=2 (or 4 or 8) loop into
> vf=16 (or 32 or 64), increasing it a lot.

That's true but unrelated.  With conditions this gets to optimizing where
the promotion/demotion happens (which depends on how the result is used).

The current pattern approach has the issue that it doesn't work for multiple
uses in the condition bitops which is bad as well.

But it couldn't have been _only_ the vector type computation that made us
invent the patterns, no?  Do you remember anything else?

Thanks,
Richard.


>
>         Jakub

Jakub Jelinek Aug. 26, 2015, 1:35 p.m. UTC | #15

On Wed, Aug 26, 2015 at 03:21:52PM +0200, Richard Biener wrote:
> On Wed, Aug 26, 2015 at 3:16 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> > On Wed, Aug 26, 2015 at 03:02:02PM +0200, Richard Biener wrote:
> >> > AVX-512 is such target. Current representation forces multiple scalar
> >> > mask -> vector mask and back transformations which are artificially
> >> > introduced by current bool patterns and are hard to optimize out.
> >>
> >> I dislike the bool patterns anyway and we should try to remove those
> >> and make the vectorizer handle them in other ways (they have single-use
> >> issues anyway).  I don't remember exactly what caused us to add them
> >> but one reason was there wasn't a vector type for 'bool' (but I don't see how
> >> it should be necessary to ask "get me a vector type for 'bool'").
> >
> > That was just one of the reasons.  The other reason is that even if we would
> > choose some vector of integer type as vector of bool, the question is what
> > type.  E.g. if you use vector of chars, you almost always get terrible
> > vectorized code, except for the AVX-512 you really want an integral type
> > that has the size of the types you are comparing.
> 
> Yeah, but the way STMT_VINFO_VECTYPE is computed is that we always
> first compute the vector type for the comparison itself (which is "fixed") and
> thus we can compute the vector type of any bitwise op on it as well.

Sure, but if you then immediately vector narrow it to a V*QI vector because
it is stored originally into a bool/_Bool variable, and then again when it
is used in say a COND_EXPR widen it again, you get really poor code.
So, what the bool pattern code does is kind of poor man's type
promotion/demotion pass for bool only, at least for the common cases.

PR50596 has been the primary reason to introduce the bool patterns.
If there is a better type promotion/demotion pass on a copy of the loop,
sure, we can get rid of it (but figure out also what to do for SLP).

	Jakub

Richard Biener Aug. 26, 2015, 2:34 p.m. UTC | #16

On Wed, Aug 26, 2015 at 3:35 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Wed, Aug 26, 2015 at 03:21:52PM +0200, Richard Biener wrote:
>> On Wed, Aug 26, 2015 at 3:16 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>> > On Wed, Aug 26, 2015 at 03:02:02PM +0200, Richard Biener wrote:
>> >> > AVX-512 is such target. Current representation forces multiple scalar
>> >> > mask -> vector mask and back transformations which are artificially
>> >> > introduced by current bool patterns and are hard to optimize out.
>> >>
>> >> I dislike the bool patterns anyway and we should try to remove those
>> >> and make the vectorizer handle them in other ways (they have single-use
>> >> issues anyway).  I don't remember exactly what caused us to add them
>> >> but one reason was there wasn't a vector type for 'bool' (but I don't see how
>> >> it should be necessary to ask "get me a vector type for 'bool'").
>> >
>> > That was just one of the reasons.  The other reason is that even if we would
>> > choose some vector of integer type as vector of bool, the question is what
>> > type.  E.g. if you use vector of chars, you almost always get terrible
>> > vectorized code, except for the AVX-512 you really want an integral type
>> > that has the size of the types you are comparing.
>>
>> Yeah, but the way STMT_VINFO_VECTYPE is computed is that we always
>> first compute the vector type for the comparison itself (which is "fixed") and
>> thus we can compute the vector type of any bitwise op on it as well.
>
> Sure, but if you then immediately vector narrow it to a V*QI vector because
> it is stored originally into a bool/_Bool variable, and then again when it
> is used in say a COND_EXPR widen it again, you get really poor code.
> So, what the bool pattern code does is kind of poor man's type
> promotion/demotion pass for bool only, at least for the common cases.

Yeah, I just looked at the code but in the end everything should be fixable
in the place we compute STMT_VINFO_VECTYPE.  The code just
looks at the LHS type plus at the narrowest type (for vectorization factor).
It should get re-structured to get the vector types from the operands
(much like code-generation will eventually fall back to).

> PR50596 has been the primary reason to introduce the bool patterns.
> If there is a better type promotion/demotion pass on a copy of the loop,
> sure, we can get rid of it (but figure out also what to do for SLP).

Yeah, of course.  Basic-block SLP just asks for the vectype during SLP
analysis AFAIK.

I suppose we want sth like get_result_vectype (gimple) which can look
at operands as well and can be used from both places.

After all we do want to fix the non-single-use issue somehow and getting
rid of the patterns sounds good to me anyway...

Not sure if I can get to the above for GCC 6, but at least putting it on my
TODO...

Richard.

>         Jakub

Ilya Enkovich Aug. 26, 2015, 2:38 p.m. UTC | #17

2015-08-26 16:02 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Fri, Aug 21, 2015 at 2:17 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>> 2015-08-21 14:00 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>
>>> Hmm, I don't see how vector masks are more difficult to operate with.
>>
>> There are just no instructions for that but you have to pretend you
>> have to get code vectorized.
>
> Huh?  Bitwise ops should be readily available.

Right bitwise ops are available, but there is no comparison into a
vector and no masked loads and stores using vector masks (when we
speak about 512-bit vectors).

>
>>>
>>>> Also according to vector ABI integer mask should be used for mask
>>>> operand in case of masked vector call.
>>>
>>> What ABI?  The function signature of the intrinsics?  How would that
>>> come into play here?
>>
>> Not intrinsics. I mean OpenMP vector functions which require integer
>> arg for a mask in case of 512-bit vector.
>
> How do you declare those?

Something like this:

#pragma omp declare simd inbranch
int foo(int*);

>
>>>
>>>> Current implementation of masked loads, masked stores and bool
>>>> patterns in vectorizer just reflect SSE4 and AVX. Can (and should) we
>>>> really call it a canonical representation for all targets?
>>>
>>> No idea - we'll revisit when another targets adds a similar capability.
>>
>> AVX-512 is such target. Current representation forces multiple scalar
>> mask -> vector mask and back transformations which are artificially
>> introduced by current bool patterns and are hard to optimize out.
>
> I dislike the bool patterns anyway and we should try to remove those
> and make the vectorizer handle them in other ways (they have single-use
> issues anyway).  I don't remember exactly what caused us to add them
> but one reason was there wasn't a vector type for 'bool' (but I don't see how
> it should be necessary to ask "get me a vector type for 'bool'").
>
>>>
>>>> Using scalar masks everywhere should probably cause the same conversion
>>>> problem for SSE I listed above though.
>>>>
>>>> Talking about a canonical representation, shouldn't we use some
>>>> special masks representation and not mixing it with integer and vector
>>>> of integers then? Only in this case target would be able to
>>>> efficiently expand it into a corresponding rtl.
>>>
>>> That was my idea of vector<bool> ... but I didn't explore it and see where
>>> it will cause issues.
>>>
>>> Fact is GCC already copes with vector masks generated by vector compares
>>> just fine everywhere and I'd rather leave it as that.
>>
>> Nope. Currently vector mask is obtained from a vec_cond <A op B, {0 ..
>> 0}, {-1 .. -1}>. AND and IOR on bools are also expressed via
>> additional vec_cond. I don't think vectorizer ever generates vector
>> comparison.
>
> Ok, well that's an implementation detail then.  Are you sure about AND and IOR?
> The comment above vect_recog_bool_pattern says
>
>         Assuming size of TYPE is the same as size of all comparisons
>         (otherwise some casts would be added where needed), the above
>         sequence we create related pattern stmts:
>         S1'  a_T = x1 CMP1 y1 ? 1 : 0;
>         S3'  c_T = x2 CMP2 y2 ? a_T : 0;
>         S4'  d_T = x3 CMP3 y3 ? 1 : 0;
>         S5'  e_T = c_T | d_T;
>         S6'  f_T = e_T;
>
> thus has vector mask |

I think in practice it would look like:

S4'  d_T = x3 CMP3 y3 ? 1 : c_T;

Thus everything is usually hidden in vec_cond. But my concern is
mostly about types used for that.

>
>> And I wouldn't say it's fine 'everywhere' because there is a single
>> target utilizing them. Masked loads and stored for AVX-512 just don't
>> work now. And if we extend existing MASK_LOAD and MASK_STORE optabs to
>> 512-bit vector then we get an ugly inefficient code. The question is
>> where to fight with this inefficiency: in RTL or in GIMPLE. I want to
>> fight with it where it appears, i.e. in GIMPLE by preventing bool ->
>> int conversions applied everywhere even if target doesn't need it.
>>
>> If we don't want to support both types of masks in GIMPLE then it's
>> more reasonable to make bool -> int conversion in expand for targets
>> requiring it, rather than do it for everyone and then leave it to
>> target to transform it back and try to get rid of all those redundant
>> transformations. I'd give vector<bool> a chance to become a canonical
>> mask representation for that.
>
> Well, you are missing the case of
>
>    bool b = a < b;
>    int x = (int)b;

This case seems to require no changes and just be transformed into vec_cond.

Thanks,
Ilya

>
> where the bool is used as integer (and thus an integer mask would have to be
> "expanded").  When the bool is a mask in itself the integer use is either free
> or a matter of a widening/shortening operation.
>
> Richard.
>

Richard Biener Aug. 26, 2015, 2:56 p.m. UTC | #18

On Wed, Aug 26, 2015 at 4:38 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> 2015-08-26 16:02 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Fri, Aug 21, 2015 at 2:17 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>> 2015-08-21 14:00 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>
>>>> Hmm, I don't see how vector masks are more difficult to operate with.
>>>
>>> There are just no instructions for that but you have to pretend you
>>> have to get code vectorized.
>>
>> Huh?  Bitwise ops should be readily available.
>
> Right bitwise ops are available, but there is no comparison into a
> vector and no masked loads and stores using vector masks (when we
> speak about 512-bit vectors).
>
>>
>>>>
>>>>> Also according to vector ABI integer mask should be used for mask
>>>>> operand in case of masked vector call.
>>>>
>>>> What ABI?  The function signature of the intrinsics?  How would that
>>>> come into play here?
>>>
>>> Not intrinsics. I mean OpenMP vector functions which require integer
>>> arg for a mask in case of 512-bit vector.
>>
>> How do you declare those?
>
> Something like this:
>
> #pragma omp declare simd inbranch
> int foo(int*);

The 'inbranch' is the thing that matters?  And all of foo is then
implicitely predicated?

>>
>>>>
>>>>> Current implementation of masked loads, masked stores and bool
>>>>> patterns in vectorizer just reflect SSE4 and AVX. Can (and should) we
>>>>> really call it a canonical representation for all targets?
>>>>
>>>> No idea - we'll revisit when another targets adds a similar capability.
>>>
>>> AVX-512 is such target. Current representation forces multiple scalar
>>> mask -> vector mask and back transformations which are artificially
>>> introduced by current bool patterns and are hard to optimize out.
>>
>> I dislike the bool patterns anyway and we should try to remove those
>> and make the vectorizer handle them in other ways (they have single-use
>> issues anyway).  I don't remember exactly what caused us to add them
>> but one reason was there wasn't a vector type for 'bool' (but I don't see how
>> it should be necessary to ask "get me a vector type for 'bool'").
>>
>>>>
>>>>> Using scalar masks everywhere should probably cause the same conversion
>>>>> problem for SSE I listed above though.
>>>>>
>>>>> Talking about a canonical representation, shouldn't we use some
>>>>> special masks representation and not mixing it with integer and vector
>>>>> of integers then? Only in this case target would be able to
>>>>> efficiently expand it into a corresponding rtl.
>>>>
>>>> That was my idea of vector<bool> ... but I didn't explore it and see where
>>>> it will cause issues.
>>>>
>>>> Fact is GCC already copes with vector masks generated by vector compares
>>>> just fine everywhere and I'd rather leave it as that.
>>>
>>> Nope. Currently vector mask is obtained from a vec_cond <A op B, {0 ..
>>> 0}, {-1 .. -1}>. AND and IOR on bools are also expressed via
>>> additional vec_cond. I don't think vectorizer ever generates vector
>>> comparison.
>>
>> Ok, well that's an implementation detail then.  Are you sure about AND and IOR?
>> The comment above vect_recog_bool_pattern says
>>
>>         Assuming size of TYPE is the same as size of all comparisons
>>         (otherwise some casts would be added where needed), the above
>>         sequence we create related pattern stmts:
>>         S1'  a_T = x1 CMP1 y1 ? 1 : 0;
>>         S3'  c_T = x2 CMP2 y2 ? a_T : 0;
>>         S4'  d_T = x3 CMP3 y3 ? 1 : 0;
>>         S5'  e_T = c_T | d_T;
>>         S6'  f_T = e_T;
>>
>> thus has vector mask |
>
> I think in practice it would look like:
>
> S4'  d_T = x3 CMP3 y3 ? 1 : c_T;
>
> Thus everything is usually hidden in vec_cond. But my concern is
> mostly about types used for that.
>
>>
>>> And I wouldn't say it's fine 'everywhere' because there is a single
>>> target utilizing them. Masked loads and stored for AVX-512 just don't
>>> work now. And if we extend existing MASK_LOAD and MASK_STORE optabs to
>>> 512-bit vector then we get an ugly inefficient code. The question is
>>> where to fight with this inefficiency: in RTL or in GIMPLE. I want to
>>> fight with it where it appears, i.e. in GIMPLE by preventing bool ->
>>> int conversions applied everywhere even if target doesn't need it.
>>>
>>> If we don't want to support both types of masks in GIMPLE then it's
>>> more reasonable to make bool -> int conversion in expand for targets
>>> requiring it, rather than do it for everyone and then leave it to
>>> target to transform it back and try to get rid of all those redundant
>>> transformations. I'd give vector<bool> a chance to become a canonical
>>> mask representation for that.
>>
>> Well, you are missing the case of
>>
>>    bool b = a < b;
>>    int x = (int)b;
>
> This case seems to require no changes and just be transformed into vec_cond.

Ok, the example was too simple but I meant that a bool has a non-conditional
use.

Ok, so I still believe we don't want two ways to express things on GIMPLE if
possible.  Yes, the vectorizer already creates only vector stmts that
are supported
by the hardware.  So it's a matter of deciding on the GIMPLE representation
for the "mask".  I'd rather use vector<bool> (and the target assigning
an integer
mode to it) than an 'int' in GIMPLE statements.  Because that makes the
type constraints on GIMPLE very weak and exposes those 'ints' to all kind
of optimization passes.

Thus if we change the result type requirement of vector comparisons from
signed integer vectors to bool vectors the vectorizer can still go for
promoting that bool vector to a vector of ints via a VEC_COND_EXPR
and the expander can special-case that if the target has a vector comparison
producing a vector mask.

So, can you give that vector<bool> some thought?  Note that to assign
sth else than a vector mode to it needs adjustments in stor-layout.c.
I'm pretty sure we don't want vector BImodes.

Richard.

> Thanks,
> Ilya
>
>>
>> where the bool is used as integer (and thus an integer mask would have to be
>> "expanded").  When the bool is a mask in itself the integer use is either free
>> or a matter of a widening/shortening operation.
>>
>> Richard.
>>

Jakub Jelinek Aug. 26, 2015, 3:10 p.m. UTC | #19

On Wed, Aug 26, 2015 at 04:56:23PM +0200, Richard Biener wrote:
> >> How do you declare those?
> >
> > Something like this:
> >
> > #pragma omp declare simd inbranch
> > int foo(int*);
> 
> The 'inbranch' is the thing that matters?  And all of foo is then
> implicitely predicated?

If it is
#pragma omp declare simd notinbranch,
then only the non-predicated version is emitted and thus it is usable only
in vectorized loops inside of non-conditional contexts.
If it is
#pragma omp declare simd inbranch,
then only the predicated version is emitted, there is an extra argument
(either V*QI if I remember well, or for AVX-512 short/int/long bitmask),
if the caller wants to use it in non-conditional contexts, it just passes
all ones mask.  For
#pragma omp declare simd
(neither inbranch nor notinbranch), two versions are emitted, one predicated
and one non-predicated.

	Jakub

Ilya Enkovich Aug. 26, 2015, 3:51 p.m. UTC | #20

2015-08-26 17:56 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Wed, Aug 26, 2015 at 4:38 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>> 2015-08-26 16:02 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Fri, Aug 21, 2015 at 2:17 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>>> 2015-08-21 14:00 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>
>>>>> Hmm, I don't see how vector masks are more difficult to operate with.
>>>>
>>>> There are just no instructions for that but you have to pretend you
>>>> have to get code vectorized.
>>>
>>> Huh?  Bitwise ops should be readily available.
>>
>> Right bitwise ops are available, but there is no comparison into a
>> vector and no masked loads and stores using vector masks (when we
>> speak about 512-bit vectors).
>>
>>>
>>>>>
>>>>>> Also according to vector ABI integer mask should be used for mask
>>>>>> operand in case of masked vector call.
>>>>>
>>>>> What ABI?  The function signature of the intrinsics?  How would that
>>>>> come into play here?
>>>>
>>>> Not intrinsics. I mean OpenMP vector functions which require integer
>>>> arg for a mask in case of 512-bit vector.
>>>
>>> How do you declare those?
>>
>> Something like this:
>>
>> #pragma omp declare simd inbranch
>> int foo(int*);
>
> The 'inbranch' is the thing that matters?  And all of foo is then
> implicitely predicated?

That's right. And a vector version of foo gets a mask as an additional arg.

>
>>>
>>> Well, you are missing the case of
>>>
>>>    bool b = a < b;
>>>    int x = (int)b;
>>
>> This case seems to require no changes and just be transformed into vec_cond.
>
> Ok, the example was too simple but I meant that a bool has a non-conditional
> use.

Right. In such cases I think it's reasonable to replace it with a
select similar to what we now have but without whole bool tree
transformed.

>
> Ok, so I still believe we don't want two ways to express things on GIMPLE if
> possible.  Yes, the vectorizer already creates only vector stmts that
> are supported
> by the hardware.  So it's a matter of deciding on the GIMPLE representation
> for the "mask".  I'd rather use vector<bool> (and the target assigning
> an integer
> mode to it) than an 'int' in GIMPLE statements.  Because that makes the
> type constraints on GIMPLE very weak and exposes those 'ints' to all kind
> of optimization passes.
>
> Thus if we change the result type requirement of vector comparisons from
> signed integer vectors to bool vectors the vectorizer can still go for
> promoting that bool vector to a vector of ints via a VEC_COND_EXPR
> and the expander can special-case that if the target has a vector comparison
> producing a vector mask.
>
> So, can you give that vector<bool> some thought?

Yes, I want to try it. But getting rid of bool patterns would mean
support for all targets currently supporting vec_cond. Would it be OK
to have vector<bool> mask co-exist with bool patterns for some time?
Thus first step would be to require vector<bool> for MASK_LOAD and
MASK_STORE and support it for i386 (the only user of MASK_LOAD and
MASK_STORE).

>Note that to assign
> sth else than a vector mode to it needs adjustments in stor-layout.c.
> I'm pretty sure we don't want vector BImodes.

I can directly build a vector type with specified mode to avoid it. Smth. like:

mask_mode = targetm.vectorize.get_mask_mode (nunits, current_vector_size);
mask_type = make_vector_type (bool_type_node, nunits, mask_mode);

Thanks,
Ilya

>
> Richard.
>

Jeff Law Aug. 26, 2015, 4:30 p.m. UTC | #21

On 08/26/2015 05:13 AM, Ilya Enkovich wrote:
> 2015-08-26 0:42 GMT+03:00 Jeff Law <law@redhat.com>:
>> On 08/21/2015 04:49 AM, Ilya Enkovich wrote:
>>>
>>>
>>> I want a work with bitmasks to be expressed in a natural way using
>>> regular integer operations. Currently all masks manipulations are
>>> emulated via vector statements (mostly using a bunch of vec_cond). For
>>> complex predicates it may be nontrivial to transform it back to scalar
>>> masks and get an efficient code. Also the same vector may be used as
>>> both a mask and an integer vector. Things become more complex if you
>>> additionally have broadcasts and vector pack/unpack code. It also
>>> should be transformed into a scalar masks manipulations somehow.
>>
>> Or why not model the conversion at the gimple level using a CONVERT_EXPR?
>> In fact, the more I think about it, that seems to make more sense to me.
>>
>> We pick a canonical form for the mask, whatever it may be.  We use that
>> canonical form and model conversions between it and the other form via
>> CONVERT_EXPR.  We then let DOM/PRE find/eliminate the redundant conversions.
>> If it's not up to the task, we should really look into why and resolve.
>>
>> Yes, that does mean we have two forms which I'm not terribly happy about and
>> it means some target dependencies on what the masked vector operation looks
>> like (ie, does it accept a simple integer or vector mask), but I'm starting
>> to wonder if, as distasteful as I find it, it's the right thing to do.
>
> If we have some special representation for masks in GIMPLE then we
> might not need any conversions. We could ask a target to define a MODE
> for this type and use it directly everywhere: directly compare into
> it, use it directly for masked loads and stores, AND, IOR, EQ etc. If
> that type is reserved for masks usage then you previous suggestion to
> transform masks into target specific form at GIMPLE->RTL phase should
> work fine. This would allow to support only a single masks
> representation in GIMPLE.
Possibly, but you mentioned that you may need to use the masks in both 
forms depending on the exact context.  If so, then I think we need to 
model a conversion between the two forms.


Jeff

Richard Biener Aug. 27, 2015, 7:55 a.m. UTC | #22

On Wed, Aug 26, 2015 at 5:51 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> 2015-08-26 17:56 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Wed, Aug 26, 2015 at 4:38 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>> 2015-08-26 16:02 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Fri, Aug 21, 2015 at 2:17 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>>>> 2015-08-21 14:00 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>
>>>>>> Hmm, I don't see how vector masks are more difficult to operate with.
>>>>>
>>>>> There are just no instructions for that but you have to pretend you
>>>>> have to get code vectorized.
>>>>
>>>> Huh?  Bitwise ops should be readily available.
>>>
>>> Right bitwise ops are available, but there is no comparison into a
>>> vector and no masked loads and stores using vector masks (when we
>>> speak about 512-bit vectors).
>>>
>>>>
>>>>>>
>>>>>>> Also according to vector ABI integer mask should be used for mask
>>>>>>> operand in case of masked vector call.
>>>>>>
>>>>>> What ABI?  The function signature of the intrinsics?  How would that
>>>>>> come into play here?
>>>>>
>>>>> Not intrinsics. I mean OpenMP vector functions which require integer
>>>>> arg for a mask in case of 512-bit vector.
>>>>
>>>> How do you declare those?
>>>
>>> Something like this:
>>>
>>> #pragma omp declare simd inbranch
>>> int foo(int*);
>>
>> The 'inbranch' is the thing that matters?  And all of foo is then
>> implicitely predicated?
>
> That's right. And a vector version of foo gets a mask as an additional arg.
>
>>
>>>>
>>>> Well, you are missing the case of
>>>>
>>>>    bool b = a < b;
>>>>    int x = (int)b;
>>>
>>> This case seems to require no changes and just be transformed into vec_cond.
>>
>> Ok, the example was too simple but I meant that a bool has a non-conditional
>> use.
>
> Right. In such cases I think it's reasonable to replace it with a
> select similar to what we now have but without whole bool tree
> transformed.
>
>>
>> Ok, so I still believe we don't want two ways to express things on GIMPLE if
>> possible.  Yes, the vectorizer already creates only vector stmts that
>> are supported
>> by the hardware.  So it's a matter of deciding on the GIMPLE representation
>> for the "mask".  I'd rather use vector<bool> (and the target assigning
>> an integer
>> mode to it) than an 'int' in GIMPLE statements.  Because that makes the
>> type constraints on GIMPLE very weak and exposes those 'ints' to all kind
>> of optimization passes.
>>
>> Thus if we change the result type requirement of vector comparisons from
>> signed integer vectors to bool vectors the vectorizer can still go for
>> promoting that bool vector to a vector of ints via a VEC_COND_EXPR
>> and the expander can special-case that if the target has a vector comparison
>> producing a vector mask.
>>
>> So, can you give that vector<bool> some thought?
>
> Yes, I want to try it. But getting rid of bool patterns would mean
> support for all targets currently supporting vec_cond. Would it be OK
> to have vector<bool> mask co-exist with bool patterns for some time?

No, I'd like to remove the bool patterns anyhow - the vectorizer should be able
to figure out the correct vector type (or mask type) from the uses.  Currently
it simply looks at the stmts LHS type but as all stmt operands already have
vector types it can as well compute the result type from those.  We'd want to
have a helper function that does this result type computation as I figure it
will be needed in multiple places.

This is now on my personal TODO list (but that's already quite long for GCC 6),
so if you manage to get to that...  see
tree-vect-loop.c:vect_determine_vectorization_factor
which computes STMT_VINFO_VECTYPE for all stmts but loads (loads get their
vector type set from data-ref analysis already - there 'bool' loads
correctly get
VNQImode).  There is a basic-block / SLP part as well that would need to use
the helper function (eventually with some SLP discovery order issue).

> Thus first step would be to require vector<bool> for MASK_LOAD and
> MASK_STORE and support it for i386 (the only user of MASK_LOAD and
> MASK_STORE).

You can certainly try that first, but as soon as you hit complications with
needing to adjust bool patterns then I'd rather get rid of them.

>>Note that to assign
>> sth else than a vector mode to it needs adjustments in stor-layout.c.
>> I'm pretty sure we don't want vector BImodes.
>
> I can directly build a vector type with specified mode to avoid it. Smth. like:
>
> mask_mode = targetm.vectorize.get_mask_mode (nunits, current_vector_size);
> mask_type = make_vector_type (bool_type_node, nunits, mask_mode);

Hmm, indeed, that might be a (good) solution.  Btw, in this case
target attribute
boundaries would be "ignored" (that is, TYPE_MODE wouldn't change depending
on the active target).  There would also be no way for the user to
declare vector<bool>
in source (which is good because of that target attribute issue...).

So yeah.  Adding a tree.c:build_truth_vector_type (unsigned nunits)
and adjusting
truth_type_for is the way to go.

I suggest you try modifying those parts first according to this scheme
that will most
likely uncover issues we missed.

Thanks,
Richard.

> Thanks,
> Ilya
>
>>
>> Richard.
>>

[Scalar,masks,2/x] Use bool masks in if-conversion

Commit Message

Comments

Patch