diff mbox

PR target/68991: Add vector_memory_operand and "Bm" constraint

Message ID 20151230205333.GA1877@gmail.com
State New
Headers show

Commit Message

H.J. Lu Dec. 30, 2015, 8:53 p.m. UTC
SSE vector arithmetic and logic instructions only accept aligned memory
operand.  This patch adds vector_memory_operand and "Bm" constraint for
aligned SSE memory operand.  They are applied to SSE any_logic patterns.

OK for trunk and release branches if there are regressions?

H.J.
---
gcc/

	PR target/68991
	* config/i386/constraints.md (Bm): New constraint.
	* config/i386/predicates.md (vector_memory_operand): New
	predicate.
	* config/i386/sse.md: Replace xm with xBm in any_logic patterns.

gcc/testsuite/

	PR target/68991
	* g++.dg/pr68991.C: New test.
---
 gcc/config/i386/constraints.md |   5 ++
 gcc/config/i386/predicates.md  |   7 ++
 gcc/config/i386/sse.md         |   8 +-
 gcc/testsuite/g++.dg/pr68991.C | 191 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 207 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/pr68991.C

Comments

Uros Bizjak Dec. 31, 2015, 9:14 a.m. UTC | #1
On Wed, Dec 30, 2015 at 9:53 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> SSE vector arithmetic and logic instructions only accept aligned memory
> operand.  This patch adds vector_memory_operand and "Bm" constraint for
> aligned SSE memory operand.  They are applied to SSE any_logic patterns.
>
> OK for trunk and release branches if there are regressions?

This patch is just papering over deeper problem, as Jakub said in the PR [1]:

--q--
GCC uses the ix86_legitimate_combined_insn target hook to disallow
misaligned memory into certain SSE instructions.
(subreg:V4SI (reg:TI 245 [ MEM[(const struct bitset &)FeatureEntry_21 + 8] ]) 0)
is not misaligned memory, it is a subreg of a pseudo register, so it is fine.
If the replacement of the pseudo register with memory happens in some
other pass, then it probably either should use the
legitimate_combined_insn target hook or some other one.  I think we
have already a PR where that happens during live range shrinking.
--/q--

Please figure out where memory replacement happens. There are several
other SSE insns (please grep the .md for "ssememalign" attribute) that
are affected by this problem, so fixing a couple of patterns won't
solve the problem completely.

Looks like infrastructure problem to me, targets should have the last
say on the validity of the insns.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68991#c5

Uros.
H.J. Lu Dec. 31, 2015, 3:29 p.m. UTC | #2
On Thu, Dec 31, 2015 at 1:14 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Wed, Dec 30, 2015 at 9:53 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> SSE vector arithmetic and logic instructions only accept aligned memory
>> operand.  This patch adds vector_memory_operand and "Bm" constraint for
>> aligned SSE memory operand.  They are applied to SSE any_logic patterns.
>>
>> OK for trunk and release branches if there are regressions?
>
> This patch is just papering over deeper problem, as Jakub said in the PR [1]:
>
> --q--
> GCC uses the ix86_legitimate_combined_insn target hook to disallow
> misaligned memory into certain SSE instructions.
> (subreg:V4SI (reg:TI 245 [ MEM[(const struct bitset &)FeatureEntry_21 + 8] ]) 0)
> is not misaligned memory, it is a subreg of a pseudo register, so it is fine.
> If the replacement of the pseudo register with memory happens in some
> other pass, then it probably either should use the
> legitimate_combined_insn target hook or some other one.  I think we
> have already a PR where that happens during live range shrinking.
> --/q--
>
> Please figure out where memory replacement happens. There are several
> other SSE insns (please grep the .md for "ssememalign" attribute) that
> are affected by this problem, so fixing a couple of patterns won't
> solve the problem completely.

LRA turns

insn 64 63 108 6 (set (reg:V4SI 148 [ vect__28.85 ])
        (xor:V4SI (reg:V4SI 149)
            (subreg:V4SI (reg:TI 147 [ MEM[(const struct bitset
&)FeatureEntry_2(D)] ]) 0))) foo.ii:26 3454 {*xorv4si3}
     (expr_list:REG_DEAD (reg:V4SI 149)
        (expr_list:REG_DEAD (reg:TI 147 [ MEM[(const struct bitset
&)FeatureEntry_2(D)] ])
            (expr_list:REG_EQUIV (mem/c:V4SI (plus:DI (reg/f:DI 20 frame)
                        (const_int -16 [0xfffffffffffffff0])) [3
MEM[(unsigned int *)&D.2851]+0 S16 A128])
                (nil)))))

into

(insn 64 63 108 6 (set (reg:V4SI 21 xmm0 [orig:148 vect__28.85 ] [148])
        (xor:V4SI (reg:V4SI 21 xmm0 [149])
            (mem:V4SI (reg/v/f:DI 4 si [orig:117 FeatureEntry ] [117])
[6 MEM[(const struct bitset &)FeatureEntry_2(D)]+0 S16 A32])))
foo.ii:26 3454 {*xorv4si3}
     (expr_list:REG_EQUIV (mem/c:V4SI (plus:DI (reg/f:DI 20 frame)
                (const_int -16 [0xfffffffffffffff0])) [3 MEM[(unsigned
int *)&D.2851]+0 S16 A128])
        (nil)))

since

(mem:V4SI (reg/v/f:DI 4 si [orig:117 FeatureEntry ] [117]) [6
MEM[(const struct bitset &)FeatureEntry_2(D)]+0 S16 A32])))

satisfies the 'm" constraint.  I don't think LRA should call
ix86_legitimate_combined_insn to validate to validate constraints on
an instruction.

> Looks like infrastructure problem to me, targets should have the last
> say on the validity of the insns.
>

We can replace 'm' constraint with 'Bm' constraint in sse.md or we
can add a new target hook, similar to ix86_legitimate_combined_insn,
to validate an instruction with hard registers.
Jakub Jelinek Dec. 31, 2015, 4:12 p.m. UTC | #3
On Thu, Dec 31, 2015 at 07:29:21AM -0800, H.J. Lu wrote:
> > This patch is just papering over deeper problem, as Jakub said in the PR [1]:
> >
> > --q--
> > GCC uses the ix86_legitimate_combined_insn target hook to disallow
> > misaligned memory into certain SSE instructions.
> > (subreg:V4SI (reg:TI 245 [ MEM[(const struct bitset &)FeatureEntry_21 + 8] ]) 0)
> > is not misaligned memory, it is a subreg of a pseudo register, so it is fine.
> > If the replacement of the pseudo register with memory happens in some
> > other pass, then it probably either should use the
> > legitimate_combined_insn target hook or some other one.  I think we
> > have already a PR where that happens during live range shrinking.
> > --/q--
> >
> > Please figure out where memory replacement happens. There are several
> > other SSE insns (please grep the .md for "ssememalign" attribute) that
> > are affected by this problem, so fixing a couple of patterns won't
> > solve the problem completely.
> 
> LRA turns
> 
> insn 64 63 108 6 (set (reg:V4SI 148 [ vect__28.85 ])
>         (xor:V4SI (reg:V4SI 149)
>             (subreg:V4SI (reg:TI 147 [ MEM[(const struct bitset
> &)FeatureEntry_2(D)] ]) 0))) foo.ii:26 3454 {*xorv4si3}
>      (expr_list:REG_DEAD (reg:V4SI 149)
>         (expr_list:REG_DEAD (reg:TI 147 [ MEM[(const struct bitset
> &)FeatureEntry_2(D)] ])
>             (expr_list:REG_EQUIV (mem/c:V4SI (plus:DI (reg/f:DI 20 frame)
>                         (const_int -16 [0xfffffffffffffff0])) [3
> MEM[(unsigned int *)&D.2851]+0 S16 A128])
>                 (nil)))))
> 
> into
> 
> (insn 64 63 108 6 (set (reg:V4SI 21 xmm0 [orig:148 vect__28.85 ] [148])
>         (xor:V4SI (reg:V4SI 21 xmm0 [149])
>             (mem:V4SI (reg/v/f:DI 4 si [orig:117 FeatureEntry ] [117])
> [6 MEM[(const struct bitset &)FeatureEntry_2(D)]+0 S16 A32])))
> foo.ii:26 3454 {*xorv4si3}
>      (expr_list:REG_EQUIV (mem/c:V4SI (plus:DI (reg/f:DI 20 frame)
>                 (const_int -16 [0xfffffffffffffff0])) [3 MEM[(unsigned
> int *)&D.2851]+0 S16 A128])
>         (nil)))
> 
> since
> 
> (mem:V4SI (reg/v/f:DI 4 si [orig:117 FeatureEntry ] [117]) [6
> MEM[(const struct bitset &)FeatureEntry_2(D)]+0 S16 A32])))
> 
> satisfies the 'm" constraint.  I don't think LRA should call
> ix86_legitimate_combined_insn to validate to validate constraints on
> an instruction.

I think best would be if we arrange (through some gen* extension?) that
the backend in *.md files could specify additional code to be run during
recognition (ideally have a way that it is used only on operands that are or
might have vector mode and are or might be MEM, so that it wouldn't be done
on every insn) and move the
          /* For pre-AVX disallow unaligned loads/stores where the
             instructions don't support it.  */
          if (!TARGET_AVX
              && VECTOR_MODE_P (mode)
              && misaligned_operand (op, mode))
            {
              unsigned int min_align = get_attr_ssememalign (insn);
              if (min_align == 0
                  || MEM_ALIGN (op) < min_align)
                return false;
            }
from the legitimate_combined_insn target hook into that.  Or run some target
hook during recog?  That way it would
be checked in all recog spots, rather than just the combiner.
Or indeed move to different predicates and/or constraints for operands that
disallow misaligned operands and perform this check in there.
Unless something in LRA is the only oher spot that could try to "combine"
misaligned memory into vector instructions.

	Jakub
H.J. Lu Dec. 31, 2015, 5:09 p.m. UTC | #4
On Thu, Dec 31, 2015 at 8:12 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Thu, Dec 31, 2015 at 07:29:21AM -0800, H.J. Lu wrote:
>> > This patch is just papering over deeper problem, as Jakub said in the PR [1]:
>> >
>> > --q--
>> > GCC uses the ix86_legitimate_combined_insn target hook to disallow
>> > misaligned memory into certain SSE instructions.
>> > (subreg:V4SI (reg:TI 245 [ MEM[(const struct bitset &)FeatureEntry_21 + 8] ]) 0)
>> > is not misaligned memory, it is a subreg of a pseudo register, so it is fine.
>> > If the replacement of the pseudo register with memory happens in some
>> > other pass, then it probably either should use the
>> > legitimate_combined_insn target hook or some other one.  I think we
>> > have already a PR where that happens during live range shrinking.
>> > --/q--
>> >
>> > Please figure out where memory replacement happens. There are several
>> > other SSE insns (please grep the .md for "ssememalign" attribute) that
>> > are affected by this problem, so fixing a couple of patterns won't
>> > solve the problem completely.
>>
>> LRA turns
>>
>> insn 64 63 108 6 (set (reg:V4SI 148 [ vect__28.85 ])
>>         (xor:V4SI (reg:V4SI 149)
>>             (subreg:V4SI (reg:TI 147 [ MEM[(const struct bitset
>> &)FeatureEntry_2(D)] ]) 0))) foo.ii:26 3454 {*xorv4si3}
>>      (expr_list:REG_DEAD (reg:V4SI 149)
>>         (expr_list:REG_DEAD (reg:TI 147 [ MEM[(const struct bitset
>> &)FeatureEntry_2(D)] ])
>>             (expr_list:REG_EQUIV (mem/c:V4SI (plus:DI (reg/f:DI 20 frame)
>>                         (const_int -16 [0xfffffffffffffff0])) [3
>> MEM[(unsigned int *)&D.2851]+0 S16 A128])
>>                 (nil)))))
>>
>> into
>>
>> (insn 64 63 108 6 (set (reg:V4SI 21 xmm0 [orig:148 vect__28.85 ] [148])
>>         (xor:V4SI (reg:V4SI 21 xmm0 [149])
>>             (mem:V4SI (reg/v/f:DI 4 si [orig:117 FeatureEntry ] [117])
>> [6 MEM[(const struct bitset &)FeatureEntry_2(D)]+0 S16 A32])))
>> foo.ii:26 3454 {*xorv4si3}
>>      (expr_list:REG_EQUIV (mem/c:V4SI (plus:DI (reg/f:DI 20 frame)
>>                 (const_int -16 [0xfffffffffffffff0])) [3 MEM[(unsigned
>> int *)&D.2851]+0 S16 A128])
>>         (nil)))
>>
>> since
>>
>> (mem:V4SI (reg/v/f:DI 4 si [orig:117 FeatureEntry ] [117]) [6
>> MEM[(const struct bitset &)FeatureEntry_2(D)]+0 S16 A32])))
>>
>> satisfies the 'm" constraint.  I don't think LRA should call
>> ix86_legitimate_combined_insn to validate to validate constraints on
>> an instruction.
>
> I think best would be if we arrange (through some gen* extension?) that
> the backend in *.md files could specify additional code to be run during
> recognition (ideally have a way that it is used only on operands that are or
> might have vector mode and are or might be MEM, so that it wouldn't be done
> on every insn) and move the
>           /* For pre-AVX disallow unaligned loads/stores where the
>              instructions don't support it.  */
>           if (!TARGET_AVX
>               && VECTOR_MODE_P (mode)
>               && misaligned_operand (op, mode))
>             {
>               unsigned int min_align = get_attr_ssememalign (insn);
>               if (min_align == 0
>                   || MEM_ALIGN (op) < min_align)
>                 return false;
>             }
> from the legitimate_combined_insn target hook into that.  Or run some target
> hook during recog?  That way it would
> be checked in all recog spots, rather than just the combiner.
> Or indeed move to different predicates and/or constraints for operands that
> disallow misaligned operands and perform this check in there.
> Unless something in LRA is the only oher spot that could try to "combine"
> misaligned memory into vector instructions.

The backend should provide accurate constraint information to
register alllocator so that the proper operand will be selected.
Uros Bizjak Jan. 2, 2016, 10:32 a.m. UTC | #5
On Thu, Dec 31, 2015 at 4:29 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Thu, Dec 31, 2015 at 1:14 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> On Wed, Dec 30, 2015 at 9:53 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> SSE vector arithmetic and logic instructions only accept aligned memory
>>> operand.  This patch adds vector_memory_operand and "Bm" constraint for
>>> aligned SSE memory operand.  They are applied to SSE any_logic patterns.
>>>
>>> OK for trunk and release branches if there are regressions?
>>
>> This patch is just papering over deeper problem, as Jakub said in the PR [1]:
>>
>> --q--
>> GCC uses the ix86_legitimate_combined_insn target hook to disallow
>> misaligned memory into certain SSE instructions.
>> (subreg:V4SI (reg:TI 245 [ MEM[(const struct bitset &)FeatureEntry_21 + 8] ]) 0)
>> is not misaligned memory, it is a subreg of a pseudo register, so it is fine.
>> If the replacement of the pseudo register with memory happens in some
>> other pass, then it probably either should use the
>> legitimate_combined_insn target hook or some other one.  I think we
>> have already a PR where that happens during live range shrinking.
>> --/q--
>>
>> Please figure out where memory replacement happens. There are several
>> other SSE insns (please grep the .md for "ssememalign" attribute) that
>> are affected by this problem, so fixing a couple of patterns won't
>> solve the problem completely.
>
> LRA turns
>
> insn 64 63 108 6 (set (reg:V4SI 148 [ vect__28.85 ])
>         (xor:V4SI (reg:V4SI 149)
>             (subreg:V4SI (reg:TI 147 [ MEM[(const struct bitset
> &)FeatureEntry_2(D)] ]) 0))) foo.ii:26 3454 {*xorv4si3}
>      (expr_list:REG_DEAD (reg:V4SI 149)
>         (expr_list:REG_DEAD (reg:TI 147 [ MEM[(const struct bitset
> &)FeatureEntry_2(D)] ])
>             (expr_list:REG_EQUIV (mem/c:V4SI (plus:DI (reg/f:DI 20 frame)
>                         (const_int -16 [0xfffffffffffffff0])) [3
> MEM[(unsigned int *)&D.2851]+0 S16 A128])
>                 (nil)))))
>
> into
>
> (insn 64 63 108 6 (set (reg:V4SI 21 xmm0 [orig:148 vect__28.85 ] [148])
>         (xor:V4SI (reg:V4SI 21 xmm0 [149])
>             (mem:V4SI (reg/v/f:DI 4 si [orig:117 FeatureEntry ] [117])
> [6 MEM[(const struct bitset &)FeatureEntry_2(D)]+0 S16 A32])))
> foo.ii:26 3454 {*xorv4si3}
>      (expr_list:REG_EQUIV (mem/c:V4SI (plus:DI (reg/f:DI 20 frame)
>                 (const_int -16 [0xfffffffffffffff0])) [3 MEM[(unsigned
> int *)&D.2851]+0 S16 A128])
>         (nil)))
>
> since
>
> (mem:V4SI (reg/v/f:DI 4 si [orig:117 FeatureEntry ] [117]) [6
> MEM[(const struct bitset &)FeatureEntry_2(D)]+0 S16 A32])))
>
> satisfies the 'm" constraint.  I don't think LRA should call
> ix86_legitimate_combined_insn to validate to validate constraints on
> an instruction.

Hm...

if LRA desn't assume that generic "m" constraint implies at least
natural alignment of propageted operand, then your patch is the way to
go. But in this case, *every* SSE vector memory constraint should be
changed to Bm.

Let's ask Vladimir for details.

Uros.
Richard Biener Jan. 2, 2016, 11:58 a.m. UTC | #6
On January 2, 2016 11:32:33 AM GMT+01:00, Uros Bizjak <ubizjak@gmail.com> wrote:
>On Thu, Dec 31, 2015 at 4:29 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Thu, Dec 31, 2015 at 1:14 AM, Uros Bizjak <ubizjak@gmail.com>
>wrote:
>>> On Wed, Dec 30, 2015 at 9:53 PM, H.J. Lu <hjl.tools@gmail.com>
>wrote:
>>>> SSE vector arithmetic and logic instructions only accept aligned
>memory
>>>> operand.  This patch adds vector_memory_operand and "Bm" constraint
>for
>>>> aligned SSE memory operand.  They are applied to SSE any_logic
>patterns.
>>>>
>>>> OK for trunk and release branches if there are regressions?
>>>
>>> This patch is just papering over deeper problem, as Jakub said in
>the PR [1]:
>>>
>>> --q--
>>> GCC uses the ix86_legitimate_combined_insn target hook to disallow
>>> misaligned memory into certain SSE instructions.
>>> (subreg:V4SI (reg:TI 245 [ MEM[(const struct bitset
>&)FeatureEntry_21 + 8] ]) 0)
>>> is not misaligned memory, it is a subreg of a pseudo register, so it
>is fine.
>>> If the replacement of the pseudo register with memory happens in
>some
>>> other pass, then it probably either should use the
>>> legitimate_combined_insn target hook or some other one.  I think we
>>> have already a PR where that happens during live range shrinking.
>>> --/q--
>>>
>>> Please figure out where memory replacement happens. There are
>several
>>> other SSE insns (please grep the .md for "ssememalign" attribute)
>that
>>> are affected by this problem, so fixing a couple of patterns won't
>>> solve the problem completely.
>>
>> LRA turns
>>
>> insn 64 63 108 6 (set (reg:V4SI 148 [ vect__28.85 ])
>>         (xor:V4SI (reg:V4SI 149)
>>             (subreg:V4SI (reg:TI 147 [ MEM[(const struct bitset
>> &)FeatureEntry_2(D)] ]) 0))) foo.ii:26 3454 {*xorv4si3}
>>      (expr_list:REG_DEAD (reg:V4SI 149)
>>         (expr_list:REG_DEAD (reg:TI 147 [ MEM[(const struct bitset
>> &)FeatureEntry_2(D)] ])
>>             (expr_list:REG_EQUIV (mem/c:V4SI (plus:DI (reg/f:DI 20
>frame)
>>                         (const_int -16 [0xfffffffffffffff0])) [3
>> MEM[(unsigned int *)&D.2851]+0 S16 A128])
>>                 (nil)))))
>>
>> into
>>
>> (insn 64 63 108 6 (set (reg:V4SI 21 xmm0 [orig:148 vect__28.85 ]
>[148])
>>         (xor:V4SI (reg:V4SI 21 xmm0 [149])
>>             (mem:V4SI (reg/v/f:DI 4 si [orig:117 FeatureEntry ]
>[117])
>> [6 MEM[(const struct bitset &)FeatureEntry_2(D)]+0 S16 A32])))
>> foo.ii:26 3454 {*xorv4si3}
>>      (expr_list:REG_EQUIV (mem/c:V4SI (plus:DI (reg/f:DI 20 frame)
>>                 (const_int -16 [0xfffffffffffffff0])) [3
>MEM[(unsigned
>> int *)&D.2851]+0 S16 A128])
>>         (nil)))
>>
>> since
>>
>> (mem:V4SI (reg/v/f:DI 4 si [orig:117 FeatureEntry ] [117]) [6
>> MEM[(const struct bitset &)FeatureEntry_2(D)]+0 S16 A32])))
>>
>> satisfies the 'm" constraint.  I don't think LRA should call
>> ix86_legitimate_combined_insn to validate to validate constraints on
>> an instruction.
>
>Hm...
>
>if LRA desn't assume that generic "m" constraint implies at least
>natural alignment of propageted operand, then your patch is the way to
>go. 

I don't think it even considers alignment. Archs where alignment validity depends on the actual instruction should model this with proper constraints.

But in this case, *every* SSE vector memory constraint should be
>changed to Bm.

I'd say so ...

>
>Let's ask Vladimir for details.
>
>Uros.
H.J. Lu Jan. 2, 2016, 6:26 p.m. UTC | #7
On Sat, Jan 2, 2016 at 3:58 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On January 2, 2016 11:32:33 AM GMT+01:00, Uros Bizjak <ubizjak@gmail.com> wrote:
>>On Thu, Dec 31, 2015 at 4:29 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Thu, Dec 31, 2015 at 1:14 AM, Uros Bizjak <ubizjak@gmail.com>
>>wrote:
>>>> On Wed, Dec 30, 2015 at 9:53 PM, H.J. Lu <hjl.tools@gmail.com>
>>wrote:
>>>>> SSE vector arithmetic and logic instructions only accept aligned
>>memory
>>>>> operand.  This patch adds vector_memory_operand and "Bm" constraint
>>for
>>>>> aligned SSE memory operand.  They are applied to SSE any_logic
>>patterns.
>>>>>
>>>>> OK for trunk and release branches if there are regressions?
>>>>
>>>> This patch is just papering over deeper problem, as Jakub said in
>>the PR [1]:
>>>>
>>>> --q--
>>>> GCC uses the ix86_legitimate_combined_insn target hook to disallow
>>>> misaligned memory into certain SSE instructions.
>>>> (subreg:V4SI (reg:TI 245 [ MEM[(const struct bitset
>>&)FeatureEntry_21 + 8] ]) 0)
>>>> is not misaligned memory, it is a subreg of a pseudo register, so it
>>is fine.
>>>> If the replacement of the pseudo register with memory happens in
>>some
>>>> other pass, then it probably either should use the
>>>> legitimate_combined_insn target hook or some other one.  I think we
>>>> have already a PR where that happens during live range shrinking.
>>>> --/q--
>>>>
>>>> Please figure out where memory replacement happens. There are
>>several
>>>> other SSE insns (please grep the .md for "ssememalign" attribute)
>>that
>>>> are affected by this problem, so fixing a couple of patterns won't
>>>> solve the problem completely.
>>>
>>> LRA turns
>>>
>>> insn 64 63 108 6 (set (reg:V4SI 148 [ vect__28.85 ])
>>>         (xor:V4SI (reg:V4SI 149)
>>>             (subreg:V4SI (reg:TI 147 [ MEM[(const struct bitset
>>> &)FeatureEntry_2(D)] ]) 0))) foo.ii:26 3454 {*xorv4si3}
>>>      (expr_list:REG_DEAD (reg:V4SI 149)
>>>         (expr_list:REG_DEAD (reg:TI 147 [ MEM[(const struct bitset
>>> &)FeatureEntry_2(D)] ])
>>>             (expr_list:REG_EQUIV (mem/c:V4SI (plus:DI (reg/f:DI 20
>>frame)
>>>                         (const_int -16 [0xfffffffffffffff0])) [3
>>> MEM[(unsigned int *)&D.2851]+0 S16 A128])
>>>                 (nil)))))
>>>
>>> into
>>>
>>> (insn 64 63 108 6 (set (reg:V4SI 21 xmm0 [orig:148 vect__28.85 ]
>>[148])
>>>         (xor:V4SI (reg:V4SI 21 xmm0 [149])
>>>             (mem:V4SI (reg/v/f:DI 4 si [orig:117 FeatureEntry ]
>>[117])
>>> [6 MEM[(const struct bitset &)FeatureEntry_2(D)]+0 S16 A32])))
>>> foo.ii:26 3454 {*xorv4si3}
>>>      (expr_list:REG_EQUIV (mem/c:V4SI (plus:DI (reg/f:DI 20 frame)
>>>                 (const_int -16 [0xfffffffffffffff0])) [3
>>MEM[(unsigned
>>> int *)&D.2851]+0 S16 A128])
>>>         (nil)))
>>>
>>> since
>>>
>>> (mem:V4SI (reg/v/f:DI 4 si [orig:117 FeatureEntry ] [117]) [6
>>> MEM[(const struct bitset &)FeatureEntry_2(D)]+0 S16 A32])))
>>>
>>> satisfies the 'm" constraint.  I don't think LRA should call
>>> ix86_legitimate_combined_insn to validate to validate constraints on
>>> an instruction.
>>
>>Hm...
>>
>>if LRA desn't assume that generic "m" constraint implies at least
>>natural alignment of propageted operand, then your patch is the way to
>>go.
>
> I don't think it even considers alignment. Archs where alignment validity depends on the actual instruction should model this with proper constraints.
>
> But in this case, *every* SSE vector memory constraint should be
>>changed to Bm.
>
> I'd say so ...

The "Bm" constraint should be applied only to non-move SSE
instructions with 16-byte memory operand.
diff mbox

Patch

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index b46d32b..ef4566a 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -148,6 +148,7 @@ 
 ;; We use the B prefix to denote any number of internal operands:
 ;;  f  FLAGS_REG
 ;;  g  GOT memory operand.
+;;  m  Vector memory operand
 ;;  s  Sibcall memory operand, not valid for TARGET_X32
 ;;  w  Call memory operand, not valid for TARGET_X32
 ;;  z  Constant call address operand.
@@ -160,6 +161,10 @@ 
   "@internal GOT memory operand."
   (match_operand 0 "GOT_memory_operand"))
 
+(define_constraint "Bm"
+  "@internal Vector memory operand."
+  (match_operand 0 "vector_memory_operand"))
+
 (define_constraint "Bs"
   "@internal Sibcall memory operand."
   (ior (and (not (match_test "TARGET_X32"))
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 17249e7..1f1368b 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -951,6 +951,13 @@ 
        (match_test "INTEGRAL_MODE_P (GET_MODE (op))")
        (match_test "op == CONSTM1_RTX (GET_MODE (op))")))
 
+; Return true when OP is operand acceptable for vector memory operand.
+; Only AVX can have misaligned memory operand.
+(define_predicate "vector_memory_operand"
+  (and (match_code "mem")
+       (ior (match_test "TARGET_AVX")
+	    (match_test "MEM_ALIGN (op) >= GET_MODE_ALIGNMENT (mode)"))))
+
 ; Return true when OP is operand acceptable for standard SSE move.
 (define_predicate "vector_move_operand"
   (ior (match_operand 0 "nonimmediate_operand")
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 6740edf..39b8d05 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -3240,7 +3240,7 @@ 
   [(set (match_operand:VF_128_256 0 "register_operand" "=x,v")
 	(any_logic:VF_128_256
 	  (match_operand:VF_128_256 1 "nonimmediate_operand" "%0,v")
-	  (match_operand:VF_128_256 2 "nonimmediate_operand" "xm,vm")))]
+	  (match_operand:VF_128_256 2 "nonimmediate_operand" "xBm,vm")))]
   "TARGET_SSE && <mask_avx512vl_condition>
    && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
 {
@@ -3489,7 +3489,7 @@ 
   [(set (match_operand:TF 0 "register_operand" "=x,x")
 	(any_logic:TF
 	  (match_operand:TF 1 "nonimmediate_operand" "%0,x")
-	  (match_operand:TF 2 "nonimmediate_operand" "xm,xm")))]
+	  (match_operand:TF 2 "nonimmediate_operand" "xBm,xm")))]
   "TARGET_SSE
    && ix86_binary_operator_ok (<CODE>, TFmode, operands)"
 {
@@ -11361,7 +11361,7 @@ 
   [(set (match_operand:VI48_AVX_AVX512F 0 "register_operand" "=x,v")
 	(any_logic:VI48_AVX_AVX512F
 	  (match_operand:VI48_AVX_AVX512F 1 "nonimmediate_operand" "%0,v")
-	  (match_operand:VI48_AVX_AVX512F 2 "nonimmediate_operand" "xm,vm")))]
+	  (match_operand:VI48_AVX_AVX512F 2 "nonimmediate_operand" "xBm,vm")))]
   "TARGET_SSE && <mask_mode512bit_condition>
    && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
 {
@@ -11457,7 +11457,7 @@ 
   [(set (match_operand:VI12_AVX_AVX512F 0 "register_operand" "=x,v")
 	(any_logic: VI12_AVX_AVX512F
 	  (match_operand:VI12_AVX_AVX512F 1 "nonimmediate_operand" "%0,v")
-	  (match_operand:VI12_AVX_AVX512F 2 "nonimmediate_operand" "xm,vm")))]
+	  (match_operand:VI12_AVX_AVX512F 2 "nonimmediate_operand" "xBm,vm")))]
   "TARGET_SSE && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
 {
   static char buf[64];
diff --git a/gcc/testsuite/g++.dg/pr68991.C b/gcc/testsuite/g++.dg/pr68991.C
new file mode 100644
index 0000000..6cd9a63
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr68991.C
@@ -0,0 +1,191 @@ 
+// { dg-do compile { target i?86-*-* x86_64-*-* } }
+// { dg-options "-std=c++11 -O3 -msse2 -mno-avx -fno-exceptions -fno-rtti -fdump-rtl-final" }
+
+typedef unsigned int size_type;
+
+#define _GLIBCXX_BITSET_BITS_PER_WORD  (__CHAR_BIT__ * __SIZEOF_INT__)
+#define _GLIBCXX_BITSET_WORDS(__n) \
+  ((__n) / _GLIBCXX_BITSET_BITS_PER_WORD + \
+   ((__n) % _GLIBCXX_BITSET_BITS_PER_WORD == 0 ? 0 : 1))
+
+namespace std
+{
+  template<size_type _Nw>
+    struct _Base_bitset
+    {
+      typedef unsigned int _WordT;
+      _WordT 		_M_w[_Nw];
+
+      _WordT&
+      _M_hiword()
+      { return _M_w[_Nw - 1]; }
+
+      void
+      _M_do_and(const _Base_bitset<_Nw>& __x)
+      {
+	for (size_type __i = 0; __i < _Nw; __i++)
+	  _M_w[__i] &= __x._M_w[__i];
+      }
+
+      void
+      _M_do_flip()
+      {
+	for (size_type __i = 0; __i < _Nw; __i++)
+	  _M_w[__i] = ~_M_w[__i];
+      }
+
+      bool
+      _M_is_equal(const _Base_bitset<_Nw>& __x) const
+      {
+	for (size_type __i = 0; __i < _Nw; ++__i)
+	  if (_M_w[__i] != __x._M_w[__i])
+	    return false;
+	return true;
+      }
+
+      bool
+      _M_is_any() const
+      {
+	for (size_type __i = 0; __i < _Nw; __i++)
+	  if (_M_w[__i] != static_cast<_WordT>(0))
+	    return true;
+	return false;
+      }
+    };
+
+  template<size_type _Extrabits>
+    struct _Sanitize
+    {
+      typedef unsigned int _WordT;
+
+      static void
+      _S_do_sanitize(_WordT& __val)
+      { __val &= ~((~static_cast<_WordT>(0)) << _Extrabits); }
+    };
+
+  template<size_type _Nb>
+    class bitset
+    : private _Base_bitset<_GLIBCXX_BITSET_WORDS(_Nb)>
+    {
+    private:
+      typedef _Base_bitset<_GLIBCXX_BITSET_WORDS(_Nb)> _Base;
+      typedef unsigned int _WordT;
+
+      void
+      _M_do_sanitize()
+      { 
+	typedef _Sanitize<_Nb % _GLIBCXX_BITSET_BITS_PER_WORD> __sanitize_type;
+	__sanitize_type::_S_do_sanitize(this->_M_hiword());
+      }
+
+    public:
+      class reference
+      {
+	friend class bitset;
+
+	_WordT*	_M_wp;
+	size_type 	_M_bpos;
+	
+      public:
+	reference&
+	flip()
+	{
+	  *_M_wp ^= _Base::_S_maskbit(_M_bpos);
+	  return *this;
+	}
+      };
+
+      bitset<_Nb>&
+      operator&=(const bitset<_Nb>& __rhs)
+      {
+	this->_M_do_and(__rhs);
+	return *this;
+      }
+
+      bitset<_Nb>&
+      flip() 
+      {
+	this->_M_do_flip();
+	this->_M_do_sanitize();
+	return *this;
+      }
+      
+      bitset<_Nb>
+      operator~() const 
+      { return bitset<_Nb>(*this).flip(); }
+
+      bool
+      operator==(const bitset<_Nb>& __rhs) const 
+      { return this->_M_is_equal(__rhs); }
+
+      bool
+      any() const 
+      { return this->_M_is_any(); }
+    };
+
+  template<size_type _Nb>
+    inline bitset<_Nb>
+    operator&(const bitset<_Nb>& __x, const bitset<_Nb>& __y) 
+    {
+      bitset<_Nb> __result(__x);
+      __result &= __y;
+      return __result;
+    }
+}
+template<typename T>
+class ArrayRef {
+public:
+    typedef const T *iterator;
+
+private:
+    const T *Data;
+    size_type Length;
+
+public:
+    iterator begin() const { return Data; }
+    iterator end() const { return Data + Length; }
+};
+
+const unsigned MAX_SUBTARGET_FEATURES = 128;
+class FeatureBitset : public std::bitset<MAX_SUBTARGET_FEATURES> {
+};
+
+struct SubtargetFeatureKV {
+  FeatureBitset Value;
+  FeatureBitset Implies;
+};
+
+struct SubtargetInfoKV {
+  const void *Value;
+};
+class SubtargetFeatures {
+public:
+    FeatureBitset ToggleFeature(FeatureBitset Bits,
+				const SubtargetFeatureKV *,
+				ArrayRef<SubtargetFeatureKV> FeatureTable);
+};
+
+static
+void ClearImpliedBits(FeatureBitset &Bits,
+		      const SubtargetFeatureKV *FeatureEntry,
+		      ArrayRef<SubtargetFeatureKV> FeatureTable) {
+  for (auto &FE : FeatureTable) {
+    if ((FE.Implies & FeatureEntry->Value).any()) {
+      Bits &= ~FE.Value;
+      ClearImpliedBits(Bits, &FE, FeatureTable);
+    }
+  }
+}
+
+FeatureBitset
+SubtargetFeatures::ToggleFeature(FeatureBitset Bits,
+				 const SubtargetFeatureKV *FeatureEntry,
+				 ArrayRef<SubtargetFeatureKV> FeatureTable) {
+    if ((Bits & FeatureEntry->Value) == FeatureEntry->Value) {
+      Bits &= ~FeatureEntry->Value;
+      ClearImpliedBits(Bits, FeatureEntry, FeatureTable);
+    }
+  return Bits;
+}
+
+// { dg-final { scan-rtl-dump-not "S16 A32\[^\n\]*\\\*xorv4si3" "final" } }