Fix vextract* masked patterns (PR target/93069)

Hi!

The AVX512F documentation clearly states that in instructions where the
destination is a memory only merging-masking is possible, not zero-masking,
and the assembler enforces that.

The testcase in this patch fails to assemble because of
Error: unsupported masking for `vextracti32x8'
on
        vextracti32x8   $0x0, %zmm1, -64(%rsp){%k1}{z}
For the vector extraction patterns, we apparently have 7 *_maskm patterns
that only accept memory destinations and rtx_equal_p merge-masking source
for it, 7 *<mask_name> corresponding patterns that allow memory destination
only for the non-masked cases (through <store_mask_constraint>), then 2
*<mask_name> patterns (lo ssehalf V16FI and lo ssehalf VI8F_256 ones) which
do allow memory destination even for masked cases and are the cause of the
testsuite failure, because we must not allow C constraint if the destination
is m, and finally one pair of patterns (separate * and *_mask, hi ssehalf
VI4F_256), which has another issue (for which I don't have a testcase
though), where if it would match zero-masking with register destination,
it wouldn't emit the needed {z} into assembly.
The attached patch fixes those 3 issues only, perhaps more suitable for
backporting.
But, even with that fixed, we are missing 3 further *_maskm patterns and
more importantly, I find the split into 3 separate patterns after subst,
*_maskm for masking with memory destination, *_mask for masking with
register destination and * for non-masking unnecessarily complex and harder
for reload, so the included patch below (non-attached) instead kills all
*_maskm patterns and splits the *<mask_name> patterns into * and *_mask
by hand instead of subst, where the *_mask ones make sure that with v
destination they use 0C, while with m destination they use 0 and as
condition enforce that either destination is not MEM, or rtx_equal_p between
the destination and corresponding merging-masking operand source.
If we had those 3 missing *_maskm patterns, this patch would actually result
in both shorter sse.md and shorter machine description after subst (e.g.
length of tmp-mddump.md), as we don't have them, the patch is actually 16
lines longer sse.md, but still shorter tmp-mddump.md.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk (and is
the shorter patch ok for backports)?

2019-12-30  Jakub Jelinek  <jakub@redhat.com>

	PR target/93069
	* config/i386/subst.md (store_mask_constraint, store_mask_predicate):
	Remove.
	(avx512dq_vextract<shuffletype>64x2_1_maskm,
	avx512f_vextract<shuffletype>32x4_1_maskm,
	vec_extract_lo_<mode>_maskm, vec_extract_hi_<mode>_maskm): Remove.
	(<mask_codefor>avx512dq_vextract<shuffletype>64x2_1<mask_name>): Split
	into ...
	(*avx512dq_vextract<shuffletype>64x2_1,
	avx512dq_vextract<shuffletype>64x2_1_mask): ... these new
	define_insns.  Even in the masked variant allow memory output but in
	that case use 0 rather than 0C constraint on the source of masked-out
	elts.
	(<mask_codefor>avx512f_vextract<shuffletype>32x4_1<mask_name>): Split
	into ...
	(*avx512f_vextract<shuffletype>32x4_1,
	avx512f_vextract<shuffletype>32x4_1_mask): ... these new define_insns.
	Even in the masked variant allow memory output but in that case use
	0 rather than 0C constraint on the source of masked-out elts.
	(vec_extract_lo_<mode><mask_name>): Split into ...
	(vec_extract_lo_<mode>, vec_extract_lo_<mode>_mask): ... these new
	define_insns.  Even in the masked variant allow memory output but in
	that case use 0 rather than 0C constraint on the source of masked-out
	elts.
	(vec_extract_hi_<mode><mask_name>): Split into ...
	(vec_extract_hi_<mode>, vec_extract_hi_<mode>_mask): ... these new
	define_insns.  Even in the masked variant allow memory output but in
	that case use 0 rather than 0C constraint on the source of masked-out
	elts.

	* gcc.target/i386/avx512vl-pr93069.c: New test.
	* gcc.dg/vect/pr93069.c: New test.


	Jakub
2019-12-30  Jakub Jelinek  <jakub@redhat.com>

	PR target/93069
	* config/i386/sse.md (vec_extract_lo_<mode><mask_name>): Use
	<store_mask_constraint> instead of m in output operand constraint.
	(vec_extract_hi_<mode><mask_name>): Use <mask_operand2> instead of
	%{%3%}.

	* gcc.target/i386/avx512vl-pr93069.c: New test.
	* gcc.dg/vect/pr93069.c: New test.

--- gcc/config/i386/sse.md.jj	2019-12-27 18:16:48.146431083 +0100
+++ gcc/config/i386/sse.md	2019-12-28 14:43:29.181456611 +0100
@@ -8782,7 +8782,8 @@
 })
 
 (define_insn "vec_extract_lo_<mode><mask_name>"
-  [(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand" "=v,v,m")
+  [(set (match_operand:<ssehalfvecmode> 0 "<store_mask_predicate>"
+					  "=v,v,<store_mask_constraint>")
 	(vec_select:<ssehalfvecmode>
 	  (match_operand:V16FI 1 "<store_mask_predicate>"
 				 "v,<store_mask_constraint>,v")
@@ -8834,7 +8835,8 @@
 })
 
 (define_insn "vec_extract_lo_<mode><mask_name>"
-  [(set (match_operand:<ssehalfvecmode> 0 "<store_mask_predicate>" "=v,v,m")
+  [(set (match_operand:<ssehalfvecmode> 0 "<store_mask_predicate>"
+					  "=v,v,<store_mask_constraint>")
 	(vec_select:<ssehalfvecmode>
 	  (match_operand:VI8F_256 1 "<store_mask_predicate>"
 				    "v,<store_mask_constraint>,v")
@@ -8844,7 +8846,7 @@
    && (<mask_applied> || !(MEM_P (operands[0]) && MEM_P (operands[1])))"
 {
   if (<mask_applied>)
-    return "vextract<shuffletype>64x2\t{$0x0, %1, %0%{%3%}|%0%{%3%}, %1, 0x0}";
+    return "vextract<shuffletype>64x2\t{$0x0, %1, %0<mask_operand2>|%0<mask_operand2>, %1, 0x0}";
   else
     return "#";
 }
--- gcc/testsuite/gcc.target/i386/avx512vl-pr93069.c.jj	2019-12-28 16:31:30.118695074 +0100
+++ gcc/testsuite/gcc.target/i386/avx512vl-pr93069.c	2019-12-28 16:32:16.920990539 +0100
@@ -0,0 +1,12 @@
+/* PR target/93069 */
+/* { dg-do assemble { target vect_simd_clones } } */
+/* { dg-options "-O2 -fopenmp-simd -mtune=skylake-avx512" } */
+/* { dg-additional-options "-mavx512vl" { target avx512vl } } */
+/* { dg-additional-options "-mavx512dq" { target avx512dq } } */
+
+#pragma omp declare simd
+int
+foo (int x, int y)
+{
+  return x == 0 ? x : y;
+}
--- gcc/testsuite/gcc.dg/vect/pr93069.c.jj	2019-12-28 16:31:01.822121036 +0100
+++ gcc/testsuite/gcc.dg/vect/pr93069.c	2019-12-28 16:30:35.503517205 +0100
@@ -0,0 +1,10 @@
+/* PR target/93069 */
+/* { dg-do assemble { target vect_simd_clones } } */
+/* { dg-options "-O2 -fopenmp-simd" } */
+
+#pragma omp declare simd
+int
+foo (int x, int y)
+{
+  return x == 0 ? x : y;
+}


Message ID	20191229234622.GT10088@tucnak
State	New
Headers	show Return-Path: <gcc-patches-return-516528-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:reply-to:mime-version :content-type; q=dns; s=default; b=e3OXKZXPBbco8yNTsmE9D7+U1A8kZ gMOm7DzK8z8ND7F0xNFizwzhrNVYGCwijQYKgBQ43w/K0O2TyiyiM2ZQcM+jHgbu p1ZhRu3CsRTx9ujnsGwjMQIDhDrm7Eg+8evzYR08H9skjOl7dmOXXgUHIcDSKTEM TSd9Qq59IJiJE8= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org Date: Mon, 30 Dec 2019 00:46:22 +0100 From: Jakub Jelinek <jakub@redhat.com> To: Uros Bizjak <ubizjak@gmail.com>, Jeff Law <law@redhat.com>, hjl.tools@gmail.com Cc: gcc-patches@gcc.gnu.org Subject: [PATCH] Fix vextract* masked patterns (PR target/93069) Message-ID: <20191229234622.GT10088@tucnak> Reply-To: Jakub Jelinek <jakub@redhat.com> MIME-Version: 1.0 User-Agent: Mutt/1.11.3 (2019-02-01) Content-Type: multipart/mixed; boundary="R3SAtXwgGp429mN7" Content-Disposition: inline
Series	Fix vextract* masked patterns (PR target/93069) \| expand Fix vextract* masked patterns (PR target/93069)

Fix vextract* masked patterns (PR target/93069)

Commit Message

Comments

Patch