Message ID | 20160329170521.GM3017@tucnak.redhat.com |
---|---|
State | New |
Headers | show |
On 03/29/2016 11:05 AM, Jakub Jelinek wrote: > Hi! > > The various blendm expanders look like: > (define_insn "<avx512>_blendm<mode>" > [(set (match_operand:V48_AVX512VL 0 "register_operand" "=v") > (vec_merge:V48_AVX512VL > (match_operand:V48_AVX512VL 2 "nonimmediate_operand" "vm") > (match_operand:V48_AVX512VL 1 "register_operand" "v") One could argue this ordering is just asking for trouble. Ultimately, I'll defer to Uros and Kirill. Jeff
On Tue, Mar 29, 2016 at 11:44:15AM -0600, Jeff Law wrote: > On 03/29/2016 11:05 AM, Jakub Jelinek wrote: > >Hi! > > > >The various blendm expanders look like: > >(define_insn "<avx512>_blendm<mode>" > > [(set (match_operand:V48_AVX512VL 0 "register_operand" "=v") > > (vec_merge:V48_AVX512VL > > (match_operand:V48_AVX512VL 2 "nonimmediate_operand" "vm") > > (match_operand:V48_AVX512VL 1 "register_operand" "v") > One could argue this ordering is just asking for trouble. I bet the reason for this ordering are both the x86 intrinsics and the HW behavior (see e.g. the order of arguments in the insn template of the define_insn, etc.). I think VEC_MERGE's definition on which argument you pick the elements from for 0 bits in the mask vs. 1 bits in the mask is the exact opposite of what the x86 HW wants and the intrinsics expect. Jakub
On 29 Mar 19:49, Jakub Jelinek wrote: > On Tue, Mar 29, 2016 at 11:44:15AM -0600, Jeff Law wrote: > > On 03/29/2016 11:05 AM, Jakub Jelinek wrote: > > >Hi! > > > > > >The various blendm expanders look like: > > >(define_insn "<avx512>_blendm<mode>" > > > [(set (match_operand:V48_AVX512VL 0 "register_operand" "=v") > > > (vec_merge:V48_AVX512VL > > > (match_operand:V48_AVX512VL 2 "nonimmediate_operand" "vm") > > > (match_operand:V48_AVX512VL 1 "register_operand" "v") > > One could argue this ordering is just asking for trouble. > > I bet the reason for this ordering are both the x86 intrinsics and > the HW behavior (see e.g. the order of arguments in the insn template > of the define_insn, etc.). > I think VEC_MERGE's definition on which argument you pick the elements from > for 0 bits in the mask vs. 1 bits in the mask is the exact opposite of what > the x86 HW wants and the intrinsics expect. I think that order of arguments in built-in might be changed easily. This doesn't affect intrinsics at all, because of that. I can hardly recall, but my bet is that this order was dictated by: ix86_expand_sse_movcc where order of blends args should corresond on AVX*. If we want to fix the order in the pattern we should swap op_[true|false] here: case V64QImode: gen = gen_avx512bw_blendmv64qi; break; case V32HImode: gen = gen_avx512bw_blendmv32hi; break; case V16SImode: gen = gen_avx512f_blendmv16si; break; case V8DImode: gen = gen_avx512f_blendmv8di; break; case V8DFmode: gen = gen_avx512f_blendmv8df; break; case V16SFmode: gen = gen_avx512f_blendmv16sf; break; Jakub, nay be add comment in the patch on blendm patterns emphasizing this non-regular order? Otherwise, patch is OK to me. -- Thanks, K > > Jakub
On Wed, Mar 30, 2016 at 04:53:48PM +0300, Kirill Yukhin wrote: > I think that order of arguments in built-in might be changed easily. > This doesn't affect intrinsics at all, because of that. > > I can hardly recall, but my bet is that this order was dictated by: > ix86_expand_sse_movcc where order of blends args should corresond > on AVX*. Having the AVX512* blends have different order from AVX{,2} blends would be bad though, so if we want to change the order, we'd have to change it everywhere. > Jakub, nay be add comment in the patch on blendm patterns emphasizing > this non-regular order? Ok, will do. Jakub
--- gcc/config/i386/i386.c.jj 2016-03-23 10:41:12.000000000 +0100 +++ gcc/config/i386/i386.c 2016-03-27 22:32:51.748280358 +0200 @@ -46930,7 +46930,7 @@ half: { tmp = gen_reg_rtx (mode); emit_insn (gen_rtx_SET (tmp, gen_rtx_VEC_DUPLICATE (mode, val))); - emit_insn (gen_blendm (target, tmp, target, + emit_insn (gen_blendm (target, target, tmp, force_reg (mmode, gen_int_mode (1 << elt, mmode)))); } --- gcc/testsuite/gcc.dg/torture/pr70421.c.jj 2016-03-29 09:25:37.015469084 +0200 +++ gcc/testsuite/gcc.dg/torture/pr70421.c 2016-03-29 09:25:13.000000000 +0200 @@ -0,0 +1,22 @@ +/* PR target/70421 */ +/* { dg-do run } */ +/* { dg-additional-options "-Wno-psabi -w" } */ + +typedef unsigned V __attribute__ ((vector_size (64))); + +unsigned __attribute__ ((noinline, noclone)) +foo (unsigned x, V u, V v) +{ + v[1] ^= v[2]; + x ^= ((V) v)[u[0]]; + return x; +} + +int +main () +{ + unsigned x = foo (0x10, (V) { 1 }, (V) { 0x100, 0x1000, 0x10000 }); + if (x != 0x11010) + __builtin_abort (); + return 0; +} --- gcc/testsuite/gcc.target/i386/avx512f-pr70421.c.jj 2016-03-29 09:26:23.380837148 +0200 +++ gcc/testsuite/gcc.target/i386/avx512f-pr70421.c 2016-03-29 09:27:37.066832846 +0200 @@ -0,0 +1,15 @@ +/* PR target/70421 */ +/* { dg-do run } */ +/* { dg-require-effective-target avx512f } */ +/* { dg-options "-O2 -mavx512f" } */ + +#include "avx512f-check.h" + +#define main() do_main() +#include "../../gcc.dg/torture/pr70421.c" + +static void +avx512f_test (void) +{ + do_main (); +}