Message ID | 0fccef6b-2885-d91d-ff05-b23c2315fc7d@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | [1/7,V4] ifn/optabs: Support vector load/store with length | expand |
"Kewen.Lin" <linkw@linux.ibm.com> writes: > @@ -5167,6 +5167,24 @@ mode @var{n}. > > This pattern is not allowed to @code{FAIL}. > > +@cindex @code{lenload@var{m}} instruction pattern > +@item @samp{lenload@var{m}} > +Perform a vector load with length from memory operand 1 of mode @var{m} > +into register operand 0. Length is provided in register operand 2 with > +appropriate mode which should afford the maximal required precision of > +any available lengths. I think we need to say in more detail what “load with length” actually means. How about: Load the number of bytes specified by operand 2 from memory operand 1 into register operand 0, setting the other bytes of operand 0 to undefined values. Operands 0 and 1 have mode @var{m}. Operand 2 has whichever integer mode the target prefers. > +@cindex @code{lenstore@var{m}} instruction pattern > +@item @samp{lenstore@var{m}} > +Perform a vector store with length from register operand 1 of mode @var{m} > +into memory operand 0. Length is provided in register operand 2 with > +appropriate mode which should afford the maximal required precision of > +any available lengths. Similarly here: Store the number of bytes specified by operand 2 from nonmemory operand 1 into memory operand 0, leaving the other bytes of operand 0 unchanged. Operands 0 and 1 have mode @var{m}. Operand 2 has whichever integer mode the target prefers. > @@ -2478,7 +2480,7 @@ expand_call_mem_ref (tree type, gcall *stmt, int index) > return fold_build2 (MEM_REF, type, addr, build_int_cst (alias_ptr_type, 0)); > } > > -/* Expand MASK_LOAD{,_LANES} call STMT using optab OPTAB. */ > +/* Expand MASK_LOAD{,_LANES} and LEN_LOAD call STMT using optab OPTAB. */ s/and/or/. > > static void > expand_mask_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab) Think it would be worth generalising the name, e.g. to expand_partial_load_optab_fn, and adding a #define for expand_mask_load_optab_fn before the other two #defines. Same comments for stores. > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def > index 1d190d492ff..ed6561f296a 100644 > --- a/gcc/internal-fn.def > +++ b/gcc/internal-fn.def > @@ -49,11 +49,13 @@ along with GCC; see the file COPYING3. If not see > - load_lanes: currently just vec_load_lanes > - mask_load_lanes: currently just vec_mask_load_lanes > - gather_load: used for {mask_,}gather_load > + - len_load: currently just lenload > > - mask_store: currently just maskstore > - store_lanes: currently just vec_store_lanes > - mask_store_lanes: currently just vec_mask_store_lanes > - scatter_store: used for {mask_,}scatter_store > + - len_store: currently just lenstore > > - unary: a normal unary optab, such as vec_reverse_<mode> > - binary: a normal binary optab, such as vec_interleave_lo_<mode> > @@ -127,6 +129,8 @@ DEF_INTERNAL_OPTAB_FN (GATHER_LOAD, ECF_PURE, gather_load, gather_load) > DEF_INTERNAL_OPTAB_FN (MASK_GATHER_LOAD, ECF_PURE, > mask_gather_load, gather_load) > > +DEF_INTERNAL_OPTAB_FN (LEN_LOAD, ECF_PURE, lenload, len_load) > + > DEF_INTERNAL_OPTAB_FN (SCATTER_STORE, 0, scatter_store, scatter_store) > DEF_INTERNAL_OPTAB_FN (MASK_SCATTER_STORE, 0, > mask_scatter_store, scatter_store) > @@ -136,6 +140,8 @@ DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_CONST, vec_store_lanes, store_lanes) > DEF_INTERNAL_OPTAB_FN (MASK_STORE_LANES, 0, > vec_mask_store_lanes, mask_store_lanes) > > +DEF_INTERNAL_OPTAB_FN (LEN_STORE, 0, lenstore, len_store) > + > DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | ECF_NOTHROW, while_ult, while) > DEF_INTERNAL_OPTAB_FN (CHECK_RAW_PTRS, ECF_CONST | ECF_NOTHROW, > check_raw_ptrs, check_ptrs) > diff --git a/gcc/optabs.def b/gcc/optabs.def > index 0c64eb52a8d..9fe4ac1840d 100644 > --- a/gcc/optabs.def > +++ b/gcc/optabs.def > @@ -435,3 +435,5 @@ OPTAB_D (check_war_ptrs_optab, "check_war_ptrs$a") > OPTAB_DC (vec_duplicate_optab, "vec_duplicate$a", VEC_DUPLICATE) > OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES) > OPTAB_D (vec_shl_insert_optab, "vec_shl_insert_$a") > +OPTAB_D (lenload_optab, "lenload$a") > +OPTAB_D (lenstore_optab, "lenstore$a") Sorry, I should have picked up on this last time, but I think we should be consistent about whether there's an underscore after “len” or not. I realise this is just replicating what happens for IFN_MASK_LOAD/ “maskload” and IFN_MASK_STORE/“maskstore”, but it's something I kept tripping over when implementing those for SVE. Personally I think it is easier to read with the underscore, so this would be “len_load_optab” and “len_load$a” (or “len_load_$a”, there's no real consistency on that). Same for stores. Thanks, Richard
Hi! On Mon, Jun 22, 2020 at 08:59:48PM +0100, Richard Sandiford wrote: > "Kewen.Lin" <linkw@linux.ibm.com> writes: > > @@ -5167,6 +5167,24 @@ mode @var{n}. > > > > This pattern is not allowed to @code{FAIL}. > > > > +@cindex @code{lenload@var{m}} instruction pattern > > +@item @samp{lenload@var{m}} > > +Perform a vector load with length from memory operand 1 of mode @var{m} > > +into register operand 0. Length is provided in register operand 2 with > > +appropriate mode which should afford the maximal required precision of > > +any available lengths. > > I think we need to say in more detail what “load with length” actually > means. How about: > > Load the number of bytes specified by operand 2 from memory operand 1 > into register operand 0, setting the other bytes of operand 0 to > undefined values. Operands 0 and 1 have mode @var{m}. Operand 2 has > whichever integer mode the target prefers. The Power instructions set the other bytes to 0. There is no great way to zero them out with other insns either, so lenloadM should do it, from our viewpoint. (The problem case is when the length is not constant). > > +@cindex @code{lenstore@var{m}} instruction pattern > > +@item @samp{lenstore@var{m}} > > +Perform a vector store with length from register operand 1 of mode @var{m} > > +into memory operand 0. Length is provided in register operand 2 with > > +appropriate mode which should afford the maximal required precision of > > +any available lengths. > > Similarly here: > > Store the number of bytes specified by operand 2 from nonmemory operand 1 > into memory operand 0, leaving the other bytes of operand 0 unchanged. > Operands 0 and 1 have mode @var{m}. Operand 2 has whichever integer > mode the target prefers. That is what our insns do, yes, so that is fine with me :-) > Sorry, I should have picked up on this last time, but I think we should > be consistent about whether there's an underscore after “len” or not. > I realise this is just replicating what happens for IFN_MASK_LOAD/ > “maskload” and IFN_MASK_STORE/“maskstore”, but it's something I kept > tripping over when implementing those for SVE. > > Personally I think it is easier to read with the underscore, so this > would be “len_load_optab” and “len_load$a” (or “len_load_$a”, > there's no real consistency on that). Same for stores. With a second underscore is better readable than with just one (the mode does not apply to just the part after the underscore). Thanks, Segher
On Mon, 22 Jun 2020, Richard Sandiford wrote: > "Kewen.Lin" <linkw@linux.ibm.com> writes: > > @@ -5167,6 +5167,24 @@ mode @var{n}. > > > > This pattern is not allowed to @code{FAIL}. > > > > +@cindex @code{lenload@var{m}} instruction pattern > > +@item @samp{lenload@var{m}} > > +Perform a vector load with length from memory operand 1 of mode @var{m} > > +into register operand 0. Length is provided in register operand 2 with > > +appropriate mode which should afford the maximal required precision of > > +any available lengths. > > I think we need to say in more detail what “load with length” actually > means. How about: > > Load the number of bytes specified by operand 2 from memory operand 1 > into register operand 0, setting the other bytes of operand 0 to > undefined values. Operands 0 and 1 have mode @var{m}. Operand 2 has > whichever integer mode the target prefers. IMHO it should also say whether length may be >= the mode byte size (should we say "units" instead of bytes everywhere?), and if so what the behavior is. That is, whether explicit masking is required or if there's implicit masking or whether values >= the mode byte size will load all bytes. Does the number of bytes have to be a multiple of the vector component size? That is, is there any difference between lenloadv4si and lenloadv8hi? > > +@cindex @code{lenstore@var{m}} instruction pattern > > +@item @samp{lenstore@var{m}} > > +Perform a vector store with length from register operand 1 of mode @var{m} > > +into memory operand 0. Length is provided in register operand 2 with > > +appropriate mode which should afford the maximal required precision of > > +any available lengths. > > Similarly here: > > Store the number of bytes specified by operand 2 from nonmemory operand 1 > into memory operand 0, leaving the other bytes of operand 0 unchanged. > Operands 0 and 1 have mode @var{m}. Operand 2 has whichever integer > mode the target prefers. > > > @@ -2478,7 +2480,7 @@ expand_call_mem_ref (tree type, gcall *stmt, int index) > > return fold_build2 (MEM_REF, type, addr, build_int_cst (alias_ptr_type, 0)); > > } > > > > -/* Expand MASK_LOAD{,_LANES} call STMT using optab OPTAB. */ > > +/* Expand MASK_LOAD{,_LANES} and LEN_LOAD call STMT using optab OPTAB. */ > > s/and/or/. > > > > > static void > > expand_mask_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab) > > Think it would be worth generalising the name, e.g. to > expand_partial_load_optab_fn, and adding a #define for > expand_mask_load_optab_fn before the other two #defines. > > Same comments for stores. > > > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def > > index 1d190d492ff..ed6561f296a 100644 > > --- a/gcc/internal-fn.def > > +++ b/gcc/internal-fn.def > > @@ -49,11 +49,13 @@ along with GCC; see the file COPYING3. If not see > > - load_lanes: currently just vec_load_lanes > > - mask_load_lanes: currently just vec_mask_load_lanes > > - gather_load: used for {mask_,}gather_load > > + - len_load: currently just lenload > > > > - mask_store: currently just maskstore > > - store_lanes: currently just vec_store_lanes > > - mask_store_lanes: currently just vec_mask_store_lanes > > - scatter_store: used for {mask_,}scatter_store > > + - len_store: currently just lenstore > > > > - unary: a normal unary optab, such as vec_reverse_<mode> > > - binary: a normal binary optab, such as vec_interleave_lo_<mode> > > @@ -127,6 +129,8 @@ DEF_INTERNAL_OPTAB_FN (GATHER_LOAD, ECF_PURE, gather_load, gather_load) > > DEF_INTERNAL_OPTAB_FN (MASK_GATHER_LOAD, ECF_PURE, > > mask_gather_load, gather_load) > > > > +DEF_INTERNAL_OPTAB_FN (LEN_LOAD, ECF_PURE, lenload, len_load) > > + > > DEF_INTERNAL_OPTAB_FN (SCATTER_STORE, 0, scatter_store, scatter_store) > > DEF_INTERNAL_OPTAB_FN (MASK_SCATTER_STORE, 0, > > mask_scatter_store, scatter_store) > > @@ -136,6 +140,8 @@ DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_CONST, vec_store_lanes, store_lanes) > > DEF_INTERNAL_OPTAB_FN (MASK_STORE_LANES, 0, > > vec_mask_store_lanes, mask_store_lanes) > > > > +DEF_INTERNAL_OPTAB_FN (LEN_STORE, 0, lenstore, len_store) > > + > > DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | ECF_NOTHROW, while_ult, while) > > DEF_INTERNAL_OPTAB_FN (CHECK_RAW_PTRS, ECF_CONST | ECF_NOTHROW, > > check_raw_ptrs, check_ptrs) > > diff --git a/gcc/optabs.def b/gcc/optabs.def > > index 0c64eb52a8d..9fe4ac1840d 100644 > > --- a/gcc/optabs.def > > +++ b/gcc/optabs.def > > @@ -435,3 +435,5 @@ OPTAB_D (check_war_ptrs_optab, "check_war_ptrs$a") > > OPTAB_DC (vec_duplicate_optab, "vec_duplicate$a", VEC_DUPLICATE) > > OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES) > > OPTAB_D (vec_shl_insert_optab, "vec_shl_insert_$a") > > +OPTAB_D (lenload_optab, "lenload$a") > > +OPTAB_D (lenstore_optab, "lenstore$a") > > Sorry, I should have picked up on this last time, but I think we should > be consistent about whether there's an underscore after “len” or not. > I realise this is just replicating what happens for IFN_MASK_LOAD/ > “maskload” and IFN_MASK_STORE/“maskstore”, but it's something I kept > tripping over when implementing those for SVE. > > Personally I think it is easier to read with the underscore, so this > would be “len_load_optab” and “len_load$a” (or “len_load_$a”, > there's no real consistency on that). Same for stores. > > Thanks, > Richard >
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 2c67c818da5..dd0d3ec203b 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -5167,6 +5167,24 @@ mode @var{n}. This pattern is not allowed to @code{FAIL}. +@cindex @code{lenload@var{m}} instruction pattern +@item @samp{lenload@var{m}} +Perform a vector load with length from memory operand 1 of mode @var{m} +into register operand 0. Length is provided in register operand 2 with +appropriate mode which should afford the maximal required precision of +any available lengths. + +This pattern is not allowed to @code{FAIL}. + +@cindex @code{lenstore@var{m}} instruction pattern +@item @samp{lenstore@var{m}} +Perform a vector store with length from register operand 1 of mode @var{m} +into memory operand 0. Length is provided in register operand 2 with +appropriate mode which should afford the maximal required precision of +any available lengths. + +This pattern is not allowed to @code{FAIL}. + @cindex @code{vec_perm@var{m}} instruction pattern @item @samp{vec_perm@var{m}} Output a (variable) vector permutation. Operand 0 is the destination diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c index 5e9aa60721e..3d590517e5d 100644 --- a/gcc/internal-fn.c +++ b/gcc/internal-fn.c @@ -104,10 +104,12 @@ init_internal_fns () #define load_lanes_direct { -1, -1, false } #define mask_load_lanes_direct { -1, -1, false } #define gather_load_direct { 3, 1, false } +#define len_load_direct { -1, -1, false } #define mask_store_direct { 3, 2, false } #define store_lanes_direct { 0, 0, false } #define mask_store_lanes_direct { 0, 0, false } #define scatter_store_direct { 3, 1, false } +#define len_store_direct { 3, 3, false } #define unary_direct { 0, 0, true } #define binary_direct { 0, 0, true } #define ternary_direct { 0, 0, true } @@ -2478,7 +2480,7 @@ expand_call_mem_ref (tree type, gcall *stmt, int index) return fold_build2 (MEM_REF, type, addr, build_int_cst (alias_ptr_type, 0)); } -/* Expand MASK_LOAD{,_LANES} call STMT using optab OPTAB. */ +/* Expand MASK_LOAD{,_LANES} and LEN_LOAD call STMT using optab OPTAB. */ static void expand_mask_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab) @@ -2497,6 +2499,8 @@ expand_mask_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab) if (optab == vec_mask_load_lanes_optab) icode = get_multi_vector_move (type, optab); + else if (optab == lenload_optab) + icode = direct_optab_handler (optab, TYPE_MODE (type)); else icode = convert_optab_handler (optab, TYPE_MODE (type), TYPE_MODE (TREE_TYPE (maskt))); @@ -2507,15 +2511,20 @@ expand_mask_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab) target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); create_output_operand (&ops[0], target, TYPE_MODE (type)); create_fixed_operand (&ops[1], mem); - create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt))); + if (optab == lenload_optab) + create_convert_operand_from (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)), + TYPE_UNSIGNED (TREE_TYPE (maskt))); + else + create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt))); expand_insn (icode, 3, ops); if (!rtx_equal_p (target, ops[0].value)) emit_move_insn (target, ops[0].value); } #define expand_mask_load_lanes_optab_fn expand_mask_load_optab_fn +#define expand_len_load_optab_fn expand_mask_load_optab_fn -/* Expand MASK_STORE{,_LANES} call STMT using optab OPTAB. */ +/* Expand MASK_STORE{,_LANES} and LEN_STORE call STMT using optab OPTAB. */ static void expand_mask_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab) @@ -2532,6 +2541,8 @@ expand_mask_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab) if (optab == vec_mask_store_lanes_optab) icode = get_multi_vector_move (type, optab); + else if (optab == lenstore_optab) + icode = direct_optab_handler (optab, TYPE_MODE (type)); else icode = convert_optab_handler (optab, TYPE_MODE (type), TYPE_MODE (TREE_TYPE (maskt))); @@ -2542,11 +2553,16 @@ expand_mask_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab) reg = expand_normal (rhs); create_fixed_operand (&ops[0], mem); create_input_operand (&ops[1], reg, TYPE_MODE (type)); - create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt))); + if (optab == lenstore_optab) + create_convert_operand_from (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)), + TYPE_UNSIGNED (TREE_TYPE (maskt))); + else + create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt))); expand_insn (icode, 3, ops); } #define expand_mask_store_lanes_optab_fn expand_mask_store_optab_fn +#define expand_len_store_optab_fn expand_mask_store_optab_fn static void expand_ABNORMAL_DISPATCHER (internal_fn, gcall *) @@ -3128,10 +3144,12 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types, #define direct_load_lanes_optab_supported_p multi_vector_optab_supported_p #define direct_mask_load_lanes_optab_supported_p multi_vector_optab_supported_p #define direct_gather_load_optab_supported_p convert_optab_supported_p +#define direct_len_load_optab_supported_p direct_optab_supported_p #define direct_mask_store_optab_supported_p direct_optab_supported_p #define direct_store_lanes_optab_supported_p multi_vector_optab_supported_p #define direct_mask_store_lanes_optab_supported_p multi_vector_optab_supported_p #define direct_scatter_store_optab_supported_p convert_optab_supported_p +#define direct_len_store_optab_supported_p direct_optab_supported_p #define direct_while_optab_supported_p convert_optab_supported_p #define direct_fold_extract_optab_supported_p direct_optab_supported_p #define direct_fold_left_optab_supported_p direct_optab_supported_p @@ -3498,6 +3516,7 @@ internal_load_fn_p (internal_fn fn) case IFN_MASK_LOAD_LANES: case IFN_GATHER_LOAD: case IFN_MASK_GATHER_LOAD: + case IFN_LEN_LOAD: return true; default: @@ -3517,6 +3536,7 @@ internal_store_fn_p (internal_fn fn) case IFN_MASK_STORE_LANES: case IFN_SCATTER_STORE: case IFN_MASK_SCATTER_STORE: + case IFN_LEN_STORE: return true; default: @@ -3577,6 +3597,7 @@ internal_fn_stored_value_index (internal_fn fn) case IFN_MASK_STORE: case IFN_SCATTER_STORE: case IFN_MASK_SCATTER_STORE: + case IFN_LEN_STORE: return 3; default: diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index 1d190d492ff..ed6561f296a 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -49,11 +49,13 @@ along with GCC; see the file COPYING3. If not see - load_lanes: currently just vec_load_lanes - mask_load_lanes: currently just vec_mask_load_lanes - gather_load: used for {mask_,}gather_load + - len_load: currently just lenload - mask_store: currently just maskstore - store_lanes: currently just vec_store_lanes - mask_store_lanes: currently just vec_mask_store_lanes - scatter_store: used for {mask_,}scatter_store + - len_store: currently just lenstore - unary: a normal unary optab, such as vec_reverse_<mode> - binary: a normal binary optab, such as vec_interleave_lo_<mode> @@ -127,6 +129,8 @@ DEF_INTERNAL_OPTAB_FN (GATHER_LOAD, ECF_PURE, gather_load, gather_load) DEF_INTERNAL_OPTAB_FN (MASK_GATHER_LOAD, ECF_PURE, mask_gather_load, gather_load) +DEF_INTERNAL_OPTAB_FN (LEN_LOAD, ECF_PURE, lenload, len_load) + DEF_INTERNAL_OPTAB_FN (SCATTER_STORE, 0, scatter_store, scatter_store) DEF_INTERNAL_OPTAB_FN (MASK_SCATTER_STORE, 0, mask_scatter_store, scatter_store) @@ -136,6 +140,8 @@ DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_CONST, vec_store_lanes, store_lanes) DEF_INTERNAL_OPTAB_FN (MASK_STORE_LANES, 0, vec_mask_store_lanes, mask_store_lanes) +DEF_INTERNAL_OPTAB_FN (LEN_STORE, 0, lenstore, len_store) + DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | ECF_NOTHROW, while_ult, while) DEF_INTERNAL_OPTAB_FN (CHECK_RAW_PTRS, ECF_CONST | ECF_NOTHROW, check_raw_ptrs, check_ptrs) diff --git a/gcc/optabs.def b/gcc/optabs.def index 0c64eb52a8d..9fe4ac1840d 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -435,3 +435,5 @@ OPTAB_D (check_war_ptrs_optab, "check_war_ptrs$a") OPTAB_DC (vec_duplicate_optab, "vec_duplicate$a", VEC_DUPLICATE) OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES) OPTAB_D (vec_shl_insert_optab, "vec_shl_insert_$a") +OPTAB_D (lenload_optab, "lenload$a") +OPTAB_D (lenstore_optab, "lenstore$a")