Message ID | 380e1418-f262-7a09-e24a-2eb14817662f@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | [1/7,v5] ifn/optabs: Support vector load/store with length | expand |
Things have moved on due to the IRC conversation, but… "Kewen.Lin" <linkw@linux.ibm.com> writes: > on 2020/6/23 上午3:59, Richard Sandiford wrote: >> "Kewen.Lin" <linkw@linux.ibm.com> writes: >>> @@ -5167,6 +5167,24 @@ mode @var{n}. >>> >>> This pattern is not allowed to @code{FAIL}. >>> >>> +@cindex @code{lenload@var{m}} instruction pattern >>> +@item @samp{lenload@var{m}} >>> +Perform a vector load with length from memory operand 1 of mode @var{m} >>> +into register operand 0. Length is provided in register operand 2 with >>> +appropriate mode which should afford the maximal required precision of >>> +any available lengths. >> >> I think we need to say in more detail what “load with length” actually >> means. How about: >> >> Load the number of bytes specified by operand 2 from memory operand 1 >> into register operand 0, setting the other bytes of operand 0 to >> undefined values. Operands 0 and 1 have mode @var{m}. Operand 2 has >> whichever integer mode the target prefers. >> > > Thanks for nice wordings! Updated, for "... to undefined values" I changed it > to "... to undefined values or zeros" as Segher's comments to match the behavior > on Power. “set … to undefined values” means that the values are not defined by the optab interface. In other words, the target can set the bytes to whatever it wants, and gimple code can't make any assumptions about what the values of the bytes are. So setting the bytes to zero (as Power does) would conform to the interface. So would leaving the bytes in operand 0 untouched. So would using an instruction that really does leave the other bytes with undefined values, etc. So I think we should keep it as just “… to undefined values”, The alternative would be to define the interface so that targets *must* ensure that the other bytes are zeros. But at the moment, the only intended use of the optabs and ifns is for autovectorisation, and the vectoriser won't care about the values of “inactive” bytes/lanes. Forcing the target to set them to a specific value like zero would be unnecessarily restrictive. Thanks, Richard
On Tue, Jun 23, 2020 at 11:53 AM Richard Sandiford <richard.sandiford@arm.com> wrote: > > Things have moved on due to the IRC conversation, but… > > "Kewen.Lin" <linkw@linux.ibm.com> writes: > > on 2020/6/23 上午3:59, Richard Sandiford wrote: > >> "Kewen.Lin" <linkw@linux.ibm.com> writes: > >>> @@ -5167,6 +5167,24 @@ mode @var{n}. > >>> > >>> This pattern is not allowed to @code{FAIL}. > >>> > >>> +@cindex @code{lenload@var{m}} instruction pattern > >>> +@item @samp{lenload@var{m}} > >>> +Perform a vector load with length from memory operand 1 of mode @var{m} > >>> +into register operand 0. Length is provided in register operand 2 with > >>> +appropriate mode which should afford the maximal required precision of > >>> +any available lengths. > >> > >> I think we need to say in more detail what “load with length” actually > >> means. How about: > >> > >> Load the number of bytes specified by operand 2 from memory operand 1 > >> into register operand 0, setting the other bytes of operand 0 to > >> undefined values. Operands 0 and 1 have mode @var{m}. Operand 2 has > >> whichever integer mode the target prefers. > >> > > > > Thanks for nice wordings! Updated, for "... to undefined values" I changed it > > to "... to undefined values or zeros" as Segher's comments to match the behavior > > on Power. > > “set … to undefined values” means that the values are not defined by > the optab interface. In other words, the target can set the bytes > to whatever it wants, and gimple code can't make any assumptions about > what the values of the bytes are. > > So setting the bytes to zero (as Power does) would conform to the > interface. So would leaving the bytes in operand 0 untouched. > So would using an instruction that really does leave the other > bytes with undefined values, etc. > > So I think we should keep it as just “… to undefined values”, > > The alternative would be to define the interface so that targets *must* > ensure that the other bytes are zeros. But at the moment, the only > intended use of the optabs and ifns is for autovectorisation, and the > vectoriser won't care about the values of “inactive” bytes/lanes. > Forcing the target to set them to a specific value like zero would be > unnecessarily restrictive. Actually it _does_ care. This is supposed to be used for fully masked loops and 'unspecified values' would require us to explicitely zero them for any FP op because of possible sNaN representations. It also precludes us from bitwise ORing in an appropriately masked vector of 1s to make integer division happy (OK, no vector ISA supports integer division). So unless we have evidence that there exists an ISA that does _not_ zero the excess bits I'd rather specify it does. Richard. > > Thanks, > Richard
Richard Biener <richard.guenther@gmail.com> writes: > On Tue, Jun 23, 2020 at 11:53 AM Richard Sandiford > <richard.sandiford@arm.com> wrote: >> >> Things have moved on due to the IRC conversation, but… >> >> "Kewen.Lin" <linkw@linux.ibm.com> writes: >> > on 2020/6/23 上午3:59, Richard Sandiford wrote: >> >> "Kewen.Lin" <linkw@linux.ibm.com> writes: >> >>> @@ -5167,6 +5167,24 @@ mode @var{n}. >> >>> >> >>> This pattern is not allowed to @code{FAIL}. >> >>> >> >>> +@cindex @code{lenload@var{m}} instruction pattern >> >>> +@item @samp{lenload@var{m}} >> >>> +Perform a vector load with length from memory operand 1 of mode @var{m} >> >>> +into register operand 0. Length is provided in register operand 2 with >> >>> +appropriate mode which should afford the maximal required precision of >> >>> +any available lengths. >> >> >> >> I think we need to say in more detail what “load with length” actually >> >> means. How about: >> >> >> >> Load the number of bytes specified by operand 2 from memory operand 1 >> >> into register operand 0, setting the other bytes of operand 0 to >> >> undefined values. Operands 0 and 1 have mode @var{m}. Operand 2 has >> >> whichever integer mode the target prefers. >> >> >> > >> > Thanks for nice wordings! Updated, for "... to undefined values" I changed it >> > to "... to undefined values or zeros" as Segher's comments to match the behavior >> > on Power. >> >> “set … to undefined values” means that the values are not defined by >> the optab interface. In other words, the target can set the bytes >> to whatever it wants, and gimple code can't make any assumptions about >> what the values of the bytes are. >> >> So setting the bytes to zero (as Power does) would conform to the >> interface. So would leaving the bytes in operand 0 untouched. >> So would using an instruction that really does leave the other >> bytes with undefined values, etc. >> >> So I think we should keep it as just “… to undefined values”, >> >> The alternative would be to define the interface so that targets *must* >> ensure that the other bytes are zeros. But at the moment, the only >> intended use of the optabs and ifns is for autovectorisation, and the >> vectoriser won't care about the values of “inactive” bytes/lanes. >> Forcing the target to set them to a specific value like zero would be >> unnecessarily restrictive. > > Actually it _does_ care. I'd argue it doesn't, but for essentially the same reasons :-) > This is supposed to be used for fully masked > loops and 'unspecified values' would require us to explicitely zero > them for any FP op because of possible sNaN representations. It > also precludes us from bitwise ORing in an appropriately masked > vector of 1s to make integer division happy (OK, no vector ISA supports > integer division). Zeros would be a problem for FP division too. And even if we require loads to set inactive lanes to zero, we couldn't infer from that that any given FP addition (say) won't raise an exception. E.g. the inputs could be the result of converting integers and adding them could trigger an inexact exception. Or the values could be the result of simple bitcasts, giving arbitrary FP values. (AIUI, current bfloat code works this way.) The vectoriser currently only allows potentially-trapping FP operations on partial vectors if the target provides an appropriate IFN_COND_* function. (That's one of the main use cases for those functions.) In other cases it requires the loop to operate on full vectors. This should be relaxed in future to support inbranch partial vectorisation of simd calls. This means that the current patch series will/should simply punt for “length”-based loop control if the loop contains FP operations that (as far as gimple is concerned) might trap. If we're thinking about how to relax that, then IMO it will need to be done either at the level of each FP operation or by some kind of “global” vectorisation subpass that introduces known-safe values for inactive lanes. The first would be easier, the second would be more optimal. I don't think that's specific to “length” vectorisation though. The same concerns apply to if-converted loops that operate on full vectors. I think the approach would be essentially the same for both. In that scenario, removing zeroing of an IFN_LEN_LOAD would “just” be an optimisation, and could potentially be left to RTL code if necessary. (But see my main point below.) SVE supports integer division btw. :-) > So unless we have evidence that there exists an ISA that does _not_ > zero the excess bits I'd rather specify it does. I think the known architectures that might use this are: - MVE - Power - RVV MVE and Power both set inactive lanes to zero. But I'm not sure about RVV. AIUI, for RVV the approach instead would be to reduce the effective vector length for the final iteration of the vector loop, and I'm not sure whether in that situation it makes sense to say that the other elements still exist and are guaranteed to be zero. I'm the last person who should be speculating on that though. Let's see whether Jim has any comments. In summary, I'm not saying we should never define the inactive values to be zero. I just think that we should leave it until it matters. And I don't think it does/should matter for the current patch series. IFN_MASK_LOAD has been around for quite a long time now and we've never had to define the values of inactive lanes there. Thanks, Richard
On Tue, Jun 23, 2020 at 5:21 AM Richard Sandiford <richard.sandiford@arm.com> wrote: > MVE and Power both set inactive lanes to zero. But I'm not sure about RVV. > AIUI, for RVV the approach instead would be to reduce the effective vector > length for the final iteration of the vector loop, and I'm not sure > whether in that situation it makes sense to say that the other elements > still exist and are guaranteed to be zero. > > I'm the last person who should be speculating on that though. Let's see > whether Jim has any comments. The RVV spec supports two policies for tail elements, i.e. elements beyond the current vector length. They can be undisturbed or agnostic. In the undisturbed case, the trail elements retain their old values. In the agnostic case, the implementation can choose to either retain their old values, or set them to all ones, and this choice can be different from lane to lane. The latter case is useful because registers may be wider than the execution unit, and current vector length may not be a multiple of the width of the execution unit. So for instance if the vector registers can hold 8 elements, and the execution unit works on 4 elements at a time, and the current vector length is 2, then it might make sense to leave the last four elements unmodified to avoid an iteration across the registers, but the third and fourth elements might be set to all ones because you have to write to them anyways. The choice is left up to the implementation because we have multiple parties designing vector units, and some are target for low cost embedded market, and some are target for high performance, and they couldn't agree on a single best way to implement this. The software is expected to choose agnostic only if it doesn't care about what happens to tail elements, and undisturbed if you want to preserve them. The value of all ones was chosen to discourage software developers from trying to use the values in tail elements. The choice of undisturbed or agnostic can be changed every time you set the current vector length and type. In most cases, I think RVV programs will use agnostic for tail elements, since we can change the vector length at will, and it will be rare that we will care about elements beyond the current vector length. Tail elements can't cause exceptions so there is no need to worry about whether those elements hold valid values. Jim
Jim Wilson <jimw@sifive.com> writes: > On Tue, Jun 23, 2020 at 5:21 AM Richard Sandiford > <richard.sandiford@arm.com> wrote: >> MVE and Power both set inactive lanes to zero. But I'm not sure about RVV. >> AIUI, for RVV the approach instead would be to reduce the effective vector >> length for the final iteration of the vector loop, and I'm not sure >> whether in that situation it makes sense to say that the other elements >> still exist and are guaranteed to be zero. >> >> I'm the last person who should be speculating on that though. Let's see >> whether Jim has any comments. > > The RVV spec supports two policies for tail elements, i.e. elements > beyond the current vector length. They can be undisturbed or > agnostic. In the undisturbed case, the trail elements retain their > old values. In the agnostic case, the implementation can choose to > either retain their old values, or set them to all ones, and this > choice can be different from lane to lane. The latter case is useful > because registers may be wider than the execution unit, and current > vector length may not be a multiple of the width of the execution > unit. So for instance if the vector registers can hold 8 elements, > and the execution unit works on 4 elements at a time, and the current > vector length is 2, then it might make sense to leave the last four > elements unmodified to avoid an iteration across the registers, but > the third and fourth elements might be set to all ones because you > have to write to them anyways. The choice is left up to the > implementation because we have multiple parties designing vector > units, and some are target for low cost embedded market, and some are > target for high performance, and they couldn't agree on a single best > way to implement this. The software is expected to choose agnostic > only if it doesn't care about what happens to tail elements, and > undisturbed if you want to preserve them. The value of all ones was > chosen to discourage software developers from trying to use the values > in tail elements. The choice of undisturbed or agnostic can be > changed every time you set the current vector length and type. > > In most cases, I think RVV programs will use agnostic for tail > elements, since we can change the vector length at will, and it will > be rare that we will care about elements beyond the current vector > length. > > Tail elements can't cause exceptions so there is no need to worry > about whether those elements hold valid values. Thanks for the info. Based on that, I guess GCC should leave the values of extra inactive lanes undefined for now, so that the agnostic case is supported. Maybe in future we could have IFN_LEN_* versions of arithmetic operations too, similar to the IFN_COND_* ones, so that they explicitly ignore the inactive elements. Richard
Hi! On Tue, Jun 23, 2020 at 01:20:53PM +0100, Richard Sandiford wrote: > SVE supports integer division btw. :-) So does Power (ISA 3.1, power10). > In summary, I'm not saying we should never define the inactive values > to be zero. I just think that we should leave it until it matters. > And I don't think it does/should matter for the current patch series. I am perfectly happy with that. Thanks for looking at it! > IFN_MASK_LOAD has been around for quite a long time now and we've never > had to define the values of inactive lanes there. Yeah, but typically the insns that consume the values loaded will use the same masks again, so that may not be such a strong point. Segher
Thanks for the update. I agree with the summary of the IRC discussion except for… "Kewen.Lin" <linkw@linux.ibm.com> writes: > Hi Richard S./Richi/Jim/Segher, > > Thanks a lot for your comments to make this patch more solid. > > Based on our discussion, for the vector load/store with length > optab, the length unit would be measured in lanes by default. > For the targets which support length measured in bytes like Power, > they should only define VnQI modes to wrap the other same size > vector modes. If the length is larger than total lane/byte count > of the given mode, it's taken to load all lanes/bytes implicitly. …this last bit. IMO the behaviour of the optab should be undefined when the supplied length is greater than the number of lanes. I think that also makes things better for the lxvl implementation, which ignores the upper 56 bits of the length. It sounds like the above semantics would instead require Power to saturate the value at 255 before shifting it. Richard > For the remaining lanes/bytes which isn't specified by length, > they would be taken as undefined value. For length in bytes, > it's required that the byte count should be a multiple of the > element size (wrapped vector), otherwise it's undefined. > > This patch has been updated as attached. > > 2/7 for rs6000 optab defintion has been updated to use V16QI. > 5/7 for vectorizer change has been updated accordingly. > > ----- > > v6: Updated optab descriptions. > > v5: > - Updated lenload/lenstore optab to len_load/len_store and the docs. > - Rename expand_mask_{load,store}_optab_fn to expand_partial_{load,store}_optab_fn > - Added/updated macros for expand_mask_{load,store}_optab_fn > and expand_len_{load,store}_optab_fn > > v4: Update len_load_direct/len_store_direct to align with direct optab. > > v3: Get rid of length mode hook. > > BR, > Kewen > ----- > gcc/ChangeLog: > > 2020-MM-DD Kewen Lin <linkw@gcc.gnu.org> > > * doc/md.texi (len_load_@var{m}): Document. > (len_store_@var{m}): Likewise. > * internal-fn.c (len_load_direct): New macro. > (len_store_direct): Likewise. > (expand_len_load_optab_fn): Likewise. > (expand_len_store_optab_fn): Likewise. > (direct_len_load_optab_supported_p): Likewise. > (direct_len_store_optab_supported_p): Likewise. > (expand_mask_load_optab_fn): New macro. Original renamed to ... > (expand_partial_load_optab_fn): ... here. Add handlings for > len_load_optab. > (expand_mask_store_optab_fn): New macro. Original renamed to ... > (expand_partial_store_optab_fn): ... here. Add handlings for > len_store_optab. > (internal_load_fn_p): Handle IFN_LEN_LOAD. > (internal_store_fn_p): Handle IFN_LEN_STORE. > (internal_fn_stored_value_index): Handle IFN_LEN_STORE. > * internal-fn.def (LEN_LOAD): New internal function. > (LEN_STORE): Likewise. > * optabs.def (len_load_optab, len_store_optab): New optab.
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 2c67c818da5..23918136345 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -5167,6 +5167,24 @@ mode @var{n}. This pattern is not allowed to @code{FAIL}. +@cindex @code{len_load_@var{m}} instruction pattern +@item @samp{len_load_@var{m}} +Load the number of bytes specified by operand 2 from memory operand 1 +into register operand 0, setting the other bytes of operand 0 to +undefined values or zeros. Operands 0 and 1 have mode @var{m}. +Operand 2 has whichever integer mode the target prefers. + +This pattern is not allowed to @code{FAIL}. + +@cindex @code{len_store_@var{m}} instruction pattern +@item @samp{len_store_@var{m}} +Store the number of bytes specified by operand 2 from nonmemory operand 1 +into memory operand 0, leaving the other bytes of operand 0 unchanged. +Operands 0 and 1 have mode @var{m}. Operand 2 has whichever integer +mode the target prefers. + +This pattern is not allowed to @code{FAIL}. + @cindex @code{vec_perm@var{m}} instruction pattern @item @samp{vec_perm@var{m}} Output a (variable) vector permutation. Operand 0 is the destination diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c index 5e9aa60721e..f9e851069a5 100644 --- a/gcc/internal-fn.c +++ b/gcc/internal-fn.c @@ -104,10 +104,12 @@ init_internal_fns () #define load_lanes_direct { -1, -1, false } #define mask_load_lanes_direct { -1, -1, false } #define gather_load_direct { 3, 1, false } +#define len_load_direct { -1, -1, false } #define mask_store_direct { 3, 2, false } #define store_lanes_direct { 0, 0, false } #define mask_store_lanes_direct { 0, 0, false } #define scatter_store_direct { 3, 1, false } +#define len_store_direct { 3, 3, false } #define unary_direct { 0, 0, true } #define binary_direct { 0, 0, true } #define ternary_direct { 0, 0, true } @@ -2478,10 +2480,10 @@ expand_call_mem_ref (tree type, gcall *stmt, int index) return fold_build2 (MEM_REF, type, addr, build_int_cst (alias_ptr_type, 0)); } -/* Expand MASK_LOAD{,_LANES} call STMT using optab OPTAB. */ +/* Expand MASK_LOAD{,_LANES} or LEN_LOAD call STMT using optab OPTAB. */ static void -expand_mask_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab) +expand_partial_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab) { class expand_operand ops[3]; tree type, lhs, rhs, maskt; @@ -2497,6 +2499,8 @@ expand_mask_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab) if (optab == vec_mask_load_lanes_optab) icode = get_multi_vector_move (type, optab); + else if (optab == len_load_optab) + icode = direct_optab_handler (optab, TYPE_MODE (type)); else icode = convert_optab_handler (optab, TYPE_MODE (type), TYPE_MODE (TREE_TYPE (maskt))); @@ -2507,18 +2511,24 @@ expand_mask_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab) target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); create_output_operand (&ops[0], target, TYPE_MODE (type)); create_fixed_operand (&ops[1], mem); - create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt))); + if (optab == len_load_optab) + create_convert_operand_from (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)), + TYPE_UNSIGNED (TREE_TYPE (maskt))); + else + create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt))); expand_insn (icode, 3, ops); if (!rtx_equal_p (target, ops[0].value)) emit_move_insn (target, ops[0].value); } +#define expand_mask_load_optab_fn expand_partial_load_optab_fn #define expand_mask_load_lanes_optab_fn expand_mask_load_optab_fn +#define expand_len_load_optab_fn expand_partial_load_optab_fn -/* Expand MASK_STORE{,_LANES} call STMT using optab OPTAB. */ +/* Expand MASK_STORE{,_LANES} or LEN_STORE call STMT using optab OPTAB. */ static void -expand_mask_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab) +expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab) { class expand_operand ops[3]; tree type, lhs, rhs, maskt; @@ -2532,6 +2542,8 @@ expand_mask_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab) if (optab == vec_mask_store_lanes_optab) icode = get_multi_vector_move (type, optab); + else if (optab == len_store_optab) + icode = direct_optab_handler (optab, TYPE_MODE (type)); else icode = convert_optab_handler (optab, TYPE_MODE (type), TYPE_MODE (TREE_TYPE (maskt))); @@ -2542,11 +2554,17 @@ expand_mask_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab) reg = expand_normal (rhs); create_fixed_operand (&ops[0], mem); create_input_operand (&ops[1], reg, TYPE_MODE (type)); - create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt))); + if (optab == len_store_optab) + create_convert_operand_from (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)), + TYPE_UNSIGNED (TREE_TYPE (maskt))); + else + create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt))); expand_insn (icode, 3, ops); } +#define expand_mask_store_optab_fn expand_partial_store_optab_fn #define expand_mask_store_lanes_optab_fn expand_mask_store_optab_fn +#define expand_len_store_optab_fn expand_partial_store_optab_fn static void expand_ABNORMAL_DISPATCHER (internal_fn, gcall *) @@ -3128,10 +3146,12 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types, #define direct_load_lanes_optab_supported_p multi_vector_optab_supported_p #define direct_mask_load_lanes_optab_supported_p multi_vector_optab_supported_p #define direct_gather_load_optab_supported_p convert_optab_supported_p +#define direct_len_load_optab_supported_p direct_optab_supported_p #define direct_mask_store_optab_supported_p direct_optab_supported_p #define direct_store_lanes_optab_supported_p multi_vector_optab_supported_p #define direct_mask_store_lanes_optab_supported_p multi_vector_optab_supported_p #define direct_scatter_store_optab_supported_p convert_optab_supported_p +#define direct_len_store_optab_supported_p direct_optab_supported_p #define direct_while_optab_supported_p convert_optab_supported_p #define direct_fold_extract_optab_supported_p direct_optab_supported_p #define direct_fold_left_optab_supported_p direct_optab_supported_p @@ -3498,6 +3518,7 @@ internal_load_fn_p (internal_fn fn) case IFN_MASK_LOAD_LANES: case IFN_GATHER_LOAD: case IFN_MASK_GATHER_LOAD: + case IFN_LEN_LOAD: return true; default: @@ -3517,6 +3538,7 @@ internal_store_fn_p (internal_fn fn) case IFN_MASK_STORE_LANES: case IFN_SCATTER_STORE: case IFN_MASK_SCATTER_STORE: + case IFN_LEN_STORE: return true; default: @@ -3577,6 +3599,7 @@ internal_fn_stored_value_index (internal_fn fn) case IFN_MASK_STORE: case IFN_SCATTER_STORE: case IFN_MASK_SCATTER_STORE: + case IFN_LEN_STORE: return 3; default: diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index 1d190d492ff..17dac128e83 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -49,11 +49,13 @@ along with GCC; see the file COPYING3. If not see - load_lanes: currently just vec_load_lanes - mask_load_lanes: currently just vec_mask_load_lanes - gather_load: used for {mask_,}gather_load + - len_load: currently just len_load - mask_store: currently just maskstore - store_lanes: currently just vec_store_lanes - mask_store_lanes: currently just vec_mask_store_lanes - scatter_store: used for {mask_,}scatter_store + - len_store: currently just len_store - unary: a normal unary optab, such as vec_reverse_<mode> - binary: a normal binary optab, such as vec_interleave_lo_<mode> @@ -127,6 +129,8 @@ DEF_INTERNAL_OPTAB_FN (GATHER_LOAD, ECF_PURE, gather_load, gather_load) DEF_INTERNAL_OPTAB_FN (MASK_GATHER_LOAD, ECF_PURE, mask_gather_load, gather_load) +DEF_INTERNAL_OPTAB_FN (LEN_LOAD, ECF_PURE, len_load, len_load) + DEF_INTERNAL_OPTAB_FN (SCATTER_STORE, 0, scatter_store, scatter_store) DEF_INTERNAL_OPTAB_FN (MASK_SCATTER_STORE, 0, mask_scatter_store, scatter_store) @@ -136,6 +140,8 @@ DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_CONST, vec_store_lanes, store_lanes) DEF_INTERNAL_OPTAB_FN (MASK_STORE_LANES, 0, vec_mask_store_lanes, mask_store_lanes) +DEF_INTERNAL_OPTAB_FN (LEN_STORE, 0, len_store, len_store) + DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | ECF_NOTHROW, while_ult, while) DEF_INTERNAL_OPTAB_FN (CHECK_RAW_PTRS, ECF_CONST | ECF_NOTHROW, check_raw_ptrs, check_ptrs) diff --git a/gcc/optabs.def b/gcc/optabs.def index 0c64eb52a8d..78409aa1453 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -435,3 +435,5 @@ OPTAB_D (check_war_ptrs_optab, "check_war_ptrs$a") OPTAB_DC (vec_duplicate_optab, "vec_duplicate$a", VEC_DUPLICATE) OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES) OPTAB_D (vec_shl_insert_optab, "vec_shl_insert_$a") +OPTAB_D (len_load_optab, "len_load_$a") +OPTAB_D (len_store_optab, "len_store_$a")