Message ID | 559F5D7B.6070208@redhat.com |
---|---|
State | New |
Headers | show |
Hi Jeff! Thanks for your details comments. You asked: How are conditionals on vectors usually handled? You should try to mimick that and report, with detail, why that's not working. In current implementation of vectorization pass all comparisons are handled through COND_EXPR, i.e. vect-pattern pass transforms all comparisons producing bool values to conditional expressions like a[i] != 0 --> a[i]!=0? 1: 0 which vectorizers transforms to vect-cond-expr. So we don't have operations with vector operands and scalar (bool) result. To implement such operations I introduced target-hook. Could you propose another solution implementing it? Thanks. Yuri. 2015-07-10 8:51 GMT+03:00 Jeff Law <law@redhat.com>: > On 06/18/2015 08:32 AM, Yuri Rumyantsev wrote: >> >> Richard, >> >> Here is updated patch which does not include your proposal related to >> the target hook deletion. >> You wrote: >>> >>> I still don't understand why you need the new target hook. If we have a >>> masked >>> load/store then the mask is computed by an assignment with a >>> VEC_COND_EXPR >>> (in your example) and thus a test for a zero mask is expressible as just >>> >> > if (vect__ifc__41.17_167 == { 0, 0, 0, 0... }) >>> >>> >>> or am I missing something? >> >> Such vector compare produces vector and does not set up cc flags >> required for conditional branch (GIMPLE_COND). >> If we use such comparison for GIMPLE_COND we got error message, so I >> used target hook which does set up cc flags aka ptest instruction and >> I left this part. > > I think we need to look for something better. I really don't like the idea > of using a target hook like this within the gimple optimizers unless it's > absolutely necessary. > > How are conditionals on vectors usually handled? You should try to mimick > that and report, with detail, why that's not working. > > I'm also not happy with the mechanisms to determine whether or not we should > make this transformation. I'm willing to hash through other details and > come back to this issue once we've got some of the other stuff figured out. > I guess a new flag with the target adjusting is the fallback if we can't > come up with some reasonable way to select this on or off. > > The reason I don't like having the target files make this kind of decision > is it makes more gimple transformations target dependent. Based on the > history with RTL, that ultimately leads to an unwieldy mess. > > And yes, I know gimple isn't 100% target independent -- but that doesn't > mean we should keep adding more target dependencies. Each one we add needs > to be well vetted. > > > patch.3 > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > index 44a8624..e90de32 100644 > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -100,6 +100,8 @@ along with GCC; see the file COPYING3. If not see > #include "tree-iterator.h" > #include "tree-chkp.h" > #include "rtl-chkp.h" > +#include "stringpool.h" > +#include "tree-ssanames.h" > So ideally we'll figure out why you're unable to generate a suitable > conditional at the gimple level and the need for the x86 backend to generate > the vector zero test will go away. And if it does go away, we want those > #includes to disappear. > > Additionally, instead of including stringpool.h & tree-ssanames.h, include > "ssa.h" -- as a general rule. > > > static rtx legitimize_dllimport_symbol (rtx, bool); > static rtx legitimize_pe_coff_extern_decl (rtx, bool); > @@ -41100,6 +41102,47 @@ ix86_vectorize_builtin_gather (const_tree > mem_vectype, > return ix86_get_builtin (code); > } > > +/* Returns true if SOURCE type is supported by builtin ptest. > + NAME is lhs of created ptest call. All created statements are added > + to GS. */ > + > +static bool > +ix86_vectorize_build_zero_vector_test (tree source, tree name, > > Given the stated goal of not doing this in the target files, I'm not going > to actually look at this routine or any of the infrastructure for this > target hook. > > > diff --git a/gcc/params.def b/gcc/params.def > index 3e4ba3a..9e8de11 100644 > --- a/gcc/params.def > +++ b/gcc/params.def > @@ -1069,6 +1069,12 @@ DEFPARAM (PARAM_MAX_STORES_TO_SINK, > "Maximum number of conditional store pairs that can be sunk", > 2, 0, 0) > > +/* Enable inserion test on zero mask for masked stores if non-zero. */ > s/inserion/insertion/ > > +DEFPARAM (PARAM_ZERO_TEST_FOR_STORE_MASK, > + "zero-test-for-store-mask", > + "Enable insertion of test on zero mask for masked stores", > + 1, 0, 1) > I'm resisting the temptation to bike-shed... I don't like the name, but I > don't have a better one yet. Similarly for the short description. > > > +/* { dg-final { scan-tree-dump-times "Move MASK_STORE to new bb" 2 "vect" } > } */ > +/* { dg-final { cleanup-tree-dump "vect" } } */ > cleanup-tree-dump is no longer needed. > > > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c > index 7ba0d8f..e31479b 100644 > --- a/gcc/tree-vect-stmts.c > +++ b/gcc/tree-vect-stmts.c > @@ -2072,6 +2072,7 @@ vectorizable_mask_load_store (gimple stmt, > gimple_stmt_iterator *gsi, > { > tree vec_rhs = NULL_TREE, vec_mask = NULL_TREE; > prev_stmt_info = NULL; > + LOOP_VINFO_HAS_MASK_STORE (loop_vinfo) = true; > for (i = 0; i < ncopies; i++) > If we need to keep this field, can you initialize it just after the > STMT_VINFO_GATHER_P test after the /** Transform **/ comment. It seems > kind-of buried and hard to find in this location. > > I'm not really sure why Richi has objected to the field. Yea we can > re-analyze stuff in the vectorizer, but we'd be repeating a fair number of > tests (presumably we'd refactor the start of vectorizable_mask_load_store to > avoid duplication). That seems more wasteful than another field. > > > > { > unsigned align, misalign; > diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c > index ff46a52..debb351 100644 > --- a/gcc/tree-vectorizer.c > +++ b/gcc/tree-vectorizer.c > @@ -92,7 +92,8 @@ along with GCC; see the file COPYING3. If not see > #include "dbgcnt.h" > #include "gimple-fold.h" > #include "tree-scalar-evolution.h" > - > +#include "stringpool.h" > +#include "tree-ssanames.h" > In general, rather than including stringpool.h and tree-ssanames.h, just > include "ssa.h". > > > > /* Loop or bb location. */ > source_location vect_location; > @@ -422,6 +423,184 @@ set_uid_loop_bbs (loop_vec_info loop_vinfo, gimple > loop_vectorized_call) > free (bbs); > } > > +/* Helper for optimize_mask_stores: returns true if STMT sinking to end > + of BB is valid and false otherwise. */ > + > +static bool > +is_valid_sink (gimple stmt, gimple last_store) > +{ > + tree vdef; > + imm_use_iterator imm_it; > + gimple use_stmt; > + basic_block bb = gimple_bb (stmt); > + > + if ((vdef = gimple_vdef (stmt))) > + { > + int cross = 0; > + /* Check that ther are no store vuses in current bb. */ > + FOR_EACH_IMM_USE_STMT (use_stmt, imm_it, vdef) > + if (gimple_bb (use_stmt) == bb) > + cross++; > Seems to me that you should get rid of "cross" and just "return false" here. > There's no need to keep looking at the immediate uses once you've found one > in the same block. That also allows you to simplify this following code ... > > + > + if (cross == 0) > + return true; > Eliminate the conditional and return true here. > > > + } > + else if (gimple_vuse (stmt) == NULL_TREE) > + return true; > + else if (gimple_vuse (stmt) == gimple_vuse (last_store)) > + return true; > + return false; > +} > + > +/* The code below is trying to perform simple optimization - do not execute > + masked store statement if its mask is zero mask since loads that follow > + a masked store can be blocked. > + It put all masked stores statements with the same mask into the new bb > + with a check on zero mask. */ > I suspect you need to re-wrap the comment to 80 columns. > > > + > + /* Loop has masked stores. */ > + while (!worklist.is_empty ()) > + { > + gimple last, def_stmt, last_store; > + edge e, efalse; > + tree mask, val; > + basic_block store_bb, join_bb; > + gimple_stmt_iterator gsi_to; > + gimple_seq gs; > + tree arg3; > + tree vdef, new_vdef; > + gphi *phi; > + bool first_dump; > + > + last = worklist.pop (); > + mask = gimple_call_arg (last, 2); > Are there ever cases where the mask is a compile-time constant? I wouldn't > think so, but I'm totally unfamiliar with all the tree-vector code. > > > + val = make_ssa_name (integer_type_node); > + gs = NULL; > + /* Escape if target does not support test on zero mask. */ > + > + if (!targetm.vectorize.build_zero_vector_test (mask, val, &gs)) > + { > + if (dump_enabled_p ()) > + dump_printf (MSG_NOTE, "Target does not support ptest!\n"); > + return; > + } > + gcc_assert (gs != NULL); > As noted earlier, I think we need to revisit this and look for a way to do > this test without calling into bits in the target files. I'd like to see > the gimple you tried to generate and any error messages that were spit out > when you did so. > > > > > + > + /* Create new bb. */ > + e = split_block (bb, last); > + join_bb = e->dest; > + store_bb = create_empty_bb (bb); > + add_bb_to_loop (store_bb, loop); > When we split BB via split_block, is the newly created block (JOIN_BB) > automatically added to the loop? > > > > > > + e->flags = EDGE_TRUE_VALUE; > + efalse = make_edge (bb, store_bb, EDGE_FALSE_VALUE); > + /* Don't want to put STORE_BB to unlikely part and use > + 50% probability. */ > + store_bb->frequency = bb->frequency / 2; > + efalse->probability = REG_BR_PROB_BASE / 2; > + make_edge (store_bb, join_bb, EDGE_FALLTHRU); > Like Richi, I'd expect that in practice STORE_BB will be the more likely > taken, how much so, I don't know, but 50-50 is probably not right. I'd > think this would affect the ultimate block layout to some degree as well. > Do we want the straightline path to include STORE_BB (which implies a > branch-around STORE_BB and inverting the test from what you're doing now). > > > > > + /* Create new PHI node for vdef of the last masked store: > + .MEM_2 = VDEF <.MEM_1> > + will be converted to > + .MEM.3 = VDEF <.MEM_1> > + and new PHI node will be created in join bb > + .MEM_2 = PHI <.MEM_1, .MEM_3> > + */ > + vdef = gimple_vdef (last); > + gcc_assert (vdef && TREE_CODE (vdef) == SSA_NAME); > + new_vdef = make_ssa_name (gimple_vop (cfun), last); > + phi = create_phi_node (vdef, join_bb); > + add_phi_arg (phi, new_vdef, EDGE_SUCC (store_bb, 0), > UNKNOWN_LOCATION); > + gimple_set_vdef (last, new_vdef); > + first_dump = true; > Presumably you don't add the phi arg associated with the edge BB->JOIN_BB > because you don't know it until the end of the loop below, right? > > > Overall I think this is pretty close. I'd like you to focus on getting the > test for a zero store mask sorted out. > > Jeff > >
On Mon, Jul 20, 2015 at 5:05 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote: > Hi Jeff! > > Thanks for your details comments. > > You asked: > How are conditionals on vectors usually handled? You should try to > mimick that and report, with detail, why that's not working. > > In current implementation of vectorization pass all comparisons are > handled through COND_EXPR, i.e. vect-pattern pass transforms all > comparisons producing bool values to conditional expressions like a[i] > != 0 --> a[i]!=0? 1: 0 which vectorizers transforms to > vect-cond-expr. So we don't have operations with vector operands and > scalar (bool) result. > To implement such operations I introduced target-hook. Could you > propose another solution implementing it? see below... > Thanks. > Yuri. > > 2015-07-10 8:51 GMT+03:00 Jeff Law <law@redhat.com>: >> On 06/18/2015 08:32 AM, Yuri Rumyantsev wrote: >>> >>> Richard, >>> >>> Here is updated patch which does not include your proposal related to >>> the target hook deletion. >>> You wrote: >>>> >>>> I still don't understand why you need the new target hook. If we have a >>>> masked >>>> load/store then the mask is computed by an assignment with a >>>> VEC_COND_EXPR >>>> (in your example) and thus a test for a zero mask is expressible as just >>>> >>> > if (vect__ifc__41.17_167 == { 0, 0, 0, 0... }) >>>> >>>> >>>> or am I missing something? >>> >>> Such vector compare produces vector and does not set up cc flags >>> required for conditional branch (GIMPLE_COND). >>> If we use such comparison for GIMPLE_COND we got error message, so I >>> used target hook which does set up cc flags aka ptest instruction and >>> I left this part. ... what error message did you get? I suppose you ran into IL checking complaining about error: vector comparison returning a boolean ? It looks like we forbade equality compares for consistency thus you'd need to use vec_ifc_tem_4 = VIEW_CONVERT_EXPR<TImode> (vect_ifc_41.17_167); if (vec_ifc_tem_4 == 0) ... instead. Thus view-convert to an integer mode of the same size as the vector. Or we allow equality compares of vectors. Not sure if expansion / targets handle them well though - you'd have to check. Richard. >> I think we need to look for something better. I really don't like the idea >> of using a target hook like this within the gimple optimizers unless it's >> absolutely necessary. >> >> How are conditionals on vectors usually handled? You should try to mimick >> that and report, with detail, why that's not working. >> >> I'm also not happy with the mechanisms to determine whether or not we should >> make this transformation. I'm willing to hash through other details and >> come back to this issue once we've got some of the other stuff figured out. >> I guess a new flag with the target adjusting is the fallback if we can't >> come up with some reasonable way to select this on or off. >> >> The reason I don't like having the target files make this kind of decision >> is it makes more gimple transformations target dependent. Based on the >> history with RTL, that ultimately leads to an unwieldy mess. >> >> And yes, I know gimple isn't 100% target independent -- but that doesn't >> mean we should keep adding more target dependencies. Each one we add needs >> to be well vetted. >> >> >> patch.3 >> >> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c >> index 44a8624..e90de32 100644 >> --- a/gcc/config/i386/i386.c >> +++ b/gcc/config/i386/i386.c >> @@ -100,6 +100,8 @@ along with GCC; see the file COPYING3. If not see >> #include "tree-iterator.h" >> #include "tree-chkp.h" >> #include "rtl-chkp.h" >> +#include "stringpool.h" >> +#include "tree-ssanames.h" >> So ideally we'll figure out why you're unable to generate a suitable >> conditional at the gimple level and the need for the x86 backend to generate >> the vector zero test will go away. And if it does go away, we want those >> #includes to disappear. >> >> Additionally, instead of including stringpool.h & tree-ssanames.h, include >> "ssa.h" -- as a general rule. >> >> >> static rtx legitimize_dllimport_symbol (rtx, bool); >> static rtx legitimize_pe_coff_extern_decl (rtx, bool); >> @@ -41100,6 +41102,47 @@ ix86_vectorize_builtin_gather (const_tree >> mem_vectype, >> return ix86_get_builtin (code); >> } >> >> +/* Returns true if SOURCE type is supported by builtin ptest. >> + NAME is lhs of created ptest call. All created statements are added >> + to GS. */ >> + >> +static bool >> +ix86_vectorize_build_zero_vector_test (tree source, tree name, >> >> Given the stated goal of not doing this in the target files, I'm not going >> to actually look at this routine or any of the infrastructure for this >> target hook. >> >> >> diff --git a/gcc/params.def b/gcc/params.def >> index 3e4ba3a..9e8de11 100644 >> --- a/gcc/params.def >> +++ b/gcc/params.def >> @@ -1069,6 +1069,12 @@ DEFPARAM (PARAM_MAX_STORES_TO_SINK, >> "Maximum number of conditional store pairs that can be sunk", >> 2, 0, 0) >> >> +/* Enable inserion test on zero mask for masked stores if non-zero. */ >> s/inserion/insertion/ >> >> +DEFPARAM (PARAM_ZERO_TEST_FOR_STORE_MASK, >> + "zero-test-for-store-mask", >> + "Enable insertion of test on zero mask for masked stores", >> + 1, 0, 1) >> I'm resisting the temptation to bike-shed... I don't like the name, but I >> don't have a better one yet. Similarly for the short description. >> >> >> +/* { dg-final { scan-tree-dump-times "Move MASK_STORE to new bb" 2 "vect" } >> } */ >> +/* { dg-final { cleanup-tree-dump "vect" } } */ >> cleanup-tree-dump is no longer needed. >> >> >> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c >> index 7ba0d8f..e31479b 100644 >> --- a/gcc/tree-vect-stmts.c >> +++ b/gcc/tree-vect-stmts.c >> @@ -2072,6 +2072,7 @@ vectorizable_mask_load_store (gimple stmt, >> gimple_stmt_iterator *gsi, >> { >> tree vec_rhs = NULL_TREE, vec_mask = NULL_TREE; >> prev_stmt_info = NULL; >> + LOOP_VINFO_HAS_MASK_STORE (loop_vinfo) = true; >> for (i = 0; i < ncopies; i++) >> If we need to keep this field, can you initialize it just after the >> STMT_VINFO_GATHER_P test after the /** Transform **/ comment. It seems >> kind-of buried and hard to find in this location. >> >> I'm not really sure why Richi has objected to the field. Yea we can >> re-analyze stuff in the vectorizer, but we'd be repeating a fair number of >> tests (presumably we'd refactor the start of vectorizable_mask_load_store to >> avoid duplication). That seems more wasteful than another field. >> >> >> >> { >> unsigned align, misalign; >> diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c >> index ff46a52..debb351 100644 >> --- a/gcc/tree-vectorizer.c >> +++ b/gcc/tree-vectorizer.c >> @@ -92,7 +92,8 @@ along with GCC; see the file COPYING3. If not see >> #include "dbgcnt.h" >> #include "gimple-fold.h" >> #include "tree-scalar-evolution.h" >> - >> +#include "stringpool.h" >> +#include "tree-ssanames.h" >> In general, rather than including stringpool.h and tree-ssanames.h, just >> include "ssa.h". >> >> >> >> /* Loop or bb location. */ >> source_location vect_location; >> @@ -422,6 +423,184 @@ set_uid_loop_bbs (loop_vec_info loop_vinfo, gimple >> loop_vectorized_call) >> free (bbs); >> } >> >> +/* Helper for optimize_mask_stores: returns true if STMT sinking to end >> + of BB is valid and false otherwise. */ >> + >> +static bool >> +is_valid_sink (gimple stmt, gimple last_store) >> +{ >> + tree vdef; >> + imm_use_iterator imm_it; >> + gimple use_stmt; >> + basic_block bb = gimple_bb (stmt); >> + >> + if ((vdef = gimple_vdef (stmt))) >> + { >> + int cross = 0; >> + /* Check that ther are no store vuses in current bb. */ >> + FOR_EACH_IMM_USE_STMT (use_stmt, imm_it, vdef) >> + if (gimple_bb (use_stmt) == bb) >> + cross++; >> Seems to me that you should get rid of "cross" and just "return false" here. >> There's no need to keep looking at the immediate uses once you've found one >> in the same block. That also allows you to simplify this following code ... >> >> + >> + if (cross == 0) >> + return true; >> Eliminate the conditional and return true here. >> >> >> + } >> + else if (gimple_vuse (stmt) == NULL_TREE) >> + return true; >> + else if (gimple_vuse (stmt) == gimple_vuse (last_store)) >> + return true; >> + return false; >> +} >> + >> +/* The code below is trying to perform simple optimization - do not execute >> + masked store statement if its mask is zero mask since loads that follow >> + a masked store can be blocked. >> + It put all masked stores statements with the same mask into the new bb >> + with a check on zero mask. */ >> I suspect you need to re-wrap the comment to 80 columns. >> >> >> + >> + /* Loop has masked stores. */ >> + while (!worklist.is_empty ()) >> + { >> + gimple last, def_stmt, last_store; >> + edge e, efalse; >> + tree mask, val; >> + basic_block store_bb, join_bb; >> + gimple_stmt_iterator gsi_to; >> + gimple_seq gs; >> + tree arg3; >> + tree vdef, new_vdef; >> + gphi *phi; >> + bool first_dump; >> + >> + last = worklist.pop (); >> + mask = gimple_call_arg (last, 2); >> Are there ever cases where the mask is a compile-time constant? I wouldn't >> think so, but I'm totally unfamiliar with all the tree-vector code. >> >> >> + val = make_ssa_name (integer_type_node); >> + gs = NULL; >> + /* Escape if target does not support test on zero mask. */ >> + >> + if (!targetm.vectorize.build_zero_vector_test (mask, val, &gs)) >> + { >> + if (dump_enabled_p ()) >> + dump_printf (MSG_NOTE, "Target does not support ptest!\n"); >> + return; >> + } >> + gcc_assert (gs != NULL); >> As noted earlier, I think we need to revisit this and look for a way to do >> this test without calling into bits in the target files. I'd like to see >> the gimple you tried to generate and any error messages that were spit out >> when you did so. >> >> >> >> >> + >> + /* Create new bb. */ >> + e = split_block (bb, last); >> + join_bb = e->dest; >> + store_bb = create_empty_bb (bb); >> + add_bb_to_loop (store_bb, loop); >> When we split BB via split_block, is the newly created block (JOIN_BB) >> automatically added to the loop? >> >> >> >> >> >> + e->flags = EDGE_TRUE_VALUE; >> + efalse = make_edge (bb, store_bb, EDGE_FALSE_VALUE); >> + /* Don't want to put STORE_BB to unlikely part and use >> + 50% probability. */ >> + store_bb->frequency = bb->frequency / 2; >> + efalse->probability = REG_BR_PROB_BASE / 2; >> + make_edge (store_bb, join_bb, EDGE_FALLTHRU); >> Like Richi, I'd expect that in practice STORE_BB will be the more likely >> taken, how much so, I don't know, but 50-50 is probably not right. I'd >> think this would affect the ultimate block layout to some degree as well. >> Do we want the straightline path to include STORE_BB (which implies a >> branch-around STORE_BB and inverting the test from what you're doing now). >> >> >> >> >> + /* Create new PHI node for vdef of the last masked store: >> + .MEM_2 = VDEF <.MEM_1> >> + will be converted to >> + .MEM.3 = VDEF <.MEM_1> >> + and new PHI node will be created in join bb >> + .MEM_2 = PHI <.MEM_1, .MEM_3> >> + */ >> + vdef = gimple_vdef (last); >> + gcc_assert (vdef && TREE_CODE (vdef) == SSA_NAME); >> + new_vdef = make_ssa_name (gimple_vop (cfun), last); >> + phi = create_phi_node (vdef, join_bb); >> + add_phi_arg (phi, new_vdef, EDGE_SUCC (store_bb, 0), >> UNKNOWN_LOCATION); >> + gimple_set_vdef (last, new_vdef); >> + first_dump = true; >> Presumably you don't add the phi arg associated with the edge BB->JOIN_BB >> because you don't know it until the end of the loop below, right? >> >> >> Overall I think this is pretty close. I'd like you to focus on getting the >> test for a zero store mask sorted out. >> >> Jeff >> >>
On 07/20/2015 09:05 AM, Yuri Rumyantsev wrote: > Hi Jeff! > > Thanks for your details comments. > > You asked: > How are conditionals on vectors usually handled? You should try to > mimick that and report, with detail, why that's not working. > > In current implementation of vectorization pass all comparisons are > handled through COND_EXPR, i.e. vect-pattern pass transforms all > comparisons producing bool values to conditional expressions like a[i] > != 0 --> a[i]!=0? 1: 0 which vectorizers transforms to > vect-cond-expr. So we don't have operations with vector operands and > scalar (bool) result. > To implement such operations I introduced target-hook. Could you > propose another solution implementing it? Is there any rationale given anywhere for the transformation into conditional expressions? ie, is there any reason why we can't have a GIMPLE_COND where the expression is a vector condition? Thanks, Jeff
Jeff, The goal of this transformation is to not execute masked store if possible,i.e. when is is equal to zero-vector. and conditional expression is not suitable for it - we must generate semi-hammock with conditional branch. Yuri. 2015-07-23 23:03 GMT+03:00 Jeff Law <law@redhat.com>: > On 07/20/2015 09:05 AM, Yuri Rumyantsev wrote: >> >> Hi Jeff! >> >> Thanks for your details comments. >> >> You asked: >> How are conditionals on vectors usually handled? You should try to >> mimick that and report, with detail, why that's not working. >> >> In current implementation of vectorization pass all comparisons are >> handled through COND_EXPR, i.e. vect-pattern pass transforms all >> comparisons producing bool values to conditional expressions like a[i] >> != 0 --> a[i]!=0? 1: 0 which vectorizers transforms to >> vect-cond-expr. So we don't have operations with vector operands and >> scalar (bool) result. >> To implement such operations I introduced target-hook. Could you >> propose another solution implementing it? > > Is there any rationale given anywhere for the transformation into > conditional expressions? ie, is there any reason why we can't have a > GIMPLE_COND where the expression is a vector condition? > > Thanks, > > Jeff >
On Thu, Jul 23, 2015 at 10:03 PM, Jeff Law <law@redhat.com> wrote: > On 07/20/2015 09:05 AM, Yuri Rumyantsev wrote: >> >> Hi Jeff! >> >> Thanks for your details comments. >> >> You asked: >> How are conditionals on vectors usually handled? You should try to >> mimick that and report, with detail, why that's not working. >> >> In current implementation of vectorization pass all comparisons are >> handled through COND_EXPR, i.e. vect-pattern pass transforms all >> comparisons producing bool values to conditional expressions like a[i] >> != 0 --> a[i]!=0? 1: 0 which vectorizers transforms to >> vect-cond-expr. So we don't have operations with vector operands and >> scalar (bool) result. >> To implement such operations I introduced target-hook. Could you >> propose another solution implementing it? > > Is there any rationale given anywhere for the transformation into > conditional expressions? ie, is there any reason why we can't have a > GIMPLE_COND where the expression is a vector condition? No rationale for equality compare which would have the semantic of having all elements equal or not equal. But you can't define a sensible ordering (that HW implements) for other compare operators and you obviously need a single boolean result, not a vector of element comparison results. I've already replied that I'm fine allowing ==/!= whole-vector compares. But one needs to check whether expansion does anything sensible with them (either expand to integer subreg compares or add optabs for the compares). Richard. > Thanks, > > Jeff >
On 07/24/2015 03:16 AM, Richard Biener wrote: >> Is there any rationale given anywhere for the transformation into >> conditional expressions? ie, is there any reason why we can't have a >> GIMPLE_COND where the expression is a vector condition? > > No rationale for equality compare which would have the semantic of > having all elements equal or not equal. But you can't define a sensible > ordering (that HW implements) for other compare operators and you > obviously need a single boolean result, not a vector of element comparison > results. Right. EQ/NE only as others just don't have any real meaning. > I've already replied that I'm fine allowing ==/!= whole-vector compares. > But one needs to check whether expansion does anything sensible > with them (either expand to integer subreg compares or add optabs > for the compares). Agreed, EQ/NE for whole vector compares only would be fine for me too under the same conditions. jeff
On Fri, Jul 24, 2015 at 9:11 PM, Jeff Law <law@redhat.com> wrote: > On 07/24/2015 03:16 AM, Richard Biener wrote: >>> >>> Is there any rationale given anywhere for the transformation into >>> conditional expressions? ie, is there any reason why we can't have a >>> GIMPLE_COND where the expression is a vector condition? >> >> >> No rationale for equality compare which would have the semantic of >> having all elements equal or not equal. But you can't define a sensible >> ordering (that HW implements) for other compare operators and you >> obviously need a single boolean result, not a vector of element comparison >> results. > > Right. EQ/NE only as others just don't have any real meaning. > > >> I've already replied that I'm fine allowing ==/!= whole-vector compares. >> But one needs to check whether expansion does anything sensible >> with them (either expand to integer subreg compares or add optabs >> for the compares). > > Agreed, EQ/NE for whole vector compares only would be fine for me too under > the same conditions. Btw, you can already do this on GIMPLE by doing TImode vec_as_int = VIEW_CONVERT_EXPR <TImode> (vec_2); if (vec_as_int == 0) ... which is what the RTL will look like in the end. So not sure if making this higher-level in GIMPLE is good or required. Richard. > jeff
HI All, Here is updated patch which implements Richard proposal to use vector comparison with boolean result instead of target hook. Support for it was added to ix86_expand_branch. Any comments will be appreciated. Bootstrap and regression testing did not show any new failures. ChangeLog: 2015-08-06 Yuri Rumyantsev <ysrumyan@gmail.com> * config/i386/i386.c (ix86_expand_branch): Implement vector comparison with boolean result. * config/i386/sse.md (define_expand "cbranch<mode>4): Add define for vector comparion. * fold-const.c (fold_relational_const): Add handling of vector comparison with boolean result. * params.def (PARAM_ZERO_TEST_FOR_STORE_MASK): New DEFPARAM. * params.h (ENABLE_ZERO_TEST_FOR_STORE_MASK): new macros. * tree-cfg.c (verify_gimple_comparison): Add test for vector comparion with boolean result. * tree-ssa-forwprop.c (forward_propagate_into_comparison_1): Do not propagate vector comparion with boolean result. * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize has_mask_store field of vect_info. * tree-vectorizer.c: Include files ssa.h, cfghooks.h and params.h. (is_valid_sink): New function. (optimize_mask_stores): New function. (vectorize_loops): Invoke optimaze_mask_stores for loops having masked stores. * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and correspondent macros. gcc/testsuite/ChangeLog: * gcc.target/i386/avx2-vect-mask-store-move1.c: New test. 2015-07-27 11:48 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: > On Fri, Jul 24, 2015 at 9:11 PM, Jeff Law <law@redhat.com> wrote: >> On 07/24/2015 03:16 AM, Richard Biener wrote: >>>> >>>> Is there any rationale given anywhere for the transformation into >>>> conditional expressions? ie, is there any reason why we can't have a >>>> GIMPLE_COND where the expression is a vector condition? >>> >>> >>> No rationale for equality compare which would have the semantic of >>> having all elements equal or not equal. But you can't define a sensible >>> ordering (that HW implements) for other compare operators and you >>> obviously need a single boolean result, not a vector of element comparison >>> results. >> >> Right. EQ/NE only as others just don't have any real meaning. >> >> >>> I've already replied that I'm fine allowing ==/!= whole-vector compares. >>> But one needs to check whether expansion does anything sensible >>> with them (either expand to integer subreg compares or add optabs >>> for the compares). >> >> Agreed, EQ/NE for whole vector compares only would be fine for me too under >> the same conditions. > > Btw, you can already do this on GIMPLE by doing > > TImode vec_as_int = VIEW_CONVERT_EXPR <TImode> (vec_2); > if (vec_as_int == 0) > ... > > which is what the RTL will look like in the end. So not sure if making this > higher-level in GIMPLE is good or required. > > Richard. > >> jeff
Hi Richard, Did you have a chance to look at updated patch? Thanks. Yuri. 2015-08-06 14:07 GMT+03:00 Yuri Rumyantsev <ysrumyan@gmail.com>: > HI All, > > Here is updated patch which implements Richard proposal to use vector > comparison with boolean result instead of target hook. Support for it > was added to ix86_expand_branch. > > Any comments will be appreciated. > > Bootstrap and regression testing did not show any new failures. > > ChangeLog: > 2015-08-06 Yuri Rumyantsev <ysrumyan@gmail.com> > > * config/i386/i386.c (ix86_expand_branch): Implement vector > comparison with boolean result. > * config/i386/sse.md (define_expand "cbranch<mode>4): Add define > for vector comparion. > * fold-const.c (fold_relational_const): Add handling of vector > comparison with boolean result. > * params.def (PARAM_ZERO_TEST_FOR_STORE_MASK): New DEFPARAM. > * params.h (ENABLE_ZERO_TEST_FOR_STORE_MASK): new macros. > * tree-cfg.c (verify_gimple_comparison): Add test for vector > comparion with boolean result. > * tree-ssa-forwprop.c (forward_propagate_into_comparison_1): Do not > propagate vector comparion with boolean result. > * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize > has_mask_store field of vect_info. > * tree-vectorizer.c: Include files ssa.h, cfghooks.h and params.h. > (is_valid_sink): New function. > (optimize_mask_stores): New function. > (vectorize_loops): Invoke optimaze_mask_stores for loops having masked > stores. > * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and > correspondent macros. > > gcc/testsuite/ChangeLog: > * gcc.target/i386/avx2-vect-mask-store-move1.c: New test. > > > 2015-07-27 11:48 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: >> On Fri, Jul 24, 2015 at 9:11 PM, Jeff Law <law@redhat.com> wrote: >>> On 07/24/2015 03:16 AM, Richard Biener wrote: >>>>> >>>>> Is there any rationale given anywhere for the transformation into >>>>> conditional expressions? ie, is there any reason why we can't have a >>>>> GIMPLE_COND where the expression is a vector condition? >>>> >>>> >>>> No rationale for equality compare which would have the semantic of >>>> having all elements equal or not equal. But you can't define a sensible >>>> ordering (that HW implements) for other compare operators and you >>>> obviously need a single boolean result, not a vector of element comparison >>>> results. >>> >>> Right. EQ/NE only as others just don't have any real meaning. >>> >>> >>>> I've already replied that I'm fine allowing ==/!= whole-vector compares. >>>> But one needs to check whether expansion does anything sensible >>>> with them (either expand to integer subreg compares or add optabs >>>> for the compares). >>> >>> Agreed, EQ/NE for whole vector compares only would be fine for me too under >>> the same conditions. >> >> Btw, you can already do this on GIMPLE by doing >> >> TImode vec_as_int = VIEW_CONVERT_EXPR <TImode> (vec_2); >> if (vec_as_int == 0) >> ... >> >> which is what the RTL will look like in the end. So not sure if making this >> higher-level in GIMPLE is good or required. >> >> Richard. >> >>> jeff
On Thu, Aug 13, 2015 at 1:32 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote: > Hi Richard, > > Did you have a chance to look at updated patch? Having a quick look now. Btw, you didn't try the simpler alternative of tree type = type_for_mode (int_mode_for_mode (TYPE_MODE (vectype))); build2 (EQ_EXPR, boolean_type_node, build1 (VIEW_CONVERT, type, op0), build1 (VIEW_CONVERT, type, op1)); ? That is, use the GIMPLE level equivalent of (cmp (subreg:TI reg:V4SI) (subreg:TI reg:V4SI)) ? That should be supported by the expander already, though again not sure if the target(s) have compares that match this. Btw, the tree-cfg.c hook wasn't what was agreed on - the restriction on EQ/NE_EXPR is missing. Operand type equality is tested anyway. Why do you need to restrict forward_propagate_into_comparison_1? Otherwise this looks better, but can you try with the VIEW_CONVERT as well? Thanks, Richard. > Thanks. > Yuri. > > 2015-08-06 14:07 GMT+03:00 Yuri Rumyantsev <ysrumyan@gmail.com>: >> HI All, >> >> Here is updated patch which implements Richard proposal to use vector >> comparison with boolean result instead of target hook. Support for it >> was added to ix86_expand_branch. >> >> Any comments will be appreciated. >> >> Bootstrap and regression testing did not show any new failures. >> >> ChangeLog: >> 2015-08-06 Yuri Rumyantsev <ysrumyan@gmail.com> >> >> * config/i386/i386.c (ix86_expand_branch): Implement vector >> comparison with boolean result. >> * config/i386/sse.md (define_expand "cbranch<mode>4): Add define >> for vector comparion. >> * fold-const.c (fold_relational_const): Add handling of vector >> comparison with boolean result. >> * params.def (PARAM_ZERO_TEST_FOR_STORE_MASK): New DEFPARAM. >> * params.h (ENABLE_ZERO_TEST_FOR_STORE_MASK): new macros. >> * tree-cfg.c (verify_gimple_comparison): Add test for vector >> comparion with boolean result. >> * tree-ssa-forwprop.c (forward_propagate_into_comparison_1): Do not >> propagate vector comparion with boolean result. >> * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize >> has_mask_store field of vect_info. >> * tree-vectorizer.c: Include files ssa.h, cfghooks.h and params.h. >> (is_valid_sink): New function. >> (optimize_mask_stores): New function. >> (vectorize_loops): Invoke optimaze_mask_stores for loops having masked >> stores. >> * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and >> correspondent macros. >> >> gcc/testsuite/ChangeLog: >> * gcc.target/i386/avx2-vect-mask-store-move1.c: New test. >> >> >> 2015-07-27 11:48 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: >>> On Fri, Jul 24, 2015 at 9:11 PM, Jeff Law <law@redhat.com> wrote: >>>> On 07/24/2015 03:16 AM, Richard Biener wrote: >>>>>> >>>>>> Is there any rationale given anywhere for the transformation into >>>>>> conditional expressions? ie, is there any reason why we can't have a >>>>>> GIMPLE_COND where the expression is a vector condition? >>>>> >>>>> >>>>> No rationale for equality compare which would have the semantic of >>>>> having all elements equal or not equal. But you can't define a sensible >>>>> ordering (that HW implements) for other compare operators and you >>>>> obviously need a single boolean result, not a vector of element comparison >>>>> results. >>>> >>>> Right. EQ/NE only as others just don't have any real meaning. >>>> >>>> >>>>> I've already replied that I'm fine allowing ==/!= whole-vector compares. >>>>> But one needs to check whether expansion does anything sensible >>>>> with them (either expand to integer subreg compares or add optabs >>>>> for the compares). >>>> >>>> Agreed, EQ/NE for whole vector compares only would be fine for me too under >>>> the same conditions. >>> >>> Btw, you can already do this on GIMPLE by doing >>> >>> TImode vec_as_int = VIEW_CONVERT_EXPR <TImode> (vec_2); >>> if (vec_as_int == 0) >>> ... >>> >>> which is what the RTL will look like in the end. So not sure if making this >>> higher-level in GIMPLE is good or required. >>> >>> Richard. >>> >>>> jeff
Hi Richard, I've come back to this optimization and try to implement your proposal for comparison: > Btw, you didn't try the simpler alternative of > > tree type = type_for_mode (int_mode_for_mode (TYPE_MODE (vectype))); > build2 (EQ_EXPR, boolean_type_node, > build1 (VIEW_CONVERT, type, op0), build1 (VIEW_CONVERT, type, op1)); > > ? That is, use the GIMPLE level equivalent of > (cmp (subreg:TI reg:V4SI) (subreg:TI reg:V4SI)) using the following code: vectype = TREE_TYPE (mask); ext_mode = mode_for_size (GET_MODE_BITSIZE (TYPE_MODE (vectype)), MODE_INT, 0); ext_type = lang_hooks.types.type_for_mode (ext_mode , 1); but I've got zero type for it. Should I miss something? Any help will be appreciated. Yuri. 2015-08-13 14:40 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: > On Thu, Aug 13, 2015 at 1:32 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote: >> Hi Richard, >> >> Did you have a chance to look at updated patch? > > Having a quick look now. Btw, you didn't try the simpler alternative of > > tree type = type_for_mode (int_mode_for_mode (TYPE_MODE (vectype))); > build2 (EQ_EXPR, boolean_type_node, > build1 (VIEW_CONVERT, type, op0), build1 (VIEW_CONVERT, type, op1)); > > ? That is, use the GIMPLE level equivalent of > > (cmp (subreg:TI reg:V4SI) (subreg:TI reg:V4SI)) > > ? That should be supported by the expander already, though again not sure if > the target(s) have compares that match this. > > Btw, the tree-cfg.c hook wasn't what was agreed on - the restriction > on EQ/NE_EXPR > is missing. Operand type equality is tested anyway. > > Why do you need to restrict forward_propagate_into_comparison_1? > > Otherwise this looks better, but can you try with the VIEW_CONVERT as well? > > Thanks, > Richard. > > >> Thanks. >> Yuri. >> >> 2015-08-06 14:07 GMT+03:00 Yuri Rumyantsev <ysrumyan@gmail.com>: >>> HI All, >>> >>> Here is updated patch which implements Richard proposal to use vector >>> comparison with boolean result instead of target hook. Support for it >>> was added to ix86_expand_branch. >>> >>> Any comments will be appreciated. >>> >>> Bootstrap and regression testing did not show any new failures. >>> >>> ChangeLog: >>> 2015-08-06 Yuri Rumyantsev <ysrumyan@gmail.com> >>> >>> * config/i386/i386.c (ix86_expand_branch): Implement vector >>> comparison with boolean result. >>> * config/i386/sse.md (define_expand "cbranch<mode>4): Add define >>> for vector comparion. >>> * fold-const.c (fold_relational_const): Add handling of vector >>> comparison with boolean result. >>> * params.def (PARAM_ZERO_TEST_FOR_STORE_MASK): New DEFPARAM. >>> * params.h (ENABLE_ZERO_TEST_FOR_STORE_MASK): new macros. >>> * tree-cfg.c (verify_gimple_comparison): Add test for vector >>> comparion with boolean result. >>> * tree-ssa-forwprop.c (forward_propagate_into_comparison_1): Do not >>> propagate vector comparion with boolean result. >>> * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize >>> has_mask_store field of vect_info. >>> * tree-vectorizer.c: Include files ssa.h, cfghooks.h and params.h. >>> (is_valid_sink): New function. >>> (optimize_mask_stores): New function. >>> (vectorize_loops): Invoke optimaze_mask_stores for loops having masked >>> stores. >>> * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and >>> correspondent macros. >>> >>> gcc/testsuite/ChangeLog: >>> * gcc.target/i386/avx2-vect-mask-store-move1.c: New test. >>> >>> >>> 2015-07-27 11:48 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: >>>> On Fri, Jul 24, 2015 at 9:11 PM, Jeff Law <law@redhat.com> wrote: >>>>> On 07/24/2015 03:16 AM, Richard Biener wrote: >>>>>>> >>>>>>> Is there any rationale given anywhere for the transformation into >>>>>>> conditional expressions? ie, is there any reason why we can't have a >>>>>>> GIMPLE_COND where the expression is a vector condition? >>>>>> >>>>>> >>>>>> No rationale for equality compare which would have the semantic of >>>>>> having all elements equal or not equal. But you can't define a sensible >>>>>> ordering (that HW implements) for other compare operators and you >>>>>> obviously need a single boolean result, not a vector of element comparison >>>>>> results. >>>>> >>>>> Right. EQ/NE only as others just don't have any real meaning. >>>>> >>>>> >>>>>> I've already replied that I'm fine allowing ==/!= whole-vector compares. >>>>>> But one needs to check whether expansion does anything sensible >>>>>> with them (either expand to integer subreg compares or add optabs >>>>>> for the compares). >>>>> >>>>> Agreed, EQ/NE for whole vector compares only would be fine for me too under >>>>> the same conditions. >>>> >>>> Btw, you can already do this on GIMPLE by doing >>>> >>>> TImode vec_as_int = VIEW_CONVERT_EXPR <TImode> (vec_2); >>>> if (vec_as_int == 0) >>>> ... >>>> >>>> which is what the RTL will look like in the end. So not sure if making this >>>> higher-level in GIMPLE is good or required. >>>> >>>> Richard. >>>> >>>>> jeff
Hi All! I prepared another patch which performs insertion additional check on zero mask for masked stores if only parameter PARAM_ZERO_TEST_FOR_MASK_STORE has non-zero value. My attempt to use approach proposed by Richard with simpler alternative for comparison - use scalar type for 256-bit was not successful and I returned to vectori comparison with scalar Boolean result. ChangeLog: 2015-11-05 Yuri Rumyantsev <ysrumyan@gmail.com> * config/i386/i386.c: Add conditional initialization of PARAM_ZERO_TEST_FOR_MASK_STORE. (ix86_expand_branch): Implement vector comparison with boolean result. * config/i386/i386.h: New macros TARGET_OPTIMIZE_MASK_STORE. * config/i386/sse.md (define_expand "cbranch<mode>4): Add define-expand for vector comparion with eq/ne only. * config/i386/x86-tune.def: New macros X86_TUNE_OPTIMIZE_MASK_STORE. * fold-const.c (fold_relational_const): Add handling of vector comparison with boolean result. * params.def (PARAM_ZERO_TEST_FOR_MASK_STORE): New DEFPARAM. * params.h (ENABLE_ZERO_TEST_FOR_MASK_STORE): New macros. * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow comparison of vector operands with boolean result for EQ/NE only. (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison. (verify_gimple_cond): Likewise. * tree-ssa-forwprop.c (forward_propagate_into_comparison_1): Do not combine vector comparison with boolean result and VEC_COND_EXPR that has vector result. * tree-vect-loop.c (is_valid_sink): New function. (optimize_mask_stores): Likewise. * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize has_mask_store field of vect_info. * tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for vectorized loops having masked stores. * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and correspondent macros. (optimize_mask_stores): Add prototype. gcc/testsuite/ChangeLog: * gcc.target/i386/avx2-vect-mask-store-move1.c: New test. * gcc.target/i386/avx2-vect-mask-store-move2.c: Likewise. 2015-11-02 18:24 GMT+03:00 Yuri Rumyantsev <ysrumyan@gmail.com>: > Hi Richard, > > I've come back to this optimization and try to implement your proposal > for comparison: >> Btw, you didn't try the simpler alternative of >> >> tree type = type_for_mode (int_mode_for_mode (TYPE_MODE (vectype))); >> build2 (EQ_EXPR, boolean_type_node, >> build1 (VIEW_CONVERT, type, op0), build1 (VIEW_CONVERT, type, op1)); >> >> ? That is, use the GIMPLE level equivalent of >> (cmp (subreg:TI reg:V4SI) (subreg:TI reg:V4SI)) > > using the following code: > > vectype = TREE_TYPE (mask); > ext_mode = mode_for_size (GET_MODE_BITSIZE (TYPE_MODE (vectype)), > MODE_INT, 0); > ext_type = lang_hooks.types.type_for_mode (ext_mode , 1); > > but I've got zero type for it. Should I miss something? > > Any help will be appreciated. > Yuri. > > > 2015-08-13 14:40 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: >> On Thu, Aug 13, 2015 at 1:32 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote: >>> Hi Richard, >>> >>> Did you have a chance to look at updated patch? >> >> Having a quick look now. Btw, you didn't try the simpler alternative of >> >> tree type = type_for_mode (int_mode_for_mode (TYPE_MODE (vectype))); >> build2 (EQ_EXPR, boolean_type_node, >> build1 (VIEW_CONVERT, type, op0), build1 (VIEW_CONVERT, type, op1)); >> >> ? That is, use the GIMPLE level equivalent of >> >> (cmp (subreg:TI reg:V4SI) (subreg:TI reg:V4SI)) >> >> ? That should be supported by the expander already, though again not sure if >> the target(s) have compares that match this. >> >> Btw, the tree-cfg.c hook wasn't what was agreed on - the restriction >> on EQ/NE_EXPR >> is missing. Operand type equality is tested anyway. >> >> Why do you need to restrict forward_propagate_into_comparison_1? >> >> Otherwise this looks better, but can you try with the VIEW_CONVERT as well? >> >> Thanks, >> Richard. >> >> >>> Thanks. >>> Yuri. >>> >>> 2015-08-06 14:07 GMT+03:00 Yuri Rumyantsev <ysrumyan@gmail.com>: >>>> HI All, >>>> >>>> Here is updated patch which implements Richard proposal to use vector >>>> comparison with boolean result instead of target hook. Support for it >>>> was added to ix86_expand_branch. >>>> >>>> Any comments will be appreciated. >>>> >>>> Bootstrap and regression testing did not show any new failures. >>>> >>>> ChangeLog: >>>> 2015-08-06 Yuri Rumyantsev <ysrumyan@gmail.com> >>>> >>>> * config/i386/i386.c (ix86_expand_branch): Implement vector >>>> comparison with boolean result. >>>> * config/i386/sse.md (define_expand "cbranch<mode>4): Add define >>>> for vector comparion. >>>> * fold-const.c (fold_relational_const): Add handling of vector >>>> comparison with boolean result. >>>> * params.def (PARAM_ZERO_TEST_FOR_STORE_MASK): New DEFPARAM. >>>> * params.h (ENABLE_ZERO_TEST_FOR_STORE_MASK): new macros. >>>> * tree-cfg.c (verify_gimple_comparison): Add test for vector >>>> comparion with boolean result. >>>> * tree-ssa-forwprop.c (forward_propagate_into_comparison_1): Do not >>>> propagate vector comparion with boolean result. >>>> * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize >>>> has_mask_store field of vect_info. >>>> * tree-vectorizer.c: Include files ssa.h, cfghooks.h and params.h. >>>> (is_valid_sink): New function. >>>> (optimize_mask_stores): New function. >>>> (vectorize_loops): Invoke optimaze_mask_stores for loops having masked >>>> stores. >>>> * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and >>>> correspondent macros. >>>> >>>> gcc/testsuite/ChangeLog: >>>> * gcc.target/i386/avx2-vect-mask-store-move1.c: New test. >>>> >>>> >>>> 2015-07-27 11:48 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: >>>>> On Fri, Jul 24, 2015 at 9:11 PM, Jeff Law <law@redhat.com> wrote: >>>>>> On 07/24/2015 03:16 AM, Richard Biener wrote: >>>>>>>> >>>>>>>> Is there any rationale given anywhere for the transformation into >>>>>>>> conditional expressions? ie, is there any reason why we can't have a >>>>>>>> GIMPLE_COND where the expression is a vector condition? >>>>>>> >>>>>>> >>>>>>> No rationale for equality compare which would have the semantic of >>>>>>> having all elements equal or not equal. But you can't define a sensible >>>>>>> ordering (that HW implements) for other compare operators and you >>>>>>> obviously need a single boolean result, not a vector of element comparison >>>>>>> results. >>>>>> >>>>>> Right. EQ/NE only as others just don't have any real meaning. >>>>>> >>>>>> >>>>>>> I've already replied that I'm fine allowing ==/!= whole-vector compares. >>>>>>> But one needs to check whether expansion does anything sensible >>>>>>> with them (either expand to integer subreg compares or add optabs >>>>>>> for the compares). >>>>>> >>>>>> Agreed, EQ/NE for whole vector compares only would be fine for me too under >>>>>> the same conditions. >>>>> >>>>> Btw, you can already do this on GIMPLE by doing >>>>> >>>>> TImode vec_as_int = VIEW_CONVERT_EXPR <TImode> (vec_2); >>>>> if (vec_as_int == 0) >>>>> ... >>>>> >>>>> which is what the RTL will look like in the end. So not sure if making this >>>>> higher-level in GIMPLE is good or required. >>>>> >>>>> Richard. >>>>> >>>>>> jeff
On Mon, Nov 2, 2015 at 4:24 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote: > Hi Richard, > > I've come back to this optimization and try to implement your proposal > for comparison: >> Btw, you didn't try the simpler alternative of >> >> tree type = type_for_mode (int_mode_for_mode (TYPE_MODE (vectype))); >> build2 (EQ_EXPR, boolean_type_node, >> build1 (VIEW_CONVERT, type, op0), build1 (VIEW_CONVERT, type, op1)); >> >> ? That is, use the GIMPLE level equivalent of >> (cmp (subreg:TI reg:V4SI) (subreg:TI reg:V4SI)) > > using the following code: > > vectype = TREE_TYPE (mask); > ext_mode = mode_for_size (GET_MODE_BITSIZE (TYPE_MODE (vectype)), > MODE_INT, 0); > ext_type = lang_hooks.types.type_for_mode (ext_mode , 1); > > but I've got zero type for it. Should I miss something? Use ext_type = build_nonstandard_integer_type (GET_MODE_PRECISION (ext_mode), 1); Richard. > Any help will be appreciated. > Yuri. > > > 2015-08-13 14:40 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: >> On Thu, Aug 13, 2015 at 1:32 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote: >>> Hi Richard, >>> >>> Did you have a chance to look at updated patch? >> >> Having a quick look now. Btw, you didn't try the simpler alternative of >> >> tree type = type_for_mode (int_mode_for_mode (TYPE_MODE (vectype))); >> build2 (EQ_EXPR, boolean_type_node, >> build1 (VIEW_CONVERT, type, op0), build1 (VIEW_CONVERT, type, op1)); >> >> ? That is, use the GIMPLE level equivalent of >> >> (cmp (subreg:TI reg:V4SI) (subreg:TI reg:V4SI)) >> >> ? That should be supported by the expander already, though again not sure if >> the target(s) have compares that match this. >> >> Btw, the tree-cfg.c hook wasn't what was agreed on - the restriction >> on EQ/NE_EXPR >> is missing. Operand type equality is tested anyway. >> >> Why do you need to restrict forward_propagate_into_comparison_1? >> >> Otherwise this looks better, but can you try with the VIEW_CONVERT as well? >> >> Thanks, >> Richard. >> >> >>> Thanks. >>> Yuri. >>> >>> 2015-08-06 14:07 GMT+03:00 Yuri Rumyantsev <ysrumyan@gmail.com>: >>>> HI All, >>>> >>>> Here is updated patch which implements Richard proposal to use vector >>>> comparison with boolean result instead of target hook. Support for it >>>> was added to ix86_expand_branch. >>>> >>>> Any comments will be appreciated. >>>> >>>> Bootstrap and regression testing did not show any new failures. >>>> >>>> ChangeLog: >>>> 2015-08-06 Yuri Rumyantsev <ysrumyan@gmail.com> >>>> >>>> * config/i386/i386.c (ix86_expand_branch): Implement vector >>>> comparison with boolean result. >>>> * config/i386/sse.md (define_expand "cbranch<mode>4): Add define >>>> for vector comparion. >>>> * fold-const.c (fold_relational_const): Add handling of vector >>>> comparison with boolean result. >>>> * params.def (PARAM_ZERO_TEST_FOR_STORE_MASK): New DEFPARAM. >>>> * params.h (ENABLE_ZERO_TEST_FOR_STORE_MASK): new macros. >>>> * tree-cfg.c (verify_gimple_comparison): Add test for vector >>>> comparion with boolean result. >>>> * tree-ssa-forwprop.c (forward_propagate_into_comparison_1): Do not >>>> propagate vector comparion with boolean result. >>>> * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize >>>> has_mask_store field of vect_info. >>>> * tree-vectorizer.c: Include files ssa.h, cfghooks.h and params.h. >>>> (is_valid_sink): New function. >>>> (optimize_mask_stores): New function. >>>> (vectorize_loops): Invoke optimaze_mask_stores for loops having masked >>>> stores. >>>> * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and >>>> correspondent macros. >>>> >>>> gcc/testsuite/ChangeLog: >>>> * gcc.target/i386/avx2-vect-mask-store-move1.c: New test. >>>> >>>> >>>> 2015-07-27 11:48 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: >>>>> On Fri, Jul 24, 2015 at 9:11 PM, Jeff Law <law@redhat.com> wrote: >>>>>> On 07/24/2015 03:16 AM, Richard Biener wrote: >>>>>>>> >>>>>>>> Is there any rationale given anywhere for the transformation into >>>>>>>> conditional expressions? ie, is there any reason why we can't have a >>>>>>>> GIMPLE_COND where the expression is a vector condition? >>>>>>> >>>>>>> >>>>>>> No rationale for equality compare which would have the semantic of >>>>>>> having all elements equal or not equal. But you can't define a sensible >>>>>>> ordering (that HW implements) for other compare operators and you >>>>>>> obviously need a single boolean result, not a vector of element comparison >>>>>>> results. >>>>>> >>>>>> Right. EQ/NE only as others just don't have any real meaning. >>>>>> >>>>>> >>>>>>> I've already replied that I'm fine allowing ==/!= whole-vector compares. >>>>>>> But one needs to check whether expansion does anything sensible >>>>>>> with them (either expand to integer subreg compares or add optabs >>>>>>> for the compares). >>>>>> >>>>>> Agreed, EQ/NE for whole vector compares only would be fine for me too under >>>>>> the same conditions. >>>>> >>>>> Btw, you can already do this on GIMPLE by doing >>>>> >>>>> TImode vec_as_int = VIEW_CONVERT_EXPR <TImode> (vec_2); >>>>> if (vec_as_int == 0) >>>>> ... >>>>> >>>>> which is what the RTL will look like in the end. So not sure if making this >>>>> higher-level in GIMPLE is good or required. >>>>> >>>>> Richard. >>>>> >>>>>> jeff
Richard, I tried it but 256-bit precision integer type is not yet supported. Yuri. 2015-11-06 15:56 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: > On Mon, Nov 2, 2015 at 4:24 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote: >> Hi Richard, >> >> I've come back to this optimization and try to implement your proposal >> for comparison: >>> Btw, you didn't try the simpler alternative of >>> >>> tree type = type_for_mode (int_mode_for_mode (TYPE_MODE (vectype))); >>> build2 (EQ_EXPR, boolean_type_node, >>> build1 (VIEW_CONVERT, type, op0), build1 (VIEW_CONVERT, type, op1)); >>> >>> ? That is, use the GIMPLE level equivalent of >>> (cmp (subreg:TI reg:V4SI) (subreg:TI reg:V4SI)) >> >> using the following code: >> >> vectype = TREE_TYPE (mask); >> ext_mode = mode_for_size (GET_MODE_BITSIZE (TYPE_MODE (vectype)), >> MODE_INT, 0); >> ext_type = lang_hooks.types.type_for_mode (ext_mode , 1); >> >> but I've got zero type for it. Should I miss something? > > Use ext_type = build_nonstandard_integer_type (GET_MODE_PRECISION > (ext_mode), 1); > > Richard. > >> Any help will be appreciated. >> Yuri. >> >> >> 2015-08-13 14:40 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: >>> On Thu, Aug 13, 2015 at 1:32 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote: >>>> Hi Richard, >>>> >>>> Did you have a chance to look at updated patch? >>> >>> Having a quick look now. Btw, you didn't try the simpler alternative of >>> >>> tree type = type_for_mode (int_mode_for_mode (TYPE_MODE (vectype))); >>> build2 (EQ_EXPR, boolean_type_node, >>> build1 (VIEW_CONVERT, type, op0), build1 (VIEW_CONVERT, type, op1)); >>> >>> ? That is, use the GIMPLE level equivalent of >>> >>> (cmp (subreg:TI reg:V4SI) (subreg:TI reg:V4SI)) >>> >>> ? That should be supported by the expander already, though again not sure if >>> the target(s) have compares that match this. >>> >>> Btw, the tree-cfg.c hook wasn't what was agreed on - the restriction >>> on EQ/NE_EXPR >>> is missing. Operand type equality is tested anyway. >>> >>> Why do you need to restrict forward_propagate_into_comparison_1? >>> >>> Otherwise this looks better, but can you try with the VIEW_CONVERT as well? >>> >>> Thanks, >>> Richard. >>> >>> >>>> Thanks. >>>> Yuri. >>>> >>>> 2015-08-06 14:07 GMT+03:00 Yuri Rumyantsev <ysrumyan@gmail.com>: >>>>> HI All, >>>>> >>>>> Here is updated patch which implements Richard proposal to use vector >>>>> comparison with boolean result instead of target hook. Support for it >>>>> was added to ix86_expand_branch. >>>>> >>>>> Any comments will be appreciated. >>>>> >>>>> Bootstrap and regression testing did not show any new failures. >>>>> >>>>> ChangeLog: >>>>> 2015-08-06 Yuri Rumyantsev <ysrumyan@gmail.com> >>>>> >>>>> * config/i386/i386.c (ix86_expand_branch): Implement vector >>>>> comparison with boolean result. >>>>> * config/i386/sse.md (define_expand "cbranch<mode>4): Add define >>>>> for vector comparion. >>>>> * fold-const.c (fold_relational_const): Add handling of vector >>>>> comparison with boolean result. >>>>> * params.def (PARAM_ZERO_TEST_FOR_STORE_MASK): New DEFPARAM. >>>>> * params.h (ENABLE_ZERO_TEST_FOR_STORE_MASK): new macros. >>>>> * tree-cfg.c (verify_gimple_comparison): Add test for vector >>>>> comparion with boolean result. >>>>> * tree-ssa-forwprop.c (forward_propagate_into_comparison_1): Do not >>>>> propagate vector comparion with boolean result. >>>>> * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize >>>>> has_mask_store field of vect_info. >>>>> * tree-vectorizer.c: Include files ssa.h, cfghooks.h and params.h. >>>>> (is_valid_sink): New function. >>>>> (optimize_mask_stores): New function. >>>>> (vectorize_loops): Invoke optimaze_mask_stores for loops having masked >>>>> stores. >>>>> * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and >>>>> correspondent macros. >>>>> >>>>> gcc/testsuite/ChangeLog: >>>>> * gcc.target/i386/avx2-vect-mask-store-move1.c: New test. >>>>> >>>>> >>>>> 2015-07-27 11:48 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: >>>>>> On Fri, Jul 24, 2015 at 9:11 PM, Jeff Law <law@redhat.com> wrote: >>>>>>> On 07/24/2015 03:16 AM, Richard Biener wrote: >>>>>>>>> >>>>>>>>> Is there any rationale given anywhere for the transformation into >>>>>>>>> conditional expressions? ie, is there any reason why we can't have a >>>>>>>>> GIMPLE_COND where the expression is a vector condition? >>>>>>>> >>>>>>>> >>>>>>>> No rationale for equality compare which would have the semantic of >>>>>>>> having all elements equal or not equal. But you can't define a sensible >>>>>>>> ordering (that HW implements) for other compare operators and you >>>>>>>> obviously need a single boolean result, not a vector of element comparison >>>>>>>> results. >>>>>>> >>>>>>> Right. EQ/NE only as others just don't have any real meaning. >>>>>>> >>>>>>> >>>>>>>> I've already replied that I'm fine allowing ==/!= whole-vector compares. >>>>>>>> But one needs to check whether expansion does anything sensible >>>>>>>> with them (either expand to integer subreg compares or add optabs >>>>>>>> for the compares). >>>>>>> >>>>>>> Agreed, EQ/NE for whole vector compares only would be fine for me too under >>>>>>> the same conditions. >>>>>> >>>>>> Btw, you can already do this on GIMPLE by doing >>>>>> >>>>>> TImode vec_as_int = VIEW_CONVERT_EXPR <TImode> (vec_2); >>>>>> if (vec_as_int == 0) >>>>>> ... >>>>>> >>>>>> which is what the RTL will look like in the end. So not sure if making this >>>>>> higher-level in GIMPLE is good or required. >>>>>> >>>>>> Richard. >>>>>> >>>>>>> jeff
On Fri, Nov 6, 2015 at 2:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote: > Richard, > > I tried it but 256-bit precision integer type is not yet supported. What's the symptom? The compare cannot be expanded? Just add a pattern then. After all we have modes up to XImode. Richard. > Yuri. > > > 2015-11-06 15:56 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: >> On Mon, Nov 2, 2015 at 4:24 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote: >>> Hi Richard, >>> >>> I've come back to this optimization and try to implement your proposal >>> for comparison: >>>> Btw, you didn't try the simpler alternative of >>>> >>>> tree type = type_for_mode (int_mode_for_mode (TYPE_MODE (vectype))); >>>> build2 (EQ_EXPR, boolean_type_node, >>>> build1 (VIEW_CONVERT, type, op0), build1 (VIEW_CONVERT, type, op1)); >>>> >>>> ? That is, use the GIMPLE level equivalent of >>>> (cmp (subreg:TI reg:V4SI) (subreg:TI reg:V4SI)) >>> >>> using the following code: >>> >>> vectype = TREE_TYPE (mask); >>> ext_mode = mode_for_size (GET_MODE_BITSIZE (TYPE_MODE (vectype)), >>> MODE_INT, 0); >>> ext_type = lang_hooks.types.type_for_mode (ext_mode , 1); >>> >>> but I've got zero type for it. Should I miss something? >> >> Use ext_type = build_nonstandard_integer_type (GET_MODE_PRECISION >> (ext_mode), 1); >> >> Richard. >> >>> Any help will be appreciated. >>> Yuri. >>> >>> >>> 2015-08-13 14:40 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: >>>> On Thu, Aug 13, 2015 at 1:32 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote: >>>>> Hi Richard, >>>>> >>>>> Did you have a chance to look at updated patch? >>>> >>>> Having a quick look now. Btw, you didn't try the simpler alternative of >>>> >>>> tree type = type_for_mode (int_mode_for_mode (TYPE_MODE (vectype))); >>>> build2 (EQ_EXPR, boolean_type_node, >>>> build1 (VIEW_CONVERT, type, op0), build1 (VIEW_CONVERT, type, op1)); >>>> >>>> ? That is, use the GIMPLE level equivalent of >>>> >>>> (cmp (subreg:TI reg:V4SI) (subreg:TI reg:V4SI)) >>>> >>>> ? That should be supported by the expander already, though again not sure if >>>> the target(s) have compares that match this. >>>> >>>> Btw, the tree-cfg.c hook wasn't what was agreed on - the restriction >>>> on EQ/NE_EXPR >>>> is missing. Operand type equality is tested anyway. >>>> >>>> Why do you need to restrict forward_propagate_into_comparison_1? >>>> >>>> Otherwise this looks better, but can you try with the VIEW_CONVERT as well? >>>> >>>> Thanks, >>>> Richard. >>>> >>>> >>>>> Thanks. >>>>> Yuri. >>>>> >>>>> 2015-08-06 14:07 GMT+03:00 Yuri Rumyantsev <ysrumyan@gmail.com>: >>>>>> HI All, >>>>>> >>>>>> Here is updated patch which implements Richard proposal to use vector >>>>>> comparison with boolean result instead of target hook. Support for it >>>>>> was added to ix86_expand_branch. >>>>>> >>>>>> Any comments will be appreciated. >>>>>> >>>>>> Bootstrap and regression testing did not show any new failures. >>>>>> >>>>>> ChangeLog: >>>>>> 2015-08-06 Yuri Rumyantsev <ysrumyan@gmail.com> >>>>>> >>>>>> * config/i386/i386.c (ix86_expand_branch): Implement vector >>>>>> comparison with boolean result. >>>>>> * config/i386/sse.md (define_expand "cbranch<mode>4): Add define >>>>>> for vector comparion. >>>>>> * fold-const.c (fold_relational_const): Add handling of vector >>>>>> comparison with boolean result. >>>>>> * params.def (PARAM_ZERO_TEST_FOR_STORE_MASK): New DEFPARAM. >>>>>> * params.h (ENABLE_ZERO_TEST_FOR_STORE_MASK): new macros. >>>>>> * tree-cfg.c (verify_gimple_comparison): Add test for vector >>>>>> comparion with boolean result. >>>>>> * tree-ssa-forwprop.c (forward_propagate_into_comparison_1): Do not >>>>>> propagate vector comparion with boolean result. >>>>>> * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize >>>>>> has_mask_store field of vect_info. >>>>>> * tree-vectorizer.c: Include files ssa.h, cfghooks.h and params.h. >>>>>> (is_valid_sink): New function. >>>>>> (optimize_mask_stores): New function. >>>>>> (vectorize_loops): Invoke optimaze_mask_stores for loops having masked >>>>>> stores. >>>>>> * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and >>>>>> correspondent macros. >>>>>> >>>>>> gcc/testsuite/ChangeLog: >>>>>> * gcc.target/i386/avx2-vect-mask-store-move1.c: New test. >>>>>> >>>>>> >>>>>> 2015-07-27 11:48 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: >>>>>>> On Fri, Jul 24, 2015 at 9:11 PM, Jeff Law <law@redhat.com> wrote: >>>>>>>> On 07/24/2015 03:16 AM, Richard Biener wrote: >>>>>>>>>> >>>>>>>>>> Is there any rationale given anywhere for the transformation into >>>>>>>>>> conditional expressions? ie, is there any reason why we can't have a >>>>>>>>>> GIMPLE_COND where the expression is a vector condition? >>>>>>>>> >>>>>>>>> >>>>>>>>> No rationale for equality compare which would have the semantic of >>>>>>>>> having all elements equal or not equal. But you can't define a sensible >>>>>>>>> ordering (that HW implements) for other compare operators and you >>>>>>>>> obviously need a single boolean result, not a vector of element comparison >>>>>>>>> results. >>>>>>>> >>>>>>>> Right. EQ/NE only as others just don't have any real meaning. >>>>>>>> >>>>>>>> >>>>>>>>> I've already replied that I'm fine allowing ==/!= whole-vector compares. >>>>>>>>> But one needs to check whether expansion does anything sensible >>>>>>>>> with them (either expand to integer subreg compares or add optabs >>>>>>>>> for the compares). >>>>>>>> >>>>>>>> Agreed, EQ/NE for whole vector compares only would be fine for me too under >>>>>>>> the same conditions. >>>>>>> >>>>>>> Btw, you can already do this on GIMPLE by doing >>>>>>> >>>>>>> TImode vec_as_int = VIEW_CONVERT_EXPR <TImode> (vec_2); >>>>>>> if (vec_as_int == 0) >>>>>>> ... >>>>>>> >>>>>>> which is what the RTL will look like in the end. So not sure if making this >>>>>>> higher-level in GIMPLE is good or required. >>>>>>> >>>>>>> Richard. >>>>>>> >>>>>>>> jeff
2015-11-10 15:33 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: > On Fri, Nov 6, 2015 at 2:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote: >> Richard, >> >> I tried it but 256-bit precision integer type is not yet supported. > > What's the symptom? The compare cannot be expanded? Just add a pattern then. > After all we have modes up to XImode. I suppose problem may be in: gcc/config/i386/i386-modes.def:#define MAX_BITSIZE_MODE_ANY_INT (128) which doesn't allow to create constants of bigger size. Changing it to maximum vector size (512) would mean we increase wide_int structure size significantly. New patterns are probably also needed. Ilya > > Richard. > >> Yuri. >> >>
On Tue, Nov 10, 2015 at 1:48 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: > 2015-11-10 15:33 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: >> On Fri, Nov 6, 2015 at 2:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote: >>> Richard, >>> >>> I tried it but 256-bit precision integer type is not yet supported. >> >> What's the symptom? The compare cannot be expanded? Just add a pattern then. >> After all we have modes up to XImode. > > I suppose problem may be in: > > gcc/config/i386/i386-modes.def:#define MAX_BITSIZE_MODE_ANY_INT (128) > > which doesn't allow to create constants of bigger size. Changing it > to maximum vector size (512) would mean we increase wide_int structure > size significantly. New patterns are probably also needed. Yes, new patterns are needed but wide-int should be fine (we only need to create a literal zero AFACS). The "new pattern" would be equality/inequality against zero compares only. Richard. > Ilya > >> >> Richard. >> >>> Yuri. >>> >>>
2015-11-10 17:46 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: > On Tue, Nov 10, 2015 at 1:48 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: >> 2015-11-10 15:33 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: >>> On Fri, Nov 6, 2015 at 2:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote: >>>> Richard, >>>> >>>> I tried it but 256-bit precision integer type is not yet supported. >>> >>> What's the symptom? The compare cannot be expanded? Just add a pattern then. >>> After all we have modes up to XImode. >> >> I suppose problem may be in: >> >> gcc/config/i386/i386-modes.def:#define MAX_BITSIZE_MODE_ANY_INT (128) >> >> which doesn't allow to create constants of bigger size. Changing it >> to maximum vector size (512) would mean we increase wide_int structure >> size significantly. New patterns are probably also needed. > > Yes, new patterns are needed but wide-int should be fine (we only need to create > a literal zero AFACS). The "new pattern" would be equality/inequality > against zero > compares only. Currently 256bit integer creation fails because wide_int for max and min values cannot be created. It is fixed by increasing MAX_BITSIZE_MODE_ANY_INT, but it increases WIDE_INT_MAX_ELTS and thus increases wide_int structure. If we use 512 for MAX_BITSIZE_MODE_ANY_INT then wide_int structure would grow by 48 bytes (16 bytes if use 256 for MAX_BITSIZE_MODE_ANY_INT). Is it OK for such narrow usage? Ilya > > Richard. > >> Ilya >> >>> >>> Richard. >>> >>>> Yuri. >>>> >>>>
On Nov 10, 2015, at 6:56 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: > 2015-11-10 17:46 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: >> On Tue, Nov 10, 2015 at 1:48 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: >>> 2015-11-10 15:33 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: >>>> On Fri, Nov 6, 2015 at 2:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote: >>>>> Richard, >>>>> >>>>> I tried it but 256-bit precision integer type is not yet supported. >>>> >>>> What's the symptom? The compare cannot be expanded? Just add a pattern then. >>>> After all we have modes up to XImode. >>> >>> I suppose problem may be in: >>> >>> gcc/config/i386/i386-modes.def:#define MAX_BITSIZE_MODE_ANY_INT (128) >>> >>> which doesn't allow to create constants of bigger size. Changing it >>> to maximum vector size (512) would mean we increase wide_int structure >>> size significantly. New patterns are probably also needed. >> >> Yes, new patterns are needed but wide-int should be fine (we only need to create >> a literal zero AFACS). The "new pattern" would be equality/inequality >> against zero >> compares only. > > Currently 256bit integer creation fails because wide_int for max and > min values cannot be created. > It is fixed by increasing MAX_BITSIZE_MODE_ANY_INT, but it increases > WIDE_INT_MAX_ELTS > and thus increases wide_int structure. If we use 512 for > MAX_BITSIZE_MODE_ANY_INT then > wide_int structure would grow by 48 bytes (16 bytes if use 256 for > MAX_BITSIZE_MODE_ANY_INT). Not answering for Richard, but the design of wide-int was that though the temporary space would grow, trees and rtl would not. Most wide-int values are short lived.
On Tue, Nov 10, 2015 at 3:56 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: > 2015-11-10 17:46 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: >> On Tue, Nov 10, 2015 at 1:48 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: >>> 2015-11-10 15:33 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: >>>> On Fri, Nov 6, 2015 at 2:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote: >>>>> Richard, >>>>> >>>>> I tried it but 256-bit precision integer type is not yet supported. >>>> >>>> What's the symptom? The compare cannot be expanded? Just add a pattern then. >>>> After all we have modes up to XImode. >>> >>> I suppose problem may be in: >>> >>> gcc/config/i386/i386-modes.def:#define MAX_BITSIZE_MODE_ANY_INT (128) >>> >>> which doesn't allow to create constants of bigger size. Changing it >>> to maximum vector size (512) would mean we increase wide_int structure >>> size significantly. New patterns are probably also needed. >> >> Yes, new patterns are needed but wide-int should be fine (we only need to create >> a literal zero AFACS). The "new pattern" would be equality/inequality >> against zero >> compares only. > > Currently 256bit integer creation fails because wide_int for max and > min values cannot be created. Hmm, indeed: #1 0x000000000072dab5 in wi::extended_tree<192>::extended_tree ( this=0x7fffffffd950, t=0x7ffff6a000b0) at /space/rguenther/src/svn/trunk/gcc/tree.h:5125 5125 gcc_checking_assert (TYPE_PRECISION (TREE_TYPE (t)) <= N); but that's not that the constants fail to be created but #5 0x00000000010d8828 in build_nonstandard_integer_type (precision=512, unsignedp=65) at /space/rguenther/src/svn/trunk/gcc/tree.c:8051 8051 if (tree_fits_uhwi_p (TYPE_MAX_VALUE (itype))) (gdb) l 8046 fixup_unsigned_type (itype); 8047 else 8048 fixup_signed_type (itype); 8049 8050 ret = itype; 8051 if (tree_fits_uhwi_p (TYPE_MAX_VALUE (itype))) 8052 ret = type_hash_canon (tree_to_uhwi (TYPE_MAX_VALUE (itype)), itype); thus the integer type hashing being "interesting". tree_fits_uhwi_p fails because it does 7289 bool 7290 tree_fits_uhwi_p (const_tree t) 7291 { 7292 return (t != NULL_TREE 7293 && TREE_CODE (t) == INTEGER_CST 7294 && wi::fits_uhwi_p (wi::to_widest (t))); 7295 } and wi::to_widest () fails with doing 5121 template <int N> 5122 inline wi::extended_tree <N>::extended_tree (const_tree t) 5123 : m_t (t) 5124 { 5125 gcc_checking_assert (TYPE_PRECISION (TREE_TYPE (t)) <= N); 5126 } fixing the hashing then runs into type_cache_hasher::equal doing tree_int_cst_equal which again uses to_widest (it should be easier and cheaper to do the compare on the actual tree representation, but well, seems to be just the first of various issues we'd run into). We eventually could fix the assert above (but then need to hope we assert when a computation overflows the narrower precision of widest_int) or use a special really_widest_int (ugh). > It is fixed by increasing MAX_BITSIZE_MODE_ANY_INT, but it increases > WIDE_INT_MAX_ELTS > and thus increases wide_int structure. If we use 512 for > MAX_BITSIZE_MODE_ANY_INT then > wide_int structure would grow by 48 bytes (16 bytes if use 256 for > MAX_BITSIZE_MODE_ANY_INT). > Is it OK for such narrow usage? widest_int is used in some long-living structures (which is the reason for MAX_BITSIZE_MODE_ANY_INT in the first place). So I don't think so. Richard. > Ilya > >> >> Richard. >> >>> Ilya >>> >>>> >>>> Richard. >>>> >>>>> Yuri. >>>>> >>>>>
Richard, What we should do to cope with this problem (structure size increasing)? Should we return to vector comparison version? Thanks. Yuri. 2015-11-11 12:18 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: > On Tue, Nov 10, 2015 at 3:56 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: >> 2015-11-10 17:46 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: >>> On Tue, Nov 10, 2015 at 1:48 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: >>>> 2015-11-10 15:33 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: >>>>> On Fri, Nov 6, 2015 at 2:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote: >>>>>> Richard, >>>>>> >>>>>> I tried it but 256-bit precision integer type is not yet supported. >>>>> >>>>> What's the symptom? The compare cannot be expanded? Just add a pattern then. >>>>> After all we have modes up to XImode. >>>> >>>> I suppose problem may be in: >>>> >>>> gcc/config/i386/i386-modes.def:#define MAX_BITSIZE_MODE_ANY_INT (128) >>>> >>>> which doesn't allow to create constants of bigger size. Changing it >>>> to maximum vector size (512) would mean we increase wide_int structure >>>> size significantly. New patterns are probably also needed. >>> >>> Yes, new patterns are needed but wide-int should be fine (we only need to create >>> a literal zero AFACS). The "new pattern" would be equality/inequality >>> against zero >>> compares only. >> >> Currently 256bit integer creation fails because wide_int for max and >> min values cannot be created. > > Hmm, indeed: > > #1 0x000000000072dab5 in wi::extended_tree<192>::extended_tree ( > this=0x7fffffffd950, t=0x7ffff6a000b0) > at /space/rguenther/src/svn/trunk/gcc/tree.h:5125 > 5125 gcc_checking_assert (TYPE_PRECISION (TREE_TYPE (t)) <= N); > > but that's not that the constants fail to be created but > > #5 0x00000000010d8828 in build_nonstandard_integer_type (precision=512, > unsignedp=65) at /space/rguenther/src/svn/trunk/gcc/tree.c:8051 > 8051 if (tree_fits_uhwi_p (TYPE_MAX_VALUE (itype))) > (gdb) l > 8046 fixup_unsigned_type (itype); > 8047 else > 8048 fixup_signed_type (itype); > 8049 > 8050 ret = itype; > 8051 if (tree_fits_uhwi_p (TYPE_MAX_VALUE (itype))) > 8052 ret = type_hash_canon (tree_to_uhwi (TYPE_MAX_VALUE > (itype)), itype); > > thus the integer type hashing being "interesting". tree_fits_uhwi_p > fails because > it does > > 7289 bool > 7290 tree_fits_uhwi_p (const_tree t) > 7291 { > 7292 return (t != NULL_TREE > 7293 && TREE_CODE (t) == INTEGER_CST > 7294 && wi::fits_uhwi_p (wi::to_widest (t))); > 7295 } > > and wi::to_widest () fails with doing > > 5121 template <int N> > 5122 inline wi::extended_tree <N>::extended_tree (const_tree t) > 5123 : m_t (t) > 5124 { > 5125 gcc_checking_assert (TYPE_PRECISION (TREE_TYPE (t)) <= N); > 5126 } > > fixing the hashing then runs into type_cache_hasher::equal doing > tree_int_cst_equal > which again uses to_widest (it should be easier and cheaper to do the compare on > the actual tree representation, but well, seems to be just the first > of various issues > we'd run into). > > We eventually could fix the assert above (but then need to hope we assert > when a computation overflows the narrower precision of widest_int) or use > a special really_widest_int (ugh). > >> It is fixed by increasing MAX_BITSIZE_MODE_ANY_INT, but it increases >> WIDE_INT_MAX_ELTS >> and thus increases wide_int structure. If we use 512 for >> MAX_BITSIZE_MODE_ANY_INT then >> wide_int structure would grow by 48 bytes (16 bytes if use 256 for >> MAX_BITSIZE_MODE_ANY_INT). >> Is it OK for such narrow usage? > > widest_int is used in some long-living structures (which is the reason for > MAX_BITSIZE_MODE_ANY_INT in the first place). So I don't think so. > > Richard. > >> Ilya >> >>> >>> Richard. >>> >>>> Ilya >>>> >>>>> >>>>> Richard. >>>>> >>>>>> Yuri. >>>>>> >>>>>>
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 44a8624..e90de32 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -100,6 +100,8 @@ along with GCC; see the file COPYING3. If not see #include "tree-iterator.h" #include "tree-chkp.h" #include "rtl-chkp.h" +#include "stringpool.h" +#include "tree-ssanames.h" So ideally we'll figure out why you're unable to generate a suitable conditional at the gimple level and the need for the x86 backend to generate the vector zero test will go away. And if it does go away, we want those #includes to disappear. Additionally, instead of including stringpool.h & tree-ssanames.h, include "ssa.h" -- as a general rule. static rtx legitimize_dllimport_symbol (rtx, bool); static rtx legitimize_pe_coff_extern_decl (rtx, bool); @@ -41100,6 +41102,47 @@ ix86_vectorize_builtin_gather (const_tree mem_vectype, return ix86_get_builtin (code); } +/* Returns true if SOURCE type is supported by builtin ptest. + NAME is lhs of created ptest call. All created statements are added + to GS. */ + +static bool +ix86_vectorize_build_zero_vector_test (tree source, tree name, Given the stated goal of not doing this in the target files, I'm not going to actually look at this routine or any of the infrastructure for this target hook. diff --git a/gcc/params.def b/gcc/params.def index 3e4ba3a..9e8de11 100644 --- a/gcc/params.def +++ b/gcc/params.def @@ -1069,6 +1069,12 @@ DEFPARAM (PARAM_MAX_STORES_TO_SINK, "Maximum number of conditional store pairs that can be sunk", 2, 0, 0) +/* Enable inserion test on zero mask for masked stores if non-zero. */ s/inserion/insertion/ +DEFPARAM (PARAM_ZERO_TEST_FOR_STORE_MASK, + "zero-test-for-store-mask", + "Enable insertion of test on zero mask for masked stores", + 1, 0, 1) I'm resisting the temptation to bike-shed... I don't like the name, but I don't have a better one yet. Similarly for the short description. +/* { dg-final { scan-tree-dump-times "Move MASK_STORE to new bb" 2 "vect" } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ cleanup-tree-dump is no longer needed. diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index 7ba0d8f..e31479b 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -2072,6 +2072,7 @@ vectorizable_mask_load_store (gimple stmt, gimple_stmt_iterator *gsi, { tree vec_rhs = NULL_TREE, vec_mask = NULL_TREE; prev_stmt_info = NULL; + LOOP_VINFO_HAS_MASK_STORE (loop_vinfo) = true; for (i = 0; i < ncopies; i++) If we need to keep this field, can you initialize it just after the STMT_VINFO_GATHER_P test after the /** Transform **/ comment. It seems kind-of buried and hard to find in this location. I'm not really sure why Richi has objected to the field. Yea we can re-analyze stuff in the vectorizer, but we'd be repeating a fair number of tests (presumably we'd refactor the start of vectorizable_mask_load_store to avoid duplication). That seems more wasteful than another field. { unsigned align, misalign; diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c index ff46a52..debb351 100644 --- a/gcc/tree-vectorizer.c +++ b/gcc/tree-vectorizer.c @@ -92,7 +92,8 @@ along with GCC; see the file COPYING3. If not see #include "dbgcnt.h" #include "gimple-fold.h" #include "tree-scalar-evolution.h" - +#include "stringpool.h" +#include "tree-ssanames.h" In general, rather than including stringpool.h and tree-ssanames.h, just include "ssa.h".