Message ID | 525E2B2E.6010505@redhat.com |
---|---|
State | New |
Headers | show |
On Wed, Oct 16, 2013 at 7:59 AM, Jeff Law <law@redhat.com> wrote: > On 07/08/13 11:45, Kai Tietz wrote: >> >> Hello, >> >> These passes are implementing type-demotion (type sinking into >> statements) for some gimple statements. I limitted inital >> implementation to the set of multiply, addition, substraction, and >> binary-and/or/xor. Additional this pass adds some rules to sink >> type-cast sequences - eg. (int) (short) x; with char as type of x. >> This special handing in this pass of such type-sequence >> simplification is necessary to avoid too complex cast-sequences by >> type-unsigned conversion used by this pass to avoid undefined >> overflow behaviour. >> >> I will sent separate patch with some test-cases to demonstrate and >> verify operation of this new optimization. Just one sample I will >> cite here to demonstrate operation of type-demotion pass. > > So I think we want to come back to this patch and make some decisions about > how to go forward. > > I've been reviewing all the discussions back through last year and I think > the single biggest thing we need to realize is that there's nothing in this > patch that *couldn't* be handled by tree-ssa-forwprop with a suitable amount > of following the use-def chains through casts in various transformations and > determining when those casts can be safely ignored. However, I don't think > extending tree-ssa-forwprop in this way is wise. > > As I see it, the value in this patch is it allows us to avoid that nonsense > by essentially reformulating this a problem of moving and reformulating the > casts in the IL. > > I see two primary effects of type sinking. Note it was called type demotion ;) > First and probably the most > important in my mind is by sinking a cast through its uses the various > transformations we already perform are more likely to apply *without* > needing to handle optimizing through typecasts explicitly. I would say it is desirable to express arithmetic in the smallest possible types (see what premature optimization the C family frontends do to narrow operations again after C integer promotion applied). You need some kind of range information to do this, thus either integrate it into VRP (there is already code that does this there) or use range information from VRP which we now preserve. > The second primary effect is, given two casts where the first indirectly > feeds the second (ie, the first feeds some statement, which then feeds the > second cast), if we're able to sink the first cast, we end up with the first > cast directly feeding the second cast. When this occurs one of the two > casts can often be eliminated. Sadly, I didn't keep any of those test > files, but I regularly saw them in GCC bootstraps. This transformation is applied both by fold-const.c and by SSA forwprop (our GIMPLE combiner). Doing it in yet another pass looks wrong (and it isn't type demotion but also can be promotion). > Kai, you mentioned you had more tests, but I never saw those posted. I > pulled together a handful of tests from the PRs & discussion threads. Some > may be duplicates of yours (mine are attached). If you could post yours as > well, it'd be helpful. Not all of mine are helped by this patch, but at > least two are. > > There was some question about what to do with PROMOTE_MODE based targets. > On those targets there's a lot of value in getting something into word mode > and doing everything in word mode. I think that can/should be a separate > issue -- ISTM that ought to be handled fairly late in the tree optimizers. > I don't see a strong argument for gating this patch on catering to > PROMOTE_MODE. In contrast to the desire of expressing operations in the smallest required type there is the desire of exposing the effect of PROMOTE_MODE on GIMPLE instead of only during RTL expansion. This is because the truncations (sext and zext) PROMOTE_MODE introduced are easier to optimize away when range information is available (see the attempts to address this at RTL expansion time from Kugan from Linaro). > > Similarly, I know there's a type hoisting patch that's also queued up. I > think it should be handled separately as well. I think we need to paint a picture of the final result - what is the main objective of the various(?!) passes in question? Where do we do the same kind of transformation already? AFAIK we have many places that optimize a series of conversions (that's a combine-like problem), I'd call that conversion contraction. We do have a set of foldings in the C family frontends, in fold and in forwprop that try to un-do the effect of C integer promotions (that's also a combine-like problem). We have no pass that tries to promote or demote the types of variables with using a data-flow approach (VRP comes closest, but the transform is again pattern-matching, thus combine-like). I do not object to adding this kind of pass, but I suggest to look at the targets desires when implementing it - which eventually means to honor PROMOTE_MODE (be careful about pass placement here - you want this after loop optimizations like vectorization but possibly before induction variable optimization). Richard. > > >> ------end------ >> >> You will notice that by typedemote2 pass the 's += t + 0x12345600;' >> expression gets simplified to 's += t + 0;'. >> >> I have added an additional early pass "typedemote1" to this patch for >> simple cases types can be easily sunken into statement without >> special unsigned-cast for overflow-case. Jakub asked for it. Tests >> have shown that this pass does optimizations in pretty few cases. As >> example in testsuite see for example pr46867.c testcase. The second >> pass (I put it behind first vrp pass to avoid testcase-conflicts) >> uses 'unsigned'-type casts to avoid undefined overflow behavior. This >> pass has much more hits in standard code, but seems to trigger some >> regressions in vect pass: >> >> List of regi >> >> FAIL: gcc.dg/vect/slp-cond-3.c scan-tree-dump-times vect "vectorizing >> stmts using SLP" 1 FAIL: gcc.dg/vect/vect-reduc-dot-s8b.c -flto >> scan-tree-dump-times vect "vect_recog_widen_mult_pattern: detected" >> 1 FAIL: gcc.dg/vect/slp-cond-3.c -flto scan-tree-dump-times vect >> "vectorizing stmts using SLP" 1 FAIL: gcc.target/i386/rotate-1.c >> scan-assembler ro[lr]b > > Have you analyzed these failures? Are they a result of the casts ending up > in inconvenient locations or is there something more subtle going on here? > We certainly need to understand precisely what's going on with these > regressions before we can move forward. > > As for the patch itself. Obviously it needs updates as a result of David's > work on the pass manager class, and other changes on the trunk since July. > > > tree-ssa-te.c needs a file block comment which describes what the patch is > trying to accomplish. > > Are all the includes necessary? cfgloop.h hash-table.h, others? We're > trying to clean things up, new files ought to be relatively clean as they go > in rather than making things worse :-) > > > build_and_add_sum is poorly named. It handles far more operations than its > name suggests. Similarly its embedded comments still refer to "addition > statement" and clearly need updating. It may need updating with the > streamlined code to build up statements as well. > > Hmm, I don't think I'm going to call out all the updates to deal with codes > in the trunk codebase -- I'm going to assume you'll fix those as part of > trying to get this running on the trunk again. > > > The code to find an insertion point seems to duplicate two large blobs of > code, one when we insert relative to op2, the other when we insert relative > to op1. ISTM that code should be refactored to avoid the duplication. Once > that refactoring is done, build_and_add_sum is going to be pretty damn > simple and easy to understand. > > Is there a good reason why we have a recursive dead code eliminator in this > pass? ie, is there a good reason why we don't just leave that stuff in the > IL and let our standard DCE take care of it? (remove_stmt_chain and its > caller is what I'm referring to). Duplicating DCE, even a trivial one like > this is prone to long term maintenance problems. Do we gain something by > cleaning up after ourselves? Are the conditions under which we optimize so > strict that we don't have to perform all the sanity checks before removing a > statement that we find in tree-ssa-dce.c? > > > gen_cast seems to return either a INTEGER_CST or an expression -- it doesn't > return an assignment. It seems like the comment at the start of that > function needs to be updated. Shouldn't the dumping be conditional on > TDF_DETAILS? > > > I'm a bit surprised we don't have an equivalent of demote_cast_p somewhere. > If not, it feels like that is generic enough that we'd like it elsewhere so > that it can be re-used. > > In demote_into, use (T) in the comment to refer to the type rather than > (typ). The former is used elsewhere in this file and all over fold-const.c > and other places. It's a minor nit, but the consistency helps folks reading > the code in the future. > > Please use consistent grouping in the switch statement in demote_into: > > > + case MULT_EXPR: > + case PLUS_EXPR: case MINUS_EXPR: > > I think GNU standards recommend each on its own line. If you're going to > group them, then ISTM all three belong on the same line. Whatever style you > use, use it consistently in that function please. > > Presumably you don't group these with their related logicals because > arithmetics have to deal with overflow? > > > I'm going to assume the conditions under which you apply the transformations > and any adjustments you make are correct. It might help to add some > comments to the handling of the MULT, PLUS & MINUS. You've got 3 strategies > in there, but no comments as to why you use a particular one at any given > time. Similarly for MIN_EXPR, MAX_EXPR > > > > + /* (NEWTYPE) (X >> Y) can be transformed to > + ((NEWTYPE) X) >> Y, if NEWTYPE == X and > + SIGN(NEWTYPE) == SIGN(X). */ > > The comments here are a bit confusing -- you're talking about equality of a > type and an object. I'm guessing you really mean NEWTYPE == TYPEOF (X)? I > think this occurred elsewhere as well, the mixing of objects and types is a > bit too free IMHO. Better to be explicit than assume the reader will make > the leap that you're talking about TYPEOF (X). > > Again, it seems like dumping should be dependent on TDF_DETAILS. > > > > In execute_type_demote_int, you've got two big conditionals. The first > checks the lhs, the second the rhs. I'd separate them with one line of > vertical whitespace and add a comment before each conditional explaining in > english what each is doing. I can figure it out from reading the code, but > it'd be easier to just read a well written comment. > > There's a comment in execute_type_demote_int > > /* OCCURS_IN_ABNORMAL_PHI */ > > What does that comment mean? It doesn't seem to directly relate to any of > the code in that file. > > I'm going to assume your iterator is correct in the cases where you're able > to optimize and that you don't end up skipping statements unexpectedly. > > > You compute CDI_DOMINATORS, but I don't see you use dominators -- you > certainly do use post-domination. > > > > > Seems like a lot of stuff but I think this is pretty close. The biggest > issue in my mind is we need a clear understanding of the vectorizer > regressions. > > We can kick this around more when we talk later today. > > Cheers, > jeff > > > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr45397-1.c > b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-1.c > new file mode 100644 > index 0000000..ba395e4 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-1.c > @@ -0,0 +1,16 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > + > +signed char a[1024], b[1024]; > + > +void > +baz (void) > +{ > + int i, s, t; > + for (i = 0; i < 1024; i++) > + { s = a[i]; t = b[i]; s += t + 0x12345600; a[i] = s; } > +} > + > +/* { dg-final { scan-tree-dump-times "305419776" 0 "optimized" } } */ > +/* { dg-final { cleanup-tree-dump "optimized" } } */ > + > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr45397-2.c > b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-2.c > new file mode 100644 > index 0000000..8185c69 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-2.c > @@ -0,0 +1,19 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > + > +int foo (const unsigned char *tmp, int i, int val) > +{ > + return (unsigned char)(((tmp[i] + val)>0xFF)?0xFF:(((tmp[i] + > val)<0)?0:(tmp[i] + val))); > +} > + > +int bar (const unsigned char *tmp, int i, int val) > +{ > + int x = (((tmp[i] + val)>0xFF)?0xFF:(((tmp[i] + val)<0)?0:(tmp[i] + > val))); > + return (unsigned char)x; > +} > + > + > +/* { dg-final { scan-tree-dump-times "MIN_EXPR" 2 "optimized" { xfail *-*-* > } } } */ > +/* { dg-final { scan-tree-dump-times "MAX_EXPR" 2 "optimized" { xfail *-*-* > } } } */ > +/* { dg-final { cleanup-tree-dump "optimized" } } */ > + > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr45397-3.c > b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-3.c > new file mode 100644 > index 0000000..15a493e > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-3.c > @@ -0,0 +1,16 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > + > + > +int foo (const unsigned char *a, int b, int c) > +{ > + int x = (unsigned char) (a[b] + c); > + int y = a[b] + c; > + int z = (unsigned char) y; > + return x == z; > +} > + > + > +/* { dg-final { scan-tree-dump-times "return 1;" 1 "optimized" { xfail > *-*-* } } } */ > +/* { dg-final { cleanup-tree-dump "optimized" } } */ > + > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr45397-5.c > b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-5.c > new file mode 100644 > index 0000000..f185526 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-5.c > @@ -0,0 +1,34 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > + > + > + > +int f1 (int a, int b, int c) > +{ > + if ((a + b) == (c + a)) > + return 1; > + return 0; > +} > + > +int f2 (int a, int b, int c) > +{ > + if ((a ^ b) == (a ^ c)) > + return 1; > + return 0; > +} > + > + > +int f3 (int a, int b) > +{ > + if (-a == (b - a)) > + return 1; > + return 0; > +} > + > + > + > +/* { dg-final { scan-tree-dump-not "\\+" "optimized" { xfail *-*-* } } } */ > +/* { dg-final { scan-tree-dump-not "\\^" "optimized" } } */ > +/* { dg-final { scan-tree-dump-not "\\-" "optimized" { xfail *-*-* } } } */ > +/* { dg-final { cleanup-tree-dump "optimized" } } */ > + > diff --git a/gcc/testsuite/gcc.target/i386/pr45397-4.c > b/gcc/testsuite/gcc.target/i386/pr45397-4.c > new file mode 100644 > index 0000000..9bf66b7 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr45397-4.c > @@ -0,0 +1,28 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -fdump-tree-optimized -mavx" } */ > + > + > +signed char a[1024], b[1024]; > + > +void > +foo (void) > +{ > + int i, s, t; > + for (i = 0; i < 1024; i++) > + { s = a[i]; t = b[i]; s += t; a[i] = s; } > +} > + > +void > +bar (void) > +{ > + int i; > + for (i = 0; i < 1024; i++) > + a[i] += b[i]; > +} > + > + > + > +/* { dg-final { scan-tree-dump-times "VIEW_CONVERT_EXPR" 6 "optimized" } } > */ > +/* { dg-final { scan-tree-dump-not "VEC_PACK_TRUNC_EXPR" "optimized" } } */ > +/* { dg-final { cleanup-tree-dump "optimized" } } */ > + > diff --git a/gcc/testsuite/gcc.target/i386/pr47477-1.c > b/gcc/testsuite/gcc.target/i386/pr47477-1.c > new file mode 100644 > index 0000000..a70ce87 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr47477-1.c > @@ -0,0 +1,29 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -fdump-tree-optimized -m32" } */ > + > + > +void * > +add (void *a, void *b) > +{ > + return (void *)(__INTPTR_TYPE__) ((long long)(__INTPTR_TYPE__) a + ((long > long)(__INTPTR_TYPE__) b & ~1L)); > +} > + > +void * > +bar (void *a, void *b) > +{ > + long long tmp = (long long)(__INTPTR_TYPE__) a + ((long > long)(__INTPTR_TYPE__) b & ~1L); > + return (void *)(__INTPTR_TYPE__) tmp; > +} > + > + > + > + > +/* { dg-final { scan-tree-dump-times "\\(unsigned int\\)" 4 "optimized" } } > */ > +/* { dg-final { scan-tree-dump-times "& 4294967294" 2 "optimized" } } */ > +/* { dg-final { scan-tree-dump-times "\\+" 2 "optimized" } } */ > +/* { dg-final { scan-tree-dump-times "\\(void \\*\\)" 2 "optimized" } } */ > +/* { dg-final { scan-tree-dump-not "\\(long long int\\)" "optimized" } } */ > +/* { dg-final { scan-tree-dump-not "\\(int\\)" "optimized" } } */ > +/* { dg-final { cleanup-tree-dump "optimized" } } */ > + > + >
On 10/16/13 03:31, Richard Biener wrote: >> I see two primary effects of type sinking. > > Note it was called type demotion ;) ;) It's a mental block of mine; it's been called type hoisting/sinking in various contexts and I see parallels between the code motion algorithms and how the type promotion/demotion exposes unnecessary type conversions. So I keep calling them hoisting/sinking. I'll try to use promotion/demotion. > >> First and probably the most >> important in my mind is by sinking a cast through its uses the various >> transformations we already perform are more likely to apply *without* >> needing to handle optimizing through typecasts explicitly. > > I would say it is desirable to express arithmetic in the smallest possible > types (see what premature optimization the C family frontends do > to narrow operations again after C integer promotion applied). I don't see this as the major benefit of type demotion. Yes, there is some value in shrinking constants and the like, but in my experience the benefits are relatively small and often get lost in things like partial register stalls on x86, the PA and probably others (yes, the PA has partial register stalls, it's just that nobody used that term). What I really want to get at here is avoiding having a large number of optimizers looking back through the use-def chains and attempting to elide typecasts in the middle of a chain of statements of interest. > You need some kind of range information to do this, thus either integrate > it into VRP (there is already code that does this there) or use range > information from VRP which we now preserve. If the primary goal is to shrink types, then yes, you want to use whatever information you can, including VRP. But that's not the primary goal in my mind, at least not at this stage. There's no reason why this pass couldn't utilize VRP information to provide more opportunities to demote types and achieve the goal you want. But I'd consider that a follow-on opportunity. > >> The second primary effect is, given two casts where the first indirectly >> feeds the second (ie, the first feeds some statement, which then feeds the >> second cast), if we're able to sink the first cast, we end up with the first >> cast directly feeding the second cast. When this occurs one of the two >> casts can often be eliminated. Sadly, I didn't keep any of those test >> files, but I regularly saw them in GCC bootstraps. > > This transformation is applied both by fold-const.c and by SSA forwprop > (our GIMPLE combiner). Doing it in yet another pass looks wrong > (and it isn't type demotion but also can be promotion). Yes, I know. And we need to get this back down to a single implementation. I don't much care which of the 3 implementations we keep, but it really should just be one and it needs to be reusable. I probably should have stated this differently -- the second primary effect is to expose more cases where type conversions can be eliminated via type promotion/demotion. I don't much care which of the 3 blobs of code to eliminate the conversions we use -- I do care that we've got a consistent way to promote/demote conversions to expose the unnecessary type conversions. > In contrast to the desire of expressing operations in the smallest required > type there is the desire of exposing the effect of PROMOTE_MODE on > GIMPLE instead of only during RTL expansion. This is because the > truncations (sext and zext) PROMOTE_MODE introduced are > easier to optimize away when range information is available (see the > attempts to address this at RTL expansion time from Kugan from Linaro). Right. I'm aware of this work and the problem he's trying to solve and have been loosely watching it -- primarily for the persistent VRP information. >> Similarly, I know there's a type hoisting patch that's also queued up. I >> think it should be handled separately as well. > > I think we need to paint a picture of the final result - what is the > main objective of the various(?!) passes in question? Where do > we do the same kind of transformation already? I thought we'd done this at a high level already. At the heart of this work is to: 1. Isolate, to the fullest extent possible, code which promotes and demotes types. We have this stuff all over the place right now and it's very ad-hoc. 2. Promote/demote types to allow our optimizers to not concern themselves with walking back through type conversions when applying optimizations. 3. Promote/demote types to expose unnecessary type conversions. If we look at #2 and #3 we can expect that we'd want a structure which allows for a simplification/optimization step to occur after types are promoted or demoted. ie, a pipeline that looks like: promote types -> optimize1 -> demote types -> optimize2 Now where that little mini pipeline lands is still a big question to me. optimize1 may be a fairly significant hunk of our pipeline. optimize2 probably isn't (may just be a final tree-ssa-forwprop pass). > > We have no pass that tries to promote or demote the types of > variables with using a data-flow approach (VRP comes closest, > but the transform is again pattern-matching, thus combine-like). > I do not object to adding this kind of pass, but I suggest to > look at the targets desires when implementing it - which eventually > means to honor PROMOTE_MODE (be careful about pass > placement here - you want this after loop optimizations like > vectorization but possibly before induction variable optimization). Placement is one of the biggest questions in my mind. If I think about something like the old SGI compiler, they did a very early promotion, then lowered/demoted and got reasonable results with it. As far as dealing with the target dependencies, there's no clear "this is best". I vaguely recall discussions with Kai where we decided that handling PROMOTE_MODE was relatively easy from a coding standpoint -- it's more a matter of where does that fit into the entire optimization pipeline. I could make arguments either way. Jeff
On Wed, Oct 16, 2013 at 6:33 PM, Jeff Law <law@redhat.com> wrote: > On 10/16/13 03:31, Richard Biener wrote: >>> >>> I see two primary effects of type sinking. >> >> >> Note it was called type demotion ;) > > ;) It's a mental block of mine; it's been called type hoisting/sinking in > various contexts and I see parallels between the code motion algorithms and > how the type promotion/demotion exposes unnecessary type conversions. So I > keep calling them hoisting/sinking. I'll try to use promotion/demotion. > > > > >> >>> First and probably the most >>> important in my mind is by sinking a cast through its uses the various >>> transformations we already perform are more likely to apply *without* >>> needing to handle optimizing through typecasts explicitly. >> >> >> I would say it is desirable to express arithmetic in the smallest possible >> types (see what premature optimization the C family frontends do >> to narrow operations again after C integer promotion applied). > > I don't see this as the major benefit of type demotion. Yes, there is some > value in shrinking constants and the like, but in my experience the benefits > are relatively small and often get lost in things like partial register > stalls on x86, the PA and probably others (yes, the PA has partial register > stalls, it's just that nobody used that term). > > What I really want to get at here is avoiding having a large number of > optimizers looking back through the use-def chains and attempting to elide > typecasts in the middle of a chain of statements of interest. Hmm, off the top of my head only forwprop and VRP look back through use-def chains to elide typecasts. And they do that to optimize those casts, thus it is their job ...? Other cases are around, but those are of the sorts of "is op1 available in type X and/or can I safely cast it to type X?" that code isn't going to be simplified by generic promotion / demotion because that code isn't going to know what type pass Y in the end wants. Abstracting functions that can answer those questions instead of repeating N variants of it would of course be nice. Likewise reducing the number of places we perform promotion / demotion (remove it from frontend code and fold, add it in the GIMPLE combiner). Also making the GIMPLE combiner available as an utility to apply to a single statement (see my very original GIMPLE-fold proposal) would be very useful. As for promotion / demotion (if you are not talking about applying PROMOTE_MODE which rather forces promotion of variables and requires inserting compensation code), you want to optimize op1 = (T) op1'; op2 = (T) op2'; x = op1 OP op2; (*) y = (T2) x; to either carry out OP in type T2 or in a type derived from the types of op1' and op2'. For the simple case combine-like pattern matching is ok. It gets more complicated if there are a series of statements here (*), but even that case is handled by iteratively applying the combiner patterns (which forwprop does). If you split out promotion / demotion into a separate pass then you introduce pass ordering issues as combining may introduce promotion / demotion opportunities and the other way around. That wouldn't apply to a pass lowering GIMPLE to fully honor PROMOTE_MODE. >> You need some kind of range information to do this, thus either integrate >> it into VRP (there is already code that does this there) or use range >> information from VRP which we now preserve. > > If the primary goal is to shrink types, then yes, you want to use whatever > information you can, including VRP. But that's not the primary goal in my > mind, at least not at this stage. > > There's no reason why this pass couldn't utilize VRP information to provide > more opportunities to demote types and achieve the goal you want. But I'd > consider that a follow-on opportunity. > > > > > >> >>> The second primary effect is, given two casts where the first indirectly >>> feeds the second (ie, the first feeds some statement, which then feeds >>> the >>> second cast), if we're able to sink the first cast, we end up with the >>> first >>> cast directly feeding the second cast. When this occurs one of the two >>> casts can often be eliminated. Sadly, I didn't keep any of those test >>> files, but I regularly saw them in GCC bootstraps. >> >> >> This transformation is applied both by fold-const.c and by SSA forwprop >> (our GIMPLE combiner). Doing it in yet another pass looks wrong >> (and it isn't type demotion but also can be promotion). > > Yes, I know. And we need to get this back down to a single implementation. > I don't much care which of the 3 implementations we keep, but it really > should just be one and it needs to be reusable. > > I probably should have stated this differently -- the second primary effect > is to expose more cases where type conversions can be eliminated via type > promotion/demotion. I don't much care which of the 3 blobs of code to > eliminate the conversions we use -- I do care that we've got a consistent > way to promote/demote conversions to expose the unnecessary type > conversions. Sure. >> In contrast to the desire of expressing operations in the smallest >> required >> type there is the desire of exposing the effect of PROMOTE_MODE on >> GIMPLE instead of only during RTL expansion. This is because the >> truncations (sext and zext) PROMOTE_MODE introduced are >> easier to optimize away when range information is available (see the >> attempts to address this at RTL expansion time from Kugan from Linaro). > > Right. I'm aware of this work and the problem he's trying to solve and have > been loosely watching it -- primarily for the persistent VRP information. > > > > >>> Similarly, I know there's a type hoisting patch that's also queued up. I >>> think it should be handled separately as well. >> >> >> I think we need to paint a picture of the final result - what is the >> main objective of the various(?!) passes in question? Where do >> we do the same kind of transformation already? > > I thought we'd done this at a high level already. At the heart of this work > is to: > > 1. Isolate, to the fullest extent possible, code which promotes and > demotes types. We have this stuff all over the place right now > and it's very ad-hoc. > > 2. Promote/demote types to allow our optimizers to not concern > themselves with walking back through type conversions when applying > optimizations. > > 3. Promote/demote types to expose unnecessary type conversions. > > > If we look at #2 and #3 we can expect that we'd want a structure which > allows for a simplification/optimization step to occur after types are > promoted or demoted. ie, a pipeline that looks like: > > promote types -> optimize1 -> demote types -> optimize2 > > Now where that little mini pipeline lands is still a big question to me. > optimize1 may be a fairly significant hunk of our pipeline. optimize2 > probably isn't (may just be a final tree-ssa-forwprop pass). > > >> >> We have no pass that tries to promote or demote the types of >> variables with using a data-flow approach (VRP comes closest, >> but the transform is again pattern-matching, thus combine-like). >> I do not object to adding this kind of pass, but I suggest to >> look at the targets desires when implementing it - which eventually >> means to honor PROMOTE_MODE (be careful about pass >> placement here - you want this after loop optimizations like >> vectorization but possibly before induction variable optimization). > > Placement is one of the biggest questions in my mind. If I think about > something like the old SGI compiler, they did a very early promotion, then > lowered/demoted and got reasonable results with it. If we remove the ad-hoc frontend code and strip down fold then an early combine phase (before CSE wrecks single-use cases) will more reliably handle what frontends and fold do. Conveniently the first forwprop is already placed very early. > As far as dealing with the target dependencies, there's no clear "this is > best". I vaguely recall discussions with Kai where we decided that handling > PROMOTE_MODE was relatively easy from a coding standpoint -- it's more a > matter of where does that fit into the entire optimization pipeline. I > could make arguments either way. One thing is honoring PROMOTE_MODE for deciding what types to promote/demote to, another thing is applying PROMOTE_MODE somewhen during GIMPLE optimizations with the goal to remove its handling from RTL expansion (I'd really like to move most of RTL expansions side-effects such as PROMOTE_MODE or strict-align bitfield memory stuff to GIMPLE). Richard. > Jeff >
On 10/17/13 04:41, Richard Biener wrote: >> I don't see this as the major benefit of type demotion. Yes, there is some >> value in shrinking constants and the like, but in my experience the benefits >> are relatively small and often get lost in things like partial register >> stalls on x86, the PA and probably others (yes, the PA has partial register >> stalls, it's just that nobody used that term). >> >> What I really want to get at here is avoiding having a large number of >> optimizers looking back through the use-def chains and attempting to elide >> typecasts in the middle of a chain of statements of interest. > > Hmm, off the top of my head only forwprop and VRP look back through > use-def chains to elide typecasts. And they do that to optimize those > casts, thus it is their job ...? Other cases are around, but those > are of the sorts of "is op1 available in type X and/or can I safely cast > it to type X?" that code isn't going to be simplified by generic > promotion / demotion because that code isn't going to know what > type pass Y in the end wants. I strongly suspect if we were to look hard at why various optimizations weren't being applied in cases where intuitively we think they should, we'd find that type conversions are often the culprit. And so we'd go off fixing the vectorizer, DOM, and god knows what else to start looking through the type conversions. I want to stop this before it starts. I'm *certain* that to do this well, we're going to need a mess of additional cases in tree-ssa-forwprop.c based on my prior investigations. A large part of the reason I stopped with that work was I could already see the code was ultimately going to be an utter mess. > Abstracting functions that can answer those questions instead of > repeating N variants of it would of course be nice. Or we can move the type conversions out of the way so they don't impact our optimizers. > > Likewise reducing the number of places we perform promotion / demotion > (remove it from frontend code and fold, add it in the GIMPLE combiner). > > Also making the GIMPLE combiner available as an utility to apply > to a single statement (see my very original GIMPLE-fold proposal) > would be very useful. I strongly believe the gimple combiner is not the place to handle promotion/demotion based on already working through some of these issues privately. It was that investigative work which led me to look more closely at what Kai was doing with the promotion/demotion work. > > As for promotion / demotion (if you are not talking about applying > PROMOTE_MODE which rather forces promotion of variables and > requires inserting compensation code), you want to optimize > > op1 = (T) op1'; > op2 = (T) op2'; > x = op1 OP op2; (*) > y = (T2) x; > > to either carry out OP in type T2 or in a type derived from the types > of op1' and op2'. That's part of the benefit, but you also want to be able to look at where op1' and op2' came from and possibly do something even more significant than just changing the type of OP. Getting the casts out of the way makes that a lot easier. And that's one of the reasons why you want both promotion and demotion, both expose those kind of opportunities. > > For the simple case combine-like pattern matching is ok. It gets > more complicated if there are a series of statements here (*), but > even that case is handled by iteratively applying the combiner > patterns (which forwprop does). Right, but you're still missing the point that every time a type conversion apepars in a stream of interesting statements that you have to special case the optimization to deal with the type conversions. With Kai's work that special casing goes away and thus our existing reassociation & forwprop passes do a better job without needing a ton of special cases. > > If you split out promotion / demotion into a separate pass then > you introduce pass ordering issues as combining may introduce > promotion / demotion opportunities and the other way around. Right, which is why you promote, optimize, demote, optimize. Both promotion and demotion have the potential to expose optimizable sequences. It's not perfect, but it's a hell of a lot better than what we do now. > > If we remove the ad-hoc frontend code and strip down fold then an > early combine phase (before CSE wrecks single-use cases) will > more reliably handle what frontends and fold do. Conveniently the first > forwprop is already placed very early. But again, you're burdening every transformation in forwprop with being aware that there may be type conversions mid-stream and having to deal with them. So consider a slightly different approach where we promote, run forwprop, demote, run forwprop, all before PRE/DOM, etc wreck the single use cases. > >> As far as dealing with the target dependencies, there's no clear "this is >> best". I vaguely recall discussions with Kai where we decided that handling >> PROMOTE_MODE was relatively easy from a coding standpoint -- it's more a >> matter of where does that fit into the entire optimization pipeline. I >> could make arguments either way. > > One thing is honoring PROMOTE_MODE for deciding what types > to promote/demote to, another thing is applying PROMOTE_MODE > somewhen during GIMPLE optimizations with the goal to remove > its handling from RTL expansion (I'd really like to move most of > RTL expansions side-effects such as PROMOTE_MODE or > strict-align bitfield memory stuff to GIMPLE). Can we please deal with PROMOTE_MODE independently from Kai's initial work. Kai's work may make it easier to implement what you want, but Kai's work has significant value independently of using it to reimplement PROMOTE_MODE in a better place in the pipeline. jeff
On Thu, Oct 17, 2013 at 9:32 PM, Jeff Law <law@redhat.com> wrote: > On 10/17/13 04:41, Richard Biener wrote: >>> >>> I don't see this as the major benefit of type demotion. Yes, there is >>> some >>> value in shrinking constants and the like, but in my experience the >>> benefits >>> are relatively small and often get lost in things like partial register >>> stalls on x86, the PA and probably others (yes, the PA has partial >>> register >>> stalls, it's just that nobody used that term). >>> >>> What I really want to get at here is avoiding having a large number of >>> optimizers looking back through the use-def chains and attempting to >>> elide >>> typecasts in the middle of a chain of statements of interest. >> >> >> Hmm, off the top of my head only forwprop and VRP look back through >> use-def chains to elide typecasts. And they do that to optimize those >> casts, thus it is their job ...? Other cases are around, but those >> are of the sorts of "is op1 available in type X and/or can I safely cast >> it to type X?" that code isn't going to be simplified by generic >> promotion / demotion because that code isn't going to know what >> type pass Y in the end wants. > > I strongly suspect if we were to look hard at why various optimizations > weren't being applied in cases where intuitively we think they should, we'd > find that type conversions are often the culprit. > > And so we'd go off fixing the vectorizer, DOM, and god knows what else to > start looking through the type conversions. I want to stop this before it > starts. > > I'm *certain* that to do this well, we're going to need a mess of additional > cases in tree-ssa-forwprop.c based on my prior investigations. A large part > of the reason I stopped with that work was I could already see the code was > ultimately going to be an utter mess. > > > >> Abstracting functions that can answer those questions instead of >> repeating N variants of it would of course be nice. > > Or we can move the type conversions out of the way so they don't impact our > optimizers. You can't move type conversion "out of the way" in most cases as GIMPLE is stronly typed and data sources and sinks can obviously not be "promoted" (nor can function arguments). So you'll very likely not be able to remove the code from the optimizers, it will only maybe trigger less often. >> Likewise reducing the number of places we perform promotion / demotion >> (remove it from frontend code and fold, add it in the GIMPLE combiner). >> >> Also making the GIMPLE combiner available as an utility to apply >> to a single statement (see my very original GIMPLE-fold proposal) >> would be very useful. > > I strongly believe the gimple combiner is not the place to handle > promotion/demotion based on already working through some of these issues > privately. It was that investigative work which led me to look more closely > at what Kai was doing with the promotion/demotion work. > > > >> >> As for promotion / demotion (if you are not talking about applying >> PROMOTE_MODE which rather forces promotion of variables and >> requires inserting compensation code), you want to optimize >> >> op1 = (T) op1'; >> op2 = (T) op2'; >> x = op1 OP op2; (*) >> y = (T2) x; >> >> to either carry out OP in type T2 or in a type derived from the types >> of op1' and op2'. > > That's part of the benefit, but you also want to be able to look at where > op1' and op2' came from and possibly do something even more significant than > just changing the type of OP. Getting the casts out of the way makes that a > lot easier. And that's one of the reasons why you want both promotion and > demotion, both expose those kind of opportunities. > > >> >> For the simple case combine-like pattern matching is ok. It gets >> more complicated if there are a series of statements here (*), but >> even that case is handled by iteratively applying the combiner >> patterns (which forwprop does). > > Right, but you're still missing the point that every time a type conversion > apepars in a stream of interesting statements that you have to special case > the optimization to deal with the type conversions. > > With Kai's work that special casing goes away and thus our existing > reassociation & forwprop passes do a better job without needing a ton of > special cases. See above - you can't remove the special casing. >> If you split out promotion / demotion into a separate pass then >> you introduce pass ordering issues as combining may introduce >> promotion / demotion opportunities and the other way around. > > Right, which is why you promote, optimize, demote, optimize. Both promotion > and demotion have the potential to expose optimizable sequences. > > It's not perfect, but it's a hell of a lot better than what we do now. I'm not sure ;) Keep an eye on compile-time. >> If we remove the ad-hoc frontend code and strip down fold then an >> early combine phase (before CSE wrecks single-use cases) will >> more reliably handle what frontends and fold do. Conveniently the first >> forwprop is already placed very early. > > But again, you're burdening every transformation in forwprop with being > aware that there may be type conversions mid-stream and having to deal with > them. So consider a slightly different approach where we promote, run > forwprop, demote, run forwprop, all before PRE/DOM, etc wreck the single use > cases. Fact is that conversions mid-stream cannot simply be ignored. If we can remove them then a combiner pattern can possibly remove them which will make the transform that only works without them trigger subsequently. The proposed patch doesn't add a single testcase nor does it remove any special code from other optimizations so it is hard to see what it tries to enable that doesn't already work. >>> As far as dealing with the target dependencies, there's no clear "this is >>> best". I vaguely recall discussions with Kai where we decided that >>> handling >>> PROMOTE_MODE was relatively easy from a coding standpoint -- it's more a >>> matter of where does that fit into the entire optimization pipeline. I >>> could make arguments either way. >> >> >> One thing is honoring PROMOTE_MODE for deciding what types >> to promote/demote to, another thing is applying PROMOTE_MODE >> somewhen during GIMPLE optimizations with the goal to remove >> its handling from RTL expansion (I'd really like to move most of >> RTL expansions side-effects such as PROMOTE_MODE or >> strict-align bitfield memory stuff to GIMPLE). > > Can we please deal with PROMOTE_MODE independently from Kai's initial work. > Kai's work may make it easier to implement what you want, but Kai's work has > significant value independently of using it to reimplement PROMOTE_MODE in a > better place in the pipeline. I think it is related in a way because PROMOTE_MODE has the issue that it introduces tons of unnecessary casts if done naiively. So the pass, if it works properly, has to show that if we apply PROMOTE_MODE as "cost model" it will remove most of the unnecessary sign-/zero-extensions (and you'll quickly find out that with strongly typed GIMPLE this gets interesting). Richard. > jeff > >
On Fri, Oct 18, 2013 at 12:06:35PM +0200, Richard Biener wrote: > You can't move type conversion "out of the way" in most cases as > GIMPLE is stronly typed > and data sources and sinks can obviously not be "promoted" (nor can > function arguments). > So you'll very likely not be able to remove the code from the > optimizers, it will only maybe > trigger less often. My take on the type demotion and promotion is that we badly need it and the question is just in which pass to do it. The benefit of type demotion is code canonicalization and removing unnecessary computation that e.g. only affects the upper bits that are going to be thrown away anyway, the disadvantage of type demotion of signed operations is that we need to perform them in unsigned type instead and thus we can't perform some loop optimizations based on undefined behavior etc. See e.g. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45397#c0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45397#c1 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45397#c8 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45397#c10 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47477#c16 for some testcases where type demotion can improve generated code. If types are demoted, upper bits of constants go away, SCCVN can find equivalences between SSA_NAMEs that wouldn't be considered before, etc. But given the issue with signed operation type demotion, I think before loop optimizations we should only be doing type demotions that don't result in defining previously undefined behavior operations. I guess passes like forwprop, gimple-fold etc. could easily handle the easy cases, where there is a tree of has_single_use SSA_NAMEs that can be demoted, but handling a more complicated web would be harder. Say in: unsigned int a, b, c, d, e, f; unsigned char h, i, j; void foo (void) { unsigned int k = a * 2 + b + 0x12340000; unsigned int l = c * 4 + d + 0x23456700; unsigned int m = e * 5 + f, n = k + l - m, o = k - l + m, p = -k + 1; h = n; i = o; j = p; } k, l, m all have multiple imm uses, but still pretty much everything in this function could be demoted to unsigned char, the two large constants could go away as additions of zero, etc. Perhaps that can be seen as little benefit, but what if the above is all s/unsigned int/unsigned long long/;s/unsigned char/unsigned int/ on 32-bit target? RTL subreg pass might help a little bit, but that is too late. For the demotion which changes undefined overflow operations to defined ones, I wonder when is the last pass that usefully makes use of that information, if e.g. we could do the full type demotion already before vectorization somewhere in the loop optimization queue, or if that is still too early. Where type demotion and promotion is very important is IMHO vectorization, the code we generate for mixed types vectorization is just huge and terrible. If we can help it by not computing useless upper bits, or on the other side sometimes not doing parts of computations in smaller types, which lead to all the other computations on wider types to be done with bigger vectorization factor, we could improve generated code quality. I wonder if for vectorizations we couldn't use the same thing I wrote recently for if-conversion, for bbs potentially suitable for vectorization (with the right loop form etc.), that is, if we don't do full type demotion before vectorization, check if we'd demote anything and if so, work only on the vectorization only loop copy (or create it), and then try to do some type promotion to minimize number of type sizes in the loop, see the http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47477#c16 (admittedly artificial) testcase for what I mean. After demotion, we could replace the cast of short to char and back just with and (for zero extension) or signed shift right + shift left (for sign extension), etc. And, finally, the question is if we generate good code if we just expand RTL from the demoted types (we'd better be, because user could have written his code in the narrower types from the beginning (well, C implicit promotions make that harder, but fold-const already demotes some computations that appear in a single statement), or if there are advantages of promoting some types, what algorithm to use for that, what cost model, what target hooks etc. Jakub
On Tue, Oct 22, 2013 at 4:27 PM, Jakub Jelinek <jakub@redhat.com> wrote: > On Fri, Oct 18, 2013 at 12:06:35PM +0200, Richard Biener wrote: >> You can't move type conversion "out of the way" in most cases as >> GIMPLE is stronly typed >> and data sources and sinks can obviously not be "promoted" (nor can >> function arguments). >> So you'll very likely not be able to remove the code from the >> optimizers, it will only maybe >> trigger less often. > > My take on the type demotion and promotion is that we badly need it and the > question is just in which pass to do it. > > The benefit of type demotion is code canonicalization and removing > unnecessary computation that e.g. only affects the upper bits that are going > to be thrown away anyway, the disadvantage of type demotion of signed > operations is that we need to perform them in unsigned type instead and thus > we can't perform some loop optimizations based on undefined behavior etc. > See e.g. > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45397#c0 > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45397#c1 > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45397#c8 > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45397#c10 > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47477#c16 > for some testcases where type demotion can improve generated code. > If types are demoted, upper bits of constants go away, SCCVN can find > equivalences between SSA_NAMEs that wouldn't be considered before, etc. Indeed demotion for this reason is good and important (we may also be able to remove the code in the frontends and in fold that "shorten" operations). > But given the issue with signed operation type demotion, I think before loop > optimizations we should only be doing type demotions that don't result > in defining previously undefined behavior operations. But the demotion pass could fill in range information which maybe allows to recover parts of the undefinedness. > I guess passes like > forwprop, gimple-fold etc. could easily handle the easy cases, where there > is a tree of has_single_use SSA_NAMEs that can be demoted, but handling > a more complicated web would be harder. Say in: > unsigned int a, b, c, d, e, f; unsigned char h, i, j; > void > foo (void) > { > unsigned int k = a * 2 + b + 0x12340000; > unsigned int l = c * 4 + d + 0x23456700; > unsigned int m = e * 5 + f, n = k + l - m, o = k - l + m, p = -k + 1; > h = n; i = o; j = p; > } > k, l, m all have multiple imm uses, but still pretty much everything in this > function could be demoted to unsigned char, the two large constants could go > away as additions of zero, etc. Perhaps that can be seen as little benefit, > but what if the above is all > s/unsigned int/unsigned long long/;s/unsigned char/unsigned int/ on 32-bit > target? RTL subreg pass might help a little bit, but that is too late. Yeah, I'd like to see testcases like this with the expected outcome. > For the demotion which changes undefined overflow operations to defined > ones, I wonder when is the last pass that usefully makes use of that > information, if e.g. we could do the full type demotion already before > vectorization somewhere in the loop optimization queue, or if that is still > too early. The most important user is number of iterations analysis. > Where type demotion and promotion is very important is IMHO vectorization, > the code we generate for mixed types vectorization is just huge and > terrible. If we can help it by not computing useless upper bits, or on the > other side sometimes not doing parts of computations in smaller types, which > lead to all the other computations on wider types to be done with bigger > vectorization factor, we could improve generated code quality. > > I wonder if for vectorizations we couldn't use the same thing I wrote > recently for if-conversion, for bbs potentially suitable for vectorization > (with the right loop form etc.), that is, if we don't do full type demotion > before vectorization, check if we'd demote anything and if so, work only on > the vectorization only loop copy (or create it), and then try to do some > type promotion to minimize number of type sizes in the loop, > see the http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47477#c16 (admittedly > artificial) testcase for what I mean. After demotion, we could replace the > cast of short to char and back just with and (for zero extension) or signed > shift right + shift left (for sign extension), etc. > > And, finally, the question is if we generate good code if we just expand RTL > from the demoted types (we'd better be, because user could have written his > code in the narrower types from the beginning (well, C implicit promotions > make that harder, but fold-const already demotes some computations that > appear in a single statement), or if there are advantages of promoting some > types, what algorithm to use for that, what cost model, what target hooks > etc. I guess that experiments will show that not doing the promotion again will regress things. Ideally we'd promote in a target specific way, just like what expand would do (and then adjust targets that don't do aggressive promotion for the now aggressive demotion done earlier). Richard. > Jakub
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr45397-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-1.c new file mode 100644 index 0000000..ba395e4 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-1.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +signed char a[1024], b[1024]; + +void +baz (void) +{ + int i, s, t; + for (i = 0; i < 1024; i++) + { s = a[i]; t = b[i]; s += t + 0x12345600; a[i] = s; } +} + +/* { dg-final { scan-tree-dump-times "305419776" 0 "optimized" } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ + diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr45397-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-2.c new file mode 100644 index 0000000..8185c69 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-2.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +int foo (const unsigned char *tmp, int i, int val) +{ + return (unsigned char)(((tmp[i] + val)>0xFF)?0xFF:(((tmp[i] + val)<0)?0:(tmp[i] + val))); +} + +int bar (const unsigned char *tmp, int i, int val) +{ + int x = (((tmp[i] + val)>0xFF)?0xFF:(((tmp[i] + val)<0)?0:(tmp[i] + val))); + return (unsigned char)x; +} + + +/* { dg-final { scan-tree-dump-times "MIN_EXPR" 2 "optimized" { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump-times "MAX_EXPR" 2 "optimized" { xfail *-*-* } } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ + diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr45397-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-3.c new file mode 100644 index 0000000..15a493e --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-3.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + + +int foo (const unsigned char *a, int b, int c) +{ + int x = (unsigned char) (a[b] + c); + int y = a[b] + c; + int z = (unsigned char) y; + return x == z; +} + + +/* { dg-final { scan-tree-dump-times "return 1;" 1 "optimized" { xfail *-*-* } } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ + diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr45397-5.c b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-5.c new file mode 100644 index 0000000..f185526 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-5.c @@ -0,0 +1,34 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + + + +int f1 (int a, int b, int c) +{ + if ((a + b) == (c + a)) + return 1; + return 0; +} + +int f2 (int a, int b, int c) +{ + if ((a ^ b) == (a ^ c)) + return 1; + return 0; +} + + +int f3 (int a, int b) +{ + if (-a == (b - a)) + return 1; + return 0; +} + + + +/* { dg-final { scan-tree-dump-not "\\+" "optimized" { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump-not "\\^" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "\\-" "optimized" { xfail *-*-* } } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ + diff --git a/gcc/testsuite/gcc.target/i386/pr45397-4.c b/gcc/testsuite/gcc.target/i386/pr45397-4.c new file mode 100644 index 0000000..9bf66b7 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr45397-4.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-optimized -mavx" } */ + + +signed char a[1024], b[1024]; + +void +foo (void) +{ + int i, s, t; + for (i = 0; i < 1024; i++) + { s = a[i]; t = b[i]; s += t; a[i] = s; } +} + +void +bar (void) +{ + int i; + for (i = 0; i < 1024; i++) + a[i] += b[i]; +} + + + +/* { dg-final { scan-tree-dump-times "VIEW_CONVERT_EXPR" 6 "optimized" } } */ +/* { dg-final { scan-tree-dump-not "VEC_PACK_TRUNC_EXPR" "optimized" } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ + diff --git a/gcc/testsuite/gcc.target/i386/pr47477-1.c b/gcc/testsuite/gcc.target/i386/pr47477-1.c new file mode 100644 index 0000000..a70ce87 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr47477-1.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-optimized -m32" } */ + + +void * +add (void *a, void *b) +{ + return (void *)(__INTPTR_TYPE__) ((long long)(__INTPTR_TYPE__) a + ((long long)(__INTPTR_TYPE__) b & ~1L)); +} + +void * +bar (void *a, void *b) +{ + long long tmp = (long long)(__INTPTR_TYPE__) a + ((long long)(__INTPTR_TYPE__) b & ~1L); + return (void *)(__INTPTR_TYPE__) tmp; +} + + + + +/* { dg-final { scan-tree-dump-times "\\(unsigned int\\)" 4 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "& 4294967294" 2 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "\\+" 2 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "\\(void \\*\\)" 2 "optimized" } } */ +/* { dg-final { scan-tree-dump-not "\\(long long int\\)" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "\\(int\\)" "optimized" } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ + +