Message ID | 2417129.VoWsWu5CJz@polaris |
---|---|
State | New |
Headers | show |
Eric Botcazou <ebotcazou@adacore.com> writes: > There seems to be a sufficiently large consensus that volatile asms should not > be treated as full optimization barriers by the compiler. This patch is a > first step towards implementing this conclusion and aims only at addressing > the code quality regressions introduced by > http://gcc.gnu.org/viewcvs/gcc?view=revision&revision=193802 > on the 4.8 branch and mainline for volatile asms. > > It introduces a new, temporary predicate really_volatile_insn_p and invokes it > from the 3 places in cse.c, cselib.c and dse.c which were touched above. But > this comes with a side effect: the "blockage" standard pattern needs to be > adjusted to always use an UNSPEC_VOLATILE. That's already the case for all > the architectures that define it, but 21 architectures (aarch64 arc avr bfin > cr16 cris epiphany h8300 m32c mcore mmix mn10300 moxie msp430 nds32 picochip > rl78 rx score v850 xtensa) don't define it and therefore need to be adjusted. > > Tested on x86_64-suse-linux and by building cc1 and compiling a relevant > testcase for the 21 aforementioned architectures. Thanks for doing this. FWIW I agree it's probably the best stop-gap fix. But the implication seems to be that unspec_volatile and volatile asms are volatile in different ways. IMO they're volatile in the same way and the problems for volatile asms apply to unspec_volatile too. E.g. although cse.c will flush the table for unspec_volatile, it isn't the case that unspec_volatile forces a containing function to save all call-saved registers. That would be excessive for a plain blockage instruction. So again we seem to be assuming one thing in places like cse.c and another in the register allocator. Code that uses the DF framework will also assume that registers are not implicitly clobbered by an unspec_volatile: case ASM_OPERANDS: case UNSPEC_VOLATILE: case TRAP_IF: case ASM_INPUT: { /* Traditional and volatile asm instructions must be considered to use and clobber all hard registers, all pseudo-registers and all of memory. So must TRAP_IF and UNSPEC_VOLATILE operations. Consider for instance a volatile asm that changes the fpu rounding mode. An insn should not be moved across this even if it only uses pseudo-regs because it might give an incorrectly rounded result. However, flow.c's liveness computation did *not* do this, giving the reasoning as " ?!? Unfortunately, marking all hard registers as live causes massive problems for the register allocator and marking all pseudos as live creates mountains of uninitialized variable warnings." In order to maintain the status quo with regard to liveness and uses, we do what flow.c did and just mark any regs we can find in ASM_OPERANDS as used. In global asm insns are scanned and regs_asm_clobbered is filled out. For all ASM_OPERANDS, we must traverse the vector of input operands. We can not just fall through here since then we would be confused by the ASM_INPUT rtx inside ASM_OPERANDS, which do not indicate traditional asms unlike their normal usage. */ if (code == ASM_OPERANDS) { int j; for (j = 0; j < ASM_OPERANDS_INPUT_LENGTH (x); j++) df_uses_record (collection_rec, &ASM_OPERANDS_INPUT (x, j), DF_REF_REG_USE, bb, insn_info, flags); return; } break; } Also, ira-lives.c (which tracks the liveness of both pseudo and hard registers) doesn't mention "volatile" at all. So most passes assume that no pseudos or hard registers will be implicitly clobbered by unspec_volatile, just like for a volatile asm. And IMO that's right. I think the rule should be the same for volatile asms and unspec_volatiles, and the same for registers as it already is for memory: if the instruction clobbers something, it should say so explicitly. Volatile itself should: (a) prevent deletion or duplication of the operation (b) prevent reordering wrt other volatiles (c) prevent the operation from being considered equivalent to any other operation (even if it's structurally identical and has the same inputs) but nothing beyond that. So in terms of where we go from here, I'd hope we'd add something like fake hard registers to track any processor mode of interest, such as FP rounding mode once that is modelled properly. Then anything that relies on a specific mode would use that register and anything that changes the mode would set the register, meaning we have a proper dependency chain. I think we should do the same for whatever unspec_volatiles caused the RTL CSE & co. to be so conservative. Trying to leave it implicit seems too error-prone, because then you have to make sure that every rtl pass agrees on what the implicit behaviour is. Thanks, Richard
> Thanks for doing this. FWIW I agree it's probably the best stop-gap fix. > But the implication seems to be that unspec_volatile and volatile asms > are volatile in different ways. IMO they're volatile in the same way > and the problems for volatile asms apply to unspec_volatile too. I disagree, we need a simple way for the RTL middle-end as well as the back- ends to block most optimizations across a specific point (e.g. a non-local label as in HP's fix) and UNSPEC_VOLATILE is the best candidate, at least in the short term. > E.g. although cse.c will flush the table for unspec_volatile, > it isn't the case that unspec_volatile forces a containing function > to save all call-saved registers. That would be excessive for a plain > blockage instruction. So again we seem to be assuming one thing in places > like cse.c and another in the register allocator. Code that uses the DF > framework will also assume that registers are not implicitly clobbered > by an unspec_volatile: > [...] > Also, ira-lives.c (which tracks the liveness of both pseudo and hard > registers) doesn't mention "volatile" at all. Yes, the definition of a blockage instruction is somewhat vague and I agree that it shoudn't cause registers to be spilled. But it needs to block most, if not all, optimizations. > So most passes assume that no pseudos or hard registers will be > implicitly clobbered by unspec_volatile, just like for a volatile asm. > And IMO that's right. I think the rule should be the same for volatile > asms and unspec_volatiles, and the same for registers as it already is for > memory: if the instruction clobbers something, it should say so explicitly. IMO that would buy us nothing and, on the contrary, would add complexity where there currently isn't. We really need a simple blockage instruction. > Volatile itself should: > > (a) prevent deletion or duplication of the operation > (b) prevent reordering wrt other volatiles > (c) prevent the operation from being considered equivalent to any other > operation (even if it's structurally identical and has the same inputs) > > but nothing beyond that. Maybe UNSPEC_VOLATILE is a misnomer then and we should allow volatile UNSPECs along the above lines.
Eric Botcazou <ebotcazou@adacore.com> writes: >> Thanks for doing this. FWIW I agree it's probably the best stop-gap fix. >> But the implication seems to be that unspec_volatile and volatile asms >> are volatile in different ways. IMO they're volatile in the same way >> and the problems for volatile asms apply to unspec_volatile too. > > I disagree, we need a simple way for the RTL middle-end as well as the back- > ends to block most optimizations across a specific point (e.g. a non-local > label as in HP's fix) and UNSPEC_VOLATILE is the best candidate, at least in > the short term. I don't agree it's the best candidate if... >> E.g. although cse.c will flush the table for unspec_volatile, >> it isn't the case that unspec_volatile forces a containing function >> to save all call-saved registers. That would be excessive for a plain >> blockage instruction. So again we seem to be assuming one thing in places >> like cse.c and another in the register allocator. Code that uses the DF >> framework will also assume that registers are not implicitly clobbered >> by an unspec_volatile: >> [...] >> Also, ira-lives.c (which tracks the liveness of both pseudo and hard >> registers) doesn't mention "volatile" at all. > > Yes, the definition of a blockage instruction is somewhat vague and I agree > that it shoudn't cause registers to be spilled. But it needs to block most, > if not all, optimizations. ...it's so loosely defined. If non-local labels are the specific problem, I think it'd be better to limit the flush to that. I'm back to throwing examples around, sorry, but take the MIPS testcase: volatile int x = 1; void foo (void) { x = 1; __builtin_mips_set_fcsr (0); x = 2; } where __builtin_mips_set_fcsr is a handy way of getting unspec_volatile. (I'm not interested in what the function does here.) Even at -O2, the cse.c code successfully prevents %hi(x) from being shared, as you'd expect: li $3,1 # 0x1 lui $2,%hi(x) sw $3,%lo(x)($2) move $2,$0 ctc1 $2,$31 li $3,2 # 0x2 lui $2,%hi(x) sw $3,%lo(x)($2) j $31 nop But put it in a loop: void frob (void) { for (;;) { x = 1; __builtin_mips_set_fcsr (0); x = 2; } } and we get the rather bizarre code: lui $2,%hi(x) li $6,1 # 0x1 move $5,$0 move $4,$2 li $3,2 # 0x2 .align 3 .L3: sw $6,%lo(x)($2) ctc1 $5,$31 sw $3,%lo(x)($4) j .L3 lui $2,%hi(x) Here the _second_ %hi(x), the 1 and the 2 have been hoisted but the first %hi(x) is reloaded each time. So what's the correct behaviour here? Should the hoisting of the second %hi(x) have been disabled because the loop contains an unspec_volatile? What about the 1 (from the first store) and the 2? If instead it was: for (i = 0; i < 100; i++) would converting to a hardware do-loop be acceptable? >> So most passes assume that no pseudos or hard registers will be >> implicitly clobbered by unspec_volatile, just like for a volatile asm. >> And IMO that's right. I think the rule should be the same for volatile >> asms and unspec_volatiles, and the same for registers as it already is for >> memory: if the instruction clobbers something, it should say so explicitly. > > IMO that would buy us nothing and, on the contrary, would add complexity where > there currently isn't. We really need a simple blockage instruction. > >> Volatile itself should: >> >> (a) prevent deletion or duplication of the operation >> (b) prevent reordering wrt other volatiles >> (c) prevent the operation from being considered equivalent to any other >> operation (even if it's structurally identical and has the same inputs) >> >> but nothing beyond that. > > Maybe UNSPEC_VOLATILE is a misnomer then and we should allow volatile UNSPECs > along the above lines. That'd be fine with me, especially since with the patch it sounds like using volatile asm would produce better code than a built-in function that expands to an unspec_volatile, whereas IMO the opposite should always be true. But I still think we need a similar list for what unspec_volatile means now, if we decide to keep something with the current meaning. Thanks, Richard
On Mon, Mar 3, 2014 at 9:01 AM, Richard Sandiford <rdsandiford@googlemail.com> wrote: > Eric Botcazou <ebotcazou@adacore.com> writes: >>> Thanks for doing this. FWIW I agree it's probably the best stop-gap fix. >>> But the implication seems to be that unspec_volatile and volatile asms >>> are volatile in different ways. IMO they're volatile in the same way >>> and the problems for volatile asms apply to unspec_volatile too. >> >> I disagree, we need a simple way for the RTL middle-end as well as the back- >> ends to block most optimizations across a specific point (e.g. a non-local >> label as in HP's fix) and UNSPEC_VOLATILE is the best candidate, at least in >> the short term. > > I don't agree it's the best candidate if... > >>> E.g. although cse.c will flush the table for unspec_volatile, >>> it isn't the case that unspec_volatile forces a containing function >>> to save all call-saved registers. That would be excessive for a plain >>> blockage instruction. So again we seem to be assuming one thing in places >>> like cse.c and another in the register allocator. Code that uses the DF >>> framework will also assume that registers are not implicitly clobbered >>> by an unspec_volatile: >>> [...] >>> Also, ira-lives.c (which tracks the liveness of both pseudo and hard >>> registers) doesn't mention "volatile" at all. >> >> Yes, the definition of a blockage instruction is somewhat vague and I agree >> that it shoudn't cause registers to be spilled. But it needs to block most, >> if not all, optimizations. > > ...it's so loosely defined. If non-local labels are the specific problem, > I think it'd be better to limit the flush to that. non-local labels should block most optimizations by the fact they are a receiver of control flow and thus should have an abnormal edge coming into them. If that's not the case (no abnormal edge) then that's the bug to fix. Otherwise I agree with Richard. Please sit down and _exactly_ define what 'volatile' in an asm provides for guarantees compared to non-volatile asms. Likewise do so for volatile UNSPECs. A volatile shouldn't be a cheap way out of properly enumerating all uses, defs and clobbers of a stmt. If volatile is used to tell the insn has additional uses/defs or clobbers to those explicitely given the only reason that may be valid is because we cannot explicitely enumerate those. But we should fix that instead (for example with the special register idea or by adding a middle-end wide "special" "blockage" that you can use/def/clobber). To better assess the problem at hand can you enumerate the cases where you need that special easy "blockage" instruction? With testcases please? Note that on GIMPLE even volatiles are not strictly ordered if they don't have a dependence that orders them (that doesn't mean that any existing transform deliberately re-orders them, but as shown with the loop example below such re-ordering can happen as a side-effect of a valid transform). > I'm back to throwing examples around, sorry, but take the MIPS testcase: > > volatile int x = 1; > > void foo (void) > { > x = 1; > __builtin_mips_set_fcsr (0); > x = 2; > } > > where __builtin_mips_set_fcsr is a handy way of getting unspec_volatile. > (I'm not interested in what the function does here.) Even at -O2, > the cse.c code successfully prevents %hi(x) from being shared, > as you'd expect: > > li $3,1 # 0x1 > lui $2,%hi(x) > sw $3,%lo(x)($2) > move $2,$0 > ctc1 $2,$31 > li $3,2 # 0x2 > lui $2,%hi(x) > sw $3,%lo(x)($2) > j $31 > nop > > But put it in a loop: > > void frob (void) > { > for (;;) > { > x = 1; > __builtin_mips_set_fcsr (0); > x = 2; > } > } > > and we get the rather bizarre code: > > lui $2,%hi(x) > li $6,1 # 0x1 > move $5,$0 > move $4,$2 > li $3,2 # 0x2 > .align 3 > .L3: > sw $6,%lo(x)($2) > ctc1 $5,$31 > sw $3,%lo(x)($4) > j .L3 > lui $2,%hi(x) > > Here the _second_ %hi(x), the 1 and the 2 have been hoisted but the first > %hi(x) is reloaded each time. So what's the correct behaviour here? > Should the hoisting of the second %hi(x) have been disabled because the > loop contains an unspec_volatile? What about the 1 (from the first store) > and the 2? > > If instead it was: > > for (i = 0; i < 100; i++) > > would converting to a hardware do-loop be acceptable? > >>> So most passes assume that no pseudos or hard registers will be >>> implicitly clobbered by unspec_volatile, just like for a volatile asm. >>> And IMO that's right. I think the rule should be the same for volatile >>> asms and unspec_volatiles, and the same for registers as it already is for >>> memory: if the instruction clobbers something, it should say so explicitly. >> >> IMO that would buy us nothing and, on the contrary, would add complexity where >> there currently isn't. We really need a simple blockage instruction. >> >>> Volatile itself should: >>> >>> (a) prevent deletion or duplication of the operation >>> (b) prevent reordering wrt other volatiles >>> (c) prevent the operation from being considered equivalent to any other >>> operation (even if it's structurally identical and has the same inputs) >>> >>> but nothing beyond that. >> >> Maybe UNSPEC_VOLATILE is a misnomer then and we should allow volatile UNSPECs >> along the above lines. > > That'd be fine with me, especially since with the patch it sounds like > using volatile asm would produce better code than a built-in function > that expands to an unspec_volatile, whereas IMO the opposite should > always be true. > > But I still think we need a similar list for what unspec_volatile > means now, if we decide to keep something with the current meaning. > > Thanks, > Richard
> non-local labels should block most optimizations by the fact they > are a receiver of control flow and thus should have an abnormal > edge coming into them. If that's not the case (no abnormal edge) > then that's the bug to fix. It's (of course) more complicated, you need to look at HP's fix and testcase to see why we need a full optimization barrier. See also the prologue and epilogue of many architectures which also need a blockage when they are establishing or destroying the frame. > Otherwise I agree with Richard. Please sit down and _exactly_ define > what 'volatile' in an asm provides for guarantees compared to non-volatile > asms. Likewise do so for volatile UNSPECs. Too late, we apparently all agree about what volatile asms and future volatile UNSPECs mean. :-) The remaining point is UNSPEC_VOLATILE, but the discussion can be deferred until the next stage 1. > A volatile shouldn't be a cheap way out of properly enumerating all > uses, defs and clobbers of a stmt. If volatile is used to tell the > insn has additional uses/defs or clobbers to those explicitely given > the only reason that may be valid is because we cannot explicitely > enumerate those. But we should fix that instead (for example with > the special register idea or by adding a middle-end wide "special" > "blockage" that you can use/def/clobber). For the time being this special blockage is UNSPEC_VOLATILE for RTL.
Eric Botcazou <ebotcazou@adacore.com> writes: >> non-local labels should block most optimizations by the fact they >> are a receiver of control flow and thus should have an abnormal >> edge coming into them. If that's not the case (no abnormal edge) >> then that's the bug to fix. > > It's (of course) more complicated, you need to look at HP's fix and testcase > to see why we need a full optimization barrier. See also the prologue and > epilogue of many architectures which also need a blockage when they are > establishing or destroying the frame. But the prologue/epilogue case often doesn't need to be a full blockage. We could move a load-immediate instruction -- or even an accesss to known- global memory -- before the allocation or after the deallocation. This can actually be important on architectures that use load-multiple to restore the return register and want the prefetcher to see the target address as early as possible. So I think the prologue and epilogue is one case where we really do want to spell out what's clobbered by the allocation and deallocation. Thanks, Richard
> ...it's so loosely defined. If non-local labels are the specific problem, > I think it'd be better to limit the flush to that. No, there was "e.g." written so non-local labels are not the only problem. > I'm back to throwing examples around, sorry, but take the MIPS testcase: > > volatile int x = 1; > > void foo (void) > { > x = 1; > __builtin_mips_set_fcsr (0); > x = 2; > } > > where __builtin_mips_set_fcsr is a handy way of getting unspec_volatile. > (I'm not interested in what the function does here.) Even at -O2, > the cse.c code successfully prevents %hi(x) from being shared, > as you'd expect: > > li $3,1 # 0x1 > lui $2,%hi(x) > sw $3,%lo(x)($2) > move $2,$0 > ctc1 $2,$31 > li $3,2 # 0x2 > lui $2,%hi(x) > sw $3,%lo(x)($2) > j $31 > nop > > But put it in a loop: > > void frob (void) > { > for (;;) > { > x = 1; > __builtin_mips_set_fcsr (0); > x = 2; > } > } > > and we get the rather bizarre code: > > lui $2,%hi(x) > li $6,1 # 0x1 > move $5,$0 > move $4,$2 > li $3,2 # 0x2 > .align 3 > .L3: > sw $6,%lo(x)($2) > ctc1 $5,$31 > sw $3,%lo(x)($4) > j .L3 > lui $2,%hi(x) > > Here the _second_ %hi(x), the 1 and the 2 have been hoisted but the first > %hi(x) is reloaded each time. So what's the correct behaviour here? > Should the hoisting of the second %hi(x) have been disabled because the > loop contains an unspec_volatile? What about the 1 (from the first store) > and the 2? Well, I personally wouldn't spend much time on the code generated in a loop containing an UNSPEC_VOLATILE. If an instruction or a builtin is supposed to be performance-sensitive, then don't use an UNSPEC_VOLATILE by all means and properly model it instead!
Eric Botcazou <ebotcazou@adacore.com> writes: >>> ...it's so loosely defined. If non-local labels are the specific problem, >> I think it'd be better to limit the flush to that. > > No, there was "e.g." written so non-local labels are not the only problem. What are the others though? As discussed in the other subthread, I don't think prologue and epilogue barriers are quite the same. >> I'm back to throwing examples around, sorry, but take the MIPS testcase: >> >> volatile int x = 1; >> >> void foo (void) >> { >> x = 1; >> __builtin_mips_set_fcsr (0); >> x = 2; >> } >> >> where __builtin_mips_set_fcsr is a handy way of getting unspec_volatile. >> (I'm not interested in what the function does here.) Even at -O2, >> the cse.c code successfully prevents %hi(x) from being shared, >> as you'd expect: >> >> li $3,1 # 0x1 >> lui $2,%hi(x) >> sw $3,%lo(x)($2) >> move $2,$0 >> ctc1 $2,$31 >> li $3,2 # 0x2 >> lui $2,%hi(x) >> sw $3,%lo(x)($2) >> j $31 >> nop >> >> But put it in a loop: >> >> void frob (void) >> { >> for (;;) >> { >> x = 1; >> __builtin_mips_set_fcsr (0); >> x = 2; >> } >> } >> >> and we get the rather bizarre code: >> >> lui $2,%hi(x) >> li $6,1 # 0x1 >> move $5,$0 >> move $4,$2 >> li $3,2 # 0x2 >> .align 3 >> .L3: >> sw $6,%lo(x)($2) >> ctc1 $5,$31 >> sw $3,%lo(x)($4) >> j .L3 >> lui $2,%hi(x) >> >> Here the _second_ %hi(x), the 1 and the 2 have been hoisted but the first >> %hi(x) is reloaded each time. So what's the correct behaviour here? >> Should the hoisting of the second %hi(x) have been disabled because the >> loop contains an unspec_volatile? What about the 1 (from the first store) >> and the 2? > > Well, I personally wouldn't spend much time on the code generated in a loop > containing an UNSPEC_VOLATILE. If an instruction or a builtin is supposed to > be performance-sensitive, then don't use an UNSPEC_VOLATILE by all means and > properly model it instead! That doesn't really answer the question though. What's the correct behaviour for an unspec volatile in a loop? I don't think it's what we did in the example above, since it doesn't seem self-consistent. And "not spending too much time" is again a bit vague in terms of saying what's right and what's wrong. My point is that if the construct is well-defined enough to handle the important things we want it to handle, the answer should be known to somebody, even if it isn't to me. :-) Thanks, Richard
On Mar 1, 2014, at 6:36 AM, Eric Botcazou <ebotcazou@adacore.com> wrote:
> It introduces a new, temporary predicate really_volatile_insn_p
really is a really horrible name. Hint, if cs domain specific wikipedia describe what you were doing, what would the page be called? really is unlikely to be it.
> That doesn't really answer the question though. What's the correct > behaviour for an unspec volatile in a loop? I don't think it's what > we did in the example above, since it doesn't seem self-consistent. > And "not spending too much time" is again a bit vague in terms of > saying what's right and what's wrong. "not spending too much time" is a polite way to say I don't really care. :-) If you do, feel free to post a formal definition, a implementation plan and maybe a patch at some point.
Eric Botcazou <ebotcazou@adacore.com> writes: >> That doesn't really answer the question though. What's the correct >> behaviour for an unspec volatile in a loop? I don't think it's what >> we did in the example above, since it doesn't seem self-consistent. >> And "not spending too much time" is again a bit vague in terms of >> saying what's right and what's wrong. > > "not spending too much time" is a polite way to say I don't really care. :-) > If you do, feel free to post a formal definition, a implementation plan and > maybe a patch at some point. Well, I still reckon unspec_volatile and volatile asm should be volatile in the same way. There should be no implicit magic clobbers. Thanks, Richard
Index: doc/md.texi =================================================================== --- doc/md.texi (revision 208241) +++ doc/md.texi (working copy) @@ -6228,7 +6228,7 @@ the values of operands 1 and 2. This pattern defines a pseudo insn that prevents the instruction scheduler and other passes from moving instructions and using register equivalences across the boundary defined by the blockage insn. -This needs to be an UNSPEC_VOLATILE pattern or a volatile ASM. +This needs to be an UNSPEC_VOLATILE pattern. @cindex @code{memory_barrier} instruction pattern @item @samp{memory_barrier} Index: rtlanal.c =================================================================== --- rtlanal.c (revision 208241) +++ rtlanal.c (working copy) @@ -2136,10 +2136,11 @@ remove_node_from_expr_list (const_rtx no /* Nonzero if X contains any volatile instructions. These are instructions which may cause unpredictable machine state instructions, and thus no instructions or register uses should be moved or combined across them. - This includes only volatile asms and UNSPEC_VOLATILE instructions. */ + This includes only UNSPEC_VOLATILE instructions and, if WITH_ASMS is + true, also volatile asms. */ -int -volatile_insn_p (const_rtx x) +static int +volatile_insn_1 (const_rtx x, bool with_asms) { const RTX_CODE code = GET_CODE (x); switch (code) @@ -2164,7 +2165,7 @@ volatile_insn_p (const_rtx x) case ASM_INPUT: case ASM_OPERANDS: - if (MEM_VOLATILE_P (x)) + if (with_asms && MEM_VOLATILE_P (x)) return 1; default: @@ -2196,6 +2197,32 @@ volatile_insn_p (const_rtx x) return 0; } +/* Nonzero if X contains any volatile instructions. These are instructions + which may cause unpredictable machine state instructions, and thus no + instructions or register uses should be moved or combined across them. + This includes only volatile asms and UNSPEC_VOLATILE instructions. + + This is the historical version, now deprecated. */ + +int +volatile_insn_p (const_rtx x) +{ + return volatile_insn_1 (x, true); +} + +/* Nonzero if X contains any volatile instructions. These are instructions + which may cause unpredictable machine state instructions, and thus no + instructions or register uses should be moved or combined across them. + This includes only UNSPEC_VOLATILE instructions. + + This is the new version. */ + +int +really_volatile_insn_p (const_rtx x) +{ + return volatile_insn_1 (x, false); +} + /* Nonzero if X contains any volatile memory references UNSPEC_VOLATILE operations or volatile ASM_OPERANDS expressions. */ Index: cse.c =================================================================== --- cse.c (revision 208241) +++ cse.c (working copy) @@ -5682,9 +5682,8 @@ cse_insn (rtx insn) invalidate (XEXP (dest, 0), GET_MODE (dest)); } - /* A volatile ASM or an UNSPEC_VOLATILE invalidates everything. */ - if (NONJUMP_INSN_P (insn) - && volatile_insn_p (PATTERN (insn))) + /* A volatile instruction invalidates everything. */ + if (NONJUMP_INSN_P (insn) && really_volatile_insn_p (PATTERN (insn))) flush_hash_table (); /* Don't cse over a call to setjmp; on some machines (eg VAX) Index: rtl.def =================================================================== --- rtl.def (revision 208241) +++ rtl.def (working copy) @@ -236,7 +236,9 @@ DEF_RTL_EXPR(ASM_OPERANDS, "asm_operands */ DEF_RTL_EXPR(UNSPEC, "unspec", "Ei", RTX_EXTRA) -/* Similar, but a volatile operation and one which may trap. */ +/* Similar, but a volatile operation and one which may trap. Moreover, it's a + full optimization barrier, i.e. no instructions may be moved and no register + (hard or pseudo) or memory equivalences may be used across it. */ DEF_RTL_EXPR(UNSPEC_VOLATILE, "unspec_volatile", "Ei", RTX_EXTRA) /* ---------------------------------------------------------------------- Index: dse.c =================================================================== --- dse.c (revision 208241) +++ dse.c (working copy) @@ -2472,8 +2472,7 @@ scan_insn (bb_info_t bb_info, rtx insn) /* Cselib clears the table for this case, so we have to essentially do the same. */ - if (NONJUMP_INSN_P (insn) - && volatile_insn_p (PATTERN (insn))) + if (NONJUMP_INSN_P (insn) && really_volatile_insn_p (PATTERN (insn))) { add_wild_read (bb_info); insn_info->cannot_delete = true; Index: emit-rtl.c =================================================================== --- emit-rtl.c (revision 208241) +++ emit-rtl.c (working copy) @@ -329,21 +329,6 @@ get_reg_attrs (tree decl, int offset) return (reg_attrs *) *slot; } - -#if !HAVE_blockage -/* Generate an empty ASM_INPUT, which is used to block attempts to schedule, - and to block register equivalences to be seen across this insn. */ - -rtx -gen_blockage (void) -{ - rtx x = gen_rtx_ASM_INPUT (VOIDmode, ""); - MEM_VOLATILE_P (x) = true; - return x; -} -#endif - - /* Generate a new REG rtx. Make sure ORIGINAL_REGNO is set properly, and don't attempt to share with the various global pieces of rtl (such as frame_pointer_rtx). */ Index: cselib.c =================================================================== --- cselib.c (revision 208241) +++ cselib.c (working copy) @@ -2655,10 +2655,8 @@ cselib_process_insn (rtx insn) /* Forget everything at a CODE_LABEL, a volatile insn, or a setjmp. */ if ((LABEL_P (insn) - || (CALL_P (insn) - && find_reg_note (insn, REG_SETJMP, NULL)) - || (NONJUMP_INSN_P (insn) - && volatile_insn_p (PATTERN (insn)))) + || (CALL_P (insn) && find_reg_note (insn, REG_SETJMP, NULL)) + || (NONJUMP_INSN_P (insn) && really_volatile_insn_p (PATTERN (insn)))) && !cselib_preserve_constants) { cselib_reset_table (next_uid); Index: emit-rtl.h =================================================================== --- emit-rtl.h (revision 208241) +++ emit-rtl.h (working copy) @@ -57,7 +57,6 @@ extern rtx replace_equiv_address (rtx, r /* Likewise, but the reference is not required to be valid. */ extern rtx replace_equiv_address_nv (rtx, rtx); -extern rtx gen_blockage (void); extern rtvec gen_rtvec (int, ...); extern rtx copy_insn_1 (rtx); extern rtx copy_insn (rtx); Index: rtl.h =================================================================== --- rtl.h (revision 208241) +++ rtl.h (working copy) @@ -2064,6 +2064,7 @@ extern void remove_reg_equal_equiv_notes extern int side_effects_p (const_rtx); extern int volatile_refs_p (const_rtx); extern int volatile_insn_p (const_rtx); +extern int really_volatile_insn_p (const_rtx x); extern int may_trap_p_1 (const_rtx, unsigned); extern int may_trap_p (const_rtx); extern int may_trap_or_fault_p (const_rtx); Index: config/m32c/m32c.md =================================================================== --- config/m32c/m32c.md (revision 208241) +++ config/m32c/m32c.md (working copy) @@ -37,7 +37,8 @@ (define_constants ]) (define_constants - [(UNS_PROLOGUE_END 1) + [(UNS_BLOCKAGE 0) + (UNS_PROLOGUE_END 1) (UNS_EPILOGUE_START 2) (UNS_EH_EPILOGUE 3) (UNS_PUSHM 4) @@ -64,6 +65,13 @@ (define_mode_attr bwl [(QI "b") (HI "w") (define_code_iterator eqne_cond [eq ne]) +(define_insn "blockage" + [(unspec_volatile [(const_int 0)] UNS_BLOCKAGE)] + "" + "" + [(set_attr "flags" "n")] +) + (define_insn "nop" [(const_int 0)] "" Index: config/rx/rx.md =================================================================== --- config/rx/rx.md (revision 208241) +++ config/rx/rx.md (working copy) @@ -75,6 +75,8 @@ (define_constants (UNSPEC_BUILTIN_WAIT 51) (UNSPEC_PID_ADDR 52) + + (UNSPEC_BLOCKAGE 53) ] ) @@ -2607,7 +2609,13 @@ (define_insn "mvfcp" ;;---------- Misc ------------------------ -;; Required by cfglayout.c... +(define_insn "blockage" + [(unspec_volatile [(const_int 0)] UNSPEC_BLOCKAGE)] + "" + "" + [(set_attr "length" "0")] +) + (define_insn "nop" [(const_int 0)] "" Index: config/avr/avr.md =================================================================== --- config/avr/avr.md (revision 208241) +++ config/avr/avr.md (working copy) @@ -83,6 +83,7 @@ (define_c_enum "unspecv" UNSPECV_SLEEP UNSPECV_WDR UNSPECV_DELAY_CYCLES + UNSPECV_BLOCKAGE ]) @@ -4876,6 +4877,13 @@ (define_insn "call_value_insn" (set_attr "length" "1,*,1,*") (set_attr "adjust_len" "*,call,*,call")]) +(define_insn "blockage" + [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)] + "" + "" + [(set_attr "length" "0")] +) + (define_insn "nop" [(const_int 0)] "" Index: config/nds32/nds32.md =================================================================== --- config/nds32/nds32.md (revision 208241) +++ config/nds32/nds32.md (working copy) @@ -1993,6 +1993,13 @@ (define_expand "epilogue" [(const_int 0) ;; nop instruction. +(define_insn "blockage" + [(unspec_volatile [(const_int 0)] UNSPEC_VOLATILE_BLOCKAGE)] + "" + "" + [(set_attr "length" "0")] +) + (define_insn "nop" [(const_int 0)] "" Index: config/nds32/nds32.c =================================================================== --- config/nds32/nds32.c (revision 208241) +++ config/nds32/nds32.c (working copy) @@ -4322,6 +4322,8 @@ nds32_valid_stack_push_pop (rtx op, bool elt = XVECEXP (op, 0, 0); /* Pick up register element. */ elt_reg = push_p ? SET_SRC (elt) : SET_DEST (elt); + if (!REG_P (elt_reg)) + return false; first_regno = REGNO (elt_reg); /* The 'push' operation is a kind of store operation. Index: config/nds32/constants.md =================================================================== --- config/nds32/constants.md (revision 208241) +++ config/nds32/constants.md (working copy) @@ -41,6 +41,7 @@ (define_c_enum "unspec_volatile_element" UNSPEC_VOLATILE_MTUSR UNSPEC_VOLATILE_SETGIE_EN UNSPEC_VOLATILE_SETGIE_DIS + UNSPEC_VOLATILE_BLOCKAGE ]) ;; ------------------------------------------------------------------------ Index: config/xtensa/xtensa.md =================================================================== --- config/xtensa/xtensa.md (revision 208241) +++ config/xtensa/xtensa.md (working copy) @@ -36,6 +36,7 @@ (define_constants [ (UNSPEC_TP 10) (UNSPEC_MEMW 11) + (UNSPECV_BLOCKAGE 0) (UNSPECV_SET_FP 1) (UNSPECV_ENTRY 2) (UNSPECV_S32RI 4) @@ -1614,6 +1615,13 @@ (define_expand "epilogue" DONE; }) +(define_insn "blockage" + [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)] + "" + "" + [(set_attr "length" "0")] +) + (define_insn "nop" [(const_int 0)] "" Index: config/epiphany/epiphany.md =================================================================== --- config/epiphany/epiphany.md (revision 208241) +++ config/epiphany/epiphany.md (working copy) @@ -52,7 +52,8 @@ (define_constants (UNSPEC_FP_MODE 1) (UNSPECV_GID 0) - (UNSPECV_GIE 1)]) + (UNSPECV_GIE 1) + (UNSPECV_BLOCKAGE 2)]) ;; Insn type. Used to default other attribute values. @@ -2805,6 +2806,14 @@ (define_expand "movmisalign<mode>" DONE; }) +(define_insn "blockage" + [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)] + "" + "" + [(set_attr "length" "0") + (set_attr "type" "flow")] +) + (define_insn "nop" [(const_int 0)] "" Index: config/moxie/moxie.md =================================================================== --- config/moxie/moxie.md (revision 208241) +++ config/moxie/moxie.md (working copy) @@ -18,6 +18,9 @@ ;; along with GCC; see the file COPYING3. If not see ;; <http://www.gnu.org/licenses/>. +(define_c_enum "unspecv" + [UNSPECV_BLOCKAGE]) + ;; ------------------------------------------------------------------------- ;; Moxie specific constraints, predicates and attributes ;; ------------------------------------------------------------------------- @@ -32,6 +35,13 @@ (define_attr "length" "" (const_int 2)) ;; nop instruction ;; ------------------------------------------------------------------------- +(define_insn "blockage" + [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)] + "" + "" + [(set_attr "length" "0")] +) + (define_insn "nop" [(const_int 0)] "" Index: config/cris/cris.md =================================================================== --- config/cris/cris.md (revision 208241) +++ config/cris/cris.md (working copy) @@ -94,6 +94,9 @@ (define_c_enum "" ;; Swap all 32 bits of the operand; 31 <=> 0, 30 <=> 1... CRIS_UNSPEC_SWAP_BITS + + ;; Blockage + CRIS_UNSPECV_BLOCKAGE ]) ;; Register numbers. @@ -3816,6 +3819,14 @@ (define_insn "*expanded_call_value_v32" [(set_attr "cc" "clobber") (set_attr "slottable" "has_call_slot")]) +(define_insn "blockage" + [(unspec_volatile [(const_int 0)] CRIS_UNSPECV_BLOCKAGE)] + "" + "" + [(set_attr "cc" "none") + (set_attr "length" "0")] +) + ;; Used in debugging. No use for the direct pattern; unfilled ;; delayed-branches are taken care of by other means. Index: config/mn10300/mn10300.md =================================================================== --- config/mn10300/mn10300.md (revision 208241) +++ config/mn10300/mn10300.md (working copy) @@ -42,6 +42,8 @@ (define_constants [ (UNSPEC_LIW 8) ;; This is for the low overhead loop instructions. (UNSPEC_SETLB 9) + + (UNSPECV_BLOCKAGE 0) ]) (include "predicates.md") @@ -1678,6 +1680,12 @@ (define_expand "untyped_call" DONE; }) +(define_insn "blockage" + [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)] + "" + "" +) + (define_insn "nop" [(const_int 0)] "" Index: config/aarch64/aarch64.md =================================================================== --- config/aarch64/aarch64.md (revision 208241) +++ config/aarch64/aarch64.md (working copy) @@ -106,6 +106,7 @@ (define_c_enum "unspec" [ (define_c_enum "unspecv" [ UNSPECV_EH_RETURN ; Represent EH_RETURN + UNSPECV_BLOCKAGE ; Represent 'blockage' insn ] ) @@ -286,6 +287,14 @@ (define_insn "casesi_dispatch" (set_attr "type" "branch")] ) +(define_insn "blockage" + [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)] + "" + "" + [(set_attr "length" "0") + (set_attr "type" "no_insn")] +) + (define_insn "nop" [(unspec[(const_int 0)] UNSPEC_NOP)] "" Index: config/picochip/picochip.md =================================================================== --- config/picochip/picochip.md (revision 208241) +++ config/picochip/picochip.md (working copy) @@ -111,6 +111,9 @@ (define_constants ; Internal TSTPORT instruction, used to generate a single TSTPORT ; instruction for use in the testport branch split. (UNSPEC_INTERNAL_TESTPORT 19) + + ; Blockage instruction + (UNSPEC_BLOCKAGE 20) ] ) @@ -926,6 +929,12 @@ (define_expand "movmemhi" ;; NOP ;;=========================================================================== +(define_insn "blockage" + [(unspec_volatile [(const_int 0)] UNSPEC_BLOCKAGE)] + "" + "" + [(set_attr "length" "0")]) + ;; No-operation (NOP) (define_insn "nop" [(const_int 0)] Index: config/arc/arc.md =================================================================== --- config/arc/arc.md (revision 208241) +++ config/arc/arc.md (working copy) @@ -124,6 +124,7 @@ (define_constants (VUNSPEC_SR 26) ; blockage insn for writing to an auxiliary register (VUNSPEC_TRAP_S 27) ; blockage insn for trap_s generation (VUNSPEC_UNIMP_S 28) ; blockage insn for unimp_s generation + (VUNSPEC_BLOCKAGE 29); blockage insn (R0_REG 0) (R1_REG 1) @@ -3871,6 +3872,13 @@ (define_insn "call_value_prof" (set_attr "predicable" "yes,yes") (set_attr "length" "4,8")]) +(define_insn "blockage" + [(unspec_volatile [(const_int 0)] VUNSPEC_BLOCKAGE)] + "" + "" + [(set_attr "length" "0") + (set_attr "type" "misc")]) + (define_insn "nop" [(const_int 0)] "" Index: config/mcore/mcore.md =================================================================== --- config/mcore/mcore.md (revision 208241) +++ config/mcore/mcore.md (working copy) @@ -22,6 +22,14 @@ +(define_c_enum "unspecv" [ + UNSPECV_BLOCKAGE + UNSPECV_CONSTANT + UNSPECV_ALIGN + UNSPECV_TABLE + ] +) + ;; ------------------------------------------------------------------------- ;; Attributes ;; ------------------------------------------------------------------------- @@ -1583,6 +1591,13 @@ (define_insn "call_value_struct" ;; Misc insns ;; ------------------------------------------------------------------------ +(define_insn "blockage" + [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)] + "" + "" + [(set_attr "length" "0")] +) + (define_insn "nop" [(const_int 0)] "" @@ -2924,7 +2939,7 @@ (define_peephole ;;; 4 byte integer in line. (define_insn "consttable_4" - [(unspec_volatile [(match_operand:SI 0 "general_operand" "=g")] 0)] + [(unspec_volatile [(match_operand:SI 0 "general_operand" "=g")] UNSPECV_CONSTANT)] "" "* { @@ -2936,14 +2951,14 @@ (define_insn "consttable_4" ;;; align to a four byte boundary. (define_insn "align_4" - [(unspec_volatile [(const_int 0)] 1)] + [(unspec_volatile [(const_int 0)] UNSPECV_ALIGN)] "" ".align 2") ;;; Handle extra constant pool entries created during final pass. (define_insn "consttable_end" - [(unspec_volatile [(const_int 0)] 2)] + [(unspec_volatile [(const_int 0)] UNSPECV_TABLE)] "" "* return mcore_output_jump_label_table ();") Index: config/score/score.md =================================================================== --- config/score/score.md (revision 208241) +++ config/score/score.md (working copy) @@ -68,7 +68,9 @@ (define_constants (LCW 8) (LCE 9) - (SFFS 10)]) + (SFFS 10) + + (BLOCKAGE 11)]) (define_attr "type" "unknown,branch,jump,call,load,store,cmp,arith,move,const,nop,mul,div,cndmv,fce,tce,fsr,tsr,fcr,tcr" @@ -1595,6 +1597,13 @@ (define_insn "return_internal_score7" "(TARGET_SCORE7 || TARGET_SCORE7D)" "br%S0\t%0") +(define_insn "blockage" + [(unspec_volatile [(const_int 0)] BLOCKAGE)] + "" + "" + [(set_attr "length" "0")] +) + (define_insn "nop" [(const_int 0)] "" Index: config/msp430/msp430.md =================================================================== --- config/msp430/msp430.md (revision 208241) +++ config/msp430/msp430.md (working copy) @@ -38,7 +38,7 @@ (define_c_enum "unspec" UNS_GROW_AND_SWAP UNS_SWAP_AND_SHRINK - + UNS_DINT UNS_EINT UNS_PUSH_INTR @@ -47,6 +47,8 @@ (define_c_enum "unspec" UNS_BIS_SR UNS_REFSYM_NEED_EXIT + + UNS_BLOCKAGE ]) (include "predicates.md") @@ -1254,6 +1256,12 @@ (define_insn "*bitbranch<mode>4_z" ;;------------------------------------------------------------ ;; Misc +(define_insn "blockage" + [(unspec_volatile [(const_int 0)] UNS_BLOCKAGE)] + "" + "" +) + (define_insn "nop" [(const_int 0)] "1" Index: config/rl78/rl78.md =================================================================== --- config/rl78/rl78.md (revision 208241) +++ config/rl78/rl78.md (working copy) @@ -51,8 +51,15 @@ (define_constants (UNS_TRAMPOLINE_UNINIT 21) (UNS_NONLOCAL_GOTO 22) + (UNS_BLOCKAGE 23) ]) +(define_insn "blockage" + [(unspec_volatile [(const_int 0)] UNS_BLOCKAGE)] + "" + "" +) + (define_insn "nop" [(const_int 0)] "" Index: config/h8300/h8300.md =================================================================== --- config/h8300/h8300.md (revision 208241) +++ config/h8300/h8300.md (working copy) @@ -48,7 +48,8 @@ (define_constants [(UNSPEC_INCDEC 0) - (UNSPEC_MONITOR 1)]) + (UNSPEC_MONITOR 1) + (UNSPEC_BLOCKAGE 2)]) (define_constants [(UNSPEC_MOVMD 100) @@ -2533,6 +2534,14 @@ (define_insn "call_value" (const_int 2) (const_int 4)))]) +(define_insn "blockage" + [(unspec_volatile [(const_int 0)] UNSPEC_BLOCKAGE)] + "" + "" + [(set_attr "cc" "none") + (set_attr "length" "0")] +) + (define_insn "nop" [(const_int 0)] "" Index: config/v850/v850.md =================================================================== --- config/v850/v850.md (revision 208241) +++ config/v850/v850.md (working copy) @@ -44,6 +44,7 @@ (define_constants (CC_REGNUM 32) ; Condition code pseudo register (FCC_REGNUM 33) ; Floating Condition code pseudo register (UNSPEC_LOOP 200) ; loop counter + (UNSPECV_BLOCKAGE 201) ; blockage ] ) @@ -1824,6 +1825,13 @@ (define_insn "call_value_internal_long" (set_attr "cc" "clobber,clobber")] ) +(define_insn "blockage" + [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)] + "" + "" + [(set_attr "length" "0") + (set_attr "cc" "none")]) + (define_insn "nop" [(const_int 0)] "" Index: config/mmix/mmix.md =================================================================== --- config/mmix/mmix.md (revision 208241) +++ config/mmix/mmix.md (working copy) @@ -24,11 +24,13 @@ ;; See file "rtl.def" for documentation on define_insn, match_*, et al. ;; Uses of UNSPEC in this file: -;; UNSPEC_VOLATILE: -;; -;; 0 sync_icache (sync icache before trampoline jump) -;; 1 nonlocal_goto_receiver -;; + +(define_c_enum "unspecv" [ + UNSPECV_BLOCKAGE ;; blockage + UNSPECV_SYNC ;; sync_icache (sync icache before trampoline jump + UNSPECV_NONLOCAL ;; nonlocal_goto_receiver + ] +) ;; The order of insns is as in Node: Standard Names, with smaller modes ;; before bigger modes. @@ -1088,6 +1090,12 @@ (define_expand "epilogue" "" "mmix_expand_epilogue ();") +(define_insn "blockage" + [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)] + "" + "" +) + (define_insn "nop" [(const_int 0)] "" @@ -1119,7 +1127,7 @@ (define_insn "tablejump" ;; of "pop 0,0" until rO equals the saved value. (If it goes lower, we ;; should die with a trap.) (define_expand "nonlocal_goto_receiver" - [(parallel [(unspec_volatile [(match_dup 1)] 1) + [(parallel [(unspec_volatile [(match_dup 1)] UNSPECV_NONLOCAL) (clobber (scratch:DI)) (clobber (reg:DI MMIX_rJ_REGNUM))]) (set (reg:DI MMIX_rJ_REGNUM) (match_dup 0))] @@ -1146,7 +1154,7 @@ (define_expand "nonlocal_goto_receiver" ;; address and re-use them after the register stack unwind, so it's best ;; to form the address ourselves. (define_insn "*nonlocal_goto_receiver_expanded" - [(unspec_volatile [(match_operand:DI 1 "frame_pointer_operand" "Yf")] 1) + [(unspec_volatile [(match_operand:DI 1 "frame_pointer_operand" "Yf")] UNSPECV_NONLOCAL) (clobber (match_scratch:DI 0 "=&r")) (clobber (reg:DI MMIX_rJ_REGNUM))] "" @@ -1233,7 +1241,7 @@ (define_insn "*nxor" (define_insn "sync_icache" [(unspec_volatile [(match_operand:DI 0 "memory_operand" "m") - (match_operand:DI 1 "const_int_operand" "I")] 0)] + (match_operand:DI 1 "const_int_operand" "I")] UNSPECV_SYNC)] "" "SYNCID %1,%0") Index: config/cr16/cr16.md =================================================================== --- config/cr16/cr16.md (revision 208241) +++ config/cr16/cr16.md (working copy) @@ -40,6 +40,12 @@ (define_constants ] ) +;; UNSPEC_VOLATILE usage +(define_constants + [(UNSPECV_BLOCKAGE 0) + ] +) + ;; Attributes (define_attr "length" "" (const_int 2)) @@ -1053,6 +1059,12 @@ (define_insn "cr16_call_value_insn_jump" [(set_attr "length" "2")] ) +(define_insn "blockage" + [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)] + "" + "" + [(set_attr "length" "0")] +) ;; Nop (define_insn "nop" Index: config/bfin/bfin.md =================================================================== --- config/bfin/bfin.md (revision 208241) +++ config/bfin/bfin.md (working copy) @@ -147,7 +147,8 @@ (define_constants (UNSPEC_VOLATILE_LOAD_FUNCDESC 3) (UNSPEC_VOLATILE_STORE_EH_HANDLER 4) (UNSPEC_VOLATILE_DUMMY 5) - (UNSPEC_VOLATILE_STALL 6)]) + (UNSPEC_VOLATILE_STALL 6) + (UNSPEC_VOLATILE_BLOCKAGE 7)]) (define_constants [(MACFLAG_NONE 0) @@ -2555,6 +2556,13 @@ (define_expand "cstoresi4" DONE; }) +(define_insn "blockage" + [(unspec_volatile [(const_int 0)] UNSPEC_VOLATILE_BLOCKAGE)] + "" + "" + [(set_attr "length" "0")] +) + (define_insn "nop" [(const_int 0)] ""