Message ID | 56C79116.2050503@redhat.com |
---|---|
State | New |
Headers | show |
On Fri, Feb 19, 2016 at 11:03 PM, Bernd Schmidt <bschmidt@redhat.com> wrote: > In this PR, we generate unnecessarily bad code for code that declares a > global register var. Since global regs get added to fixed_regs, IRA never > considers them as candidates. However, we do seem to have proper data flow > information for them. In the testcase, the global reg dies, some operations > are done on temporary results, and the final result stored back in the > global reg. We can achieve the desired code generation by reusing the global > reg for those temporaries. > > Bootstrapped and tested on x86_64-linux. Ok? An argument could be made not > to use this for gcc-6 since global register vars are both not very important > and not very well represented in the testsuite. Do calls properly clobber them even if they are not in the set of call-clobbered regs? Are asm()s properly using/clobbering them? I think you are allowed to use them in asm()s without adding constraints for them? I'd say this is sth for GCC 7. Thanks, Richard. > > Bernd
On 02/19/2016 03:03 PM, Bernd Schmidt wrote: > In this PR, we generate unnecessarily bad code for code that declares a > global register var. Since global regs get added to fixed_regs, IRA > never considers them as candidates. However, we do seem to have proper > data flow information for them. In the testcase, the global reg dies, > some operations are done on temporary results, and the final result > stored back in the global reg. We can achieve the desired code > generation by reusing the global reg for those temporaries. > > Bootstrapped and tested on x86_64-linux. Ok? An argument could be made > not to use this for gcc-6 since global register vars are both not very > important and not very well represented in the testsuite. > > > Bernd > > global-regalloc.diff > > > PR rtl-optimization/44281 > * hard-reg-set.h (struct target_hard_regs): New field > x_fixed_nonglobal_reg_set. > (fixed_nonglobal_reg_set): New macro. > * reginfo.c (init_reg_sets_1): Initialize it. > * ira.c (setup_alloc_regs): Use fixed_nonglobal_reg_set instead > of fixed_reg_set. > > PR rtl-optimization/44281 > * gcc.target/i386/pr44281.c: New test. It sounds like Richi wants to table this to gcc-7, which I can understand. So I'm not going to dig into the patch itself, just touch on the high level stuff. Global register variables are currently documented as "The register is reserved entirely for this use, and will not be allocated for any other purpose". That would need to be fixed. When David W. and I were working through the docs on this stuff in 2015, I don't think that either of us considered the possibility that the register could be used for temporaries between a use of the global register var (where it dies) and an assignment to the global register var (rebirth). As long as our dataflow is good, using the global register var for a temporary in that window seems like a useful thing to do. So I think the biggest concern would be any kind of implicit use. I'm thinking of function calls where the callee expects the global register var to have the right value, perhaps asms as well (someone might reference the global var in the asm, but not list it in the sets/uses/clobbers because it wasn't really necessary in the past). Thoughts? jeff
Hi, On Mon, 22 Feb 2016, Jeff Law wrote: > > never considers them as candidates. However, we do seem to have proper > > data flow information for them. IMO one of the points of global reg vars, and its long-standing documentation to that effect, is that we do not have proper data flow information ... > temporary in that window seems like a useful thing to do. So I think > the biggest concern would be any kind of implicit use. I'm thinking of > function calls where the callee expects the global register var to have > the right value, perhaps asms as well (someone might reference the > global var in the asm, but not list it in the sets/uses/clobbers because > it wasn't really necessary in the past). ... exactly because of this. Some jit-like code uses global reg vars to reserve registers for the generated code. It would have to add -ffixed-<name> compilation options instead of only the global vars after the patch. I think the advantages of having "good RA" with global reg vars are dubious enough to not warrant breaking backward compatibility. Ciao, Michael.
On 02/22/2016 03:37 PM, Richard Biener wrote: > Do calls properly clobber them even if they are not in the set of > call-clobbered regs? > Are asm()s properly using/clobbering them? I think you are allowed to use them > in asm()s without adding constraints for them? Calls do, asms currently don't AFAICT. Not sure whether it's allowed to use them, but I think it should be straightforward to adjust df-scan. Michael Matz wrote: > Some jit-like code uses global reg vars to > reserve registers for the generated code. It would have to add > -ffixed-<name> compilation options instead of only the global vars after > the patch. I think the advantages of having "good RA" with global reg > vars are dubious enough to not warrant breaking backward compatibility. I don't see the need for new options - if we have proper live info for calls and asms, what other reason could there be for incomplete liveness information? RB: > I'd say this is sth for GCC 7. Not really objecting to that. Would you ack it if I extended df-scan for the asm case? Bernd
On 02/26/2016 08:41 AM, Bernd Schmidt wrote: > On 02/22/2016 03:37 PM, Richard Biener wrote: > >> Do calls properly clobber them even if they are not in the set of >> call-clobbered regs? >> Are asm()s properly using/clobbering them? I think you are allowed to >> use them >> in asm()s without adding constraints for them? > > Calls do, asms currently don't AFAICT. Not sure whether it's allowed to > use them, but I think it should be straightforward to adjust df-scan. The other case that came to mind was signal handlers. What happens if we're using the global register as a scratch, we hit a memory reference that faults and inside the signal handler the original code expects to be able to use the global register the user set up? If that's a valid use scenario, then there's probably all kinds of DF stuff that we'd need to expose. jeff
On 02/27/2016 08:12 PM, Richard Biener wrote: > > > Am Freitag, 26. Februar 2016 schrieb Jeff Law : > > The other case that came to mind was signal handlers. What happens > if we're using the global register as a scratch, we hit a memory > reference that faults and inside the signal handler the original > code expects to be able to use the global register the user set up? > > If that's a valid use scenario, then there's probably all kinds of > DF stuff that we'd need to expose. > > I'd say that's a valid assumption. Though maybe we want to be able to > change semantics with a flag here. A flag seems like overkill, I don't think people are likely to enable that ever. So what's the consensus here, closed wontfix? Bernd
On Mon, Feb 29, 2016 at 2:38 PM, Bernd Schmidt <bschmidt@redhat.com> wrote: > On 02/27/2016 08:12 PM, Richard Biener wrote: >> >> >> >> Am Freitag, 26. Februar 2016 schrieb Jeff Law : >> >> The other case that came to mind was signal handlers. What happens >> if we're using the global register as a scratch, we hit a memory >> reference that faults and inside the signal handler the original >> code expects to be able to use the global register the user set up? >> >> If that's a valid use scenario, then there's probably all kinds of >> DF stuff that we'd need to expose. >> >> I'd say that's a valid assumption. Though maybe we want to be able to >> change semantics with a flag here. > > > A flag seems like overkill, I don't think people are likely to enable that > ever. > > So what's the consensus here, closed wontfix? I think so, and maybe update documentation to reflect the discussion. Richard. > > Bernd > >
On 02/29/2016 08:18 AM, Richard Biener wrote: > On Mon, Feb 29, 2016 at 2:38 PM, Bernd Schmidt <bschmidt@redhat.com> wrote: >> On 02/27/2016 08:12 PM, Richard Biener wrote: >>> >>> >>> >>> Am Freitag, 26. Februar 2016 schrieb Jeff Law : >>> >>> The other case that came to mind was signal handlers. What happens >>> if we're using the global register as a scratch, we hit a memory >>> reference that faults and inside the signal handler the original >>> code expects to be able to use the global register the user set up? >>> >>> If that's a valid use scenario, then there's probably all kinds of >>> DF stuff that we'd need to expose. >>> >>> I'd say that's a valid assumption. Though maybe we want to be able to >>> change semantics with a flag here. >> >> >> A flag seems like overkill, I don't think people are likely to enable that >> ever. >> >> So what's the consensus here, closed wontfix? > > I think so, and maybe update documentation to reflect the discussion. Agreed (closed/wontfix). I think the signal handler case essentially kills the ability to use global regs as scratches. We could create another style declaration for a global-like register which has different semantics, but I don't think that's wise. I'd either add a comment summarizing the discussion or a pointer back to the discussion in the archives. jeff
Hi, On Fri, 26 Feb 2016, Bernd Schmidt wrote: > Calls do, asms currently don't AFAICT. Not sure whether it's allowed to > use them, but I think it should be straightforward to adjust df-scan. > > > Some jit-like code uses global reg vars to reserve registers for the > > generated code. It would have to add -ffixed-<name> compilation > > options instead of only the global vars after the patch. I think the > > advantages of having "good RA" with global reg vars are dubious enough > > to not warrant breaking backward compatibility. > > I don't see the need for new options - if we have proper live info for > calls and asms, what other reason could there be for incomplete liveness > information? [now moot, since wontfix, but still:] Well, but we don't have proper live info, as you already said yourself above. What I mean to say is, the following is currently proper use of global reg vars: ---- register uint64_t ugh __asm__("rbx"); //r11, whatever void write_into_ugh (void) { asm volatile ("mov 1, %%rbx" :::); assert (ugh == 1); } ---- %rbx would have to be implicitly used/clobbered by the asm. In addition it would have to be used by all function entries and exits (so that a function body where the global reg var is merely visible but not used doesn't accidentally clobber that register) and of course by calls. AFAICS this would fix the code in push_flag_into_global_reg_var, because then indeed you'd have proper live info, including proper deaths. But for everything a little more complicated than this it'd basically be the same as putting the reg into fixed_regs[]. FWIW: signal handlers need no consideration (if they were allowed to inspect/alter global reg vars we would have lost and no improvement on fixed_regs[] could be done). They are explicitely documented to not be able to access global reg vars. (They already can't accidentally clobber the register in question because signal handlers don't do that) So, I think it can be made to work, but not something for GCC 6, and if the additional bit for all calls/asms/function-entry-exits really is worth it ... I just don't know. Ciao, Michael.
On 02/29/2016 06:07 PM, Michael Matz wrote: > %rbx would have to be implicitly used/clobbered by the asm. In addition > it would have to be used by all function entries and exits (so that a > function body where the global reg var is merely visible but not used > doesn't accidentally clobber that register) and of course by calls. Nearly all this exists as of today. From df-scan.c: static void df_get_call_refs (struct df_collection_rec *collection_rec, [...] else if (global_regs[i]) { /* Calls to const functions cannot access any global registers and calls to pure functions cannot set them. All other calls may reference any of the global registers, so they are recorded as used. */ if (!RTL_CONST_CALL_P (insn_info->insn)) and static void df_get_entry_block_def_set (bitmap entry_block_defs) { [...] for (i = 0; i < FIRST_PSEUDO_REGISTER; i++) { if (global_regs[i]) bitmap_set_bit (entry_block_defs, i); and /* Mark all global registers, and all registers used by the epilogue as being live at the end of the function since they may be referenced by our caller. */ for (i = 0; i < FIRST_PSEUDO_REGISTER; i++) if (global_regs[i] || EPILOGUE_USES (i)) bitmap_set_bit (exit_block_uses, i); > FWIW: signal handlers need no consideration (if they were allowed to > inspect/alter global reg vars we would have lost and no improvement on > fixed_regs[] could be done). They are explicitely documented to not be > able to access global reg vars. (They already can't accidentally clobber > the register in question because signal handlers don't do that) Oh, they can't modify the register in question because the OS would restore it? Hadn't thought of that. Ok so maybe reopen and apply my patch for gcc-7, with a tweak for asms. Bernd
Hi, On Mon, 29 Feb 2016, Bernd Schmidt wrote: > On 02/29/2016 06:07 PM, Michael Matz wrote: > > > %rbx would have to be implicitly used/clobbered by the asm. In addition > > it would have to be used by all function entries and exits (so that a > > function body where the global reg var is merely visible but not used > > doesn't accidentally clobber that register) and of course by calls. > > Nearly all this exists as of today. From df-scan.c: Okay, that looks good, I agree (modulo the asms). > > FWIW: signal handlers need no consideration (if they were allowed to > > inspect/alter global reg vars we would have lost and no improvement on > > fixed_regs[] could be done). They are explicitely documented to not be > > able to access global reg vars. (They already can't accidentally clobber > > the register in question because signal handlers don't do that) > > Oh, they can't modify the register in question because the OS would > restore it? Yep. > Ok so maybe reopen and apply my patch for gcc-7, with a tweak for asms. That seems workable. At least I can't imagine other implicit uses of such registers. Ciao, Michael.
Michael Matz writes: > > > FWIW: signal handlers need no consideration (if they were allowed to > > > inspect/alter global reg vars we would have lost and no improvement on > > > fixed_regs[] could be done). They are explicitely documented to not be > > > able to access global reg vars. (They already can't accidentally clobber > > > the register in question because signal handlers don't do that) > > > > Oh, they can't modify the register in question because the OS would > > restore it? > > Yep. Well, almost. While it is true that a signal handler cannot *accidentally* clobber the register state of the interrupted thread, it can in fact access and update any part of that state via the ucontext_t passed to it. Doing so is uncommon, but not unheard of and not even that difficult -- I've done it myself in several different runtime systems. The code in a signal handler cannot assume that global register variables are in sync with the interrupted thread, or that plain assignments to them are reflected back, but that's not GCC's fault, nor is it GCC's job to make that happen. /Mikael
Hi, On Mon, 29 Feb 2016, Mikael Pettersson wrote: > Well, almost. While it is true that a signal handler cannot > *accidentally* clobber the register state of the interrupted thread, it > can in fact access and update any part of that state via the ucontext_t > passed to it. Doing so is uncommon, but not unheard of and not even > that difficult -- I've done it myself in several different runtime > systems. Yeah, well, sure. That's not clobbering the registers directly though, but setting it up so that the kernel does it on return :) If you do that, you have to have a special sig-handler anyway, lest it clobbers other registers that are currently in use by the interrupted piece of code. > The code in a signal handler cannot assume that global register > variables are in sync with the interrupted thread, or that plain > assignments to them are reflected back, but that's not GCC's fault, nor > is it GCC's job to make that happen. And it's documented to not happen (reliably anyway), so all is fine. Ciai, Michael.
On 02/19/2016 03:03 PM, Bernd Schmidt wrote: > In this PR, we generate unnecessarily bad code for code that declares a > global register var. Since global regs get added to fixed_regs, IRA > never considers them as candidates. However, we do seem to have proper > data flow information for them. In the testcase, the global reg dies, > some operations are done on temporary results, and the final result > stored back in the global reg. We can achieve the desired code > generation by reusing the global reg for those temporaries. > > Bootstrapped and tested on x86_64-linux. Ok? An argument could be made > not to use this for gcc-6 since global register vars are both not very > important and not very well represented in the testsuite. > > > Bernd > > global-regalloc.diff > > > PR rtl-optimization/44281 > * hard-reg-set.h (struct target_hard_regs): New field > x_fixed_nonglobal_reg_set. > (fixed_nonglobal_reg_set): New macro. > * reginfo.c (init_reg_sets_1): Initialize it. > * ira.c (setup_alloc_regs): Use fixed_nonglobal_reg_set instead > of fixed_reg_set. > > PR rtl-optimization/44281 > * gcc.target/i386/pr44281.c: New test. So my recollection of where we left things was that we needed to dataflow information fixed for implicit uses/sets in asms. With that infrastructure in place I think we'll be ready to resolve this old issue. Right? jeff
On Fri, Feb 19, 2016 at 11:03:02PM +0100, Bernd Schmidt wrote: > In this PR, we generate unnecessarily bad code for code that > declares a global register var. Since global regs get added to > fixed_regs, IRA never considers them as candidates. However, we do > seem to have proper data flow information for them. In the testcase, > the global reg dies, some operations are done on temporary results, > and the final result stored back in the global reg. We can achieve > the desired code generation by reusing the global reg for those > temporaries. > > Bootstrapped and tested on x86_64-linux. Ok? An argument could be > made not to use this for gcc-6 since global register vars are both > not very important and not very well represented in the testsuite. > PR rtl-optimization/44281 > * hard-reg-set.h (struct target_hard_regs): New field > x_fixed_nonglobal_reg_set. > (fixed_nonglobal_reg_set): New macro. > * reginfo.c (init_reg_sets_1): Initialize it. > * ira.c (setup_alloc_regs): Use fixed_nonglobal_reg_set instead > of fixed_reg_set. > > PR rtl-optimization/44281 > * gcc.target/i386/pr44281.c: New test. Unfortunately this patch (or whatever got actually committed) has broken the gcc.target/s390/pr679443.c test case, which is a bit fishy (see code snippet below). I assign most registers to global variables and then use some complicated arithmetics with the goal that the pointer stored in the first argument gets saved on the stack and reloaded to a different register. Before this patch the test case just needed three registers to do its work (r2, r3, r4). With the patch it currently causes an error in the reload pass error: unable to find a register to spill -- snip -- /* Block all registers except the first three argument registers. */ register long r0 asm ("r0"); register long r1 asm ("r1"); register long r5 asm ("r5"); register long r6 asm ("r6"); register long r7 asm ("r7"); register long r8 asm ("r8"); register long r9 asm ("r9"); register long r10 asm ("r10"); register long r11 asm ("r11"); ... void foo (struct s_t *ps, int c, int i) { /* Uses r2 as address register. */ ps->f1 = c; /* The calculation of the value is so expensive that it's cheaper to spill ps to the stack and reload it later (into a different register). ==> Uses r4 as address register.*/ ps->f2 = i + i % 3; /* If dead store elimination fails to detect that the address in r2 during the first assignment is an alias of the address in r4 during the second assignment, it eliminates the first assignment and the f1 field is not written (bug). */ } -- snip -- If a fourth register is available, the ICE goes away, but the pointer remains in r2, rendering the test case useless. So, what should be done about this? Ciao Dominik ^_^ ^_^
On 07/13/2016 05:29 PM, Dominik Vogt wrote: > Unfortunately this patch (or whatever got actually committed) has > broken the gcc.target/s390/pr679443.c test case, which is a bit > fishy (see code snippet below). I assign most registers to global > variables and then use some complicated arithmetics with the goal > that the pointer stored in the first argument gets saved on the > stack and reloaded to a different register. Before this patch the > test case just needed three registers to do its work (r2, r3, r4). > With the patch it currently causes an error in the reload pass > > error: unable to find a register to spill Might be useful to see the dump_reload output. > If a fourth register is available, the ICE goes away, but the > pointer remains in r2, rendering the test case useless. I don't think I quite understand what you're trying to do here, but it does look dodgy. Whatever this is testing would probably be better as a part of a unit test. Bernd
On Wed, Jul 13, 2016 at 07:43:13PM +0200, Bernd Schmidt wrote: > On 07/13/2016 05:29 PM, Dominik Vogt wrote: > > >Unfortunately this patch (or whatever got actually committed) has > >broken the gcc.target/s390/pr679443.c test case, which is a bit > >fishy (see code snippet below). I assign most registers to global > >variables and then use some complicated arithmetics with the goal > >that the pointer stored in the first argument gets saved on the > >stack and reloaded to a different register. Before this patch the > >test case just needed three registers to do its work (r2, r3, r4). > >With the patch it currently causes an error in the reload pass > > > > error: unable to find a register to spill > > Might be useful to see the dump_reload output. Attached. > >If a fourth register is available, the ICE goes away, but the > >pointer remains in r2, rendering the test case useless. > > I don't think I quite understand what you're trying to do here, Alias detection of the memory pointed to by the first register. There was some hard to trigger bug where writing a bitfield in a struct would also overwrite the unselected bits of the corresponding word. See here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67443 > but it does look dodgy. It is. The problem is that I can't tell Gcc to move the address to a different register directly, so I had to do some voodoo to achieve that. The test case seemed to be the most robust approach ... > Whatever this is testing would probably be > better as a part of a unit test.o I've spent a lot of time to port the test to Power but was unable to reproduce the effect. Ciao Dominik ^_^ ^_^
On Thu, Jul 14, 2016 at 10:24:38AM +0100, Dominik Vogt wrote: > On Wed, Jul 13, 2016 at 07:43:13PM +0200, Bernd Schmidt wrote: > > On 07/13/2016 05:29 PM, Dominik Vogt wrote: > > > > >Unfortunately this patch (or whatever got actually committed) has > > >broken the gcc.target/s390/pr679443.c test case, which is a bit > > >fishy (see code snippet below). I assign most registers to global > > >variables and then use some complicated arithmetics with the goal > > >that the pointer stored in the first argument gets saved on the > > >stack and reloaded to a different register. Before this patch the > > >test case just needed three registers to do its work (r2, r3, r4). > > >With the patch it currently causes an error in the reload pass > > > > > > error: unable to find a register to spill > > > > Might be useful to see the dump_reload output. > > Attached. > > > >If a fourth register is available, the ICE goes away, but the > > >pointer remains in r2, rendering the test case useless. > > > > I don't think I quite understand what you're trying to do here, > > Alias detection of the memory pointed to by the first register. > There was some hard to trigger bug where writing a bitfield in a > struct would also overwrite the unselected bits of the > corresponding word. See here: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67443 I've made a patch for the testcase: https://gcc.gnu.org/ml/gcc-patches/2016-07/msg01232.html That fixes the problem for s390/s390x, but I cannot tell wether the patch to global register variable allocation has a problem or not. If you need any more information just give me a shout. Otherwise I'll not track this issue any further. Ciao Dominik ^_^ ^_^
PR rtl-optimization/44281 * hard-reg-set.h (struct target_hard_regs): New field x_fixed_nonglobal_reg_set. (fixed_nonglobal_reg_set): New macro. * reginfo.c (init_reg_sets_1): Initialize it. * ira.c (setup_alloc_regs): Use fixed_nonglobal_reg_set instead of fixed_reg_set. PR rtl-optimization/44281 * gcc.target/i386/pr44281.c: New test. Index: gcc/hard-reg-set.h =================================================================== --- gcc/hard-reg-set.h (revision 233451) +++ gcc/hard-reg-set.h (working copy) @@ -660,6 +660,12 @@ struct target_hard_regs { across calls even if we are willing to save and restore them. */ HARD_REG_SET x_call_fixed_reg_set; + /* Contains registers that are fixed use -- i.e. in fixed_reg_set -- but + only if they are not merely part of that set because they are global + regs. Global regs that are not otherwise fixed can still take part + in register allocation. */ + HARD_REG_SET x_fixed_nonglobal_reg_set; + /* Contains 1 for registers that are set or clobbered by calls. */ /* ??? Ideally, this would be just call_used_regs plus global_regs, but for someone's bright idea to have call_used_regs strictly include @@ -722,6 +728,8 @@ extern struct target_hard_regs *this_tar (this_target_hard_regs->x_fixed_regs) #define fixed_reg_set \ (this_target_hard_regs->x_fixed_reg_set) +#define fixed_nonglobal_reg_set \ + (this_target_hard_regs->x_fixed_nonglobal_reg_set) #define call_used_regs \ (this_target_hard_regs->x_call_used_regs) #define call_really_used_regs \ Index: gcc/ira.c =================================================================== --- gcc/ira.c (revision 233451) +++ gcc/ira.c (working copy) @@ -512,7 +512,7 @@ setup_alloc_regs (bool use_hard_frame_p) #ifdef ADJUST_REG_ALLOC_ORDER ADJUST_REG_ALLOC_ORDER; #endif - COPY_HARD_REG_SET (no_unit_alloc_regs, fixed_reg_set); + COPY_HARD_REG_SET (no_unit_alloc_regs, fixed_nonglobal_reg_set); if (! use_hard_frame_p) SET_HARD_REG_BIT (no_unit_alloc_regs, HARD_FRAME_POINTER_REGNUM); setup_class_hard_regs (); Index: gcc/reginfo.c =================================================================== --- gcc/reginfo.c (revision 233451) +++ gcc/reginfo.c (working copy) @@ -449,6 +449,7 @@ init_reg_sets_1 (void) } COPY_HARD_REG_SET (call_fixed_reg_set, fixed_reg_set); + COPY_HARD_REG_SET (fixed_nonglobal_reg_set, fixed_reg_set); /* Preserve global registers if called more than once. */ for (i = 0; i < FIRST_PSEUDO_REGISTER; i++) Index: gcc/testsuite/gcc.target/i386/pr44281.c =================================================================== --- gcc/testsuite/gcc.target/i386/pr44281.c (revision 0) +++ gcc/testsuite/gcc.target/i386/pr44281.c (working copy) @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-options "-std=gnu99 -O2" } */ +/* { dg-final { scan-assembler "salq\[ \\t\]+\\\$8, %rbx" } } */ + +#include <stdint.h> + +register uint64_t global_flag_stack __asm__("rbx"); + +void push_flag_into_global_reg_var(uint64_t a, uint64_t b) { + uint64_t flag = (a==b); + global_flag_stack <<= 8; + global_flag_stack |= flag; +}