Message ID | 20110924.020832.1683526855969392801.davem@davemloft.net |
---|---|
State | New |
Headers | show |
On Sat, 24 Sep 2011, David Miller wrote: > Hans, here is what I'm playing with right now against current > trunk. A spot-check review: > I looked at the use cases for making use of the scale factor in the > VIS %gsr register and it's used similar to how rounding modes are > modified in the FPU control register. It's more of a parameter actually, GSR.scale_factor is the bit-shift count for the pack insns and GSR.alignaddr_offset the byte-shift in the aligndata insns. > You have a function, or family of functions, that want to operate with > a certain scale factor. And at the top level the first thing you do > is set the %gsr as you want it to be set. Certainly an improvement, but... > So I've added a GSR register to the sparc backend and then added > __vis_write_gsr() and __vis_read_gsr() functions to facilitate the > use cases I've seen. I'd prefer it as a parameter to the builtins (expanding to two insns, letting gcc get rid of the redundant ones; let the initialization value be 0). I understand you're trying to keep some kind of compatibility there, but additional builtins would do the trick and fit nicely: the new builtins expanding to a set of GSR (GSR field) followed by the "old" insn but fixed as in this patch. Besides, the functions that use GSR still can't be const in this patch. I guess they never can, when you think of it, setting and/or using a register that can affect/be affected something elsewhere, when that something is known to gcc. Oh well. Another aspect would be to model the different GSR fields as different registers; they're used completely differently and just happen to be set with the same insn. That might help gcc getting rid of redundant settings. > This allowed me to describe to the compiler exactly what the alignaddr > instructions do, and thus the unspecs for them are now gone. > > The pack and faligndata intrinsics still need to be unspec, FWIW not "need"; IIUC at least faligndata *can* be a vec_select of a vec_concat of the two vectors, but in practice I don't think gcc can make use of it yet and all ports use unspec... While on faligndata, see vec_realign_load_<mode> (sadly undocumented at present); it'll enable the autovectorizer to... autovectorize some more. (Right, I'm working on [yet] another SIMD back-end, implemented as MIPS COP2 insns.) > and thus I > merely added GSR uses to those patterns which is enough to let the > compiler get the dataflow right. How about putting it inside the unspec vector? Those "use" thingies always gives me the creeps; outside of an insn (no, not here) they're sometimes lost and at least disconnected to the insn. I think practically there's no difference here. > This all seems sufficient for what things like Sun's medialib and your > RAPP project want to do. > > I'll look into your other suggestion in PR48974, namely making use of > fone VIS instructions. One more: please consider adding a if (TARGET_VIS) builtin_define ("__VIS__=something") so I as a user theoretically wouldn't *have* to autoconfiscate for the changes. ;) > + def_builtin_const ("__builtin_vis_fpack16", CODE_FOR_fpack16_vis, > + v4qi_ftype_v4hi); > + def_builtin_const ("__builtin_vis_fpack32", CODE_FOR_fpack32_vis, > + v8qi_ftype_v2si_v8qi); > + def_builtin_const ("__builtin_vis_fpackfix", CODE_FOR_fpackfix_vis, > + v2hi_ftype_v2si); > def_builtin_const ("__builtin_vis_fexpand", CODE_FOR_fexpand_vis, > v4hi_ftype_v4qi); No, they (and aligndata) can't be const as long as they're affected by something other than their parameters (GSR); pure yes but not const. See extend.texi. > + def_builtin_const ("__builtin_vis_alignaddr", CODE_FOR_alignaddrdi_vis, > + ptr_ftype_ptr_di); > + def_builtin_const ("__builtin_vis_alignaddrl", CODE_FOR_alignaddrldi_vis, > + ptr_ftype_ptr_di); Can't be neither pure nor const; affects something global (GSR). BTW, vector header files are overrated, at least when there's no compiler compatibility expected. They can even be in the way: there's an ARM NEON PR being stalled because of concern that the header could be used with another gcc version. Bah. ...ok I see visintrin.h is already in. Never mind then. :) brgds, H-P
From: Hans-Peter Nilsson <hp@bitrange.com> Date: Sat, 24 Sep 2011 17:15:06 -0400 (EDT) > It's more of a parameter actually, GSR.scale_factor is the > bit-shift count for the pack insns and GSR.alignaddr_offset the > byte-shift in the aligndata insns. I realize this. > I'd prefer it as a parameter to the builtins (expanding to two > insns, letting gcc get rid of the redundant ones; let the > initialization value be 0). I understand you're trying to keep > some kind of compatibility there, but additional builtins would > do the trick and fit nicely: the new builtins expanding to a set > of GSR (GSR field) followed by the "old" insn but fixed as in > this patch. Besides, the functions that use GSR still can't be > const in this patch. I guess they never can, when you think of > it, setting and/or using a register that can affect/be affected > something elsewhere, when that something is known to gcc. Oh well. I read this idea in your PR before I did this work and I disagree that this is a better approach, because then I have to assume that you care about all the other bits in the %gsr register. So on the first set I'd have to read it, mask it out, then set the scale bits. A needless waste of 20 to 30 cycles on UltraSPARC-III. If you just call "__vis_write_gsr()" at the beginning of your kernel, you can tell the compiler that you just want to set the scaling bits and you don't care about the others at all. > Another aspect would be to model the different GSR fields as > different registers; they're used completely differently and > just happen to be set with the same insn. That might help gcc > getting rid of redundant settings. Again, this doesn't allow the user to say "don't care" about the other fields like a plain "__vis_write_gsr(2<<3)" call does. You know what fields actually matter for your code. > FWIW not "need"; IIUC at least faligndata *can* be a vec_select > of a vec_concat of the two vectors, but in practice I don't > think gcc can make use of it yet and all ports use unspec... > > While on faligndata, see vec_realign_load_<mode> (sadly > undocumented at present); it'll enable the autovectorizer to... > autovectorize some more. (Right, I'm working on [yet] another > SIMD back-end, implemented as MIPS COP2 insns.) Thanks for these suggestions. > How about putting it inside the unspec vector? Those "use" > thingies always gives me the creeps; outside of an insn (no, not > here) they're sometimes lost and at least disconnected to the > insn. I think practically there's no difference here. The canonical thing to do is to put them outside of the unspec so that is what I have done. > One more: please consider adding a > if (TARGET_VIS) builtin_define ("__VIS__=something") so I as a > user theoretically wouldn't *have* to autoconfiscate for the > changes. ;) This is on my todo list as well, I'll try to emit some CPP define compatible with what Sun uses. But, thanks for reminding me. >> + def_builtin_const ("__builtin_vis_fpack16", CODE_FOR_fpack16_vis, >> + v4qi_ftype_v4hi); >> + def_builtin_const ("__builtin_vis_fpack32", CODE_FOR_fpack32_vis, >> + v8qi_ftype_v2si_v8qi); >> + def_builtin_const ("__builtin_vis_fpackfix", CODE_FOR_fpackfix_vis, >> + v2hi_ftype_v2si); >> def_builtin_const ("__builtin_vis_fexpand", CODE_FOR_fexpand_vis, >> v4hi_ftype_v4qi); > > No, they (and aligndata) can't be const as long as they're > affected by something other than their parameters (GSR); pure > yes but not const. See extend.texi. Good catch, I was thinking purely on the RTL level where we do show the compiler all of the "inputs" but at the tree level this is not visible. I'll fix that up for the next revision. >> + def_builtin_const ("__builtin_vis_alignaddr", CODE_FOR_alignaddrdi_vis, >> + ptr_ftype_ptr_di); >> + def_builtin_const ("__builtin_vis_alignaddrl", CODE_FOR_alignaddrldi_vis, >> + ptr_ftype_ptr_di); > > Can't be neither pure nor const; affects something global (GSR). Gotcha. I'd like to revisit this at some point in the future though, maybe we can legitimately at least mark these things pure.
On Sat, 24 Sep 2011, David Miller wrote: > From: Hans-Peter Nilsson <hp@bitrange.com> > Date: Sat, 24 Sep 2011 17:15:06 -0400 (EDT) > > I'd prefer it as a parameter to the builtins (expanding to two > > insns, letting gcc get rid of the redundant ones; let the > > initialization value be 0). I understand you're trying to keep > > some kind of compatibility there, but additional builtins would > > do the trick and fit nicely: the new builtins expanding to a set > > of GSR (GSR field) followed by the "old" insn but fixed as in > > this patch. Besides, the functions that use GSR still can't be > > const in this patch. I guess they never can, when you think of > > it, setting and/or using a register that can affect/be affected > > something elsewhere, when that something is known to gcc. Oh well. > > I read this idea in your PR before I did this work and I disagree that > this is a better approach, because then I have to assume that you care > about all the other bits in the %gsr register. I don't understand what you mean here. Maybe it doesn't matter... My suggestions come from observing what gcc did to the "faked gsr modelling" I had to use with the current releases (what moving and eliminating redundant variable settings used in asms that it did; turned out acceptable FWIW, no redundant reads), which would map directly to my suggestion. But I guess you have a point in that your setting-gsr-then-using-builtins maps better to the machine insns. BTW, don't forget to clobber GSR at call insns! > So on the first set I'd have to read it, mask it out, then set the > scale bits. A needless waste of 20 to 30 cycles on UltraSPARC-III. No, it doesn't have to be read. If the fields have (useful) implicit initial values (like scale=7 and align=4) at the beginning of any function, you wouldn't have to read and mask, just set. (Caveat: the port has to have a way to emit a gsr-setting even if the supposed-initial-values are specified - like another register or variable, or the initial-value machinery as I suggested.) > If you just call "__vis_write_gsr()" at the beginning of your kernel, > you can tell the compiler that you just want to set the scaling bits > and you don't care about the others at all. Don't care how? They're certainly set by both __vis_write_gsr() and alignaddr and used by faligndata. I guess my confusion is that I don't see what aspect is "don't care" here that'd be "care" with my suggestion. > > Another aspect would be to model the different GSR fields as > > different registers; they're used completely differently and > > just happen to be set with the same insn. That might help gcc > > getting rid of redundant settings. > > Again, this doesn't allow the user to say "don't care" about the other > fields like a plain "__vis_write_gsr(2<<3)" call does. But that'd set GSR.alignaddr_offset to 0 rather than "don't care". > You know what fields actually matter for your code. A good reason to model them as different registers! Still, this is a good start and much more workable (and schedulable) than what's already there, thank you for that. It doesn't add hurdles for a revisit, if the mechanism is found unusable or the generated code pessimal! brgds, H-P
From: Hans-Peter Nilsson <hp@bitrange.com> Date: Sat, 24 Sep 2011 18:37:33 -0400 (EDT) > BTW, don't forget to clobber GSR at call insns! This I explicitly want to avoid and is an explicit design decision. Like I said the model is like setting the floating point rounding mode for a family of functions. You set the floating point rounding mode at the top level, run your kernel and all the helper functions in that mode. The %gsr scaling factor is to be used similarly. You have to control all the functions that get called once you set the %gsr before a calculation, and they either have to explicitly save and restore the %gsr around changes to %gsr, or have been designed to use the %gsr setting made by the callee. The last thing I want to do is have to teach reload how to handle this thing, it simply makes no sense to put that much engineering into it if it is for zero or very little gain. And it would explicitly prevent the kind of model I see as the most reasonable for using this register, in that if we clobber it during a call there is no way for the user to say not to save and restore %gsr over a call. >> So on the first set I'd have to read it, mask it out, then set the >> scale bits. A needless waste of 20 to 30 cycles on UltraSPARC-III. > > No, it doesn't have to be read. If the fields have (useful) > implicit initial values (like scale=7 and align=4) at the > beginning of any function, you wouldn't have to read and mask, > just set. You can't just set. What about the VIS-2.0 byte-mask at the top 32-bits of the register, are you just going to clobber that when you change the scale factor? If we support treating the different fields as different registers we have to preserve the setting of the other fields of %gsr when we change one of them. There are 5 fields currently defined: 1) align address <2:0> 2) scale factor <7:3> 3) interval rounding mode (VIS 2.0) <26:25> 4) interval mode enable <27> 5) Byte mask (VIS 2.0) <63:32> And also this idea of using get_hard_reg_initial_val() to "optimize" this kind of usage especially forces us to clobber the %gsr over function calls which, as stated, I want to avoid if at all possible. >> Again, this doesn't allow the user to say "don't care" about the other >> fields like a plain "__vis_write_gsr(2<<3)" call does. > > But that'd set GSR.alignaddr_offset to 0 rather than "don't > care". Zero is equivalent to "don't care" in this situation if either 1) you aren't doing any falignaddr operations or 2) you are then going to subsequently do an "alignaddr" to set that field up. Look at the medialib code, that's basically the usage model there and I think it's quite reasonable. > It doesn't add hurdles for a revisit, if the mechanism is found > unusable or the generated code pessimal! Absolutely, thanks for your review.
On Sat, 24 Sep 2011, David Miller wrote: > From: Hans-Peter Nilsson <hp@bitrange.com> > Date: Sat, 24 Sep 2011 18:37:33 -0400 (EDT) > > > BTW, don't forget to clobber GSR at call insns! > > This I explicitly want to avoid and is an explicit design decision. Aha, now I get it; that's certainly key. Thanks for taking the time. Yes, it's certainly more flexible to have the user set GSR than allowing gcc to clobber it when seeing VIS intrinsics, at the minor usability cost of the user having to keep track of GSR separately to when used in the individual intrinsics. > Like I said the model is like setting the floating point rounding mode > for a family of functions. Aha 2: I didn't interpret what you wrote as referring to the model; I thought you meant the actual function (one of the usages of the fpack insns being "fixed math"). Sure. > Zero is equivalent to "don't care" in this situation if either > 1) you aren't doing any falignaddr operations or 2) you are (JFTR, "faligndata") > then going to subsequently do an "alignaddr" to set that field > up. brgds, H-P PS. gcc-4.7/changes.html?
From: Hans-Peter Nilsson <hp@bitrange.com> Date: Sat, 24 Sep 2011 19:32:55 -0400 (EDT) > PS. gcc-4.7/changes.html? Also on my TODO list, and Eric made some noise about documenting these improvements as well, thanks for the reminder. I'll post and commit the current version of my %gsr changes after my bootstrap/testsuite run finishes.
From: David Miller <davem@davemloft.net> Date: Sat, 24 Sep 2011 20:05:19 -0400 (EDT) > From: Hans-Peter Nilsson <hp@bitrange.com> > Date: Sat, 24 Sep 2011 19:32:55 -0400 (EDT) > >> PS. gcc-4.7/changes.html? > > Also on my TODO list, and Eric made some noise about documenting these > improvements as well, thanks for the reminder. I just commited an update to the wwwdocs.
diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c index d62d5a1..f38ecda 100644 --- a/gcc/config/sparc/sparc.c +++ b/gcc/config/sparc/sparc.c @@ -329,7 +329,7 @@ char leaf_reg_remap[] = 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, - 96, 97, 98, 99, 100}; + 96, 97, 98, 99, 100, 101, 102}; /* Vector, indexed by hard register number, which contains 1 for a register that is allowable in a candidate for leaf @@ -347,7 +347,7 @@ char sparc_leaf_regs[] = 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, - 1, 1, 1, 1, 1}; + 1, 1, 1, 1, 1, 1, 1}; struct GTY(()) machine_function { @@ -4036,8 +4036,8 @@ static const int hard_32bit_mode_classes[] = { /* %fcc[0123] */ CCFP_MODES, CCFP_MODES, CCFP_MODES, CCFP_MODES, - /* %icc */ - CC_MODES + /* %icc, %sfp, %gsr */ + CC_MODES, 0, S_MODES }; static const int hard_64bit_mode_classes[] = { @@ -4061,8 +4061,8 @@ static const int hard_64bit_mode_classes[] = { /* %fcc[0123] */ CCFP_MODES, CCFP_MODES, CCFP_MODES, CCFP_MODES, - /* %icc */ - CC_MODES + /* %icc, %sfp, %gsr */ + CC_MODES, 0, S_MODES }; int sparc_mode_class [NUM_MACHINE_MODES]; @@ -9168,14 +9168,18 @@ sparc_vis_init_builtins (void) v4hi, v4hi, 0); tree si_ftype_v2si_v2si = build_function_type_list (intSI_type_node, v2si, v2si, 0); + tree void_ftype_si = build_function_type_list (void_type_node, + intSI_type_node, 0); + tree si_ftype_void = build_function_type_list (intSI_type_node, + void_type_node, 0); /* Packing and expanding vectors. */ - def_builtin ("__builtin_vis_fpack16", CODE_FOR_fpack16_vis, - v4qi_ftype_v4hi); - def_builtin ("__builtin_vis_fpack32", CODE_FOR_fpack32_vis, - v8qi_ftype_v2si_v8qi); - def_builtin ("__builtin_vis_fpackfix", CODE_FOR_fpackfix_vis, - v2hi_ftype_v2si); + def_builtin_const ("__builtin_vis_fpack16", CODE_FOR_fpack16_vis, + v4qi_ftype_v4hi); + def_builtin_const ("__builtin_vis_fpack32", CODE_FOR_fpack32_vis, + v8qi_ftype_v2si_v8qi); + def_builtin_const ("__builtin_vis_fpackfix", CODE_FOR_fpackfix_vis, + v2hi_ftype_v2si); def_builtin_const ("__builtin_vis_fexpand", CODE_FOR_fexpand_vis, v4hi_ftype_v4qi); def_builtin_const ("__builtin_vis_fpmerge", CODE_FOR_fpmerge_vis, @@ -9198,27 +9202,33 @@ sparc_vis_init_builtins (void) v2si_ftype_v4qi_v2hi); /* Data aligning. */ - def_builtin ("__builtin_vis_faligndatav4hi", CODE_FOR_faligndatav4hi_vis, - v4hi_ftype_v4hi_v4hi); - def_builtin ("__builtin_vis_faligndatav8qi", CODE_FOR_faligndatav8qi_vis, - v8qi_ftype_v8qi_v8qi); - def_builtin ("__builtin_vis_faligndatav2si", CODE_FOR_faligndatav2si_vis, - v2si_ftype_v2si_v2si); - def_builtin ("__builtin_vis_faligndatadi", CODE_FOR_faligndatadi_vis, - di_ftype_di_di); + def_builtin_const ("__builtin_vis_faligndatav4hi", CODE_FOR_faligndatav4hi_vis, + v4hi_ftype_v4hi_v4hi); + def_builtin_const ("__builtin_vis_faligndatav8qi", CODE_FOR_faligndatav8qi_vis, + v8qi_ftype_v8qi_v8qi); + def_builtin_const ("__builtin_vis_faligndatav2si", CODE_FOR_faligndatav2si_vis, + v2si_ftype_v2si_v2si); + def_builtin_const ("__builtin_vis_faligndatadi", CODE_FOR_faligndatadi_vis, + di_ftype_di_di); + + def_builtin ("__builtin_vis_write_gsr", CODE_FOR_wrgsr_vis, + void_ftype_si); + def_builtin ("__builtin_vis_read_gsr", CODE_FOR_rdgsr_vis, + si_ftype_void); + if (TARGET_ARCH64) { - def_builtin ("__builtin_vis_alignaddr", CODE_FOR_alignaddrdi_vis, - ptr_ftype_ptr_di); - def_builtin ("__builtin_vis_alignaddrl", CODE_FOR_alignaddrldi_vis, - ptr_ftype_ptr_di); + def_builtin_const ("__builtin_vis_alignaddr", CODE_FOR_alignaddrdi_vis, + ptr_ftype_ptr_di); + def_builtin_const ("__builtin_vis_alignaddrl", CODE_FOR_alignaddrldi_vis, + ptr_ftype_ptr_di); } else { - def_builtin ("__builtin_vis_alignaddr", CODE_FOR_alignaddrsi_vis, - ptr_ftype_ptr_si); - def_builtin ("__builtin_vis_alignaddrl", CODE_FOR_alignaddrlsi_vis, - ptr_ftype_ptr_si); + def_builtin_const ("__builtin_vis_alignaddr", CODE_FOR_alignaddrsi_vis, + ptr_ftype_ptr_si); + def_builtin_const ("__builtin_vis_alignaddrl", CODE_FOR_alignaddrlsi_vis, + ptr_ftype_ptr_si); } /* Pixel distance. */ @@ -9289,32 +9299,47 @@ sparc_expand_builtin (tree exp, rtx target, tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0); unsigned int icode = DECL_FUNCTION_CODE (fndecl); rtx pat, op[4]; - enum machine_mode mode[4]; int arg_count = 0; + bool nonvoid; - mode[0] = insn_data[icode].operand[0].mode; - if (!target - || GET_MODE (target) != mode[0] - || ! (*insn_data[icode].operand[0].predicate) (target, mode[0])) - op[0] = gen_reg_rtx (mode[0]); - else - op[0] = target; + nonvoid = TREE_TYPE (TREE_TYPE (fndecl)) != void_type_node; + if (nonvoid) + { + enum machine_mode tmode = insn_data[icode].operand[0].mode; + if (!target + || GET_MODE (target) != tmode + || ! (*insn_data[icode].operand[0].predicate) (target, tmode)) + op[0] = gen_reg_rtx (tmode); + else + op[0] = target; + } FOR_EACH_CALL_EXPR_ARG (arg, iter, exp) { + const struct insn_operand_data *insn_op; + + if (arg == error_mark_node) + return NULL_RTX; + arg_count++; - mode[arg_count] = insn_data[icode].operand[arg_count].mode; + insn_op = &insn_data[icode].operand[arg_count - !nonvoid]; op[arg_count] = expand_normal (arg); if (! (*insn_data[icode].operand[arg_count].predicate) (op[arg_count], - mode[arg_count])) - op[arg_count] = copy_to_mode_reg (mode[arg_count], op[arg_count]); + insn_op->mode)) + op[arg_count] = copy_to_mode_reg (insn_op->mode, op[arg_count]); } switch (arg_count) { + case 0: + pat = GEN_FCN (icode) (op[0]); + break; case 1: - pat = GEN_FCN (icode) (op[0], op[1]); + if (nonvoid) + pat = GEN_FCN (icode) (op[0], op[1]); + else + pat = GEN_FCN (icode) (op[1]); break; case 2: pat = GEN_FCN (icode) (op[0], op[1], op[2]); @@ -9331,7 +9356,10 @@ sparc_expand_builtin (tree exp, rtx target, emit_insn (pat); - return op[0]; + if (nonvoid) + return op[0]; + else + return const0_rtx; } static int @@ -9416,7 +9444,8 @@ sparc_fold_builtin (tree fndecl, int n_args ATTRIBUTE_UNUSED, if (ignore && icode != CODE_FOR_alignaddrsi_vis - && icode != CODE_FOR_alignaddrdi_vis) + && icode != CODE_FOR_alignaddrdi_vis + && icode != CODE_FOR_wrgsr_vis) return build_zero_cst (rtype); switch (icode) diff --git a/gcc/config/sparc/sparc.h b/gcc/config/sparc/sparc.h index afdca1e..77eff2e 100644 --- a/gcc/config/sparc/sparc.h +++ b/gcc/config/sparc/sparc.h @@ -691,7 +691,7 @@ extern enum cmodel sparc_cmodel; Register 100 is used as the integer condition code register. Register 101 is used as the soft frame pointer register. */ -#define FIRST_PSEUDO_REGISTER 102 +#define FIRST_PSEUDO_REGISTER 103 #define SPARC_FIRST_FP_REG 32 /* Additional V9 fp regs. */ @@ -704,6 +704,7 @@ extern enum cmodel sparc_cmodel; #define SPARC_FCC_REG 96 /* Integer CC reg. We don't distinguish %icc from %xcc. */ #define SPARC_ICC_REG 100 +#define SPARC_GSR_REG 102 /* Nonzero if REGNO is an fp reg. */ #define SPARC_FP_REG_P(REGNO) \ @@ -757,7 +758,7 @@ extern enum cmodel sparc_cmodel; 0, 0, 0, 0, 0, 0, 0, 0, \ 0, 0, 0, 0, 0, 0, 0, 0, \ \ - 0, 0, 0, 0, 0, 1} + 0, 0, 0, 0, 0, 1, 1} /* 1 for registers not available across function calls. These must include the FIXED_REGISTERS and also any @@ -782,7 +783,7 @@ extern enum cmodel sparc_cmodel; 1, 1, 1, 1, 1, 1, 1, 1, \ 1, 1, 1, 1, 1, 1, 1, 1, \ \ - 1, 1, 1, 1, 1, 1} + 1, 1, 1, 1, 1, 1, 1} /* Return number of consecutive hard regs needed starting at reg REGNO to hold something of mode MODE. @@ -796,11 +797,12 @@ extern enum cmodel sparc_cmodel; included in the hard register count). */ #define HARD_REGNO_NREGS(REGNO, MODE) \ - (TARGET_ARCH64 \ - ? ((REGNO) < 32 || (REGNO) == FRAME_POINTER_REGNUM \ - ? (GET_MODE_SIZE (MODE) + UNITS_PER_WORD - 1) / UNITS_PER_WORD \ - : (GET_MODE_SIZE (MODE) + 3) / 4) \ - : ((GET_MODE_SIZE (MODE) + UNITS_PER_WORD - 1) / UNITS_PER_WORD)) + ((REGNO) == SPARC_GSR_REG ? 1 : \ + (TARGET_ARCH64 \ + ? ((REGNO) < 32 || (REGNO) == FRAME_POINTER_REGNUM \ + ? (GET_MODE_SIZE (MODE) + UNITS_PER_WORD - 1) / UNITS_PER_WORD \ + : (GET_MODE_SIZE (MODE) + 3) / 4) \ + : ((GET_MODE_SIZE (MODE) + UNITS_PER_WORD - 1) / UNITS_PER_WORD))) /* Due to the ARCH64 discrepancy above we must override this next macro too. */ @@ -985,7 +987,7 @@ enum reg_class { NO_REGS, FPCC_REGS, I64_REGS, GENERAL_REGS, FP_REGS, {0, -1, -1, 0}, /* EXTRA_FP_REGS */ \ {-1, -1, 0, 0x20}, /* GENERAL_OR_FP_REGS */ \ {-1, -1, -1, 0x20}, /* GENERAL_OR_EXTRA_FP_REGS */ \ - {-1, -1, -1, 0x3f}} /* ALL_REGS */ + {-1, -1, -1, 0x7f}} /* ALL_REGS */ /* The same information, inverted: Return the class number of the smallest class containing @@ -1046,7 +1048,7 @@ extern enum reg_class sparc_regno_reg_class[FIRST_PSEUDO_REGISTER]; 88, 89, 90, 91, 92, 93, 94, 95, /* %f56-%f63 */ \ 39, 38, 37, 36, 35, 34, 33, 32, /* %f7-%f0 */ \ 96, 97, 98, 99, /* %fcc0-3 */ \ - 100, 0, 14, 30, 101} /* %icc, %g0, %o6, %i6, %sfp */ + 100, 0, 14, 30, 101, 102 } /* %icc, %g0, %o6, %i6, %sfp, %gsr */ /* This is the order in which to allocate registers for leaf functions. If all registers can fit in the global and @@ -1085,7 +1087,7 @@ extern enum reg_class sparc_regno_reg_class[FIRST_PSEUDO_REGISTER]; 88, 89, 90, 91, 92, 93, 94, 95, /* %f56-%f63 */ \ 39, 38, 37, 36, 35, 34, 33, 32, /* %f7-%f0 */ \ 96, 97, 98, 99, /* %fcc0-3 */ \ - 100, 0, 14, 30, 31, 101} /* %icc, %g0, %o6, %i6, %i7, %sfp */ + 100, 0, 14, 30, 31, 101, 102 } /* %icc, %g0, %o6, %i6, %i7, %sfp, %gsr */ #define ADJUST_REG_ALLOC_ORDER order_regs_for_local_alloc () @@ -1724,7 +1726,7 @@ do { \ "%f40", "%f41", "%f42", "%f43", "%f44", "%f45", "%f46", "%f47", \ "%f48", "%f49", "%f50", "%f51", "%f52", "%f53", "%f54", "%f55", \ "%f56", "%f57", "%f58", "%f59", "%f60", "%f61", "%f62", "%f63", \ - "%fcc0", "%fcc1", "%fcc2", "%fcc3", "%icc", "%sfp" } + "%fcc0", "%fcc1", "%fcc2", "%fcc3", "%icc", "%sfp", "%gsr" } /* Define additional names for use in asm clobbers and asm declarations. */ diff --git a/gcc/config/sparc/sparc.md b/gcc/config/sparc/sparc.md index 588caf3..200846e 100644 --- a/gcc/config/sparc/sparc.md +++ b/gcc/config/sparc/sparc.md @@ -58,7 +58,7 @@ (UNSPEC_MUL8UL 46) (UNSPEC_MULDUL 47) (UNSPEC_ALIGNDATA 48) - (UNSPEC_ALIGNADDR 49) + (UNSPEC_PDIST 50) (UNSPEC_EDGE8 51) (UNSPEC_EDGE8L 52) @@ -66,7 +66,6 @@ (UNSPEC_EDGE16L 54) (UNSPEC_EDGE32 55) (UNSPEC_EDGE32L 56) - (UNSPEC_ALIGNADDRL 57) (UNSPEC_SP_SET 60) (UNSPEC_SP_TEST 61) @@ -176,6 +175,7 @@ (FCC3_REG 99) (CC_REG 100) (SFP_REG 101) + (GSR_REG 102) ]) (define_mode_iterator P [(SI "Pmode == SImode") (DI "Pmode == DImode")]) @@ -7752,7 +7752,8 @@ (define_insn "fpack16_vis" [(set (match_operand:V4QI 0 "register_operand" "=f") (unspec:V4QI [(match_operand:V4HI 1 "register_operand" "e")] - UNSPEC_FPACK16))] + UNSPEC_FPACK16)) + (use (reg:SI GSR_REG))] "TARGET_VIS" "fpack16\t%1, %0" [(set_attr "type" "fga") @@ -7761,7 +7762,8 @@ (define_insn "fpackfix_vis" [(set (match_operand:V2HI 0 "register_operand" "=f") (unspec:V2HI [(match_operand:V2SI 1 "register_operand" "e")] - UNSPEC_FPACKFIX))] + UNSPEC_FPACKFIX)) + (use (reg:SI GSR_REG))] "TARGET_VIS" "fpackfix\t%1, %0" [(set_attr "type" "fga") @@ -7771,7 +7773,8 @@ [(set (match_operand:V8QI 0 "register_operand" "=e") (unspec:V8QI [(match_operand:V2SI 1 "register_operand" "e") (match_operand:V8QI 2 "register_operand" "e")] - UNSPEC_FPACK32))] + UNSPEC_FPACK32)) + (use (reg:SI GSR_REG))] "TARGET_VIS" "fpack32\t%1, %2, %0" [(set_attr "type" "fga") @@ -7871,6 +7874,18 @@ [(set_attr "type" "fpmul") (set_attr "fptype" "double")]) +(define_insn "wrgsr_vis" + [(set (reg:SI GSR_REG) (match_operand:SI 0 "arith_operand" "rI"))] + "TARGET_VIS" + "wr\t%%g0, %0, %%gsr" + [(set_attr "type" "multi")]) + +(define_insn "rdgsr_vis" + [(set (match_operand:SI 0 "register_operand" "=r") (reg:SI GSR_REG))] + "TARGET_VIS" + "rd\t%%gsr, %0" + [(set_attr "type" "multi")]) + ;; Using faligndata only makes sense after an alignaddr since the choice of ;; bytes to take out of each operand is dependent on the results of the last ;; alignaddr. @@ -7878,25 +7893,57 @@ [(set (match_operand:V64I 0 "register_operand" "=e") (unspec:V64I [(match_operand:V64I 1 "register_operand" "e") (match_operand:V64I 2 "register_operand" "e")] - UNSPEC_ALIGNDATA))] + UNSPEC_ALIGNDATA)) + (use (reg:SI GSR_REG))] "TARGET_VIS" "faligndata\t%1, %2, %0" [(set_attr "type" "fga") (set_attr "fptype" "double")]) -(define_insn "alignaddr<P:mode>_vis" - [(set (match_operand:P 0 "register_operand" "=r") - (unspec:P [(match_operand:P 1 "register_or_zero_operand" "rJ") - (match_operand:P 2 "register_or_zero_operand" "rJ")] - UNSPEC_ALIGNADDR))] +(define_insn "alignaddrsi_vis" + [(set (match_operand:SI 0 "register_operand" "=r") + (plus:SI (match_operand:SI 1 "register_or_zero_operand" "rJ") + (match_operand:SI 2 "register_or_zero_operand" "rJ"))) + (set (reg:SI GSR_REG) + (ior:SI (and:SI (reg:SI GSR_REG) (const_int -8)) + (and:SI (plus:SI (match_dup 1) (match_dup 2)) + (const_int 7))))] "TARGET_VIS" "alignaddr\t%r1, %r2, %0") -(define_insn "alignaddrl<P:mode>_vis" - [(set (match_operand:P 0 "register_operand" "=r") - (unspec:P [(match_operand:P 1 "register_or_zero_operand" "rJ") - (match_operand:P 2 "register_or_zero_operand" "rJ")] - UNSPEC_ALIGNADDRL))] +(define_insn "alignaddrdi_vis" + [(set (match_operand:DI 0 "register_operand" "=r") + (plus:DI (match_operand:DI 1 "register_or_zero_operand" "rJ") + (match_operand:DI 2 "register_or_zero_operand" "rJ"))) + (set (reg:SI GSR_REG) + (ior:SI (and:SI (reg:SI GSR_REG) (const_int -8)) + (and:SI (truncate:SI (plus:DI (match_dup 1) (match_dup 2))) + (const_int 7))))] + "TARGET_VIS" + "alignaddr\t%r1, %r2, %0") + +(define_insn "alignaddrlsi_vis" + [(set (match_operand:SI 0 "register_operand" "=r") + (plus:SI (match_operand:SI 1 "register_or_zero_operand" "rJ") + (match_operand:SI 2 "register_or_zero_operand" "rJ"))) + (set (reg:SI GSR_REG) + (ior:SI (and:SI (reg:SI GSR_REG) (const_int -8)) + (xor:SI (and:SI (plus:SI (match_dup 1) (match_dup 2)) + (const_int 7)) + (const_int 7))))] + "TARGET_VIS" + "alignaddrl\t%r1, %r2, %0") + +(define_insn "alignaddrldi_vis" + [(set (match_operand:DI 0 "register_operand" "=r") + (plus:DI (match_operand:DI 1 "register_or_zero_operand" "rJ") + (match_operand:DI 2 "register_or_zero_operand" "rJ"))) + (set (reg:SI GSR_REG) + (ior:SI (and:SI (reg:SI GSR_REG) (const_int -8)) + (xor:SI (and:SI (truncate:SI (plus:DI (match_dup 1) + (match_dup 2))) + (const_int 7)) + (const_int 7))))] "TARGET_VIS" "alignaddrl\t%r1, %r2, %0") diff --git a/gcc/config/sparc/visintrin.h b/gcc/config/sparc/visintrin.h index 4c2fa18..37c1113 100644 --- a/gcc/config/sparc/visintrin.h +++ b/gcc/config/sparc/visintrin.h @@ -31,6 +31,20 @@ typedef unsigned char __v8qi __attribute__ ((__vector_size__ (8))); typedef unsigned char __v4qi __attribute__ ((__vector_size__ (4))); typedef int __i64 __attribute__ ((__mode__ (DI))); +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +__vis_write_gsr (int __A) +{ + __builtin_vis_write_gsr (__A); +} + +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +__vis_read_gsr (void) +{ + return __builtin_vis_read_gsr (); +} + extern __inline void * __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) __vis_alignaddr (void *__A, long __B)