Message ID | 20170725033320.17893-4-matthew.brown.dev@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
On Tue, 2017-07-25 at 13:33 +1000, Matt Brown wrote: > This adds emulation for the prtyw and prtyd instructions. > Tested for logical correctness against the prtyw and prtyd instructions > on ppc64le. > > Signed-off-by: Matt Brown <matthew.brown.dev@gmail.com> > --- > v3: > - optimised using the Giles-Miller method of side-ways addition > v2: > - fixed opcodes > - fixed bitshifting and typecast errors > - merged do_prtyw and do_prtyd into single function > --- > arch/powerpc/lib/sstep.c | 27 +++++++++++++++++++++++++++ > 1 file changed, 27 insertions(+) > > diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c > index 6a79618..0bcf631 100644 > --- a/arch/powerpc/lib/sstep.c > +++ b/arch/powerpc/lib/sstep.c > @@ -655,6 +655,25 @@ static nokprobe_inline void do_bpermd(struct pt_regs *regs, unsigned long v1, > regs->gpr[ra] = perm; > } > #endif /* __powerpc64__ */ > +/* > + * The size parameter adjusts the equivalent prty instruction. > + * prtyw = 32, prtyd = 64 > + */ > +static nokprobe_inline void do_prty(struct pt_regs *regs, unsigned long v, > + int size, int ra) > +{ > + unsigned long long res = v; > + > + res = (0x0001000100010001 & res) + (0x0001000100010001 & (res >> 8)); > + res = (0x0000000700000007 & res) + (0x0000000700000007 & (res >> 16)); > + if (size == 32) { /* prtyw */ > + regs->gpr[ra] = (0x0000000100000001 & res); > + return; > + } > + > + res = (0x000000000000000f & res) + (0x000000000000000f & (res >> 32)); > + regs->gpr[ra] = res & 1; /*prtyd */ Looks good, you can also xor instead of adding, but the masks would need to change in that case. Balbir Singh.
On Tue, Jul 25, 2017 at 01:33:19PM +1000, Matt Brown wrote: > +static nokprobe_inline void do_prty(struct pt_regs *regs, unsigned long v, > + int size, int ra) > +{ > + unsigned long long res = v; > + > + res = (0x0001000100010001 & res) + (0x0001000100010001 & (res >> 8)); > + res = (0x0000000700000007 & res) + (0x0000000700000007 & (res >> 16)); > + if (size == 32) { /* prtyw */ > + regs->gpr[ra] = (0x0000000100000001 & res); > + return; > + } > + > + res = (0x000000000000000f & res) + (0x000000000000000f & (res >> 32)); > + regs->gpr[ra] = res & 1; /*prtyd */ > +} Does 7's and 0xf look strange (since the top bit in the values there is always 0). You always "and" the values with 1 later, you could do that immediately (and change + to ^). A general question about these patches: some things are inside #ifdef __powerpc64__, some are not. It seems it is the wrong macro, and it should be used (or not used) consistently? Segher
On Tue, Jul 25, 2017 at 06:08:05PM +1000, Balbir Singh wrote: > On Tue, 2017-07-25 at 13:33 +1000, Matt Brown wrote: > > This adds emulation for the prtyw and prtyd instructions. > > Tested for logical correctness against the prtyw and prtyd instructions > > on ppc64le. > > > > Signed-off-by: Matt Brown <matthew.brown.dev@gmail.com> > > --- > > v3: > > - optimised using the Giles-Miller method of side-ways addition > > v2: > > - fixed opcodes > > - fixed bitshifting and typecast errors > > - merged do_prtyw and do_prtyd into single function > > --- > > arch/powerpc/lib/sstep.c | 27 +++++++++++++++++++++++++++ > > 1 file changed, 27 insertions(+) > > > > diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c > > index 6a79618..0bcf631 100644 > > --- a/arch/powerpc/lib/sstep.c > > +++ b/arch/powerpc/lib/sstep.c > > @@ -655,6 +655,25 @@ static nokprobe_inline void do_bpermd(struct pt_regs *regs, unsigned long v1, > > regs->gpr[ra] = perm; > > } > > #endif /* __powerpc64__ */ > > +/* > > + * The size parameter adjusts the equivalent prty instruction. > > + * prtyw = 32, prtyd = 64 > > + */ > > +static nokprobe_inline void do_prty(struct pt_regs *regs, unsigned long v, > > + int size, int ra) > > +{ > > + unsigned long long res = v; > > + > > + res = (0x0001000100010001 & res) + (0x0001000100010001 & (res >> 8)); > > + res = (0x0000000700000007 & res) + (0x0000000700000007 & (res >> 16)); > > + if (size == 32) { /* prtyw */ > > + regs->gpr[ra] = (0x0000000100000001 & res); > > + return; > > + } > > + > > + res = (0x000000000000000f & res) + (0x000000000000000f & (res >> 32)); > > + regs->gpr[ra] = res & 1; /*prtyd */ > > Looks good, you can also xor instead of adding, but the masks would need > to change in that case. Actually, I'd prefer to use xor for this case, since you hardly ever need any mask, except at the end. Most masks are only needed because of carry propagation, which the xor completely avoid. In other words: unsigned long long res = v ^ (v >> 8); res ^= res >> 16; if (size == 32) { regs->gpr[ra] = res & 0x0000000100000001; return; } res ^= res >> 32; regs-> gpr[ra] = res & 1; should work, but I've not even compiled it. Gabriel
Segher Boessenkool <segher@kernel.crashing.org> writes: > On Tue, Jul 25, 2017 at 01:33:19PM +1000, Matt Brown wrote: >> +static nokprobe_inline void do_prty(struct pt_regs *regs, unsigned long v, >> + int size, int ra) >> +{ >> + unsigned long long res = v; >> + >> + res = (0x0001000100010001 & res) + (0x0001000100010001 & (res >> 8)); >> + res = (0x0000000700000007 & res) + (0x0000000700000007 & (res >> 16)); >> + if (size == 32) { /* prtyw */ >> + regs->gpr[ra] = (0x0000000100000001 & res); >> + return; >> + } >> + >> + res = (0x000000000000000f & res) + (0x000000000000000f & (res >> 32)); >> + regs->gpr[ra] = res & 1; /*prtyd */ >> +} > > Does 7's and 0xf look strange (since the top bit in the values there > is always 0). You always "and" the values with 1 later, you could do > that immediately (and change + to ^). > > A general question about these patches: some things are inside #ifdef > __powerpc64__, some are not. It seems it is the wrong macro, and it > should be used (or not used) consistently? Why is it the wrong macro? Because we tend to use CONFIG_PPC64 you mean? I thought the reason some are #ifdef'ed is that some are 64-bit only. ie. bpermd is 64-bit only ? cheers
On Wed, Jul 26, 2017 at 08:03:30PM +1000, Michael Ellerman wrote: > Segher Boessenkool <segher@kernel.crashing.org> writes: > > A general question about these patches: some things are inside #ifdef > > __powerpc64__, some are not. It seems it is the wrong macro, and it > > should be used (or not used) consistently? > > Why is it the wrong macro? Because we tend to use CONFIG_PPC64 you mean? Yeah. But I see sstep.c already mixes those two at will (or if there is a distinction, I'm not seeing it :-) ) > I thought the reason some are #ifdef'ed is that some are 64-bit only. > ie. bpermd is 64-bit only ? 64-bit only, in what way? It's not clear what the rules are. It's not instructions that can only run in 64-bit mode. It's not instructions that only give a usable result with 64-bit regs implemented. It's not instructions only implemented on 64-bit CPUs. It's not even "all instructions that would not give a correct result in the low 32 bits of GPRs if the high 32 bits are not implemented". Segher
Segher Boessenkool <segher@kernel.crashing.org> writes: > On Wed, Jul 26, 2017 at 08:03:30PM +1000, Michael Ellerman wrote: >> Segher Boessenkool <segher@kernel.crashing.org> writes: >> > A general question about these patches: some things are inside #ifdef >> > __powerpc64__, some are not. It seems it is the wrong macro, and it >> > should be used (or not used) consistently? >> >> Why is it the wrong macro? Because we tend to use CONFIG_PPC64 you mean? > > Yeah. But I see sstep.c already mixes those two at will (or if there > is a distinction, I'm not seeing it :-) ) Yeah OK. In practice they're equivalent, if CONFIG_PPC64=y then the kernel is built 64-bit and therefore __powerpc64__ is defined. But I agree it's a mess, we should use CONFIG_PPC64 exclusively unless there's some reason not to (which I don't think there ever is). >> I thought the reason some are #ifdef'ed is that some are 64-bit only. >> ie. bpermd is 64-bit only ? > > 64-bit only, in what way? It's not clear what the rules are. Instructions that have "d" in the name? :P > It's not instructions that can only run in 64-bit mode. > It's not instructions that only give a usable result with 64-bit regs > implemented. > It's not instructions only implemented on 64-bit CPUs. I think it's trying to be that ^ If you build a 32-bit kernel then instructions that are only defined on 64-bit CPUs should be treated as illegal, so the easiest way to achieve that is to #ifdef off the code for those instructions. cheers
On Thu, Jul 27, 2017 at 11:26 AM, Michael Ellerman <mpe@ellerman.id.au> wrote: > Segher Boessenkool <segher@kernel.crashing.org> writes: > >> On Wed, Jul 26, 2017 at 08:03:30PM +1000, Michael Ellerman wrote: >>> Segher Boessenkool <segher@kernel.crashing.org> writes: >>> > A general question about these patches: some things are inside #ifdef >>> > __powerpc64__, some are not. It seems it is the wrong macro, and it >>> > should be used (or not used) consistently? >>> >>> Why is it the wrong macro? Because we tend to use CONFIG_PPC64 you mean? >> >> Yeah. But I see sstep.c already mixes those two at will (or if there >> is a distinction, I'm not seeing it :-) ) > > Yeah OK. In practice they're equivalent, if CONFIG_PPC64=y then the > kernel is built 64-bit and therefore __powerpc64__ is defined. > > But I agree it's a mess, we should use CONFIG_PPC64 exclusively unless > there's some reason not to (which I don't think there ever is). > >>> I thought the reason some are #ifdef'ed is that some are 64-bit only. >>> ie. bpermd is 64-bit only ? >> >> 64-bit only, in what way? It's not clear what the rules are. > > Instructions that have "d" in the name? :P > >> It's not instructions that can only run in 64-bit mode. >> It's not instructions that only give a usable result with 64-bit regs >> implemented. >> It's not instructions only implemented on 64-bit CPUs. > > I think it's trying to be that ^ > > If you build a 32-bit kernel then instructions that are only defined on > 64-bit CPUs should be treated as illegal, so the easiest way to achieve > that is to #ifdef off the code for those instructions. > I'll fixup this up to use the xor implementation, and change the series to use CONFIG_PPC64 for the ifdef. Thanks, Matt > cheers
Matt Brown <matthew.brown.dev@gmail.com> writes: > On Thu, Jul 27, 2017 at 11:26 AM, Michael Ellerman <mpe@ellerman.id.au> wrote: >> Segher Boessenkool <segher@kernel.crashing.org> writes: >> >>> On Wed, Jul 26, 2017 at 08:03:30PM +1000, Michael Ellerman wrote: >>>> Segher Boessenkool <segher@kernel.crashing.org> writes: >>>> > A general question about these patches: some things are inside #ifdef >>>> > __powerpc64__, some are not. It seems it is the wrong macro, and it >>>> > should be used (or not used) consistently? >>>> >>>> Why is it the wrong macro? Because we tend to use CONFIG_PPC64 you mean? >>> >>> Yeah. But I see sstep.c already mixes those two at will (or if there >>> is a distinction, I'm not seeing it :-) ) >> >> Yeah OK. In practice they're equivalent, if CONFIG_PPC64=y then the >> kernel is built 64-bit and therefore __powerpc64__ is defined. >> >> But I agree it's a mess, we should use CONFIG_PPC64 exclusively unless >> there's some reason not to (which I don't think there ever is). >> >>>> I thought the reason some are #ifdef'ed is that some are 64-bit only. >>>> ie. bpermd is 64-bit only ? >>> >>> 64-bit only, in what way? It's not clear what the rules are. >> >> Instructions that have "d" in the name? :P >> >>> It's not instructions that can only run in 64-bit mode. >>> It's not instructions that only give a usable result with 64-bit regs >>> implemented. >>> It's not instructions only implemented on 64-bit CPUs. >> >> I think it's trying to be that ^ >> >> If you build a 32-bit kernel then instructions that are only defined on >> 64-bit CPUs should be treated as illegal, so the easiest way to achieve >> that is to #ifdef off the code for those instructions. > > I'll fixup this up to use the xor implementation, and change the > series to use CONFIG_PPC64 for the ifdef. Thanks. In terms of the actual algorithms, we want to use something reasonably efficient, but don't sweat too much on the performance. The overhead of the emulation is in the exception we took to get there. So readable and obviously correct source is the main goal. cheers
diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c index 6a79618..0bcf631 100644 --- a/arch/powerpc/lib/sstep.c +++ b/arch/powerpc/lib/sstep.c @@ -655,6 +655,25 @@ static nokprobe_inline void do_bpermd(struct pt_regs *regs, unsigned long v1, regs->gpr[ra] = perm; } #endif /* __powerpc64__ */ +/* + * The size parameter adjusts the equivalent prty instruction. + * prtyw = 32, prtyd = 64 + */ +static nokprobe_inline void do_prty(struct pt_regs *regs, unsigned long v, + int size, int ra) +{ + unsigned long long res = v; + + res = (0x0001000100010001 & res) + (0x0001000100010001 & (res >> 8)); + res = (0x0000000700000007 & res) + (0x0000000700000007 & (res >> 16)); + if (size == 32) { /* prtyw */ + regs->gpr[ra] = (0x0000000100000001 & res); + return; + } + + res = (0x000000000000000f & res) + (0x000000000000000f & (res >> 32)); + regs->gpr[ra] = res & 1; /*prtyd */ +} static nokprobe_inline int trap_compare(long v1, long v2) { @@ -1245,6 +1264,14 @@ int analyse_instr(struct instruction_op *op, struct pt_regs *regs, case 124: /* nor */ regs->gpr[ra] = ~(regs->gpr[rd] | regs->gpr[rb]); goto logical_done; + + case 154: /* prtyw */ + do_prty(regs, regs->gpr[rd], 32, ra); + goto logical_done; + + case 186: /* prtyd */ + do_prty(regs, regs->gpr[rd], 64, ra); + goto logical_done; #ifdef __powerpc64__ case 252: /* bpermd */ do_bpermd(regs, regs->gpr[rd], regs->gpr[rb], ra);
This adds emulation for the prtyw and prtyd instructions. Tested for logical correctness against the prtyw and prtyd instructions on ppc64le. Signed-off-by: Matt Brown <matthew.brown.dev@gmail.com> --- v3: - optimised using the Giles-Miller method of side-ways addition v2: - fixed opcodes - fixed bitshifting and typecast errors - merged do_prtyw and do_prtyd into single function --- arch/powerpc/lib/sstep.c | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+)