mbox series

[RFC,0/7] VSX MMA Implementation

Message ID 20220426125028.18844-1-lucas.araujo@eldorado.org.br
Headers show
Series VSX MMA Implementation | expand

Message

Lucas Mateus Martins Araujo e Castro April 26, 2022, 12:50 p.m. UTC
From: "Lucas Mateus Castro (alqotel)" <lucas.araujo@eldorado.org.br>

This patch series is an RFC of the Matrix-Multiply Assist (MMA)
instructions implementation from the PowerISA 3.1 

These and the VDIV/VMOD implementation are the last new PowerISA 3.1
instructions left to be implemented.

Thanks
Lucas Mateus Castro (alqotel) (7):
  target/ppc: Implement xxm[tf]acc and xxsetaccz
  target/ppc: Implemented xvi*ger* instructions
  target/ppc: Implemented pmxvi*ger* instructions
  target/ppc: Implemented xvf*ger*
  target/ppc: Implemented xvf16ger*
  target/ppc: Implemented pmxvf*ger*
  target/ppc: Implemented [pm]xvbf16ger2*

 include/fpu/softfloat.h             |   9 ++
 target/ppc/cpu.h                    |  15 +++
 target/ppc/fpu_helper.c             | 130 ++++++++++++++++++
 target/ppc/helper.h                 |   7 +
 target/ppc/insn32.decode            |  49 +++++++
 target/ppc/insn64.decode            |  80 +++++++++++
 target/ppc/int_helper.c             |  85 ++++++++++++
 target/ppc/internal.h               |  28 ++++
 target/ppc/translate/vsx-impl.c.inc | 200 ++++++++++++++++++++++++++++
 9 files changed, 603 insertions(+)

Comments

Joel Stanley April 27, 2022, 6:21 a.m. UTC | #1
On Tue, 26 Apr 2022 at 12:51, Lucas Mateus Castro(alqotel)
<lucas.araujo@eldorado.org.br> wrote:
>
> From: "Lucas Mateus Castro (alqotel)" <lucas.araujo@eldorado.org.br>
>
> This patch series is an RFC of the Matrix-Multiply Assist (MMA)
> instructions implementation from the PowerISA 3.1
>
> These and the VDIV/VMOD implementation are the last new PowerISA 3.1
> instructions left to be implemented.
>
> Thanks
> Lucas Mateus Castro (alqotel) (7):
>   target/ppc: Implement xxm[tf]acc and xxsetaccz
>   target/ppc: Implemented xvi*ger* instructions
>   target/ppc: Implemented pmxvi*ger* instructions
>   target/ppc: Implemented xvf*ger*
>   target/ppc: Implemented xvf16ger*
>   target/ppc: Implemented pmxvf*ger*
>   target/ppc: Implemented [pm]xvbf16ger2*

I have a small test case for the MMA instructions that Alistair wrote
a while back[1]. It passes when run with these patches applied
(previously it would sigill).

$ qemu-ppc64le -cpu power10  -L ~/ppc64le/ ./test -m
Smoke test MMA
MMA[0] = 1 (Correct)
MMA[1] = 2 (Correct)
MMA[2] = 3 (Correct)
MMA[3] = 4 (Correct)
MMA[4] = 2 (Correct)
MMA[5] = 4 (Correct)
MMA[6] = 6 (Correct)
MMA[7] = 8 (Correct)
MMA[8] = 3 (Correct)
MMA[9] = 6 (Correct)
MMA[10] = 9 (Correct)
MMA[11] = 12 (Correct)
MMA[12] = 4 (Correct)
MMA[13] = 8 (Correct)
MMA[14] = 12 (Correct)
MMA[15] = 16 (Correct)

[1] https://github.com/shenki/p10_tests


>
>  include/fpu/softfloat.h             |   9 ++
>  target/ppc/cpu.h                    |  15 +++
>  target/ppc/fpu_helper.c             | 130 ++++++++++++++++++
>  target/ppc/helper.h                 |   7 +
>  target/ppc/insn32.decode            |  49 +++++++
>  target/ppc/insn64.decode            |  80 +++++++++++
>  target/ppc/int_helper.c             |  85 ++++++++++++
>  target/ppc/internal.h               |  28 ++++
>  target/ppc/translate/vsx-impl.c.inc | 200 ++++++++++++++++++++++++++++
>  9 files changed, 603 insertions(+)
>
> --
> 2.31.1
>
>
Cédric Le Goater April 27, 2022, 7:10 a.m. UTC | #2
Hello,

On 4/27/22 08:21, Joel Stanley wrote:
> On Tue, 26 Apr 2022 at 12:51, Lucas Mateus Castro(alqotel)
> <lucas.araujo@eldorado.org.br> wrote:
>>
>> From: "Lucas Mateus Castro (alqotel)" <lucas.araujo@eldorado.org.br>
>>
>> This patch series is an RFC of the Matrix-Multiply Assist (MMA)
>> instructions implementation from the PowerISA 3.1
>>
>> These and the VDIV/VMOD implementation are the last new PowerISA 3.1
>> instructions left to be implemented.
>>
>> Thanks
>> Lucas Mateus Castro (alqotel) (7):
>>    target/ppc: Implement xxm[tf]acc and xxsetaccz
>>    target/ppc: Implemented xvi*ger* instructions
>>    target/ppc: Implemented pmxvi*ger* instructions
>>    target/ppc: Implemented xvf*ger*
>>    target/ppc: Implemented xvf16ger*
>>    target/ppc: Implemented pmxvf*ger*
>>    target/ppc: Implemented [pm]xvbf16ger2*
> 
> I have a small test case for the MMA instructions that Alistair wrote
> a while back[1]. It passes when run with these patches applied
> (previously it would sigill).

Could we have your Tested-by then ?


> 
> $ qemu-ppc64le -cpu power10  -L ~/ppc64le/ ./test -m
> Smoke test MMA
> MMA[0] = 1 (Correct)
> MMA[1] = 2 (Correct)
> MMA[2] = 3 (Correct)
> MMA[3] = 4 (Correct)
> MMA[4] = 2 (Correct)
> MMA[5] = 4 (Correct)
> MMA[6] = 6 (Correct)
> MMA[7] = 8 (Correct)
> MMA[8] = 3 (Correct)
> MMA[9] = 6 (Correct)
> MMA[10] = 9 (Correct)
> MMA[11] = 12 (Correct)
> MMA[12] = 4 (Correct)
> MMA[13] = 8 (Correct)
> MMA[14] = 12 (Correct)
> MMA[15] = 16 (Correct)
> 
> [1] https://github.com/shenki/p10_tests

Looks like a good candidate for tests/tcg/ppc64le/. Adding Matheus and Leandro.

Thanks,

C.



> 
> 
>>
>>   include/fpu/softfloat.h             |   9 ++
>>   target/ppc/cpu.h                    |  15 +++
>>   target/ppc/fpu_helper.c             | 130 ++++++++++++++++++
>>   target/ppc/helper.h                 |   7 +
>>   target/ppc/insn32.decode            |  49 +++++++
>>   target/ppc/insn64.decode            |  80 +++++++++++
>>   target/ppc/int_helper.c             |  85 ++++++++++++
>>   target/ppc/internal.h               |  28 ++++
>>   target/ppc/translate/vsx-impl.c.inc | 200 ++++++++++++++++++++++++++++
>>   9 files changed, 603 insertions(+)
>>
>> --
>> 2.31.1
>>
>>
Lucas Mateus Martins Araujo e Castro April 28, 2022, 2:05 p.m. UTC | #3
Something I forgot to mention in the cover letter, the XVFGER 
instructions accumulate the exception status and at the end set the 
FPSCR and take a Program interrupt on a trap-enabled exception, but as 
the exception functions are currently set up in target/ppc/fpu_helper.c 
a call to set a FPSCR bit could raise an exception before all bits could 
be set.

Victor (CCing him) is working on a patch series to fix the FPSCR.FI bit 
that will reorganize do_float_check_status (that would solve the 
aforementioned problem), so for now I sent without trying to solve that 
problem

In v2 I'll remember to mention this in the cover letter

On 26/04/2022 09:50, Lucas Mateus Castro(alqotel) wrote:
> From: "Lucas Mateus Castro (alqotel)"<lucas.araujo@eldorado.org.br>
>
> This patch series is an RFC of the Matrix-Multiply Assist (MMA)
> instructions implementation from the PowerISA 3.1
>
> These and the VDIV/VMOD implementation are the last new PowerISA 3.1
> instructions left to be implemented.
>
> Thanks
> Lucas Mateus Castro (alqotel) (7):
>    target/ppc: Implement xxm[tf]acc and xxsetaccz
>    target/ppc: Implemented xvi*ger* instructions
>    target/ppc: Implemented pmxvi*ger* instructions
>    target/ppc: Implemented xvf*ger*
>    target/ppc: Implemented xvf16ger*
>    target/ppc: Implemented pmxvf*ger*
>    target/ppc: Implemented [pm]xvbf16ger2*
>
>   include/fpu/softfloat.h             |   9 ++
>   target/ppc/cpu.h                    |  15 +++
>   target/ppc/fpu_helper.c             | 130 ++++++++++++++++++
>   target/ppc/helper.h                 |   7 +
>   target/ppc/insn32.decode            |  49 +++++++
>   target/ppc/insn64.decode            |  80 +++++++++++
>   target/ppc/int_helper.c             |  85 ++++++++++++
>   target/ppc/internal.h               |  28 ++++
>   target/ppc/translate/vsx-impl.c.inc | 200 ++++++++++++++++++++++++++++
>   9 files changed, 603 insertions(+)
>
Joel Stanley May 5, 2022, 6:06 a.m. UTC | #4
On Wed, 27 Apr 2022 at 07:10, Cédric Le Goater <clg@kaod.org> wrote:
>
> Hello,
>
> On 4/27/22 08:21, Joel Stanley wrote:
> > On Tue, 26 Apr 2022 at 12:51, Lucas Mateus Castro(alqotel)
> > <lucas.araujo@eldorado.org.br> wrote:
> >>
> >> From: "Lucas Mateus Castro (alqotel)" <lucas.araujo@eldorado.org.br>
> >>
> >> This patch series is an RFC of the Matrix-Multiply Assist (MMA)
> >> instructions implementation from the PowerISA 3.1
> >>
> >> These and the VDIV/VMOD implementation are the last new PowerISA 3.1
> >> instructions left to be implemented.
> >>
> >> Thanks
> >> Lucas Mateus Castro (alqotel) (7):
> >>    target/ppc: Implement xxm[tf]acc and xxsetaccz
> >>    target/ppc: Implemented xvi*ger* instructions
> >>    target/ppc: Implemented pmxvi*ger* instructions
> >>    target/ppc: Implemented xvf*ger*
> >>    target/ppc: Implemented xvf16ger*
> >>    target/ppc: Implemented pmxvf*ger*
> >>    target/ppc: Implemented [pm]xvbf16ger2*
> >
> > I have a small test case for the MMA instructions that Alistair wrote
> > a while back[1]. It passes when run with these patches applied
> > (previously it would sigill).
>
> Could we have your Tested-by then ?

Sure! I was going to re-test v2, but it doesn't hurt to mention it for
this version.

Tested-by: Joel Stanley <joel@jms.id.au>

>
>
> >
> > $ qemu-ppc64le -cpu power10  -L ~/ppc64le/ ./test -m
> > Smoke test MMA
> > MMA[0] = 1 (Correct)
> > MMA[1] = 2 (Correct)
> > MMA[2] = 3 (Correct)
> > MMA[3] = 4 (Correct)
> > MMA[4] = 2 (Correct)
> > MMA[5] = 4 (Correct)
> > MMA[6] = 6 (Correct)
> > MMA[7] = 8 (Correct)
> > MMA[8] = 3 (Correct)
> > MMA[9] = 6 (Correct)
> > MMA[10] = 9 (Correct)
> > MMA[11] = 12 (Correct)
> > MMA[12] = 4 (Correct)
> > MMA[13] = 8 (Correct)
> > MMA[14] = 12 (Correct)
> > MMA[15] = 16 (Correct)
> >
> > [1] https://github.com/shenki/p10_tests
>
> Looks like a good candidate for tests/tcg/ppc64le/. Adding Matheus and Leandro.
>
> Thanks,
>
> C.
>
>
>
> >
> >
> >>
> >>   include/fpu/softfloat.h             |   9 ++
> >>   target/ppc/cpu.h                    |  15 +++
> >>   target/ppc/fpu_helper.c             | 130 ++++++++++++++++++
> >>   target/ppc/helper.h                 |   7 +
> >>   target/ppc/insn32.decode            |  49 +++++++
> >>   target/ppc/insn64.decode            |  80 +++++++++++
> >>   target/ppc/int_helper.c             |  85 ++++++++++++
> >>   target/ppc/internal.h               |  28 ++++
> >>   target/ppc/translate/vsx-impl.c.inc | 200 ++++++++++++++++++++++++++++
> >>   9 files changed, 603 insertions(+)
> >>
> >> --
> >> 2.31.1
> >>
> >>
>