mbox series

[v6,00/35] target/arm SVE patches

Message ID 20180627043328.11531-1-richard.henderson@linaro.org
Headers show
Series target/arm SVE patches | expand

Message

Richard Henderson June 27, 2018, 4:32 a.m. UTC
This is the remainder of the SVE enablement patches,
with an extra bonus patch to enable ARMv8.2-DotProd.

V6 updates based on review.

Patches with changes:
  0002-target-arm-Implement-SVE-Contiguous-Load-first-fa.patch
  0007-target-arm-Implement-SVE-FP-Multiply-Add-Group.patch
  0009-target-arm-Implement-SVE-load-and-broadcast-eleme.patch
  0010-target-arm-Implement-SVE-store-vector-predicate-r.patch
  0011-target-arm-Implement-SVE-scatter-stores.patch
  0013-target-arm-Implement-SVE-gather-loads.patch
  0023-target-arm-Implement-SVE-floating-point-convert-p.patch
  0027-target-arm-Implement-SVE-MOVPRFX.patch
  0030-target-arm-Pass-index-to-AdvSIMD-FCMLA-indexed.patch
  0033-target-arm-Implement-SVE-dot-product-indexed.patch
  0034-target-arm-Enable-SVE-for-aarch64-linux-user.patch
  0035-target-arm-Implement-ARMv8.2-DotProd.patch

Patches lacking reviews:
  0002-target-arm-Implement-SVE-Contiguous-Load-first-fa.patch
  0007-target-arm-Implement-SVE-FP-Multiply-Add-Group.patch
  0013-target-arm-Implement-SVE-gather-loads.patch
  0030-target-arm-Pass-index-to-AdvSIMD-FCMLA-indexed.patch
  0031-target-arm-Implement-SVE-fp-complex-multiply-add-.patch
  0033-target-arm-Implement-SVE-dot-product-indexed.patch


r~


Richard Henderson (35):
  target/arm: Implement SVE Memory Contiguous Load Group
  target/arm: Implement SVE Contiguous Load, first-fault and no-fault
  target/arm: Implement SVE Memory Contiguous Store Group
  target/arm: Implement SVE load and broadcast quadword
  target/arm: Implement SVE integer convert to floating-point
  target/arm: Implement SVE floating-point arithmetic (predicated)
  target/arm: Implement SVE FP Multiply-Add Group
  target/arm: Implement SVE Floating Point Accumulating Reduction Group
  target/arm: Implement SVE load and broadcast element
  target/arm: Implement SVE store vector/predicate register
  target/arm: Implement SVE scatter stores
  target/arm: Implement SVE prefetches
  target/arm: Implement SVE gather loads
  target/arm: Implement SVE first-fault gather loads
  target/arm: Implement SVE scatter store vector immediate
  target/arm: Implement SVE floating-point compare vectors
  target/arm: Implement SVE floating-point arithmetic with immediate
  target/arm: Implement SVE Floating Point Multiply Indexed Group
  target/arm: Implement SVE FP Fast Reduction Group
  target/arm: Implement SVE Floating Point Unary Operations -
    Unpredicated Group
  target/arm: Implement SVE FP Compare with Zero Group
  target/arm: Implement SVE floating-point trig multiply-add coefficient
  target/arm: Implement SVE floating-point convert precision
  target/arm: Implement SVE floating-point convert to integer
  target/arm: Implement SVE floating-point round to integral value
  target/arm: Implement SVE floating-point unary operations
  target/arm: Implement SVE MOVPRFX
  target/arm: Implement SVE floating-point complex add
  target/arm: Implement SVE fp complex multiply add
  target/arm: Pass index to AdvSIMD FCMLA (indexed)
  target/arm: Implement SVE fp complex multiply add (indexed)
  target/arm: Implement SVE dot product (vectors)
  target/arm: Implement SVE dot product (indexed)
  target/arm: Enable SVE for aarch64-linux-user
  target/arm: Implement ARMv8.2-DotProd

 target/arm/cpu.h           |    1 +
 target/arm/helper-sve.h    |  682 +++++++++++++
 target/arm/helper.h        |   44 +-
 linux-user/elfload.c       |    2 +
 target/arm/cpu.c           |    8 +
 target/arm/cpu64.c         |    2 +
 target/arm/helper.c        |    2 +-
 target/arm/sve_helper.c    | 1855 ++++++++++++++++++++++++++++++++++++
 target/arm/translate-a64.c |   57 +-
 target/arm/translate-sve.c | 1688 +++++++++++++++++++++++++++++++-
 target/arm/translate.c     |  102 +-
 target/arm/vec_helper.c    |  311 +++++-
 target/arm/sve.decode      |  427 +++++++++
 13 files changed, 5116 insertions(+), 65 deletions(-)

Comments

Alex Bennée June 28, 2018, 11:30 a.m. UTC | #1
Richard Henderson <richard.henderson@linaro.org> writes:

> This is the remainder of the SVE enablement patches,
> with an extra bonus patch to enable ARMv8.2-DotProd.
>
> V6 updates based on review.

One failure from the VQ3 test set:

../qemu.git/aarch64-linux-user/qemu-aarch64 \
  ./risu --test-sve=3 \
  sve-all-short-v8.3+sve@vq3/insn_sdiv_z_p_zz___INC.risu.bin \
  --trace=sve-all-short-v8.3+sve@vq3/insn_sdiv_z_p_zz___INC.risu.bin.trace

Gives:

  loading test image
  sve-all-short-v8.3+sve@vq3/insn_sdiv_z_p_zz___INC.risu.bin...
  starting apprentice image at 0x4000801000
  starting image
  fish: “../qemu.git/aarch64-linux-user/…” terminated by signal SIGFPE
  (Floating point exception)

From:

  http://people.linaro.org/~alex.bennee/testcases/arm64.risu/sve-all-short-v8.3+sve@vq3.tar.xz

>
> Patches with changes:
>   0002-target-arm-Implement-SVE-Contiguous-Load-first-fa.patch
>   0007-target-arm-Implement-SVE-FP-Multiply-Add-Group.patch
>   0009-target-arm-Implement-SVE-load-and-broadcast-eleme.patch
>   0010-target-arm-Implement-SVE-store-vector-predicate-r.patch
>   0011-target-arm-Implement-SVE-scatter-stores.patch
>   0013-target-arm-Implement-SVE-gather-loads.patch
>   0023-target-arm-Implement-SVE-floating-point-convert-p.patch
>   0027-target-arm-Implement-SVE-MOVPRFX.patch
>   0030-target-arm-Pass-index-to-AdvSIMD-FCMLA-indexed.patch
>   0033-target-arm-Implement-SVE-dot-product-indexed.patch
>   0034-target-arm-Enable-SVE-for-aarch64-linux-user.patch
>   0035-target-arm-Implement-ARMv8.2-DotProd.patch
>
> Patches lacking reviews:
>   0002-target-arm-Implement-SVE-Contiguous-Load-first-fa.patch
>   0007-target-arm-Implement-SVE-FP-Multiply-Add-Group.patch
>   0013-target-arm-Implement-SVE-gather-loads.patch
>   0030-target-arm-Pass-index-to-AdvSIMD-FCMLA-indexed.patch
>   0031-target-arm-Implement-SVE-fp-complex-multiply-add-.patch
>   0033-target-arm-Implement-SVE-dot-product-indexed.patch
>
>
> r~
>
>
> Richard Henderson (35):
>   target/arm: Implement SVE Memory Contiguous Load Group
>   target/arm: Implement SVE Contiguous Load, first-fault and no-fault
>   target/arm: Implement SVE Memory Contiguous Store Group
>   target/arm: Implement SVE load and broadcast quadword
>   target/arm: Implement SVE integer convert to floating-point
>   target/arm: Implement SVE floating-point arithmetic (predicated)
>   target/arm: Implement SVE FP Multiply-Add Group
>   target/arm: Implement SVE Floating Point Accumulating Reduction Group
>   target/arm: Implement SVE load and broadcast element
>   target/arm: Implement SVE store vector/predicate register
>   target/arm: Implement SVE scatter stores
>   target/arm: Implement SVE prefetches
>   target/arm: Implement SVE gather loads
>   target/arm: Implement SVE first-fault gather loads
>   target/arm: Implement SVE scatter store vector immediate
>   target/arm: Implement SVE floating-point compare vectors
>   target/arm: Implement SVE floating-point arithmetic with immediate
>   target/arm: Implement SVE Floating Point Multiply Indexed Group
>   target/arm: Implement SVE FP Fast Reduction Group
>   target/arm: Implement SVE Floating Point Unary Operations -
>     Unpredicated Group
>   target/arm: Implement SVE FP Compare with Zero Group
>   target/arm: Implement SVE floating-point trig multiply-add coefficient
>   target/arm: Implement SVE floating-point convert precision
>   target/arm: Implement SVE floating-point convert to integer
>   target/arm: Implement SVE floating-point round to integral value
>   target/arm: Implement SVE floating-point unary operations
>   target/arm: Implement SVE MOVPRFX
>   target/arm: Implement SVE floating-point complex add
>   target/arm: Implement SVE fp complex multiply add
>   target/arm: Pass index to AdvSIMD FCMLA (indexed)
>   target/arm: Implement SVE fp complex multiply add (indexed)
>   target/arm: Implement SVE dot product (vectors)
>   target/arm: Implement SVE dot product (indexed)
>   target/arm: Enable SVE for aarch64-linux-user
>   target/arm: Implement ARMv8.2-DotProd
>
>  target/arm/cpu.h           |    1 +
>  target/arm/helper-sve.h    |  682 +++++++++++++
>  target/arm/helper.h        |   44 +-
>  linux-user/elfload.c       |    2 +
>  target/arm/cpu.c           |    8 +
>  target/arm/cpu64.c         |    2 +
>  target/arm/helper.c        |    2 +-
>  target/arm/sve_helper.c    | 1855 ++++++++++++++++++++++++++++++++++++
>  target/arm/translate-a64.c |   57 +-
>  target/arm/translate-sve.c | 1688 +++++++++++++++++++++++++++++++-
>  target/arm/translate.c     |  102 +-
>  target/arm/vec_helper.c    |  311 +++++-
>  target/arm/sve.decode      |  427 +++++++++
>  13 files changed, 5116 insertions(+), 65 deletions(-)


--
Alex Bennée
Alex Bennée June 28, 2018, 2:05 p.m. UTC | #2
Richard Henderson <richard.henderson@linaro.org> writes:

> This is the remainder of the SVE enablement patches,
> with an extra bonus patch to enable ARMv8.2-DotProd.
>
> V6 updates based on review.
>
<snip>
> Patches lacking reviews:
>   0002-target-arm-Implement-SVE-Contiguous-Load-first-fa.patch
>   0007-target-arm-Implement-SVE-FP-Multiply-Add-Group.patch
>   0013-target-arm-Implement-SVE-gather-loads.patch
>   0030-target-arm-Pass-index-to-AdvSIMD-FCMLA-indexed.patch
>   0031-target-arm-Implement-SVE-fp-complex-multiply-add-.patch
>   0033-target-arm-Implement-SVE-dot-product-indexed.patch

OK I have finished sweeping through the un-reviewed patches.

>
>
> r~
>
>
> Richard Henderson (35):
>   target/arm: Implement SVE Memory Contiguous Load Group
>   target/arm: Implement SVE Contiguous Load, first-fault and no-fault
>   target/arm: Implement SVE Memory Contiguous Store Group
>   target/arm: Implement SVE load and broadcast quadword
>   target/arm: Implement SVE integer convert to floating-point
>   target/arm: Implement SVE floating-point arithmetic (predicated)
>   target/arm: Implement SVE FP Multiply-Add Group
>   target/arm: Implement SVE Floating Point Accumulating Reduction Group
>   target/arm: Implement SVE load and broadcast element
>   target/arm: Implement SVE store vector/predicate register
>   target/arm: Implement SVE scatter stores
>   target/arm: Implement SVE prefetches
>   target/arm: Implement SVE gather loads
>   target/arm: Implement SVE first-fault gather loads
>   target/arm: Implement SVE scatter store vector immediate
>   target/arm: Implement SVE floating-point compare vectors
>   target/arm: Implement SVE floating-point arithmetic with immediate
>   target/arm: Implement SVE Floating Point Multiply Indexed Group
>   target/arm: Implement SVE FP Fast Reduction Group
>   target/arm: Implement SVE Floating Point Unary Operations -
>     Unpredicated Group
>   target/arm: Implement SVE FP Compare with Zero Group
>   target/arm: Implement SVE floating-point trig multiply-add coefficient
>   target/arm: Implement SVE floating-point convert precision
>   target/arm: Implement SVE floating-point convert to integer
>   target/arm: Implement SVE floating-point round to integral value
>   target/arm: Implement SVE floating-point unary operations
>   target/arm: Implement SVE MOVPRFX
>   target/arm: Implement SVE floating-point complex add
>   target/arm: Implement SVE fp complex multiply add
>   target/arm: Pass index to AdvSIMD FCMLA (indexed)
>   target/arm: Implement SVE fp complex multiply add (indexed)
>   target/arm: Implement SVE dot product (vectors)
>   target/arm: Implement SVE dot product (indexed)
>   target/arm: Enable SVE for aarch64-linux-user
>   target/arm: Implement ARMv8.2-DotProd
>
>  target/arm/cpu.h           |    1 +
>  target/arm/helper-sve.h    |  682 +++++++++++++
>  target/arm/helper.h        |   44 +-
>  linux-user/elfload.c       |    2 +
>  target/arm/cpu.c           |    8 +
>  target/arm/cpu64.c         |    2 +
>  target/arm/helper.c        |    2 +-
>  target/arm/sve_helper.c    | 1855 ++++++++++++++++++++++++++++++++++++
>  target/arm/translate-a64.c |   57 +-
>  target/arm/translate-sve.c | 1688 +++++++++++++++++++++++++++++++-
>  target/arm/translate.c     |  102 +-
>  target/arm/vec_helper.c    |  311 +++++-
>  target/arm/sve.decode      |  427 +++++++++
>  13 files changed, 5116 insertions(+), 65 deletions(-)


--
Alex Bennée
Peter Maydell June 28, 2018, 2:12 p.m. UTC | #3
On 28 June 2018 at 12:30, Alex Bennée <alex.bennee@linaro.org> wrote:
>
> Richard Henderson <richard.henderson@linaro.org> writes:
>
>> This is the remainder of the SVE enablement patches,
>> with an extra bonus patch to enable ARMv8.2-DotProd.
>>
>> V6 updates based on review.
>
> One failure from the VQ3 test set:
>
> ../qemu.git/aarch64-linux-user/qemu-aarch64 \
>   ./risu --test-sve=3 \
>   sve-all-short-v8.3+sve@vq3/insn_sdiv_z_p_zz___INC.risu.bin \
>   --trace=sve-all-short-v8.3+sve@vq3/insn_sdiv_z_p_zz___INC.risu.bin.trace
>
> Gives:
>
>   loading test image
>   sve-all-short-v8.3+sve@vq3/insn_sdiv_z_p_zz___INC.risu.bin...
>   starting apprentice image at 0x4000801000
>   starting image
>   fish: “../qemu.git/aarch64-linux-user/…” terminated by signal SIGFPE
>   (Floating point exception)

Do you have the insn that it's barfing on? In particular,
I'm guessing from the test name that this is for something
covered by one of the SDIV_zpzz lines in sve.decode, which
is already in master rather than in this test series.
If that's true, then it shouldn't block applying this set...

thanks
-- PMM
Peter Maydell June 28, 2018, 2:55 p.m. UTC | #4
On 28 June 2018 at 15:12, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 28 June 2018 at 12:30, Alex Bennée <alex.bennee@linaro.org> wrote:
>>   loading test image
>>   sve-all-short-v8.3+sve@vq3/insn_sdiv_z_p_zz___INC.risu.bin...
>>   starting apprentice image at 0x4000801000
>>   starting image
>>   fish: “../qemu.git/aarch64-linux-user/…” terminated by signal SIGFPE
>>   (Floating point exception)
>
> Do you have the insn that it's barfing on? In particular,
> I'm guessing from the test name that this is for something
> covered by one of the SDIV_zpzz lines in sve.decode, which
> is already in master rather than in this test series.
> If that's true, then it shouldn't block applying this set...

Further discussion on IRC suggests that this is failing on
MININT idiv -1, which is an annoying special case that x86
happens to generate SIGFPE for. Compare our HELPER(sdiv) code:

int32_t HELPER(sdiv)(int32_t num, int32_t den)
{
    if (den == 0)
      return 0;
    if (num == INT_MIN && den == -1)
      return INT_MIN;
    return num / den;
}

with what we do for SVE:

#define DO_DIV(N, M)  (M ? N / M : 0)

This is OK for unsigned division, but signed division needs to
special case INT_MIN / -1.

In any case, this is in an existing insn, so I'm going to apply
this series to target-arm.next (fixing up the patch 5 comment
typo).

thanks
-- PMM
Alex Bennée June 28, 2018, 2:55 p.m. UTC | #5
Peter Maydell <peter.maydell@linaro.org> writes:

> On 28 June 2018 at 12:30, Alex Bennée <alex.bennee@linaro.org> wrote:
>>
>> Richard Henderson <richard.henderson@linaro.org> writes:
>>
>>> This is the remainder of the SVE enablement patches,
>>> with an extra bonus patch to enable ARMv8.2-DotProd.
>>>
>>> V6 updates based on review.
>>
>> One failure from the VQ3 test set:
>>
>> ../qemu.git/aarch64-linux-user/qemu-aarch64 \
>>   ./risu --test-sve=3 \
>>   sve-all-short-v8.3+sve@vq3/insn_sdiv_z_p_zz___INC.risu.bin \
>>   --trace=sve-all-short-v8.3+sve@vq3/insn_sdiv_z_p_zz___INC.risu.bin.trace
>>
>> Gives:
>>
>>   loading test image
>>   sve-all-short-v8.3+sve@vq3/insn_sdiv_z_p_zz___INC.risu.bin...
>>   starting apprentice image at 0x4000801000
>>   starting image
>>   fish: “../qemu.git/aarch64-linux-user/…” terminated by signal SIGFPE
>>   (Floating point exception)
>
> Do you have the insn that it's barfing on? In particular,
> I'm guessing from the test name that this is for something
> covered by one of the SDIV_zpzz lines in sve.decode, which
> is already in master rather than in this test series.
> If that's true, then it shouldn't block applying this set...

#0  0x000055555569297f in helper_sve_sdiv_zpzz_s (vd=0x555557a522e0, vn=0x555557a522e0, vm=0x555557a51fe0, vg=0x555557a52be0, desc=<optimised out>) at /home/alex/lsrc/qemu/qemu.git/target/arm/sve_helper.c:480
#1  0x0000555555b1283f in static_code_gen_buffer ()
#2  0x00005555555ea0d8 in cpu_tb_exec (itb=<optimised out>, cpu=0x555557a50320) at /home/alex/lsrc/qemu/qemu.git/accel/tcg/cpu-exec.c:171
#3  cpu_loop_exec_tb (tb_exit=<synthetic pointer>, last_tb=<synthetic pointer>, tb=<optimised out>, cpu=0x555557a50320) at /home/alex/lsrc/qemu/qemu.git/accel/tcg/cpu-exec.c:612
#4  cpu_exec (cpu=cpu@entry=0x555557a48070) at /home/alex/lsrc/qemu/qemu.git/accel/tcg/cpu-exec.c:722
#5  0x000055555560ad40 in cpu_loop (env=0x555557a50320) at /home/alex/lsrc/qemu/qemu.git/linux-user/aarch64/cpu_loop.c:82
#6  0x00005555555afb0c in main (argc=<optimised out>, argv=0x7fffffffdea8, envp=<optimised out>) at /home/alex/lsrc/qemu/qemu.git/linux-user/main.c:813
#0  0x000055555569297f in helper_sve_sdiv_zpzz_s (vd=0x555557a522e0, vn=0x555557a522e0, vm=0x555557a51fe0, vg=0x555557a52be0, desc=<optimised out>) at /home/alex/lsrc/qemu/qemu.git/target/arm/sve_helper.c:480
480	DO_ZPZZ(sve_sdiv_zpzz_s, int32_t, H1_4, DO_DIV)
=> 0x55555569297f <helper_sve_sdiv_zpzz_s+63>:	idiv   %r10d
   0x555555692982 <helper_sve_sdiv_zpzz_s+66>:	mov    %eax,%r11d
   0x555555692985 <helper_sve_sdiv_zpzz_s+69>:	mov    %r11d,(%rdi,%r8,1)
   0x555555692989 <helper_sve_sdiv_zpzz_s+73>:	add    $0x4,%r8
   0x55555569298d <helper_sve_sdiv_zpzz_s+77>:	shr    $0x4,%r9w
A syntax error in expression, near `./ $r10d'.

r10d $6 = 0xffffffff
rax $7 = 0x80000000
rdx $8 = 0xffffffff

Yeah so from something already merged in.

--
Alex Bennée