mbox series

[RFC,0/4] improve coverage of vector backend

Message ID 20220202191242.652607-1-alex.bennee@linaro.org
Headers show
Series improve coverage of vector backend | expand

Message

Alex Bennée Feb. 2, 2022, 7:12 p.m. UTC
Hi Richard,

While reviewing the TCG vector clean-ups I tried to improve the
range of instructions we tested. I couldn't get the existing hacky
sha1 test to vectorise nicely so I snarfed the sha512 algorithm from
CCAN. The sha512 test is good because it is all purely integer so we
should be able to use native code on the backend. The test also has
the nice property of validating behaviour.

I did toy with the idea of incorporating CCAN as a submodule because
there is quite a lot of nice stuff in there we could use for further
tests. However for now witness the glory of a cut and paste job.

What do you think?


Alex Bennée (4):
  tests/tcg: cleanup sha1 source code
  tests/tcg: build sha1-vector for SVE and compare
  tests/tcg: add sha512 test
  tests/tcg: add vectorised sha512 versions

 tests/tcg/multiarch/sha1.c        |  67 +-
 tests/tcg/multiarch/sha512.c      | 990 ++++++++++++++++++++++++++++++
 tests/tcg/aarch64/Makefile.target |  23 +
 tests/tcg/i386/Makefile.target    |   6 +
 tests/tcg/ppc64le/Makefile.target |   5 +-
 tests/tcg/s390x/Makefile.target   |   9 +
 tests/tcg/x86_64/Makefile.target  |   7 +
 7 files changed, 1056 insertions(+), 51 deletions(-)
 create mode 100644 tests/tcg/multiarch/sha512.c

Comments

Alex Bennée Feb. 2, 2022, 11:16 p.m. UTC | #1
Alex Bennée <alex.bennee@linaro.org> writes:

> Hi Richard,
>
> While reviewing the TCG vector clean-ups I tried to improve the
> range of instructions we tested. I couldn't get the existing hacky
> sha1 test to vectorise nicely so I snarfed the sha512 algorithm from
> CCAN. The sha512 test is good because it is all purely integer so we
> should be able to use native code on the backend. The test also has
> the nice property of validating behaviour.

Hi Taylor,

You might want to check this out:

✗  ./qemu-hexagon ./tests/tcg/hexagon-linux-user/sha512
1..10
not ok 1 - do_test(&tests[i])
#     Failed test (/home/alex.bennee/lsrc/qemu.git/tests/tcg/multiarch/sha512.c:main() at line 986)
not ok 2 - do_test(&tests[i])
#     Failed test (/home/alex.bennee/lsrc/qemu.git/tests/tcg/multiarch/sha512.c:main() at line 986)
not ok 3 - do_test(&tests[i])
#     Failed test (/home/alex.bennee/lsrc/qemu.git/tests/tcg/multiarch/sha512.c:main() at line 986)
not ok 4 - do_test(&tests[i])
#     Failed test (/home/alex.bennee/lsrc/qemu.git/tests/tcg/multiarch/sha512.c:main() at line 986)
not ok 5 - do_test(&tests[i])
#     Failed test (/home/alex.bennee/lsrc/qemu.git/tests/tcg/multiarch/sha512.c:main() at line 986)
not ok 6 - do_test(&tests[i])
#     Failed test (/home/alex.bennee/lsrc/qemu.git/tests/tcg/multiarch/sha512.c:main() at line 986)
not ok 7 - do_test(&tests[i])
#     Failed test (/home/alex.bennee/lsrc/qemu.git/tests/tcg/multiarch/sha512.c:main() at line 986)
not ok 8 - do_test(&tests[i])
#     Failed test (/home/alex.bennee/lsrc/qemu.git/tests/tcg/multiarch/sha512.c:main() at line 986)
not ok 9 - do_test(&tests[i])
#     Failed test (/home/alex.bennee/lsrc/qemu.git/tests/tcg/multiarch/sha512.c:main() at line 986)
not ok 10 - do_test(&tests[i])
#     Failed test (/home/alex.bennee/lsrc/qemu.git/tests/tcg/multiarch/sha512.c:main() at line 986)
# Looks like you failed 10 tests of 10.

Which makes me think either the translation or compiler are wrong.

>
> I did toy with the idea of incorporating CCAN as a submodule because
> there is quite a lot of nice stuff in there we could use for further
> tests. However for now witness the glory of a cut and paste job.
>
> What do you think?
>
>
> Alex Bennée (4):
>   tests/tcg: cleanup sha1 source code
>   tests/tcg: build sha1-vector for SVE and compare
>   tests/tcg: add sha512 test
>   tests/tcg: add vectorised sha512 versions
>
>  tests/tcg/multiarch/sha1.c        |  67 +-
>  tests/tcg/multiarch/sha512.c      | 990 ++++++++++++++++++++++++++++++
>  tests/tcg/aarch64/Makefile.target |  23 +
>  tests/tcg/i386/Makefile.target    |   6 +
>  tests/tcg/ppc64le/Makefile.target |   5 +-
>  tests/tcg/s390x/Makefile.target   |   9 +
>  tests/tcg/x86_64/Makefile.target  |   7 +
>  7 files changed, 1056 insertions(+), 51 deletions(-)
>  create mode 100644 tests/tcg/multiarch/sha512.c
Taylor Simpson Feb. 3, 2022, 1:45 a.m. UTC | #2
> -----Original Message-----
> From: Alex Bennée <alex.bennee@linaro.org>
> Sent: Wednesday, February 2, 2022 5:16 PM
> To: richard.henderson@linaro.org; qemu-devel@nongnu.org
> Cc: qemu-arm@nongnu.org; fam@euphon.net; berrange@redhat.com;
> f4bug@amsat.org; aurelien@aurel32.net; pbonzini@redhat.com;
> stefanha@redhat.com; crosa@redhat.com; Alex Bennée
> <alex.bennee@linaro.org>; Taylor Simpson <tsimpson@quicinc.com>
> Subject: Re: [RFC PATCH 0/4] improve coverage of vector backend
> 
> Alex Bennée <alex.bennee@linaro.org> writes:
> 
> > Hi Richard,
> >
> > While reviewing the TCG vector clean-ups I tried to improve the range
> > of instructions we tested. I couldn't get the existing hacky
> > sha1 test to vectorise nicely so I snarfed the sha512 algorithm from
> > CCAN. The sha512 test is good because it is all purely integer so we
> > should be able to use native code on the backend. The test also has
> > the nice property of validating behaviour.
> 
> Hi Taylor,
> 
> You might want to check this out:
> 
> ✗  ./qemu-hexagon ./tests/tcg/hexagon-linux-user/sha512
> 1..10
> not ok 1 - do_test(&tests[i])
> #     Failed test
> (/home/alex.bennee/lsrc/qemu.git/tests/tcg/multiarch/sha512.c:main() at
> line 986)
> not ok 2 - do_test(&tests[i])

Thanks for the heads-up.  I'll take a look

Taylor
Taylor Simpson Feb. 3, 2022, 4:33 p.m. UTC | #3
> -----Original Message-----
> From: Taylor Simpson
> Sent: Wednesday, February 2, 2022 7:45 PM
> To: Alex Bennée <alex.bennee@linaro.org>; richard.henderson@linaro.org;
> qemu-devel@nongnu.org
> Cc: qemu-arm@nongnu.org; fam@euphon.net; berrange@redhat.com;
> f4bug@amsat.org; aurelien@aurel32.net; pbonzini@redhat.com;
> stefanha@redhat.com; crosa@redhat.com
> Subject: RE: [RFC PATCH 0/4] improve coverage of vector backend
> 
> 
> > -----Original Message-----
> > From: Alex Bennée <alex.bennee@linaro.org>
> > Sent: Wednesday, February 2, 2022 5:16 PM
> > To: richard.henderson@linaro.org; qemu-devel@nongnu.org
> > Cc: qemu-arm@nongnu.org; fam@euphon.net; berrange@redhat.com;
> > f4bug@amsat.org; aurelien@aurel32.net; pbonzini@redhat.com;
> > stefanha@redhat.com; crosa@redhat.com; Alex Bennée
> > <alex.bennee@linaro.org>; Taylor Simpson <tsimpson@quicinc.com>
> > Subject: Re: [RFC PATCH 0/4] improve coverage of vector backend
> >
> > Alex Bennée <alex.bennee@linaro.org> writes:
> >
> > > Hi Richard,
> > >
> > > While reviewing the TCG vector clean-ups I tried to improve the
> > > range of instructions we tested. I couldn't get the existing hacky
> > > sha1 test to vectorise nicely so I snarfed the sha512 algorithm from
> > > CCAN. The sha512 test is good because it is all purely integer so we
> > > should be able to use native code on the backend. The test also has
> > > the nice property of validating behaviour.
> >
> > Hi Taylor,
> >
> > You might want to check this out:
> >
> > ✗  ./qemu-hexagon ./tests/tcg/hexagon-linux-user/sha512
> > 1..10
> > not ok 1 - do_test(&tests[i])
> > #     Failed test
> > (/home/alex.bennee/lsrc/qemu.git/tests/tcg/multiarch/sha512.c:main()
> > at line 986) not ok 2 - do_test(&tests[i])
> 
> Thanks for the heads-up.  I'll take a look

Quick update - I ran the test on the hardware and have the same error messages.

So, it doesn't look like a QEMU problem.  I'll investigate if the problem is due to something in the toolchain.

Taylor
Alex Bennée Feb. 3, 2022, 5:50 p.m. UTC | #4
Taylor Simpson <tsimpson@quicinc.com> writes:

>> -----Original Message-----
>> From: Taylor Simpson
>> Sent: Wednesday, February 2, 2022 7:45 PM
>> To: Alex Bennée <alex.bennee@linaro.org>; richard.henderson@linaro.org;
>> qemu-devel@nongnu.org
>> Cc: qemu-arm@nongnu.org; fam@euphon.net; berrange@redhat.com;
>> f4bug@amsat.org; aurelien@aurel32.net; pbonzini@redhat.com;
>> stefanha@redhat.com; crosa@redhat.com
>> Subject: RE: [RFC PATCH 0/4] improve coverage of vector backend
>> 
>> 
>> > -----Original Message-----
>> > From: Alex Bennée <alex.bennee@linaro.org>
>> > Sent: Wednesday, February 2, 2022 5:16 PM
>> > To: richard.henderson@linaro.org; qemu-devel@nongnu.org
>> > Cc: qemu-arm@nongnu.org; fam@euphon.net; berrange@redhat.com;
>> > f4bug@amsat.org; aurelien@aurel32.net; pbonzini@redhat.com;
>> > stefanha@redhat.com; crosa@redhat.com; Alex Bennée
>> > <alex.bennee@linaro.org>; Taylor Simpson <tsimpson@quicinc.com>
>> > Subject: Re: [RFC PATCH 0/4] improve coverage of vector backend
>> >
>> > Alex Bennée <alex.bennee@linaro.org> writes:
>> >
>> > > Hi Richard,
>> > >
>> > > While reviewing the TCG vector clean-ups I tried to improve the
>> > > range of instructions we tested. I couldn't get the existing hacky
>> > > sha1 test to vectorise nicely so I snarfed the sha512 algorithm from
>> > > CCAN. The sha512 test is good because it is all purely integer so we
>> > > should be able to use native code on the backend. The test also has
>> > > the nice property of validating behaviour.
>> >
>> > Hi Taylor,
>> >
>> > You might want to check this out:
>> >
>> > ✗  ./qemu-hexagon ./tests/tcg/hexagon-linux-user/sha512
>> > 1..10
>> > not ok 1 - do_test(&tests[i])
>> > #     Failed test
>> > (/home/alex.bennee/lsrc/qemu.git/tests/tcg/multiarch/sha512.c:main()
>> > at line 986) not ok 2 - do_test(&tests[i])
>> 
>> Thanks for the heads-up.  I'll take a look
>
> Quick update - I ran the test on the hardware and have the same error messages.
>
> So, it doesn't look like a QEMU problem.  I'll investigate if the
> problem is due to something in the toolchain.

That reminds me what is the status of the binary toolchain. The last
attempt had some issues so we are still using the hand-built one
upstream.


>
> Taylor
Taylor Simpson Feb. 3, 2022, 5:57 p.m. UTC | #5
> -----Original Message-----
> From: Alex Bennée <alex.bennee@linaro.org>
> Sent: Thursday, February 3, 2022 11:50 AM
> To: Taylor Simpson <tsimpson@quicinc.com>
> Cc: richard.henderson@linaro.org; qemu-devel@nongnu.org; qemu-
> arm@nongnu.org; fam@euphon.net; berrange@redhat.com;
> f4bug@amsat.org; aurelien@aurel32.net; pbonzini@redhat.com;
> stefanha@redhat.com; crosa@redhat.com
> Subject: Re: [RFC PATCH 0/4] improve coverage of vector backend
> 
> Taylor Simpson <tsimpson@quicinc.com> writes:
> 
> > Quick update - I ran the test on the hardware and have the same error
> messages.
> >
> > So, it doesn't look like a QEMU problem.  I'll investigate if the
> > problem is due to something in the toolchain.
> 
> That reminds me what is the status of the binary toolchain. The last attempt
> had some issues so we are still using the hand-built one upstream.

No progress on that.  The team hasn't had the bandwidth to work on it.

However, I'm less suspicious of the toolchain now.  I built with a couple of different compiler options and a couple of different versions of the toolchain, including the C library that runs in production.  In all cases, I see the same result.

Any chance the problem is in the test itself (e.g., some sort of undefined behavior or a 64-bit vs 32-bit difference)?

Thanks,
Taylor
Alex Bennée Feb. 3, 2022, 6:26 p.m. UTC | #6
Taylor Simpson <tsimpson@quicinc.com> writes:

>> -----Original Message-----
>> From: Alex Bennée <alex.bennee@linaro.org>
>> Sent: Thursday, February 3, 2022 11:50 AM
>> To: Taylor Simpson <tsimpson@quicinc.com>
>> Cc: richard.henderson@linaro.org; qemu-devel@nongnu.org; qemu-
>> arm@nongnu.org; fam@euphon.net; berrange@redhat.com;
>> f4bug@amsat.org; aurelien@aurel32.net; pbonzini@redhat.com;
>> stefanha@redhat.com; crosa@redhat.com
>> Subject: Re: [RFC PATCH 0/4] improve coverage of vector backend
>> 
>> Taylor Simpson <tsimpson@quicinc.com> writes:
>> 
>> > Quick update - I ran the test on the hardware and have the same error
>> messages.
>> >
>> > So, it doesn't look like a QEMU problem.  I'll investigate if the
>> > problem is due to something in the toolchain.
>> 
>> That reminds me what is the status of the binary toolchain. The last attempt
>> had some issues so we are still using the hand-built one upstream.
>
> No progress on that.  The team hasn't had the bandwidth to work on it.
>
> However, I'm less suspicious of the toolchain now. I built with a
> couple of different compiler options and a couple of different
> versions of the toolchain, including the C library that runs in
> production. In all cases, I see the same result.
>
> Any chance the problem is in the test itself (e.g., some sort of
> undefined behavior or a 64-bit vs 32-bit difference)?

It does have a 64 bit byteswap in - it's possible I broke it copying
from the upstream:

  https://ccodearchive.net/info/crypto/sha512.html

but it does pass on *all* the other architectures which is a mix of 32
and 64 bit code. I did have to hack the endian detection code though.
Does:

  #if BYTE_ORDER == BIG_ENDIAN

work for your compiler?


>
> Thanks,
> Taylor
Taylor Simpson Feb. 3, 2022, 7:01 p.m. UTC | #7
> -----Original Message-----
> From: Alex Bennée <alex.bennee@linaro.org>
> Sent: Thursday, February 3, 2022 12:26 PM
> To: Taylor Simpson <tsimpson@quicinc.com>
> Cc: richard.henderson@linaro.org; qemu-devel@nongnu.org; qemu-
> arm@nongnu.org; fam@euphon.net; berrange@redhat.com;
> f4bug@amsat.org; aurelien@aurel32.net; pbonzini@redhat.com;
> stefanha@redhat.com; crosa@redhat.com
> Subject: Re: [RFC PATCH 0/4] improve coverage of vector backend
> 
> > Any chance the problem is in the test itself (e.g., some sort of
> > undefined behavior or a 64-bit vs 32-bit difference)?
> 
> It does have a 64 bit byteswap in - it's possible I broke it copying from the
> upstream:
> 
>   https://ccodearchive.net/info/crypto/sha512.html
> 
> but it does pass on *all* the other architectures which is a mix of 32 and 64
> bit code. I did have to hack the endian detection code though.
> Does:
> 
>   #if BYTE_ORDER == BIG_ENDIAN
> 
> work for your compiler?

No, but this does
#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__

With that change in the source, the tests passes.  Will that work for other targets?

Taylor
Alex Bennée Feb. 3, 2022, 8 p.m. UTC | #8
Taylor Simpson <tsimpson@quicinc.com> writes:

>> -----Original Message-----
>> From: Alex Bennée <alex.bennee@linaro.org>
>> Sent: Thursday, February 3, 2022 12:26 PM
>> To: Taylor Simpson <tsimpson@quicinc.com>
>> Cc: richard.henderson@linaro.org; qemu-devel@nongnu.org; qemu-
>> arm@nongnu.org; fam@euphon.net; berrange@redhat.com;
>> f4bug@amsat.org; aurelien@aurel32.net; pbonzini@redhat.com;
>> stefanha@redhat.com; crosa@redhat.com
>> Subject: Re: [RFC PATCH 0/4] improve coverage of vector backend
>> 
>> > Any chance the problem is in the test itself (e.g., some sort of
>> > undefined behavior or a 64-bit vs 32-bit difference)?
>> 
>> It does have a 64 bit byteswap in - it's possible I broke it copying from the
>> upstream:
>> 
>>   https://ccodearchive.net/info/crypto/sha512.html
>> 
>> but it does pass on *all* the other architectures which is a mix of 32 and 64
>> bit code. I did have to hack the endian detection code though.
>> Does:
>> 
>>   #if BYTE_ORDER == BIG_ENDIAN
>> 
>> work for your compiler?
>
> No, but this does
> #if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
>
> With that change in the source, the tests passes.  Will that work for
> other targets?

At least not hppa-linux-user. The joy of having no standard compile time
way to report byte order in the C standard despite most things needing
to know one way or another.

Richard,

Any ideas?

>
> Taylor
Taylor Simpson Feb. 3, 2022, 9:05 p.m. UTC | #9
> -----Original Message-----
> From: Alex Bennée <alex.bennee@linaro.org>
> Sent: Thursday, February 3, 2022 2:00 PM
> To: Taylor Simpson <tsimpson@quicinc.com>
> Cc: richard.henderson@linaro.org; qemu-devel@nongnu.org; qemu-
> arm@nongnu.org; fam@euphon.net; berrange@redhat.com;
> f4bug@amsat.org; aurelien@aurel32.net; pbonzini@redhat.com;
> stefanha@redhat.com; crosa@redhat.com
> Subject: Re: [RFC PATCH 0/4] improve coverage of vector backend
> 
> Taylor Simpson <tsimpson@quicinc.com> writes:
> 
> >> -----Original Message-----
> >> From: Alex Bennée <alex.bennee@linaro.org>
> >> Sent: Thursday, February 3, 2022 12:26 PM
> >> To: Taylor Simpson <tsimpson@quicinc.com>
> >> Cc: richard.henderson@linaro.org; qemu-devel@nongnu.org; qemu-
> >> arm@nongnu.org; fam@euphon.net; berrange@redhat.com;
> f4bug@amsat.org;
> >> aurelien@aurel32.net; pbonzini@redhat.com; stefanha@redhat.com;
> >> crosa@redhat.com
> >> Subject: Re: [RFC PATCH 0/4] improve coverage of vector backend
> >>
> >> > Any chance the problem is in the test itself (e.g., some sort of
> >> > undefined behavior or a 64-bit vs 32-bit difference)?
> >>
> >> It does have a 64 bit byteswap in - it's possible I broke it copying
> >> from the
> >> upstream:
> >>
> >>   https://ccodearchive.net/info/crypto/sha512.html
> >>
> >> but it does pass on *all* the other architectures which is a mix of
> >> 32 and 64 bit code. I did have to hack the endian detection code though.
> >> Does:
> >>
> >>   #if BYTE_ORDER == BIG_ENDIAN
> >>
> >> work for your compiler?
> >
> > No, but this does
> > #if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
> >
> > With that change in the source, the tests passes.  Will that work for
> > other targets?
> 
> At least not hppa-linux-user. The joy of having no standard compile time way
> to report byte order in the C standard despite most things needing to know
> one way or another.

How about this?
#if (defined (BYTE_ORDER) && BYTE_ORDER == BIG_ENDIAN) || (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
Richard Henderson Feb. 3, 2022, 9:31 p.m. UTC | #10
On 2/4/22 07:00, Alex Bennée wrote:
>>> Does:
>>>
>>>    #if BYTE_ORDER == BIG_ENDIAN
>>>
>>> work for your compiler?
>>
>> No, but this does
>> #if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
>>
>> With that change in the source, the tests passes.  Will that work for
>> other targets?
> 
> At least not hppa-linux-user. The joy of having no standard compile time
> way to report byte order in the C standard despite most things needing
> to know one way or another.
> 
> Richard,
> 
> Any ideas?

I see you're not explicitly including <endian.h>.
I would expect that to be of some use.


r~