mbox series

[v4,0/6] Add Loongson SX/ASX instruction support to LoongArch target.

Message ID 20230815010537.1817292-1-panchenghui@loongson.cn
Headers show
Series Add Loongson SX/ASX instruction support to LoongArch target. | expand

Message

Chenghui Pan Aug. 15, 2023, 1:05 a.m. UTC
This is an update of:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626194.html

This version of patch set only introduces some small simplications of
implementation. Because I missed the size limitation of mail size, the
huge testsuite patches of v2 and v3 are not shown in the mail list. So,
testsuite patches are splited from this patch set again and will be submitted 
independently in the future.

Binutils-gdb introduced LSX/LASX support since 2.41 release:
https://lists.gnu.org/archive/html/info-gnu/2023-07/msg00009.html

Brief history of patch set version:
v1 -> v2:
- Reduce usage of "unspec" in RTL template.
- Append Support of ADDR_REG_REG in LSX and LASX.
- Constraint docs are appended in gcc/doc/md.texi and ccomment block.
- Codes related to vecarg are removed.
- Testsuite of LSX and LASX is added in v2. (Because of the size limitation of
  mail list, these patches are not shown)
- Adjust the loongarch_expand_vector_init() function to reduce instruction 
  output amount.
- Some minor implementation changes of RTL templates.

v2 -> v3:
- Revert vabsd/xvabsd RTL templates to unspec impl.
- Resolve warning in gcc/config/loongarch/loongarch.cc when bootstrapping 
  with BOOT_CFLAGS="-O2 -ftree-vectorize -fno-vect-cost-model -mlasx".
- Remove redundant definitions in lasxintrin.h.
- Refine commit info.

Lulu Cheng (6):
  LoongArch: Add Loongson SX vector directive compilation framework.
  LoongArch: Add Loongson SX base instruction support.
  LoongArch: Add Loongson SX directive builtin function support.
  LoongArch: Add Loongson ASX vector directive compilation framework.
  LoongArch: Add Loongson ASX base instruction support.
  LoongArch: Add Loongson ASX directive builtin function support.

 gcc/config.gcc                                |    2 +-
 gcc/config/loongarch/constraints.md           |  131 +-
 .../loongarch/genopts/loongarch-strings       |    4 +
 gcc/config/loongarch/genopts/loongarch.opt.in |   12 +-
 gcc/config/loongarch/lasx.md                  | 5122 ++++++++++++++++
 gcc/config/loongarch/lasxintrin.h             | 5338 +++++++++++++++++
 gcc/config/loongarch/loongarch-builtins.cc    | 2686 ++++++++-
 gcc/config/loongarch/loongarch-c.cc           |   18 +
 gcc/config/loongarch/loongarch-def.c          |    6 +
 gcc/config/loongarch/loongarch-def.h          |    9 +-
 gcc/config/loongarch/loongarch-driver.cc      |   10 +
 gcc/config/loongarch/loongarch-driver.h       |    2 +
 gcc/config/loongarch/loongarch-ftypes.def     |  666 +-
 gcc/config/loongarch/loongarch-modes.def      |   39 +
 gcc/config/loongarch/loongarch-opts.cc        |   89 +-
 gcc/config/loongarch/loongarch-opts.h         |    3 +
 gcc/config/loongarch/loongarch-protos.h       |   35 +
 gcc/config/loongarch/loongarch-str.h          |    3 +
 gcc/config/loongarch/loongarch.cc             | 4586 +++++++++++++-
 gcc/config/loongarch/loongarch.h              |  117 +-
 gcc/config/loongarch/loongarch.md             |   56 +-
 gcc/config/loongarch/loongarch.opt            |   12 +-
 gcc/config/loongarch/lsx.md                   | 4481 ++++++++++++++
 gcc/config/loongarch/lsxintrin.h              | 5181 ++++++++++++++++
 gcc/config/loongarch/predicates.md            |  333 +-
 gcc/doc/md.texi                               |   11 +
 26 files changed, 28668 insertions(+), 284 deletions(-)
 create mode 100644 gcc/config/loongarch/lasx.md
 create mode 100644 gcc/config/loongarch/lasxintrin.h
 create mode 100644 gcc/config/loongarch/lsx.md
 create mode 100644 gcc/config/loongarch/lsxintrin.h

Comments

Xi Ruoyao Aug. 16, 2023, 3:27 a.m. UTC | #1
The implementation fails to handle this test case properly:

typedef double __attribute__((vector_size(32))) v4df;

void use1(double);

__attribute__((noipa)) double use(double)
{
	register double x asm("f24") = 114.514;
	__asm__("" : "+f" (x));
	return x;
}

void test(void)
{
	register v4df x asm("f24") = {1, 2, 3, 4};
	__asm__("" : "+f" (x));
	use(x[1]);
	use1(x[3]);
}

Here use() attempts to save and restore f24, but it uses fst.d/fld.d,
clobbering the high 192 bits of xr24.  Now test() passes a wrong value
of x[3] to use1().

Note that saving and restoring f24 with xvst/xvld in use() won't really
fix the issue because in real life use() can be in another translation
unit (or even a shared library) compiled with -mno-lsx.  So it seems we
need to tell the compiler "a function call may clobber the high bits of
a vector register even if the corresponding floating-point register is
saved".  I'm not sure how to accomplish this...

On Tue, 2023-08-15 at 09:05 +0800, Chenghui Pan wrote:
> This is an update of:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626194.html
> 
> This version of patch set only introduces some small simplications of
> implementation. Because I missed the size limitation of mail size, the
> huge testsuite patches of v2 and v3 are not shown in the mail list.
> So,
> testsuite patches are splited from this patch set again and will be
> submitted 
> independently in the future.
> 
> Binutils-gdb introduced LSX/LASX support since 2.41 release:
> https://lists.gnu.org/archive/html/info-gnu/2023-07/msg00009.html
> 
> Brief history of patch set version:
> v1 -> v2:
> - Reduce usage of "unspec" in RTL template.
> - Append Support of ADDR_REG_REG in LSX and LASX.
> - Constraint docs are appended in gcc/doc/md.texi and ccomment block.
> - Codes related to vecarg are removed.
> - Testsuite of LSX and LASX is added in v2. (Because of the size
> limitation of
>   mail list, these patches are not shown)
> - Adjust the loongarch_expand_vector_init() function to reduce
> instruction 
>   output amount.
> - Some minor implementation changes of RTL templates.
> 
> v2 -> v3:
> - Revert vabsd/xvabsd RTL templates to unspec impl.
> - Resolve warning in gcc/config/loongarch/loongarch.cc when
> bootstrapping 
>   with BOOT_CFLAGS="-O2 -ftree-vectorize -fno-vect-cost-model -mlasx".
> - Remove redundant definitions in lasxintrin.h.
> - Refine commit info.
> 
> Lulu Cheng (6):
>   LoongArch: Add Loongson SX vector directive compilation framework.
>   LoongArch: Add Loongson SX base instruction support.
>   LoongArch: Add Loongson SX directive builtin function support.
>   LoongArch: Add Loongson ASX vector directive compilation framework.
>   LoongArch: Add Loongson ASX base instruction support.
>   LoongArch: Add Loongson ASX directive builtin function support.
> 
>  gcc/config.gcc                                |    2 +-
>  gcc/config/loongarch/constraints.md           |  131 +-
>  .../loongarch/genopts/loongarch-strings       |    4 +
>  gcc/config/loongarch/genopts/loongarch.opt.in |   12 +-
>  gcc/config/loongarch/lasx.md                  | 5122 ++++++++++++++++
>  gcc/config/loongarch/lasxintrin.h             | 5338
> +++++++++++++++++
>  gcc/config/loongarch/loongarch-builtins.cc    | 2686 ++++++++-
>  gcc/config/loongarch/loongarch-c.cc           |   18 +
>  gcc/config/loongarch/loongarch-def.c          |    6 +
>  gcc/config/loongarch/loongarch-def.h          |    9 +-
>  gcc/config/loongarch/loongarch-driver.cc      |   10 +
>  gcc/config/loongarch/loongarch-driver.h       |    2 +
>  gcc/config/loongarch/loongarch-ftypes.def     |  666 +-
>  gcc/config/loongarch/loongarch-modes.def      |   39 +
>  gcc/config/loongarch/loongarch-opts.cc        |   89 +-
>  gcc/config/loongarch/loongarch-opts.h         |    3 +
>  gcc/config/loongarch/loongarch-protos.h       |   35 +
>  gcc/config/loongarch/loongarch-str.h          |    3 +
>  gcc/config/loongarch/loongarch.cc             | 4586 +++++++++++++-
>  gcc/config/loongarch/loongarch.h              |  117 +-
>  gcc/config/loongarch/loongarch.md             |   56 +-
>  gcc/config/loongarch/loongarch.opt            |   12 +-
>  gcc/config/loongarch/lsx.md                   | 4481 ++++++++++++++
>  gcc/config/loongarch/lsxintrin.h              | 5181 ++++++++++++++++
>  gcc/config/loongarch/predicates.md            |  333 +-
>  gcc/doc/md.texi                               |   11 +
>  26 files changed, 28668 insertions(+), 284 deletions(-)
>  create mode 100644 gcc/config/loongarch/lasx.md
>  create mode 100644 gcc/config/loongarch/lasxintrin.h
>  create mode 100644 gcc/config/loongarch/lsx.md
>  create mode 100644 gcc/config/loongarch/lsxintrin.h
>
Chenghui Pan Aug. 17, 2023, 7:20 a.m. UTC | #2
Hi! I try to investigate on this problem, and modify the testcase to
compile and run on aarch64 for reference, but I get some strange result
(comment shows the info that I see by stepping through by using gdb):

typedef double __attribute__((vector_size(16))) v2df;

void use1(double d) {}

__attribute__((noipa)) v2df use(double d)
{
  //reg v8's value: {1, 2}
  register v2df x asm("v8") = {5, 9};
  //reg v8's value: {5, 9}
  __asm__("" : "+w" (x));
  return x;
}

void test(void)
{
  register v2df x asm("v8") = {1, 2};
  __asm__("" : "+w" (x));
  //reg v8's value: {1, 2}
  use(x[0]);
  //reg v8's value: {1, 0}
  use1(x[1]);
}

int main(int argc, char **argv)
{
  test();
  return 0;
}

The compile command is: gcc -march=armv8-a -Og -g 1.c (gcc
8.3.0+binutils 2.31)

Disassembly of test() and use():
0000000000400558 <use>:                                               
  400558:       fc1f0fe8        str     d8, [sp, #-16]!               
  40055c:       90000000        adrp    x0, 400000 <_init-0x3e0>      
  400560:       3dc19c08        ldr     q8, [x0, #1648]               
  400564:       4ea81d00        mov     v0.16b, v8.16b                
  400568:       fc4107e8        ldr     d8, [sp], #16                 
  40056c:       d65f03c0        ret    

0000000000400570 <test>:                                              
  400570:       a9be7bfd        stp     x29, x30, [sp, #-32]!         
  400574:       910003fd        mov     x29, sp                       
  400578:       fd000be8        str     d8, [sp, #16]                 
  40057c:       90000000        adrp    x0, 400000 <_init-0x3e0>      
  400580:       3dc1a008        ldr     q8, [x0, #1664]               
  400584:       5e080500        mov     d0, v8.d[0]                   
  400588:       97fffff4        bl      400558 <use>                  
  40058c:       fd400be8        ldr     d8, [sp, #16]                 
  400590:       a8c27bfd        ldp     x29, x30, [sp], #32           
  400594:       d65f03c0        ret     

As the register value in the comments, The compiling output on aarch64
also clobbers the high parts of vector register. I googled for some
documents and I find this:
https://developer.arm.com/documentation/den0024/a/The-ABI-for-ARM-64
-bit-Architecture/Register-use-in-the-AArch64-Procedure-Call-Standard
/Parameters-in-NEON-and-floating-point-registers 

Seems ARMv8-A only guarantees to preserve low 64-bit value of
NEON/floating-point register value. I'm not sure that I modify the
testcase in the right way and maybe we need more investigations. Any
ideas or suggestion?

On Wed, 2023-08-16 at 11:27 +0800, Xi Ruoyao wrote:
> The implementation fails to handle this test case properly:
> 
> typedef double __attribute__((vector_size(32))) v4df;
> 
> void use1(double);
> 
> __attribute__((noipa)) double use(double)
> {
> 	register double x asm("f24") = 114.514;
> 	__asm__("" : "+f" (x));
> 	return x;
> }
> 
> void test(void)
> {
> 	register v4df x asm("f24") = {1, 2, 3, 4};
> 	__asm__("" : "+f" (x));
> 	use(x[1]);
> 	use1(x[3]);
> }
> 
> Here use() attempts to save and restore f24, but it uses fst.d/fld.d,
> clobbering the high 192 bits of xr24.  Now test() passes a wrong
> value
> of x[3] to use1().
> 
> Note that saving and restoring f24 with xvst/xvld in use() won't
> really
> fix the issue because in real life use() can be in another
> translation
> unit (or even a shared library) compiled with -mno-lsx.  So it seems
> we
> need to tell the compiler "a function call may clobber the high bits
> of
> a vector register even if the corresponding floating-point register
> is
> saved".  I'm not sure how to accomplish this...
> 
> On Tue, 2023-08-15 at 09:05 +0800, Chenghui Pan wrote:
> > This is an update of:
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626194.html
> > 
> > This version of patch set only introduces some small simplications
> > of
> > implementation. Because I missed the size limitation of mail size,
> > the
> > huge testsuite patches of v2 and v3 are not shown in the mail list.
> > So,
> > testsuite patches are splited from this patch set again and will be
> > submitted 
> > independently in the future.
> > 
> > Binutils-gdb introduced LSX/LASX support since 2.41 release:
> > https://lists.gnu.org/archive/html/info-gnu/2023-07/msg00009.html
> > 
> > Brief history of patch set version:
> > v1 -> v2:
> > - Reduce usage of "unspec" in RTL template.
> > - Append Support of ADDR_REG_REG in LSX and LASX.
> > - Constraint docs are appended in gcc/doc/md.texi and ccomment
> > block.
> > - Codes related to vecarg are removed.
> > - Testsuite of LSX and LASX is added in v2. (Because of the size
> > limitation of
> >   mail list, these patches are not shown)
> > - Adjust the loongarch_expand_vector_init() function to reduce
> > instruction 
> >   output amount.
> > - Some minor implementation changes of RTL templates.
> > 
> > v2 -> v3:
> > - Revert vabsd/xvabsd RTL templates to unspec impl.
> > - Resolve warning in gcc/config/loongarch/loongarch.cc when
> > bootstrapping 
> >   with BOOT_CFLAGS="-O2 -ftree-vectorize -fno-vect-cost-model -
> > mlasx".
> > - Remove redundant definitions in lasxintrin.h.
> > - Refine commit info.
> > 
> > Lulu Cheng (6):
> >   LoongArch: Add Loongson SX vector directive compilation
> > framework.
> >   LoongArch: Add Loongson SX base instruction support.
> >   LoongArch: Add Loongson SX directive builtin function support.
> >   LoongArch: Add Loongson ASX vector directive compilation
> > framework.
> >   LoongArch: Add Loongson ASX base instruction support.
> >   LoongArch: Add Loongson ASX directive builtin function support.
> > 
> >  gcc/config.gcc                                |    2 +-
> >  gcc/config/loongarch/constraints.md           |  131 +-
> >  .../loongarch/genopts/loongarch-strings       |    4 +
> >  gcc/config/loongarch/genopts/loongarch.opt.in |   12 +-
> >  gcc/config/loongarch/lasx.md                  | 5122
> > ++++++++++++++++
> >  gcc/config/loongarch/lasxintrin.h             | 5338
> > +++++++++++++++++
> >  gcc/config/loongarch/loongarch-builtins.cc    | 2686 ++++++++-
> >  gcc/config/loongarch/loongarch-c.cc           |   18 +
> >  gcc/config/loongarch/loongarch-def.c          |    6 +
> >  gcc/config/loongarch/loongarch-def.h          |    9 +-
> >  gcc/config/loongarch/loongarch-driver.cc      |   10 +
> >  gcc/config/loongarch/loongarch-driver.h       |    2 +
> >  gcc/config/loongarch/loongarch-ftypes.def     |  666 +-
> >  gcc/config/loongarch/loongarch-modes.def      |   39 +
> >  gcc/config/loongarch/loongarch-opts.cc        |   89 +-
> >  gcc/config/loongarch/loongarch-opts.h         |    3 +
> >  gcc/config/loongarch/loongarch-protos.h       |   35 +
> >  gcc/config/loongarch/loongarch-str.h          |    3 +
> >  gcc/config/loongarch/loongarch.cc             | 4586
> > +++++++++++++-
> >  gcc/config/loongarch/loongarch.h              |  117 +-
> >  gcc/config/loongarch/loongarch.md             |   56 +-
> >  gcc/config/loongarch/loongarch.opt            |   12 +-
> >  gcc/config/loongarch/lsx.md                   | 4481
> > ++++++++++++++
> >  gcc/config/loongarch/lsxintrin.h              | 5181
> > ++++++++++++++++
> >  gcc/config/loongarch/predicates.md            |  333 +-
> >  gcc/doc/md.texi                               |   11 +
> >  26 files changed, 28668 insertions(+), 284 deletions(-)
> >  create mode 100644 gcc/config/loongarch/lasx.md
> >  create mode 100644 gcc/config/loongarch/lasxintrin.h
> >  create mode 100644 gcc/config/loongarch/lsx.md
> >  create mode 100644 gcc/config/loongarch/lsxintrin.h
> > 
>
Xi Ruoyao Aug. 20, 2023, 8:25 a.m. UTC | #3
On Thu, 2023-08-17 at 15:20 +0800, Chenghui Pan wrote:
> Seems ARMv8-A only guarantees to preserve low 64-bit value of
> NEON/floating-point register value. I'm not sure that I modify the
> testcase in the right way and maybe we need more investigations. Any
> ideas or suggestion?

Sorry, the following sentence in GCC manual section 6.47.5.2 suggests my
test case is not valid:

"As with global register variables, it is recommended that you choose a
register that is normally saved and restored by function calls on your
machine, so that calls to library routines will not clobber it."

So when I use asm(name), the compiler has no obligation to guarantee
that it will ever work like a normal variable after a function call.

But I still need to verify that the compiler correctly understands only
the low 64 bits of the vector register is saved.  I'll try to make
another test case...
Lulu Cheng Oct. 19, 2023, 8 a.m. UTC | #4
在 2023/8/20 下午4:25, Xi Ruoyao 写道:
> On Thu, 2023-08-17 at 15:20 +0800, Chenghui Pan wrote:
>> Seems ARMv8-A only guarantees to preserve low 64-bit value of
>> NEON/floating-point register value. I'm not sure that I modify the
>> testcase in the right way and maybe we need more investigations. Any
>> ideas or suggestion?

Hi, Ruoyao:

The implementation of hook loongarch_hard_regno_call_part_clobbered 
results in all vector registers being caller saved registers.

So no data will be lost during the function call.

> Sorry, the following sentence in GCC manual section 6.47.5.2 suggests my
> test case is not valid:
>
> "As with global register variables, it is recommended that you choose a
> register that is normally saved and restored by function calls on your
> machine, so that calls to library routines will not clobber it."
>
> So when I use asm(name), the compiler has no obligation to guarantee
> that it will ever work like a normal variable after a function call.
>
> But I still need to verify that the compiler correctly understands only
> the low 64 bits of the vector register is saved.  I'll try to make
> another test case...
>