RFA: Add Epiphany port
diff mbox

Message ID 20111103140816.trwz063nj4ks4gos-nzlynne@webmail.spamcop.net
State New
Headers show

Commit Message

Joern Rennecke Nov. 3, 2011, 6:08 p.m. UTC
There are still a few score failures, mostly because the testsuite assumes
that structure padding does not happen, and due to aliasing between stack
and frame pointer based addresses.  I have patches for these issues, however,
I think they will be easier to discuss once the entire Epiphany toolchain is
submitted.
The binutils port is already in the sourceware.org CVS repository,
newlib/libgloss and gdb/simulator ports are to follow.
2011-11-03  Joern Rennecke <joern.rennecke@embecosm.com>

gcc:
	* config.gcc (epiphany-*-*): New architecture.
	(epiphany-*-elf): New configuration.
	* config/epiphany, common/config/epiphany : New directories.
	* doc/extend.texi (disinterrupt attribute): Add Epiphany.
	(interrupt attribute): Add Epiphany.
	(long_call, short_call attribute): Add Epiphany.
	(Epiphany Built-in Functions): New node.
	* doc/invoke.texi (Options): Add Epiphany options.
gcc/testsuite:
	* gcc.c-torture/execute/ieee/mul-subnormal-single-1.x:
	Disable test on Epiphany.
	* gcc.c-torture/execute/20101011-1.c: Disable test on Epiphany.
	* gcc.dg/stack-usage-1.c [__epiphany__] (SIZE): Define.
	* gcc.dg/pragma-pack-3.c: Disable test on Epiphany.
	* g++.dg/parse/pragma3.C: Likewise.
	* stackalign/builtin-apply-2.c (STACK_ARGUMENTS_SIZE): Define.
	(bar): Use it.
	* gcc.dg/weak/typeof-2.c [epiphany-*-*]: Add option -mshort-calls.
	* gcc.dg/tls/thr-cse-1.c: Likewise.
	* g++.dg/opt/devirt2.C: Likewise.
	* gcc.dg/20020312-2.c [epiphany-*-*] (PIC_REG): Define.
	* gcc.dg/builtin-apply2.c [__epiphany__]: (STACK_ARGUMENTS_SIZE): 20.
libgcc:
	* config.host (epiphany-*-elf*): New configuration.
	* config/epiphany: New Directory.

Comments

Andrew Pinski Nov. 3, 2011, 6:30 p.m. UTC | #1
On Thu, Nov 3, 2011 at 11:08 AM, Joern Rennecke <amylaar@spamcop.net> wrote:
> +@smallexample
> +float __builtin_epiphany_fmadd (float a, float b, float c) /* a + b * c */
> +float __builtin_epiphany_fmsub (float a, float b, float c) /* a - b * c */
> +@end smallexample

I don't think you need target specific builtins for these any more.
Also all your fma patterns are now incorrect.  You should use fma for the RTL.

Thanks,
Andrew Pinski
Joern Rennecke Nov. 3, 2011, 7:04 p.m. UTC | #2
Quoting Andrew Pinski <pinskia@gmail.com>:

> On Thu, Nov 3, 2011 at 11:08 AM, Joern Rennecke <amylaar@spamcop.net> wrote:
>> +@smallexample
>> +float __builtin_epiphany_fmadd (float a, float b, float c) /* a + b * c */
>> +float __builtin_epiphany_fmsub (float a, float b, float c) /* a - b * c */
>> +@end smallexample

> I don't think you need target specific builtins for these any more.

I could conceivably implement __builtin_epiphany_fmadd in a header file
using fma, reordering the operands, but that would only make the port
messier.  The semantics of fma are not documented in extend.texi.

What's more, __builtin_epiphany_fmsub is a different operation than fmssf4 -
it subtracts the product from the scalar, while fmssf4 subtracts the scalar
from the product.  Besides, there is no builtin for fmssf4 anyway.
So I need the builtin for __builtin_epiphany_fmsub .  Keeping the
existing __builtin_epiphany_fmadd is then simpler than to convert that
one intrinsic from a builtin into a macro.

> Also all your fma patterns are now incorrect.  You should use fma   
> for the RTL.

Why is that incorrect?  They still describe the operation.  And fms is the
wrong operation, anyway.  Changing only fmadd -> fma would break the symmetry
between the add and sub pattern inside the port, so I don't see that there
would be any net gain.

There is an fmasf4 expander, so the high-level optimizers can do their thing.
Richard Henderson Nov. 3, 2011, 7:58 p.m. UTC | #3
On 11/03/2011 12:04 PM, Joern Rennecke wrote:
> I could conceivably implement __builtin_epiphany_fmadd in a header file
> using fma, reordering the operands, but that would only make the port
> messier.  The semantics of fma are not documented in extend.texi.

Well, we managed to get the docs into rtl.texi and md.texi.
The builtin has the exact same semantics as the C99 function.

> What's more, __builtin_epiphany_fmsub is a different operation than fmssf4 -
> it subtracts the product from the scalar, while fmssf4 subtracts the scalar
> from the product.

See fnma for the named pattern.

> Besides, there is no builtin for fmssf4 anyway.

We look for all of the variants with extra negations.  Try 

  __builtin_fma (-a, b, c)
or
  __builtin_fma (a, -b, c)

either should generate fnma.

There are different examples of these sorts of combinations in the i386
and powerpc backends, since Intel and IBM picked a different set of 
variations.


r~

Patch
diff mbox

Index: libgcc/config.host
===================================================================
--- libgcc/config.host	(revision 180805)
+++ libgcc/config.host	(working copy)
@@ -441,6 +441,10 @@ 
 cris-*-linux* | crisv32-*-linux*)
 	tmake_file="$tmake_file cris/t-cris t-fdpbit cris/t-linux"
 	;;
+epiphany-*-elf*)
+	tmake_file="epiphany/t-epiphany t-fdpbit epiphany/t-custom-eqsf"
+	extra_parts="crti.o crtint.o crtrunc.o crtm1reg-r43.o crtm1reg-r63.o crtbegin.o crtend.o crtn.o"
+	;;
 fr30-*-elf)
 	tmake_file="$tmake_file fr30/t-fr30 t-fdpbit"
 	extra_parts="$extra_parts crti.o crtn.o"
Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 180805)
+++ gcc/doc/extend.texi	(working copy)
@@ -2192,7 +2192,7 @@  types (@pxref{Variable Attributes}, @pxr
 
 @item disinterrupt
 @cindex @code{disinterrupt} attribute
-On MeP targets, this attribute causes the compiler to emit
+On Epiphany and MeP targets, this attribute causes the compiler to emit
 instructions to disable interrupts for the duration of the given
 function.
 
@@ -2551,7 +2551,7 @@  void bar (void)
 
 @item interrupt
 @cindex interrupt handler functions
-Use this attribute on the ARM, AVR, M32C, M32R/D, m68k, MeP, MIPS,
+Use this attribute on the ARM, AVR, Epiphany, M32C, M32R/D, m68k, MeP, MIPS,
 RX and Xstormy16 ports to indicate that the specified function is an
 interrupt handler.  The compiler will generate function entry and exit
 sequences suitable for use in an interrupt handler when this attribute
@@ -2723,7 +2723,8 @@  least version 2.20.1), and GNU C library
 @item long_call/short_call
 @cindex indirect calls on ARM
 This attribute specifies how a particular function is called on
-ARM@.  Both attributes override the @option{-mlong-calls} (@pxref{ARM Options})
+ARM and Epiphany@.  Both attributes override the
+@option{-mlong-calls} (@pxref{ARM Options})
 command-line switch and @code{#pragma long_calls} settings.  The
 @code{long_call} attribute indicates that the function might be far
 away from the call site and require a different (more expensive)
@@ -8061,6 +8062,7 @@  @deftypefn {Built-in Function} int64_t _
 * ARM NEON Intrinsics::
 * AVR Built-in Functions::
 * Blackfin Built-in Functions::
+* Epiphany Built-in Functions::
 * FR-V Built-in Functions::
 * X86 Built-in Functions::
 * MIPS DSP Built-in Functions::
@@ -8364,6 +8366,16 @@  void __builtin_bfin_csync (void)
 void __builtin_bfin_ssync (void)
 @end smallexample
 
+@node  Epiphany Built-in Functions
+@subsection Epiphany Built-in Functions
+There are two Epiphany-specific built-in functions, which map on the
+single precision floating point mulriply-and-accumulate instructions:
+
+@smallexample
+float __builtin_epiphany_fmadd (float a, float b, float c) /* a + b * c */
+float __builtin_epiphany_fmsub (float a, float b, float c) /* a - b * c */
+@end smallexample
+
 @node FR-V Built-in Functions
 @subsection FR-V Built-in Functions
 
Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 180805)
+++ gcc/doc/invoke.texi	(working copy)
@@ -458,6 +458,14 @@  cpp(1), gcov(1), as(1), ld(1), gdb(1), a
 @c Try and put the significant identifier (CPU or system) first,
 @c so users have a clue at guessing where the ones they want will be.
 
+@emph{Adapteva Epiphany Options}
+@gccoptlist{-mhalf-reg-file -mprefer-short-insn-regs @gol
+-mbranch-cost=@var{num} -mcmove -mnops=@var{num} -msoft-cmpsf @gol
+-msplit-lohi -mpost-inc -mpost-modify -mstack-offset=@var{num} @gol
+-mround-nearest -mlong-calls -mshort-calls -msmall16 @gol
+-mfp-mode=@var{mode} -mvect-double -max-vect-align=@var{num} @gol
+-msplit-vecmove-early -m1reg-@var{reg}}
+
 @emph{ARM Options}
 @gccoptlist{-mapcs-frame  -mno-apcs-frame @gol
 -mabi=@var{name} @gol
@@ -10226,6 +10234,7 @@  finds any @option{-l} options and any no
 @c in Machine Dependent Options
 
 @menu
+* Adapteva Epiphany Options::
 * ARM Options::
 * AVR Options::
 * Blackfin Options::
@@ -10274,6 +10283,161 @@  finds any @option{-l} options and any no
 * zSeries Options::
 @end menu
 
+@node Adapteva Epiphany Options
+@subsection Adapteva Epiphany Options
+
+These @samp{-m} options are defined for Adapteva Epiphany:
+
+@table @gcctabopt
+@item -mhalf-reg-file
+@opindex mhalf-reg-file
+Don't allocate any register in the range @code{r32} .. @code{r63} .
+That allows code to run on hardware variants that lack these registers.
+
+@item -mprefer-short-insn-regs
+@opindex mprefer-short-insn-regs
+Preferrentially allocate registers that allow short instruction generation.
+This can result in increasesd instruction count, so if this reduces or
+increases code size might vary from case to case.
+
+@item -mbranch-cost=@var{num}
+@opindex mbranch-cost
+Set the cost of branches to roughly @var{num} ``simple'' instructions.
+This cost is only a heuristic and is not guaranteed to produce
+consistent results across releases.
+
+@item -mcmove
+@opindex mcmove
+Enable the generation of conditional moves.
+
+@item -mnops=@var{num}
+@opindex mnops
+Emit @var{num} nops before every other generated instruction.
+
+@item -mno-soft-cmpsf
+@opindex mno-soft-cmpsf
+For single-precision floating point comparisons, emit an fsub instruction
+and test the flags.  This is faster than a software comparison, but can
+get incorrect results in the presence of NaNs, or when two different small
+numbers are compared such that their difference is calculated as zero.
+The default is @option{-msoft-cmpsf}, which uses slower, but IEEE-compliant,
+software comparisons.
+
+@item -mstack-offse@var{num}
+@opindex mstack-offset
+Set the offset between the top of the stack and the stack pointer.
+E.g. a value of 8 means that the eight bytes in the range sp+0.. sp+7
+can be used by leaf functions without stack allocation.
+Values other than @samp{8} or @samp{16} are untested and unlikely to work.
+Note also that this option changes the ABI, compiling a program with a
+different stack offset than the libraries have been compiled with
+will generally not work.
+This option can be useful if you want to evaluate if a different stack
+offset would give you better code, but to actually use a different stack
+offset to build working programs, it is recommended to configure the
+toolchain with the appropriate @samp{--with-stack-offset=@var{num}} option.
+
+@item -mno-round-nearest
+@opindex mno-round-nearest
+Make the scheduler assume that the rounding mode has been set to
+truncating.  The default is @option{-mround-nearest}.
+
+@item -mlong-calls
+@opindex mlong-calls
+If not otherwise specified by an attribute, assume all calls might be beyond
+the offset range of the b / bl instructions, and therefore load the
+function address into a register before performing a (otherwise direct) call.
+This is the default.
+
+@item -mshort-calls
+@opindex short-calls
+If not otherwise specified by an attribute, assume all direct calls are
+in the range of the b / bl instructions, so use these instructions
+for direct calls.  The default is @option{-mlong-calls}.
+
+@item -msmall16
+@opindex msmall16
+Assume addresses can be loaded as 16 bit unsigned values.  This does not
+apply to function addresses for which @option{-mlong-calls} semantics
+are in effect.
+
+@item -mfp-mode=@var{mode}
+@opindex mfp-mode
+Set the prevailing mode of the floating point unit.
+This determines the floating point mode that is provided and expected
+at function call and return time.  Making this mode match the mode you
+predominantly need at function start can make your programs smaller and
+faster by avoiding unnecessary mode switches.
+
+@var{mode} can be set to one the following values:
+
+@table @samp
+@item caller
+Any mode at function entry is valid, and retained or restored when
+the function returns, and when it calls other functions.
+This mode is useful for compiling libraries or other compilation units
+you might want to incorporate into different programs with different
+prevailing FPU modes, and the convenience of being able to use a single
+object file outweighs the size and speed overhead for any extra
+mode switching that might be needed, compared with what would be needed
+with a more specific choice of prevailing FPU mode.
+
+@item truncate
+This is the mode used for floating point calculations with
+truncating (i.e. round tgowards zero) rounding mode.  That includes
+conversion from floating point to integer.
+
+@item round-nearest
+This is the mode used for floating point calculations with
+round-to-nearest-or-even rounding mode.
+
+@item int
+This is the mode used to perform integer calculations in the FPU, e.g.
+integer multiply, or integer multiply-and-accumulate.
+@end table
+
+The default is @option{-mfp-mode=caller}
+
+@item -mnosplit-lohi
+@opindex mnosplit-lohi
+@item -mno-postinc
+@opindex mno-postinc
+@item -mno-postmodify
+@opindex mno-postmodify
+Code generation tweaks that disable, respectively, splitting of 32
+bit loads, generation of post-increment addresses, and generation of
+post-modify addresses.  The defaults are @option{msplit-lohi},
+@option{-mpost-inc}, and @option{-mpost-modify}.
+
+@item -mnovect-double
+@opindex mno-vect-double
+Change the preferred SIMD mode to SImode.  The default is
+@option{-mvect-double}, which uses DImode as preferred SIMD mode.
+
+@item -max-vect-align=@var{num}
+@opindex max-vect-align
+The maximum alignment for SIMD vector mode types.
+@var{num} may be 4 or 8.  The default is 8.
+Note that this is an ABI change, even though many library function
+interfaces will be unaffected, if they don't use SIMD vector modes
+in places where they affect size and/or alignment of relevant types.
+
+@item -msplit-vecmove-early
+@opindex msplit-vecmove-early
+Split vector moves into single word moves before reload.  In theory this
+could give better register allocation, but so far the reverse seems to be
+generally the case.
+
+@item -m1reg-@var{reg}
+@opindex m1reg-
+Specify a register to hold the constant -1, which makes loading small negative
+constants and certain bitmasks faster.
+Allowable values for reg are r43 and r63, which specify to use that register
+as a fixed register, and none, which means that no register is used for this
+purpose.  The default is @option{-m1reg-none}.
+
+@end table
+
 @node ARM Options
 @subsection ARM Options
 @cindex ARM options
Index: gcc/testsuite/gcc.c-torture/execute/ieee/mul-subnormal-single-1.x
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/ieee/mul-subnormal-single-1.x	(revision 180805)
+++ gcc/testsuite/gcc.c-torture/execute/ieee/mul-subnormal-single-1.x	(working copy)
@@ -1,3 +1,8 @@ 
+if [istarget "epiphany-*-*"] {
+    # The Epiphany single-precision floating point format does not
+    # support subnormals.
+    return 1
+}
 if [istarget "mips-sgi-irix6*"] {
     # IRIX 6 sets the MIPS IV flush to zero bit by default, so this test
     # isn't expected to work for n32 and n64 on MIPS IV targets.
Index: gcc/testsuite/gcc.c-torture/execute/20101011-1.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/20101011-1.c	(revision 180805)
+++ gcc/testsuite/gcc.c-torture/execute/20101011-1.c	(working copy)
@@ -28,6 +28,10 @@ 
   /* Not all Linux kernels deal correctly the breakpoints generated by
      MIPS16 divisions by zero.  They show up as a SIGTRAP instead.  */
 # define DO_TEST 0
+#elif defined (__epiphany__)
+  /* Epiphany does not have hardware division, and the software implementation
+     has truly undefined behaviour for division by 0.  */
+# define DO_TEST 0
 #else
 # define DO_TEST 1
 #endif
Index: gcc/testsuite/gcc.dg/stack-usage-1.c
===================================================================
--- gcc/testsuite/gcc.dg/stack-usage-1.c	(revision 180805)
+++ gcc/testsuite/gcc.dg/stack-usage-1.c	(working copy)
@@ -52,6 +52,8 @@ 
 #  define SIZE 160 /* 256 -  96 bytes for register save area */
 #elif defined (__SPU__)
 #  define SIZE 224
+#elif defined (__epiphany__)
+#  define SIZE (256 - __EPIPHANY_STACK_OFFSET__)
 #else
 #  define SIZE 256
 #endif
Index: gcc/testsuite/gcc.dg/pragma-pack-3.c
===================================================================
--- gcc/testsuite/gcc.dg/pragma-pack-3.c	(revision 180805)
+++ gcc/testsuite/gcc.dg/pragma-pack-3.c	(working copy)
@@ -1,6 +1,6 @@ 
 /* PR c++/25294 */
 /* { dg-options "-std=gnu99" } */
-/* { dg-do run } */
+/* { dg-do run { target { ! epiphany-*-* } } } */
 
 extern void abort (void);
 
Index: gcc/testsuite/gcc.dg/torture/stackalign/builtin-apply-2.c
===================================================================
--- gcc/testsuite/gcc.dg/torture/stackalign/builtin-apply-2.c	(revision 180805)
+++ gcc/testsuite/gcc.dg/torture/stackalign/builtin-apply-2.c	(working copy)
@@ -9,6 +9,15 @@ 
 
 #define INTEGER_ARG  5
 
+#if defined(__ARM_PCS) || defined(__epiphany__)
+/* For Base AAPCS, NAME is passed in r0.  D is passed in r2 and r3.
+   E, F and G are passed on stack.  So the size of the stack argument
+   data is 20.  */
+#define STACK_ARGUMENTS_SIZE  20
+#else
+#define STACK_ARGUMENTS_SIZE  64
+#endif
+
 extern void abort(void);
 
 void foo(char *name, double d, double e, double f, int g)
@@ -19,7 +28,7 @@  void foo(char *name, double d, double e,
 
 void bar(char *name, ...)
 {
-  __builtin_apply(foo, __builtin_apply_args(), 64);
+  __builtin_apply(foo, __builtin_apply_args(), STACK_ARGUMENTS_SIZE);
 }
 
 int main(void)
Index: gcc/testsuite/gcc.dg/weak/typeof-2.c
===================================================================
--- gcc/testsuite/gcc.dg/weak/typeof-2.c	(revision 180805)
+++ gcc/testsuite/gcc.dg/weak/typeof-2.c	(working copy)
@@ -5,6 +5,7 @@ 
 /* { dg-require-weak "" } */
 /* { dg-require-alias "" } */
 /* { dg-options "-O2" } */
+/* { dg-options "-O2 -mshort-calls" { target epiphany-*-* } } */
 
 extern int foo1 (int x) __asm ("baz1");
 int bar1 (int x) { return x; }
Index: gcc/testsuite/gcc.dg/tls/thr-cse-1.c
===================================================================
--- gcc/testsuite/gcc.dg/tls/thr-cse-1.c	(revision 180805)
+++ gcc/testsuite/gcc.dg/tls/thr-cse-1.c	(working copy)
@@ -1,5 +1,6 @@ 
 /* { dg-do compile } */
 /* { dg-options "-O1" } */
+/* { dg-options "-O1 -mshort-calls" { target epiphany-*-* } } */
 /* { dg-require-effective-target tls_emulated } */
 
 /* Test that we only get one call to emutls_get_address when CSE is
Index: gcc/testsuite/gcc.dg/20020312-2.c
===================================================================
--- gcc/testsuite/gcc.dg/20020312-2.c	(revision 180805)
+++ gcc/testsuite/gcc.dg/20020312-2.c	(working copy)
@@ -20,6 +20,8 @@  extern void abort (void);
 /* No pic register.  */
 #elif defined(__cris__)
 # define PIC_REG  "0"
+#elif defined(__epiphany__)
+#define PIC_REG "r28"
 #elif defined(__fr30__)
 /* No pic register.  */
 #elif defined(__H8300__) || defined(__H8300H__) || defined(__H8300S__)
Index: gcc/testsuite/gcc.dg/builtin-apply2.c
===================================================================
--- gcc/testsuite/gcc.dg/builtin-apply2.c	(revision 180805)
+++ gcc/testsuite/gcc.dg/builtin-apply2.c	(working copy)
@@ -12,7 +12,7 @@ 
 
 #define INTEGER_ARG  5
 
-#ifdef __ARM_PCS
+#if defined(__ARM_PCS) || defined(__epiphany__)
 /* For Base AAPCS, NAME is passed in r0.  D is passed in r2 and r3.
    E, F and G are passed on stack.  So the size of the stack argument
    data is 20.  */
Index: gcc/testsuite/g++.dg/opt/devirt2.C
===================================================================
--- gcc/testsuite/g++.dg/opt/devirt2.C	(revision 180805)
+++ gcc/testsuite/g++.dg/opt/devirt2.C	(working copy)
@@ -1,5 +1,6 @@ 
 // { dg-do compile }
 // { dg-options "-O2" }
+// { dg-options "-O2 -mshort-calls" {target epiphany-*-*} }
 // { dg-final { scan-assembler-times "xyzzy" 2 { target { ! { alpha*-*-* hppa*-*-* ia64*-*-hpux* sparc*-*-* } } } } }
 // The IA64 and HPPA compilers generate external declarations in addition
 // to the call so those scans need to be more specific.
Index: gcc/testsuite/g++.dg/parse/pragma3.C
===================================================================
--- gcc/testsuite/g++.dg/parse/pragma3.C	(revision 180805)
+++ gcc/testsuite/g++.dg/parse/pragma3.C	(working copy)
@@ -1,5 +1,5 @@ 
 // PR c++/25294
-// { dg-do run }
+// { dg-do run { target { ! epiphany-*-* } } }
 
 extern "C" void abort (void);
 
Index: gcc/config.gcc
===================================================================
--- gcc/config.gcc	(revision 180805)
+++ gcc/config.gcc	(working copy)
@@ -327,6 +327,9 @@ 
 crisv32-*)
 	cpu_type=cris
 	;;
+epiphany-*-*)
+	cpu_type=epiphany
+	;;
 frv*)	cpu_type=frv
 	extra_options="${extra_options} g.opt"
 	;;
@@ -965,6 +968,13 @@ 
 		;;
 	esac
 	;;
+epiphany-*-elf )
+	tm_file="dbxelf.h elfos.h newlib-stdint.h ${tm_file}"
+	tmake_file="epiphany/t-epiphany"
+	extra_options="${extra_options} fused-madd.opt"
+	extra_objs="$extra_objs mode-switch-use.o resolve-sw-modes.o"
+	tm_defines="${tm_defines} EPIPHANY_STACK_OFFSET=${with_stack_offset:-8}"
+	;;
 fr30-*-elf)
 	tm_file="dbxelf.h elfos.h newlib-stdint.h ${tm_file}"
 	;;