diff mbox

[ARC] Handle FPX NaN within optimized floating point library.

Message ID 1461924998-9190-1-git-send-email-claziss@synopsys.com
State New
Headers show

Commit Message

Claudiu Zissulescu April 29, 2016, 10:16 a.m. UTC
This is the updated patch on handling FPX NaNs.

Ok to apply?
Claudiu


gcc/
2016-04-18  Claudiu Zissulescu  <claziss@synopsys.com>

	* testsuite/gcc.target/arc/ieee_eq.c: New test.

libgcc/
2016-04-18  Claudiu Zissulescu  <claziss@synopsys.com>

	* config/arc/ieee-754/eqdf2.S: Handle FPX NaN.
---
 gcc/testsuite/gcc.target/arc/ieee_eq.c | 47 ++++++++++++++++++++++++++++++++++
 libgcc/config/arc/ieee-754/eqdf2.S     | 15 +++++++----
 2 files changed, 57 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arc/ieee_eq.c

Comments

Joern Wolfgang Rennecke April 29, 2016, 10:22 a.m. UTC | #1
On 29/04/16 11:16, Claudiu Zissulescu wrote:
> This is the updated patch on handling FPX NaNs.
>
> Ok to apply?
> Claudiu
>
>
  OK.
Joern Wolfgang Rennecke April 29, 2016, 10:27 a.m. UTC | #2
P.S.: the .d suffix on the branch was there just for scheduling purposes -
not sure if that actually helped any chip's pipeline, or if it was just 
a bug
in the documentation.
Claudiu Zissulescu April 29, 2016, 10:31 a.m. UTC | #3
It should do the job, at least for EM where the jump takes 2 cycle, and by means of using delay slots we can make all the cycles count. HS has a branch prediction mechanism, hence, filling up the delay slot doesn't have such a big impact like in EM or even earlier cpus.

//Claudiu

> -----Original Message-----
> From: Joern Wolfgang Rennecke [mailto:gnu@amylaar.uk]
> Sent: Friday, April 29, 2016 12:27 PM
> To: Claudiu Zissulescu; gcc-patches@gcc.gnu.org
> Cc: Francois.Bedard@synopsys.com; jeremy.bennett@embecosm.com
> Subject: Re: [PATCH] [ARC] Handle FPX NaN within optimized floating point
> library.
> 
> P.S.: the .d suffix on the branch was there just for scheduling purposes -
> not sure if that actually helped any chip's pipeline, or if it was just
> a bug
> in the documentation.
Joern Wolfgang Rennecke April 29, 2016, 10:37 a.m. UTC | #4
On 29/04/16 11:31, Claudiu Zissulescu wrote:
> It should do the job, at least for EM where the jump takes 2 cycle, and by means of using delay slots we can make all the cycles count. HS has a branch prediction mechanism, hence, filling up the delay slot doesn't have such a big impact like in EM or even earlier cpus.
No, the alternative is to hide the delay slot, so if the branch is 
predicted properly, the case with
different high words should be faster without the .d suffix.

I.e. , eagerly filling the delay slot like this has a bigger - negative 
- impact on performance.
Claudiu Zissulescu April 29, 2016, 10:47 a.m. UTC | #5
> > It should do the job, at least for EM where the jump takes 2 cycle, and by
> means of using delay slots we can make all the cycles count. HS has a branch
> prediction mechanism, hence, filling up the delay slot doesn't have such a big
> impact like in EM or even earlier cpus.
> No, the alternative is to hide the delay slot, so if the branch is
> predicted properly, the case with
> different high words should be faster without the .d suffix.
> 
> I.e. , eagerly filling the delay slot like this has a bigger - negative
> - impact on performance.


If we talking about HS, then we can add another flag 'T' which should instruct the branch prediction that we expect this branch to be taken. However, I haven't seen any impact of this flag on the code, and the compiler generates this. In general, the HS branch prediction has some particularities. Although what you say makes perfect sense, I am almost sure it doesn't apply in the case of HS because of the way how it is implemented. But this is a good point, I will try to keep it in mind and ask the hw guys what is best.

//Claudiu
diff mbox

Patch

diff --git a/gcc/testsuite/gcc.target/arc/ieee_eq.c b/gcc/testsuite/gcc.target/arc/ieee_eq.c
new file mode 100644
index 0000000..70aebad
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arc/ieee_eq.c
@@ -0,0 +1,47 @@ 
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include <stdio.h>
+#include <float.h>
+
+#define TEST_EQ(TYPE,X,Y,RES)				\
+  do {							\
+    volatile TYPE a, b;					\
+    a = (TYPE) X;					\
+    b = (TYPE) Y;					\
+    if ((a == b) != RES)				\
+      {							\
+	printf ("Runtime computation error @%d. %g "	\
+		"!= %g\n", __LINE__, a, b);		\
+	error = 1;					\
+      }							\
+  } while (0)
+
+#ifndef __HS__
+/* Special type of NaN found when using double FPX instructions.  */
+static const unsigned long long __nan = 0x7FF0000080000000ULL;
+# define W (*(double *) &__nan)
+#else
+# define W __builtin_nan ("")
+#endif
+
+#define Q __builtin_nan ("")
+#define H __builtin_inf ()
+
+int main (void)
+{
+  int error = 0;
+
+  TEST_EQ (double, 1, 1, 1);
+  TEST_EQ (double, 1, 2, 0);
+  TEST_EQ (double, W, W, 0);
+  TEST_EQ (double, Q, Q, 0);
+  TEST_EQ (double, __DBL_MAX__, __DBL_MAX__, 1);
+  TEST_EQ (double, __DBL_MIN__, __DBL_MIN__, 1);
+  TEST_EQ (double, H, H, 1);
+
+  if (error)
+    __builtin_abort ();
+
+  return 0;
+}
diff --git a/libgcc/config/arc/ieee-754/eqdf2.S b/libgcc/config/arc/ieee-754/eqdf2.S
index bc7d88e..7e80ef5 100644
--- a/libgcc/config/arc/ieee-754/eqdf2.S
+++ b/libgcc/config/arc/ieee-754/eqdf2.S
@@ -58,11 +58,16 @@  __eqdf2:
 	   well predictable (as seen from the branch predictor).  */
 __eqdf2:
 	brne.d DBL0H,DBL1H,.Lhighdiff
-	bmsk r12,DBL0H,20
-#ifdef DPFP_COMPAT
-	or.f 0,DBL0L,DBL1L
-	bset.ne r12,r12,21
-#endif /* DPFP_COMPAT */
+#ifndef __HS__
+	/* The next two instructions are required to recognize the FPX
+	NaN, which has a pattern like this: 0x7ff0_0000_8000_0000, as
+	oposite to 0x7ff8_0000_0000_0000.  */
+	or.f    0,DBL0L,DBL1L
+	mov_s	r12,0x00200000
+	bset.ne r12,r12,0
+#else
+	bmsk    r12,DBL0H,20
+#endif /* __HS__ */
 	add1.f	r12,r12,DBL0H /* set c iff NaN; also, clear z if NaN.  */
 	j_s.d	[blink]
 	cmp.cc	DBL0L,DBL1L