diff mbox series

Fix up norm2 simplification (PR middle-end/88074)

Message ID 20190212230942.GQ2135@tucnak
State New
Headers show
Series Fix up norm2 simplification (PR middle-end/88074) | expand

Commit Message

Jakub Jelinek Feb. 12, 2019, 11:09 p.m. UTC
Hi!

As discussed recently on the mailing list, the norm2 simplification doesn't
work if we limit mpfr emin/emax to some values derived from maximum floating
exponents (and precision for denormals).

The following patch adjusts the computation, so that it is scaled down if
needed.  In particular, if any value in the array is so large that **2 will
overflow on it or will be very close to it, the scale down is set to
2**(max_exponent/2+4), and if the result during computation gets close to
overflowing, it is scaled down a little bit too.  The scaling is always done
using powers of two, operands by that and the sum by **2 of that, and at the
end it multiplies the sqrt back.  I had to change
simplify_transformation_to_array, so that post_op is done immediately after
finishing ops corresponding to that, so that there can be just one global
variable for the scale.  From my understanding of e.g. the libgfortran norm2
code where sqrt is done basically in this spot I hope it isn't possible that
the same *dest is updated multiple times with dest increments/decrements in
between.

Bootstrapped/regtested on x86_64-linux and i686-linux (together with
Richard's patch), ok for trunk?

2019-02-12  Jakub Jelinek  <jakub@redhat.com>

	PR middle-end/88074
	* simplify.c (simplify_transformation_to_array): Run post_op
	immediately after processing corresponding row, rather than at the
	end.
	(norm2_scale): New variable.
	(add_squared): Rename to ...
	(norm2_add_squared): ... this.  Scale down operand and/or result
	if needed.
	(do_sqrt): Rename to ...
	(norm2_do_sqrt): ... this.  Handle the result == e case.  Scale up
	result and clear norm2_scale.
	(gfc_simplify_norm2): Clear norm2_scale.  Change add_squared to
	norm2_add_squared and &do_sqrt to norm2_do_sqrt.  Scale up result
	and clear norm2_scale again.


	Jakub

Comments

Thomas Koenig Feb. 13, 2019, 6:17 p.m. UTC | #1
Hi Jakub,

> Bootstrapped/regtested on x86_64-linux and i686-linux (together with
> Richard's patch), ok for trunk?

A couple of points:

with this patch, we will have an algorithm that will evaluate NORM2
in a different way than before, possibly leading to regressions
in some cases where we previously generated good results and
would return +/- Inf now.

Also, with this patch, we would use a very different algorithm
at run-time to compile-time, potentially leading to differences
far exceeding a few ulps. I am not sure I like that idea.

I'll look at the literature and generate some test cases, and
come back to you.

 From curiosity, did you chose this algorithm from literature?

Regards

	Thomas
Jakub Jelinek Feb. 13, 2019, 6:30 p.m. UTC | #2
On Wed, Feb 13, 2019 at 07:17:53PM +0100, Thomas Koenig wrote:
> > Bootstrapped/regtested on x86_64-linux and i686-linux (together with
> > Richard's patch), ok for trunk?
> 
> A couple of points:
> 
> with this patch, we will have an algorithm that will evaluate NORM2
> in a different way than before, possibly leading to regressions
> in some cases where we previously generated good results and
> would return +/- Inf now.

I'm not aware of such a case, do you have examples what do you have in mind?
The adjustments are only done by powers of two, the only reason I'm not
using just mpfr_set_exp is to deal with the case where there are extreme
differences in exponents, say 1.0e-8100L*1.0e-8100L+1.0e+16382L*1.0e+16382L,
where because 1.0e+16382L*1.0e+16382L would overflow the very tiny temporary
result needs to be divided by a very large power of 2 and so I want to let
mpfr do the rounding.

> Also, with this patch, we would use a very different algorithm
> at run-time to compile-time, potentially leading to differences
> far exceeding a few ulps. I am not sure I like that idea.

The runtime code is already so much different that I don't see how this
holds.  And it is actually much worse, exactly because it doesn't scale just
by powers of two and thus every scaling has a rounding error.

> I'll look at the literature and generate some test cases, and
> come back to you.
> 
> From curiosity, did you chose this algorithm from literature?

No.  But, first of all, the patch should change nothing at all unless
either at least one extremely large number (whose **2 is not representable
in the kind or close to that) appears, or the array is so huge that the
result is attacking the maximum.  And, for the cases where it does change
something, unless the differences in exponents between numbers are extreme,
there should be again no rounding difference, the exponent of the numbers
will simply change and nothing else.

	Jakub
Steve Kargl Feb. 13, 2019, 6:51 p.m. UTC | #3
On Wed, Feb 13, 2019 at 07:30:53PM +0100, Jakub Jelinek wrote:
> On Wed, Feb 13, 2019 at 07:17:53PM +0100, Thomas Koenig wrote:
> > > Bootstrapped/regtested on x86_64-linux and i686-linux (together with
> > > Richard's patch), ok for trunk?
> > 
> > A couple of points:
> > 
> > with this patch, we will have an algorithm that will evaluate NORM2
> > in a different way than before, possibly leading to regressions
> > in some cases where we previously generated good results and
> > would return +/- Inf now.
> 
> I'm not aware of such a case, do you have examples what do you have in mind?
> The adjustments are only done by powers of two, the only reason I'm not
> using just mpfr_set_exp is to deal with the case where there are extreme
> differences in exponents, say 1.0e-8100L*1.0e-8100L+1.0e+16382L*1.0e+16382L,
> where because 1.0e+16382L*1.0e+16382L would overflow the very tiny temporary
> result needs to be divided by a very large power of 2 and so I want to let
> mpfr do the rounding.
> 
> > Also, with this patch, we would use a very different algorithm
> > at run-time to compile-time, potentially leading to differences
> > far exceeding a few ulps. I am not sure I like that idea.
> 
> The runtime code is already so much different that I don't see how this
> holds.  And it is actually much worse, exactly because it doesn't scale just
> by powers of two and thus every scaling has a rounding error.
> 
> > I'll look at the literature and generate some test cases, and
> > come back to you.
> > 
> > From curiosity, did you chose this algorithm from literature?
> 
> No.  But, first of all, the patch should change nothing at all unless
> either at least one extremely large number (whose **2 is not representable
> in the kind or close to that) appears, or the array is so huge that the
> result is attacking the maximum.  And, for the cases where it does change
> something, unless the differences in exponents between numbers are extreme,
> there should be again no rounding difference, the exponent of the numbers
> will simply change and nothing else.
> 

I think we can agree that gfortran's norm2 needs some work.

To get Richard's emin/emax patch into the tree so adequate
testing can be done prior to the 9.1 release, I'll suggest
that we XFAIL norm2_3.f90 while we work out the details for
norm2.

Yesteraday, I looked at Jakub's patch, and thought the approach
was fine.  I haven't had time to look at the library side.
Perhaps, we can adopt Jakub's approach for the library as well.
Jakub Jelinek Feb. 13, 2019, 6:58 p.m. UTC | #4
On Wed, Feb 13, 2019 at 10:51:07AM -0800, Steve Kargl wrote:
> To get Richard's emin/emax patch into the tree so adequate
> testing can be done prior to the 9.1 release, I'll suggest
> that we XFAIL norm2_3.f90 while we work out the details for
> norm2.
> 
> Yesteraday, I looked at Jakub's patch, and thought the approach
> was fine.  I haven't had time to look at the library side.
> Perhaps, we can adopt Jakub's approach for the library as well.

Yeah.  I guess fastest would be to precompute as constants the
two boundaries when we might want to scale something (of course different
for each kind), then just compare the operand and/or result against
those and only if they are above that do frexp* to determine exponent,
(re)compute the scale variable (scalbn* ?) and then just multiply or divide
by that (if division is much slower, we could have two scale variables, one
for scaling down and one for scaling up).

	Jakub
Thomas Koenig Feb. 16, 2019, 4:23 p.m. UTC | #5
Hi Jakub,

I checked the patch together with Richard's (by which I assume you
mean https://gcc.gnu.org/bugzilla/attachment.cgi?id=45052 ), and
thinks looked good.

So, the Fortran part of this is OK.

However, we should really also do power-of-two scaling
for the runtime method.

Also, we seem to have a lot of issues with IEEE flags
when calculating NORM2, this would also need to be
addressed.

Regards

	Thomas
Steve Kargl Feb. 16, 2019, 6:01 p.m. UTC | #6
On Sat, Feb 16, 2019 at 05:23:58PM +0100, Thomas Koenig wrote:
> 
> Also, we seem to have a lot of issues with IEEE flags
> when calculating NORM2, this would also need to be
> addressed.
> 

Which IEEE flags and are you referring using the
Fortran modules or -ffpe-trap?
Thomas Koenig Feb. 17, 2019, 5:30 p.m. UTC | #7
Hi Steve,

> On Sat, Feb 16, 2019 at 05:23:58PM +0100, Thomas Koenig wrote:
>>
>> Also, we seem to have a lot of issues with IEEE flags
>> when calculating NORM2, this would also need to be
>> addressed.
>>
> 
> Which IEEE flags and are you referring using the
> Fortran modules or -ffpe-trap?

I checked out the software from that paper (thanks for the link), and
gfortran ran up quite a lot of errors running the test suite even with
the author's preferred version.  That's what I meant.

Regards

	Thomas
diff mbox series

Patch

--- gcc/fortran/simplify.c.jj	2019-01-10 11:43:12.452409482 +0100
+++ gcc/fortran/simplify.c	2019-02-12 19:54:03.726526824 +0100
@@ -636,6 +636,9 @@  simplify_transformation_to_array (gfc_ex
 	if (*src)
 	  *dest = op (*dest, gfc_copy_expr (*src));
 
+      if (post_op)
+	*dest = post_op (*dest, *dest);
+
       count[0]++;
       base += sstride[0];
       dest += dstride[0];
@@ -671,10 +674,7 @@  simplify_transformation_to_array (gfc_ex
   result_ctor = gfc_constructor_first (result->value.constructor);
   for (i = 0; i < resultsize; ++i)
     {
-      if (post_op)
-	result_ctor->expr = post_op (result_ctor->expr, resultvec[i]);
-      else
-	result_ctor->expr = resultvec[i];
+      result_ctor->expr = resultvec[i];
       result_ctor = gfc_constructor_next (result_ctor);
     }
 
@@ -6048,9 +6048,10 @@  gfc_simplify_idnint (gfc_expr *e)
   return simplify_nint ("IDNINT", e, NULL);
 }
 
+static int norm2_scale;
 
 static gfc_expr *
-add_squared (gfc_expr *result, gfc_expr *e)
+norm2_add_squared (gfc_expr *result, gfc_expr *e)
 {
   mpfr_t tmp;
 
@@ -6059,8 +6060,45 @@  add_squared (gfc_expr *result, gfc_expr
 	      && result->expr_type == EXPR_CONSTANT);
 
   gfc_set_model_kind (result->ts.kind);
+  int index = gfc_validate_kind (BT_REAL, result->ts.kind, false);
+  mpfr_exp_t exp;
+  if (mpfr_regular_p (result->value.real))
+    {
+      exp = mpfr_get_exp (result->value.real);
+      /* If result is getting close to overflowing, scale down.  */
+      if (exp >= gfc_real_kinds[index].max_exponent - 4
+	  && norm2_scale <= gfc_real_kinds[index].max_exponent - 2)
+	{
+	  norm2_scale += 2;
+	  mpfr_div_ui (result->value.real, result->value.real, 16,
+		       GFC_RND_MODE);
+	}
+    }
+
   mpfr_init (tmp);
-  mpfr_pow_ui (tmp, e->value.real, 2, GFC_RND_MODE);
+  if (mpfr_regular_p (e->value.real))
+    {
+      exp = mpfr_get_exp (e->value.real);
+      /* If e**2 would overflow or close to overflowing, scale down.  */
+      if (exp - norm2_scale >= gfc_real_kinds[index].max_exponent / 2 - 2)
+	{
+	  int new_scale = gfc_real_kinds[index].max_exponent / 2 + 4;
+	  mpfr_set_ui (tmp, 1, GFC_RND_MODE);
+	  mpfr_set_exp (tmp, new_scale - norm2_scale);
+	  mpfr_div (result->value.real, result->value.real, tmp, GFC_RND_MODE);
+	  mpfr_div (result->value.real, result->value.real, tmp, GFC_RND_MODE);
+	  norm2_scale = new_scale;
+	}
+    }
+  if (norm2_scale)
+    {
+      mpfr_set_ui (tmp, 1, GFC_RND_MODE);
+      mpfr_set_exp (tmp, norm2_scale);
+      mpfr_div (tmp, e->value.real, tmp, GFC_RND_MODE);
+    }
+  else
+    mpfr_set (tmp, e->value.real, GFC_RND_MODE);
+  mpfr_pow_ui (tmp, tmp, 2, GFC_RND_MODE);
   mpfr_add (result->value.real, result->value.real, tmp,
 	    GFC_RND_MODE);
   mpfr_clear (tmp);
@@ -6070,14 +6108,26 @@  add_squared (gfc_expr *result, gfc_expr
 
 
 static gfc_expr *
-do_sqrt (gfc_expr *result, gfc_expr *e)
+norm2_do_sqrt (gfc_expr *result, gfc_expr *e)
 {
   gcc_assert (e->ts.type == BT_REAL && e->expr_type == EXPR_CONSTANT);
   gcc_assert (result->ts.type == BT_REAL
 	      && result->expr_type == EXPR_CONSTANT);
 
-  mpfr_set (result->value.real, e->value.real, GFC_RND_MODE);
+  if (result != e)
+    mpfr_set (result->value.real, e->value.real, GFC_RND_MODE);
   mpfr_sqrt (result->value.real, result->value.real, GFC_RND_MODE);
+  if (norm2_scale && mpfr_regular_p (result->value.real))
+    {
+      mpfr_t tmp;
+      mpfr_init (tmp);
+      mpfr_set_ui (tmp, 1, GFC_RND_MODE);
+      mpfr_set_exp (tmp, norm2_scale);
+      mpfr_mul (result->value.real, result->value.real, tmp, GFC_RND_MODE);
+      mpfr_clear (tmp);
+    }
+  norm2_scale = 0;
+
   return result;
 }
 
@@ -6100,15 +6150,27 @@  gfc_simplify_norm2 (gfc_expr *e, gfc_exp
   if (size_zero)
     return result;
 
+  norm2_scale = 0;
   if (!dim || e->rank == 1)
     {
       result = simplify_transformation_to_scalar (result, e, NULL,
-						  add_squared);
+						  norm2_add_squared);
       mpfr_sqrt (result->value.real, result->value.real, GFC_RND_MODE);
+      if (norm2_scale && mpfr_regular_p (result->value.real))
+	{
+	  mpfr_t tmp;
+	  mpfr_init (tmp);
+	  mpfr_set_ui (tmp, 1, GFC_RND_MODE);
+	  mpfr_set_exp (tmp, norm2_scale);
+	  mpfr_mul (result->value.real, result->value.real, tmp, GFC_RND_MODE);
+	  mpfr_clear (tmp);
+	}
+      norm2_scale = 0;
     }
   else
     result = simplify_transformation_to_array (result, e, dim, NULL,
-					       add_squared, &do_sqrt);
+					       norm2_add_squared,
+					       norm2_do_sqrt);
 
   return result;
 }