Message ID | 20190212230942.GQ2135@tucnak |
---|---|
State | New |
Headers | show |
Series | Fix up norm2 simplification (PR middle-end/88074) | expand |
Hi Jakub, > Bootstrapped/regtested on x86_64-linux and i686-linux (together with > Richard's patch), ok for trunk? A couple of points: with this patch, we will have an algorithm that will evaluate NORM2 in a different way than before, possibly leading to regressions in some cases where we previously generated good results and would return +/- Inf now. Also, with this patch, we would use a very different algorithm at run-time to compile-time, potentially leading to differences far exceeding a few ulps. I am not sure I like that idea. I'll look at the literature and generate some test cases, and come back to you. From curiosity, did you chose this algorithm from literature? Regards Thomas
On Wed, Feb 13, 2019 at 07:17:53PM +0100, Thomas Koenig wrote: > > Bootstrapped/regtested on x86_64-linux and i686-linux (together with > > Richard's patch), ok for trunk? > > A couple of points: > > with this patch, we will have an algorithm that will evaluate NORM2 > in a different way than before, possibly leading to regressions > in some cases where we previously generated good results and > would return +/- Inf now. I'm not aware of such a case, do you have examples what do you have in mind? The adjustments are only done by powers of two, the only reason I'm not using just mpfr_set_exp is to deal with the case where there are extreme differences in exponents, say 1.0e-8100L*1.0e-8100L+1.0e+16382L*1.0e+16382L, where because 1.0e+16382L*1.0e+16382L would overflow the very tiny temporary result needs to be divided by a very large power of 2 and so I want to let mpfr do the rounding. > Also, with this patch, we would use a very different algorithm > at run-time to compile-time, potentially leading to differences > far exceeding a few ulps. I am not sure I like that idea. The runtime code is already so much different that I don't see how this holds. And it is actually much worse, exactly because it doesn't scale just by powers of two and thus every scaling has a rounding error. > I'll look at the literature and generate some test cases, and > come back to you. > > From curiosity, did you chose this algorithm from literature? No. But, first of all, the patch should change nothing at all unless either at least one extremely large number (whose **2 is not representable in the kind or close to that) appears, or the array is so huge that the result is attacking the maximum. And, for the cases where it does change something, unless the differences in exponents between numbers are extreme, there should be again no rounding difference, the exponent of the numbers will simply change and nothing else. Jakub
On Wed, Feb 13, 2019 at 07:30:53PM +0100, Jakub Jelinek wrote: > On Wed, Feb 13, 2019 at 07:17:53PM +0100, Thomas Koenig wrote: > > > Bootstrapped/regtested on x86_64-linux and i686-linux (together with > > > Richard's patch), ok for trunk? > > > > A couple of points: > > > > with this patch, we will have an algorithm that will evaluate NORM2 > > in a different way than before, possibly leading to regressions > > in some cases where we previously generated good results and > > would return +/- Inf now. > > I'm not aware of such a case, do you have examples what do you have in mind? > The adjustments are only done by powers of two, the only reason I'm not > using just mpfr_set_exp is to deal with the case where there are extreme > differences in exponents, say 1.0e-8100L*1.0e-8100L+1.0e+16382L*1.0e+16382L, > where because 1.0e+16382L*1.0e+16382L would overflow the very tiny temporary > result needs to be divided by a very large power of 2 and so I want to let > mpfr do the rounding. > > > Also, with this patch, we would use a very different algorithm > > at run-time to compile-time, potentially leading to differences > > far exceeding a few ulps. I am not sure I like that idea. > > The runtime code is already so much different that I don't see how this > holds. And it is actually much worse, exactly because it doesn't scale just > by powers of two and thus every scaling has a rounding error. > > > I'll look at the literature and generate some test cases, and > > come back to you. > > > > From curiosity, did you chose this algorithm from literature? > > No. But, first of all, the patch should change nothing at all unless > either at least one extremely large number (whose **2 is not representable > in the kind or close to that) appears, or the array is so huge that the > result is attacking the maximum. And, for the cases where it does change > something, unless the differences in exponents between numbers are extreme, > there should be again no rounding difference, the exponent of the numbers > will simply change and nothing else. > I think we can agree that gfortran's norm2 needs some work. To get Richard's emin/emax patch into the tree so adequate testing can be done prior to the 9.1 release, I'll suggest that we XFAIL norm2_3.f90 while we work out the details for norm2. Yesteraday, I looked at Jakub's patch, and thought the approach was fine. I haven't had time to look at the library side. Perhaps, we can adopt Jakub's approach for the library as well.
On Wed, Feb 13, 2019 at 10:51:07AM -0800, Steve Kargl wrote: > To get Richard's emin/emax patch into the tree so adequate > testing can be done prior to the 9.1 release, I'll suggest > that we XFAIL norm2_3.f90 while we work out the details for > norm2. > > Yesteraday, I looked at Jakub's patch, and thought the approach > was fine. I haven't had time to look at the library side. > Perhaps, we can adopt Jakub's approach for the library as well. Yeah. I guess fastest would be to precompute as constants the two boundaries when we might want to scale something (of course different for each kind), then just compare the operand and/or result against those and only if they are above that do frexp* to determine exponent, (re)compute the scale variable (scalbn* ?) and then just multiply or divide by that (if division is much slower, we could have two scale variables, one for scaling down and one for scaling up). Jakub
Hi Jakub, I checked the patch together with Richard's (by which I assume you mean https://gcc.gnu.org/bugzilla/attachment.cgi?id=45052 ), and thinks looked good. So, the Fortran part of this is OK. However, we should really also do power-of-two scaling for the runtime method. Also, we seem to have a lot of issues with IEEE flags when calculating NORM2, this would also need to be addressed. Regards Thomas
On Sat, Feb 16, 2019 at 05:23:58PM +0100, Thomas Koenig wrote: > > Also, we seem to have a lot of issues with IEEE flags > when calculating NORM2, this would also need to be > addressed. > Which IEEE flags and are you referring using the Fortran modules or -ffpe-trap?
Hi Steve, > On Sat, Feb 16, 2019 at 05:23:58PM +0100, Thomas Koenig wrote: >> >> Also, we seem to have a lot of issues with IEEE flags >> when calculating NORM2, this would also need to be >> addressed. >> > > Which IEEE flags and are you referring using the > Fortran modules or -ffpe-trap? I checked out the software from that paper (thanks for the link), and gfortran ran up quite a lot of errors running the test suite even with the author's preferred version. That's what I meant. Regards Thomas
--- gcc/fortran/simplify.c.jj 2019-01-10 11:43:12.452409482 +0100 +++ gcc/fortran/simplify.c 2019-02-12 19:54:03.726526824 +0100 @@ -636,6 +636,9 @@ simplify_transformation_to_array (gfc_ex if (*src) *dest = op (*dest, gfc_copy_expr (*src)); + if (post_op) + *dest = post_op (*dest, *dest); + count[0]++; base += sstride[0]; dest += dstride[0]; @@ -671,10 +674,7 @@ simplify_transformation_to_array (gfc_ex result_ctor = gfc_constructor_first (result->value.constructor); for (i = 0; i < resultsize; ++i) { - if (post_op) - result_ctor->expr = post_op (result_ctor->expr, resultvec[i]); - else - result_ctor->expr = resultvec[i]; + result_ctor->expr = resultvec[i]; result_ctor = gfc_constructor_next (result_ctor); } @@ -6048,9 +6048,10 @@ gfc_simplify_idnint (gfc_expr *e) return simplify_nint ("IDNINT", e, NULL); } +static int norm2_scale; static gfc_expr * -add_squared (gfc_expr *result, gfc_expr *e) +norm2_add_squared (gfc_expr *result, gfc_expr *e) { mpfr_t tmp; @@ -6059,8 +6060,45 @@ add_squared (gfc_expr *result, gfc_expr && result->expr_type == EXPR_CONSTANT); gfc_set_model_kind (result->ts.kind); + int index = gfc_validate_kind (BT_REAL, result->ts.kind, false); + mpfr_exp_t exp; + if (mpfr_regular_p (result->value.real)) + { + exp = mpfr_get_exp (result->value.real); + /* If result is getting close to overflowing, scale down. */ + if (exp >= gfc_real_kinds[index].max_exponent - 4 + && norm2_scale <= gfc_real_kinds[index].max_exponent - 2) + { + norm2_scale += 2; + mpfr_div_ui (result->value.real, result->value.real, 16, + GFC_RND_MODE); + } + } + mpfr_init (tmp); - mpfr_pow_ui (tmp, e->value.real, 2, GFC_RND_MODE); + if (mpfr_regular_p (e->value.real)) + { + exp = mpfr_get_exp (e->value.real); + /* If e**2 would overflow or close to overflowing, scale down. */ + if (exp - norm2_scale >= gfc_real_kinds[index].max_exponent / 2 - 2) + { + int new_scale = gfc_real_kinds[index].max_exponent / 2 + 4; + mpfr_set_ui (tmp, 1, GFC_RND_MODE); + mpfr_set_exp (tmp, new_scale - norm2_scale); + mpfr_div (result->value.real, result->value.real, tmp, GFC_RND_MODE); + mpfr_div (result->value.real, result->value.real, tmp, GFC_RND_MODE); + norm2_scale = new_scale; + } + } + if (norm2_scale) + { + mpfr_set_ui (tmp, 1, GFC_RND_MODE); + mpfr_set_exp (tmp, norm2_scale); + mpfr_div (tmp, e->value.real, tmp, GFC_RND_MODE); + } + else + mpfr_set (tmp, e->value.real, GFC_RND_MODE); + mpfr_pow_ui (tmp, tmp, 2, GFC_RND_MODE); mpfr_add (result->value.real, result->value.real, tmp, GFC_RND_MODE); mpfr_clear (tmp); @@ -6070,14 +6108,26 @@ add_squared (gfc_expr *result, gfc_expr static gfc_expr * -do_sqrt (gfc_expr *result, gfc_expr *e) +norm2_do_sqrt (gfc_expr *result, gfc_expr *e) { gcc_assert (e->ts.type == BT_REAL && e->expr_type == EXPR_CONSTANT); gcc_assert (result->ts.type == BT_REAL && result->expr_type == EXPR_CONSTANT); - mpfr_set (result->value.real, e->value.real, GFC_RND_MODE); + if (result != e) + mpfr_set (result->value.real, e->value.real, GFC_RND_MODE); mpfr_sqrt (result->value.real, result->value.real, GFC_RND_MODE); + if (norm2_scale && mpfr_regular_p (result->value.real)) + { + mpfr_t tmp; + mpfr_init (tmp); + mpfr_set_ui (tmp, 1, GFC_RND_MODE); + mpfr_set_exp (tmp, norm2_scale); + mpfr_mul (result->value.real, result->value.real, tmp, GFC_RND_MODE); + mpfr_clear (tmp); + } + norm2_scale = 0; + return result; } @@ -6100,15 +6150,27 @@ gfc_simplify_norm2 (gfc_expr *e, gfc_exp if (size_zero) return result; + norm2_scale = 0; if (!dim || e->rank == 1) { result = simplify_transformation_to_scalar (result, e, NULL, - add_squared); + norm2_add_squared); mpfr_sqrt (result->value.real, result->value.real, GFC_RND_MODE); + if (norm2_scale && mpfr_regular_p (result->value.real)) + { + mpfr_t tmp; + mpfr_init (tmp); + mpfr_set_ui (tmp, 1, GFC_RND_MODE); + mpfr_set_exp (tmp, norm2_scale); + mpfr_mul (result->value.real, result->value.real, tmp, GFC_RND_MODE); + mpfr_clear (tmp); + } + norm2_scale = 0; } else result = simplify_transformation_to_array (result, e, dim, NULL, - add_squared, &do_sqrt); + norm2_add_squared, + norm2_do_sqrt); return result; }