Patchwork [libfortran] PR 52434/48878/38199 Improve floating point formatted writes

login
register
mail settings
Submitter Janne Blomqvist
Date March 8, 2012, 10:40 p.m.
Message ID <CAO9iq9FAJWxk3AEuBJt1CEjWEwP+Dw5=COjZ1+n8yX17g04Rhw@mail.gmail.com>
Download mbox | patch
Permalink /patch/145634/
State New
Headers show

Comments

Janne Blomqvist - March 8, 2012, 10:40 p.m.
Hi,

the attached patch implements some improvements for formatted writes
of floating point values. Currently libgfortran uses snprintf() to
fill a buffer with a fixed amount of digits, regardless of the digits
required, and then rounding or zero extending as needed. The patch
changes this to first figure out the number of digits needed, then
using snprintf() to fill the buffer with exactly the needed amount of
digits (except when a ROUND= specifier is used, in that case more
digits are generated). This has a few advantages:

- As the performance cost of snprintf() increases roughly linearly
with more digits (see
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38199#c13 ), we needlessly
waste cycles when we don't need all the digits. The attached
write-many.f90 benchmark is almost twice as fast; best of five timings
with current trunk:

real    0m2.055s

with patch:

real    0m1.268s


- Although we generate enough digits to be able to get back the
original value exactly when reading, there are some (valid?) usecases
where more digits are needed. Exhibit 1 is dyadic.f90 which prints a
couple dyadic fractions which are exact both in binary and character
representations. With current trunk:

$ ./dyadic_f
.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000004940656458412465441765687928682213723650600000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
.99999999999999988897769753748434595763683000000000000

With patch:

$ ./dyadic_f
.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000494065645841246544176568792868221372365059802614324
76442558568250067550727020875186529983636163599237979656469544571773092665671035593979639877479601078187812630071319031140452
78458171678489821036887186360569987307230500063874091535649843873124733972731696151400317153853980741262385655911710266585566
86768187039560310624931945271591492455329305456544401127480129709999541931989409080416563324524757147869014726780159355238611
55013480352649347201937902681071074917033322268447533357208324319360923828934583680601060115061698097530783422773183292479049
82524730776375927247874656084778203734469699533647017972677717585125660551199131504891101451037862738167250955837389733598993
664809941164205702637090279242767544565229087538682506419718265533447265625
.99999999999999988897769753748434595763683319091796875

For another example, Bob Corbett recently posted boz_corbett.f90 on
c.l.f. With trunk:

$ ./boz_corbett
.67242062862241870125253556346435045576786000...E-4931

where "..." is a very long string of zeros. With patch:

$ ./boz_corbett
.672420628622418701252535563464350455767864674589043138777375863685283014882452413061808444374167...E-4931

where "..." is a very long string of digits.

- With the patch, for the vast majority of Fortran programs which
don't use the F2003 ROUND= specifier, we let snprintf() do the
rounding (as we generate the correct number of digits to begin with).
Thus isolating the user from potential bugs in the rounding logic in
libgfortran, as well as the performance impacts of that same code.

Note that GFortran previously did roundTiesToAway, whereas at least
glibc snprintf() does roundTiesToEven, so some change in behavior is
expected. Anyway, this is IMHO for the better, as it avoids biasing
the magnitude upwards in some common cases. Also, roundTiesToAway is
the default rounding mode for arithmetic.

Regtested on x86_64-unknown-linux-gnu, Ok for trunk?

2012-03-09  Janne Blomqvist  <jb@gcc.gnu.org>

	PR libfortran/52434
	PR libfortran/48878
	PR libfortran/38199
	* io/unit.c (get_internal_unit): Default to ROUND_UNSPECIFIED.
	(init_units): Likewise.
	* io/write_float.def (determine_precision): New function.
	(output_float): Take into account buffer with %f format, no need
	for our own rounding if unspecified or processor specified
	rounding.
	(DTOA): Simplify format string, add parameters.
	(FDTOA): New macros similar to DTOA, but using %f format.
	(OUTPUT_FLOAT_FMT_G): Stack allocate newf, determine correct
	precision and fill buffer.
	(EN_PREC): New macro.
	(determine_en_precision): New function.
	(WRITE_FLOAT): For G format, move buffer filling into
	output_float_FMT_G, use FDTOA for F format.
	(write_float): Increase buffer due to F format.


testsuite ChangeLog:

2012-03-09  Janne Blomqvist  <jb@gcc.gnu.org>

	PR libfortran/52434
	PR libfortran/48878
	PR libfortran/38199
	* gfortran.dg/edit_real_1.f90: Don't assume roundTiesToAway.
	* gfortran.dg/round_1.f03: Likewise.
Janne Blomqvist - March 9, 2012, 10:01 a.m.
On Fri, Mar 9, 2012 at 00:40, Janne Blomqvist <blomqvist.janne@gmail.com> wrote:
> Note that GFortran previously did roundTiesToAway, whereas at least
> glibc snprintf() does roundTiesToEven, so some change in behavior is
> expected. Anyway, this is IMHO for the better, as it avoids biasing
> the magnitude upwards in some common cases. Also, roundTiesToAway is
> the default rounding mode for arithmetic.

Err, roundTiestToEven is the default, that is.

Also, the patch contains a bunch of commented out printf's for
debugging. I will obviously remove those before committing, so please
disregard those when reviewing. Thanks!
jerry DeLisle - March 14, 2012, 11:52 p.m.
On 03/08/2012 05:40 PM, Janne Blomqvist wrote:
> Hi,
>
> the attached patch implements some improvements for formatted writes
> of floating point values. Currently libgfortran uses snprintf() to
> fill a buffer with a fixed amount of digits, regardless of the digits
> required, and then rounding or zero extending as needed. The patch
> changes this to first figure out the number of digits needed, then
> using snprintf() to fill the buffer with exactly the needed amount of
> digits (except when a ROUND= specifier is used, in that case more
> digits are generated). This has a few advantages:
>
> - As the performance cost of snprintf() increases roughly linearly
> with more digits (see
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38199#c13 ), we needlessly
> waste cycles when we don't need all the digits. The attached
> write-many.f90 benchmark is almost twice as fast; best of five timings
> with current trunk:
>
> real    0m2.055s
>
> with patch:
>
> real    0m1.268s
>
>
> - Although we generate enough digits to be able to get back the
> original value exactly when reading, there are some (valid?) usecases
> where more digits are needed. Exhibit 1 is dyadic.f90 which prints a
> couple dyadic fractions which are exact both in binary and character
> representations. With current trunk:
>
> $ ./dyadic_f
> .000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000004940656458412465441765687928682213723650600000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000
> .99999999999999988897769753748434595763683000000000000
>
> With patch:
>
> $ ./dyadic_f
> .0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
> 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
> 00000000000000000000000000000000000000000000000000000000000000000000000000494065645841246544176568792868221372365059802614324
> 76442558568250067550727020875186529983636163599237979656469544571773092665671035593979639877479601078187812630071319031140452
> 78458171678489821036887186360569987307230500063874091535649843873124733972731696151400317153853980741262385655911710266585566
> 86768187039560310624931945271591492455329305456544401127480129709999541931989409080416563324524757147869014726780159355238611
> 55013480352649347201937902681071074917033322268447533357208324319360923828934583680601060115061698097530783422773183292479049
> 82524730776375927247874656084778203734469699533647017972677717585125660551199131504891101451037862738167250955837389733598993
> 664809941164205702637090279242767544565229087538682506419718265533447265625
> .99999999999999988897769753748434595763683319091796875
>
> For another example, Bob Corbett recently posted boz_corbett.f90 on
> c.l.f. With trunk:
>
> $ ./boz_corbett
> .67242062862241870125253556346435045576786000...E-4931
>
> where "..." is a very long string of zeros. With patch:
>
> $ ./boz_corbett
> .672420628622418701252535563464350455767864674589043138777375863685283014882452413061808444374167...E-4931
>
> where "..." is a very long string of digits.
>
> - With the patch, for the vast majority of Fortran programs which
> don't use the F2003 ROUND= specifier, we let snprintf() do the
> rounding (as we generate the correct number of digits to begin with).
> Thus isolating the user from potential bugs in the rounding logic in
> libgfortran, as well as the performance impacts of that same code.
>
> Note that GFortran previously did roundTiesToAway, whereas at least
> glibc snprintf() does roundTiesToEven, so some change in behavior is
> expected. Anyway, this is IMHO for the better, as it avoids biasing
> the magnitude upwards in some common cases. Also, roundTiesToAway is
> the default rounding mode for arithmetic.
>
> Regtested on x86_64-unknown-linux-gnu, Ok for trunk?

I like the idea behind this patch.  I confess, I have not studied the two test 
cases that you are modifying, but the changes seem to stick out with too many 
digits there.  Is this really correct?
When I get another moment, I will look closer. Maybe you can explain the need 
for this change a little more.  Sorry if I am being to careful. Its been many 
months since I delved into the formatting code.

Best regards,

Jerry
Janne Blomqvist - March 15, 2012, 6:55 a.m.
On Thu, Mar 15, 2012 at 01:52, Jerry DeLisle <jvdelisle@charter.net> wrote:
> I like the idea behind this patch.  I confess, I have not studied the two
> test cases that you are modifying, but the changes seem to stick out with
> too many digits there.  Is this really correct?
> When I get another moment, I will look closer. Maybe you can explain the
> need for this change a little more.  Sorry if I am being to careful. Its
> been many months since I delved into the formatting code.

The reason is that with the patch, the default rounding is whatever
snprintf() gives us. At least with glibc, snprintf() rounds ties to
even, but I'm not sure all implementations do this rather than
rounding ties to away. So the testcase changes just make sure that
we're not rounding a tie. E.g. "1.25" rounded to two significant
digits is "1.2" if one rounds ties to even, "1.3" if ties are rounded
away. So using "1.250001" ensures that the rounded value is "1.3" with
both rounding modes.

I suppose another option would be to assume that rounding is ties to
even, and keep an eye out for regressions on other targets.
jerry DeLisle - March 15, 2012, 10:33 a.m.
On 03/15/2012 02:55 AM, Janne Blomqvist wrote:
> On Thu, Mar 15, 2012 at 01:52, Jerry DeLisle<jvdelisle@charter.net>  wrote:
>> I like the idea behind this patch.  I confess, I have not studied the two
>> test cases that you are modifying, but the changes seem to stick out with
>> too many digits there.  Is this really correct?
>> When I get another moment, I will look closer. Maybe you can explain the
>> need for this change a little more.  Sorry if I am being to careful. Its
>> been many months since I delved into the formatting code.
>
> The reason is that with the patch, the default rounding is whatever
> snprintf() gives us. At least with glibc, snprintf() rounds ties to
> even, but I'm not sure all implementations do this rather than
> rounding ties to away. So the testcase changes just make sure that
> we're not rounding a tie. E.g. "1.25" rounded to two significant
> digits is "1.2" if one rounds ties to even, "1.3" if ties are rounded
> away. So using "1.250001" ensures that the rounded value is "1.3" with
> both rounding modes.
>
> I suppose another option would be to assume that rounding is ties to
> even, and keep an eye out for regressions on other targets.
>
>

Understood, thanks.  patch OK for trunk.


Jerry

Patch

diff --git a/gcc/testsuite/gfortran.dg/edit_real_1.f90 b/gcc/testsuite/gfortran.dg/edit_real_1.f90
index 3ac7cb4..594b2f1 100644
--- a/gcc/testsuite/gfortran.dg/edit_real_1.f90
+++ b/gcc/testsuite/gfortran.dg/edit_real_1.f90
@@ -68,7 +68,7 @@  program edit_real_1
   if (s .ne. '12.345E-01z') call abort
   ! E format, negative scale factor
   s = x
-  write (s, '(-2PE10.4,A)') 1.25, "z"
+  write (s, '(-2PE10.4,A)') 1.250001, "z"
   if (s .ne. '0.0013E+03z') call abort
   ! E format, single digit precision
   s = x
diff --git a/gcc/testsuite/gfortran.dg/round_1.f03 b/gcc/testsuite/gfortran.dg/round_1.f03
index db5d6ec..f74b137 100644
--- a/gcc/testsuite/gfortran.dg/round_1.f03
+++ b/gcc/testsuite/gfortran.dg/round_1.f03
@@ -20,9 +20,9 @@  write(line, fmt(4)) 1.20, 1.22, 1.25, 1.27, 1.30, 1.125
 if (line.ne."      1.20      1.22      1.25      1.27      1.30      1.12") call abort
 write(line, fmt(5)) 1.20, 1.22, 1.25, 1.27, 1.30, 1.125
 if (line.ne."      1.20      1.22      1.25      1.27      1.30      1.13") call abort
-write(line, fmt(6)) 1.20, 1.22, 1.25, 1.27, 1.30, 1.125
+write(line, fmt(6)) 1.20, 1.22, 1.250001, 1.27, 1.30, 1.125
 if (line.ne."       1.2       1.2       1.3       1.3       1.3       1.1") call abort
-write(line, fmt(7)) 1.20, 1.22, 1.25, 1.27, 1.30, 1.125
+write(line, fmt(7)) 1.20, 1.22, 1.250001, 1.27, 1.30, 1.125
 if (line.ne."      +1.2      +1.2      +1.3      +1.3      +1.3      +1.1") call abort
 
 end
diff --git a/libgcc/configure b/libgcc/configure
old mode 100644
new mode 100755
diff --git a/libgfortran/io/unit.c b/libgfortran/io/unit.c
index 7c71b09..819d0e9 100644
--- a/libgfortran/io/unit.c
+++ b/libgfortran/io/unit.c
@@ -453,7 +453,7 @@  get_internal_unit (st_parameter_dt *dtp)
   iunit->flags.decimal = DECIMAL_POINT;
   iunit->flags.encoding = ENCODING_DEFAULT;
   iunit->flags.async = ASYNC_NO;
-  iunit->flags.round = ROUND_COMPATIBLE;
+  iunit->flags.round = ROUND_UNSPECIFIED;
 
   /* Initialize the data transfer parameters.  */
 
@@ -543,7 +543,7 @@  init_units (void)
       u->flags.decimal = DECIMAL_POINT;
       u->flags.encoding = ENCODING_DEFAULT;
       u->flags.async = ASYNC_NO;
-      u->flags.round = ROUND_COMPATIBLE;
+      u->flags.round = ROUND_UNSPECIFIED;
      
       u->recl = options.default_recl;
       u->endfile = NO_ENDFILE;
@@ -573,7 +573,7 @@  init_units (void)
       u->flags.decimal = DECIMAL_POINT;
       u->flags.encoding = ENCODING_DEFAULT;
       u->flags.async = ASYNC_NO;
-      u->flags.round = ROUND_COMPATIBLE;
+      u->flags.round = ROUND_UNSPECIFIED;
 
       u->recl = options.default_recl;
       u->endfile = AT_ENDFILE;
@@ -603,7 +603,7 @@  init_units (void)
       u->flags.decimal = DECIMAL_POINT;
       u->flags.encoding = ENCODING_DEFAULT;
       u->flags.async = ASYNC_NO;
-      u->flags.round = ROUND_COMPATIBLE;
+      u->flags.round = ROUND_UNSPECIFIED;
 
       u->recl = options.default_recl;
       u->endfile = AT_ENDFILE;
diff --git a/libgfortran/io/write_float.def b/libgfortran/io/write_float.def
index 78f09b2..7882c90 100644
--- a/libgfortran/io/write_float.def
+++ b/libgfortran/io/write_float.def
@@ -1,4 +1,5 @@ 
-/* Copyright (C) 2007, 2008, 2009, 2010, 2011 Free Software Foundation, Inc.
+/* Copyright (C) 2007, 2008, 2009, 2010, 2011, 2012 
+   Free Software Foundation, Inc.
    Contributed by Andy Vaught
    Write float code factoring to this file by Jerry DeLisle   
    F2003 I/O support contributed by Jerry DeLisle
@@ -59,11 +60,60 @@  calculate_sign (st_parameter_dt *dtp, int negative_flag)
 }
 
 
+/* Determine the precision except for EN format. For G format,
+   determines an upper bound to be used for sizing the buffer. */
+
+static int
+determine_precision (st_parameter_dt * dtp, const fnode * f, int len)
+{
+  int precision = f->u.real.d;
+
+  switch (f->format)
+    {
+    case FMT_F:
+    case FMT_G:
+      precision += dtp->u.p.scale_factor;
+      break;
+    case FMT_ES:
+      /* Scale factor has no effect on output.  */
+      break;
+    case FMT_E:
+    case FMT_D:
+      /* See F2008 10.7.2.3.3.6 */
+      if (dtp->u.p.scale_factor <= 0)
+	precision += dtp->u.p.scale_factor - 1;
+      break;
+    default:
+      return -1;
+    }
+
+  /* If the scale factor has a large negative value, we must do our
+     own rounding? Use ROUND='NEAREST', which should be what snprintf
+     is using as well.  */
+  if (precision < 0 && 
+      (dtp->u.p.current_unit->round_status == ROUND_UNSPECIFIED 
+       || dtp->u.p.current_unit->round_status == ROUND_PROCDEFINED))
+    dtp->u.p.current_unit->round_status = ROUND_NEAREST;
+
+  /* Add extra guard digits up to at least full precision when we do
+     our own rounding.  */
+  if (dtp->u.p.current_unit->round_status != ROUND_UNSPECIFIED
+      && dtp->u.p.current_unit->round_status != ROUND_PROCDEFINED)
+    {
+      precision += 2 * len + 4;
+      if (precision < 0)
+	precision = 0;
+    }
+
+  return precision;
+}
+
+
 /* Output a real number according to its format which is FMT_G free.  */
 
 static try
-output_float (st_parameter_dt *dtp, const fnode *f, char *buffer, size_t size, 
-	      int sign_bit, bool zero_flag, int ndigits, int edigits)
+output_float (st_parameter_dt *dtp, const fnode *f, char *buffer, size_t size,
+	      int nprinted, int precision, int sign_bit, bool zero_flag)
 {
   char *out;
   char *digits;
@@ -80,6 +130,7 @@  output_float (st_parameter_dt *dtp, const fnode *f, char *buffer, size_t size,
   int nzero_real;
   int leadzero;
   int nblanks;
+  int ndigits, edigits;
   sign_t sign;
 
   ft = f->format;
@@ -96,50 +147,97 @@  output_float (st_parameter_dt *dtp, const fnode *f, char *buffer, size_t size,
 
   sign = calculate_sign (dtp, sign_bit);
   
-  /* The following code checks the given string has punctuation in the correct
-     places.  Uncomment if needed for debugging.
-     if (d != 0 && ((buffer[2] != '.' && buffer[2] != ',')
-		    || buffer[ndigits + 2] != 'e'))
-       internal_error (&dtp->common, "printf is broken");  */
+  /* Calculate total number of digits.  */
+  if (ft == FMT_F)
+    ndigits = nprinted - 2;
+  else
+    ndigits = precision + 1;
 
   /* Read the exponent back in.  */
-  e = atoi (&buffer[ndigits + 3]) + 1;
+  if (ft != FMT_F)
+    e = atoi (&buffer[ndigits + 3]) + 1;
+  else
+    e = 0;
 
   /* Make sure zero comes out as 0.0e0.   */
   if (zero_flag)
     e = 0;
 
   /* Normalize the fractional component.  */
-  buffer[2] = buffer[1];
-  digits = &buffer[2];
+  if (ft != FMT_F)
+    {
+      buffer[2] = buffer[1];
+      digits = &buffer[2];
+    }
+  else
+    digits = &buffer[1];
 
   /* Figure out where to place the decimal point.  */
   switch (ft)
     {
     case FMT_F:
-      if (d == 0 && e <= 0 && p == 0)
+      nbefore = ndigits - precision;
+      /* Make sure the decimal point is a '.'; depending on the
+	 locale, this might not be the case otherwise.  */
+      digits[nbefore] = '.';
+      if (digits[0] == '0' && nbefore == 1)
 	{
-	  memmove (digits + 1, digits, ndigits - 1);
-	  digits[0] = '0';
-	  e++;
+	  digits++;
+	  nbefore--;
+	  ndigits--;
 	}
-
-      nbefore = e + p;
-      if (nbefore < 0)
+      //printf("nbefore: %d, digits: %s\n", nbefore, digits);
+      if (p != 0)
 	{
-	  nzero = -nbefore;
-          nzero_real = nzero;
-	  if (nzero > d)
-	    nzero = d;
-	  nafter = d - nzero;
-	  nbefore = 0;
+	  if (p > 0)
+	    {
+	      
+	      memmove (digits + nbefore, digits + nbefore + 1, p);
+	      digits[nbefore + p] = '.';
+	      nbefore += p;
+	      nafter = d - p;
+	      if (nafter < 0)
+		nafter = 0;
+	      nafter = d;
+	      nzero = nzero_real = 0;
+	      //printf("digits: %s\n", digits);
+	    }
+	  else /* p < 0  */
+	    {
+	      if (nbefore + p >= 0)
+		{
+		  nzero = 0;
+		  memmove (digits + nbefore + p + 1, digits + nbefore + p, -p);
+		  nbefore += p;
+		  digits[nbefore] = '.';
+		  nafter = d;
+		}
+	      else
+		{
+		  nzero = -(nbefore + p);
+		  memmove (digits + 1, digits, nbefore);
+		  digits++;
+		  nafter = d + nbefore;
+		  nbefore = 0;
+		}
+	      nzero_real = nzero;
+	      if (nzero > d)
+		nzero = d;
+	    }
 	}
       else
 	{
-	  nzero = 0;
+	  nzero = nzero_real = 0;
 	  nafter = d;
 	}
+
       expchar = 0;
+      /* If we need to do rounding ourselves, get rid of the dot by
+	 moving the fractional part.  */
+      if (dtp->u.p.current_unit->round_status != ROUND_UNSPECIFIED
+	  && dtp->u.p.current_unit->round_status != ROUND_PROCDEFINED)
+	memmove (digits + nbefore, digits + nbefore + 1, ndigits - nbefore);
+      //printf("nbefore after p handling: %d, digits: %s\n", nbefore, digits);
       break;
 
     case FMT_E:
@@ -222,10 +320,16 @@  output_float (st_parameter_dt *dtp, const fnode *f, char *buffer, size_t size,
   if (zero_flag)
     goto skip;
 
-  /* Round the value.  The value being rounded is an unsigned magnitude.
-     The ROUND_COMPATIBLE is rounding away from zero when there is a tie.  */
+  /* Round the value.  The value being rounded is an unsigned magnitude.  */
   switch (dtp->u.p.current_unit->round_status)
     {
+      /* For processor defined and unspecified rounding we use
+	 snprintf to print the exact number of digits needed, and thus
+	 let snprintf handle the rounding.  On system claiming support
+	 for IEEE 754, this ought to be round to nearest, ties to
+	 even, corresponding to the Fortran ROUND='NEAREST'.  */
+      case ROUND_PROCDEFINED: 
+      case ROUND_UNSPECIFIED:
       case ROUND_ZERO: /* Do nothing and truncation occurs.  */
 	goto skip;
       case ROUND_UP:
@@ -240,6 +344,7 @@  output_float (st_parameter_dt *dtp, const fnode *f, char *buffer, size_t size,
 	/* Round compatible unless there is a tie. A tie is a 5 with
 	   all trailing zero's.  */
 	i = nafter + nbefore;
+	//printf("I = %d, digits = %s, nbefore = %d\n", i, digits, nbefore);
 	if (digits[i] == '5')
 	  {
 	    for(i++ ; i < ndigits; i++)
@@ -262,9 +367,8 @@  output_float (st_parameter_dt *dtp, const fnode *f, char *buffer, size_t size,
 		  goto skip;
 	      }
 	  }
-	 /* Fall through.  */ 
-      case ROUND_PROCDEFINED:
-      case ROUND_UNSPECIFIED:
+	/* Fall through.  */
+	/* The ROUND_COMPATIBLE is rounding away from zero when there is a tie.  */
       case ROUND_COMPATIBLE:
 	rchar = '5';
 	goto do_rnd;
@@ -300,6 +404,7 @@  output_float (st_parameter_dt *dtp, const fnode *f, char *buffer, size_t size,
   else if (nbefore + nafter < ndigits)
     {
       i = ndigits = nbefore + nafter;
+      //printf("i: %d, digits: %s, nbefore: %d, nafter: %d\n", i, digits, nbefore, nafter);
       if (digits[i] >= rchar)
 	{
 	  /* Propagate the carry.  */
@@ -384,7 +489,7 @@  output_float (st_parameter_dt *dtp, const fnode *f, char *buffer, size_t size,
      rounding completed above.  */
   for (i = 0; i < ndigits; i++)
     {
-      if (digits[i] != '0')
+      if (digits[i] != '0' && digits[i] != '.')
 	break;
     }
 
@@ -502,6 +607,10 @@  output_float (st_parameter_dt *dtp, const fnode *f, char *buffer, size_t size,
       /* Output the decimal point.  */
       *(out4++) = dtp->u.p.current_unit->decimal_status
 		    == DECIMAL_POINT ? '.' : ',';
+      if (ft == FMT_F 
+	  && (dtp->u.p.current_unit->round_status == ROUND_UNSPECIFIED 
+	      || dtp->u.p.current_unit->round_status == ROUND_PROCDEFINED))
+	digits++;
 
       /* Output leading zeros after the decimal point.  */
       if (nzero > 0)
@@ -590,6 +699,10 @@  output_float (st_parameter_dt *dtp, const fnode *f, char *buffer, size_t size,
 
   /* Output the decimal point.  */
   *(out++) = dtp->u.p.current_unit->decimal_status == DECIMAL_POINT ? '.' : ',';
+  if (ft == FMT_F
+	  && (dtp->u.p.current_unit->round_status == ROUND_UNSPECIFIED 
+	      || dtp->u.p.current_unit->round_status == ROUND_PROCDEFINED))
+    digits++;
 
   /* Output leading zeros after the decimal point.  */
   if (nzero > 0)
@@ -634,9 +747,6 @@  output_float (st_parameter_dt *dtp, const fnode *f, char *buffer, size_t size,
       dtp->u.p.no_leading_blank = 0;
     }
 
-#undef STR
-#undef STR1
-#undef MIN_FIELD_WIDTH
   return SUCCESS;
 }
 
@@ -798,6 +908,61 @@  CALCULATE_EXP(16)
 #endif
 #undef CALCULATE_EXP
 
+
+/* Define a macro to build code for write_float.  */
+
+  /* Note: Before output_float is called, snprintf is used to print to buffer the
+     number in the format +D.DDDDe+ddd. 
+
+     #   The result will always contain a decimal point, even if no
+	 digits follow it
+
+     -   The converted value is to be left adjusted on the field boundary
+
+     +   A sign (+ or -) always be placed before a number
+
+     *   prec is used as the precision
+
+     e format: [-]d.ddde±dd where there is one digit before the
+       decimal-point character and the number of digits after it is
+       equal to the precision. The exponent always contains at least two
+       digits; if the value is zero, the exponent is 00.  */
+
+
+#define TOKENPASTE(x, y) TOKENPASTE2(x, y)
+#define TOKENPASTE2(x, y) x ## y
+
+#define DTOA(suff,prec,val) TOKENPASTE(DTOA2,suff)(prec,val)
+
+#define DTOA2(prec,val)					\
+snprintf (buffer, size, "%+-#.*e", (prec), (val))
+
+#define DTOA2L(prec,val)				\
+snprintf (buffer, size, "%+-#.*Le", (prec), (val))
+
+
+#if defined(GFC_REAL_16_IS_FLOAT128)
+#define DTOA2Q(prec,val)							\
+__qmath_(quadmath_snprintf) (buffer, size, "%+-#.*Qe", (prec), (val))
+#endif
+
+#define FDTOA(suff,prec,val) TOKENPASTE(FDTOA2,suff)(prec,val)
+
+/* For F format, we print to the buffer with f format.  */
+#define FDTOA2(prec,val)							\
+snprintf (buffer, size, "%+-#.*f", (prec), (val))
+
+#define FDTOA2L(prec,val)						\
+snprintf (buffer, size, "%+-#.*Lf", (prec), (val))
+
+
+#if defined(GFC_REAL_16_IS_FLOAT128)
+#define FDTOA2Q(prec,val)			       \
+__qmath_(quadmath_snprintf) (buffer, size, "%+-#.*Qf", \
+			     (prec), (val))
+#endif
+
+
 /* Generate corresponding I/O format for FMT_G and output.
    The rules to translate FMT_G to FMT_E or FMT_F from DEC fortran
    LRM (table 11-2, Chapter 11, "I/O Formatting", P11-25) is:
@@ -817,26 +982,25 @@  CALCULATE_EXP(16)
 	  for rounding modes adjustment, r, See Fortran F2008 10.7.5.2.2
 	  the asm volatile is required for 32-bit x86 platforms.  */
 
-#define OUTPUT_FLOAT_FMT_G(x) \
+#define OUTPUT_FLOAT_FMT_G(x,y)			\
 static void \
 output_float_FMT_G_ ## x (st_parameter_dt *dtp, const fnode *f, \
 		      GFC_REAL_ ## x m, char *buffer, size_t size, \
-		      int sign_bit, bool zero_flag, int ndigits, \
-                      int edigits, int comp_d) \
+			  int sign_bit, bool zero_flag, int comp_d) \
 { \
   int e = f->u.real.e;\
   int d = f->u.real.d;\
   int w = f->u.real.w;\
-  fnode *newf;\
+  fnode newf;\
   GFC_REAL_ ## x rexp_d, r = 0.5;\
   int low, high, mid;\
   int ubound, lbound;\
   char *p, pad = ' ';\
   int save_scale_factor, nb = 0;\
   try result;\
+  int nprinted, precision;\
 \
   save_scale_factor = dtp->u.p.scale_factor;\
-  newf = (fnode *) get_mem (sizeof (fnode));\
 \
   switch (dtp->u.p.current_unit->round_status)\
     {\
@@ -858,11 +1022,13 @@  output_float_FMT_G_ ## x (st_parameter_dt *dtp, const fnode *f, \
       || ((m == 0.0) && !(compile_options.allow_std\
 			  & (GFC_STD_F2003 | GFC_STD_F2008))))\
     { \
-      newf->format = FMT_E;\
-      newf->u.real.w = w;\
-      newf->u.real.d = d - comp_d;\
-      newf->u.real.e = e;\
+      newf.format = FMT_E;\
+      newf.u.real.w = w;\
+      newf.u.real.d = d - comp_d;\
+      newf.u.real.e = e;\
       nb = 0;\
+      precision = determine_precision (dtp, &newf, x);\
+      nprinted = DTOA(y,precision,m);			     \
       goto finish;\
     }\
 \
@@ -905,17 +1071,18 @@  output_float_FMT_G_ ## x (st_parameter_dt *dtp, const fnode *f, \
 \
   nb = e <= 0 ? 4 : e + 2;\
   nb = nb >= w ? w - 1 : nb;\
-  newf->format = FMT_F;\
-  newf->u.real.w = w - nb;\
-  newf->u.real.d = m == 0.0 ? d - 1 : -(mid - d - 1) ;\
+  newf.format = FMT_F;\
+  newf.u.real.w = w - nb;\
+  newf.u.real.d = m == 0.0 ? d - 1 : -(mid - d - 1) ;\
   dtp->u.p.scale_factor = 0;\
+  precision = determine_precision (dtp, &newf, x);	\
+  nprinted = FDTOA(y,precision,m);					\
 \
  finish:\
-  result = output_float (dtp, newf, buffer, size, sign_bit, zero_flag, \
-			 ndigits, edigits);\
+    result = output_float (dtp, &newf, buffer, size, nprinted, precision,\
+			   sign_bit, zero_flag);\
   dtp->u.p.scale_factor = save_scale_factor;\
 \
-  free (newf);\
 \
   if (nb > 0 && !dtp->u.p.g0_no_blanks)\
     {\
@@ -934,59 +1101,96 @@  output_float_FMT_G_ ## x (st_parameter_dt *dtp, const fnode *f, \
     }\
 }\
 
-OUTPUT_FLOAT_FMT_G(4)
+OUTPUT_FLOAT_FMT_G(4,)
 
-OUTPUT_FLOAT_FMT_G(8)
+OUTPUT_FLOAT_FMT_G(8,)
 
 #ifdef HAVE_GFC_REAL_10
-OUTPUT_FLOAT_FMT_G(10)
+OUTPUT_FLOAT_FMT_G(10,L)
 #endif
 
 #ifdef HAVE_GFC_REAL_16
-OUTPUT_FLOAT_FMT_G(16)
+# ifdef GFC_REAL_16_IS_FLOAT128
+OUTPUT_FLOAT_FMT_G(16,Q)
+#else
+OUTPUT_FLOAT_FMT_G(16,L)
+#endif
 #endif
 
 #undef OUTPUT_FLOAT_FMT_G
 
 
-/* Define a macro to build code for write_float.  */
-
-  /* Note: Before output_float is called, snprintf is used to print to buffer the
-     number in the format +D.DDDDe+ddd. For an N digit exponent, this gives us
-     (MIN_FIELD_WIDTH-5)-N digits after the decimal point, plus another one
-     before the decimal point.
-
-     #   The result will always contain a decimal point, even if no
-	 digits follow it
-
-     -   The converted value is to be left adjusted on the field boundary
+/* EN format is tricky since the number of significant digits depends
+   on the magnitude.  Solve it by first printing a temporary value and
+   figure out the number of significant digits from the printed
+   exponent.  */
 
-     +   A sign (+ or -) always be placed before a number
-
-     MIN_FIELD_WIDTH  minimum field width
+#define EN_PREC(x,y)\
+{\
+    GFC_REAL_ ## x tmp;				\
+    tmp = * (GFC_REAL_ ## x *)source;				\
+    if (isfinite (tmp))						\
+      nprinted = DTOA(y,0,tmp);					\
+    else\
+      nprinted = -1;\
+}\
 
-     *   (ndigits-1) is used as the precision
+static int
+determine_en_precision (st_parameter_dt *dtp, const fnode *f, 
+			const char *source, int len)
+{
+  int nprinted;
+  char buffer[10];
+  const size_t size = 10;
 
-     e format: [-]d.ddde±dd where there is one digit before the
-       decimal-point character and the number of digits after it is
-       equal to the precision. The exponent always contains at least two
-       digits; if the value is zero, the exponent is 00.  */
+  switch (len)
+    {
+    case 4:
+      EN_PREC(4,)
+      break;
 
-#define DTOA \
-snprintf (buffer, size, "%+-#" STR(MIN_FIELD_WIDTH) ".*" \
-	  "e", ndigits - 1, tmp);
+    case 8:
+      EN_PREC(8,)
+      break;
 
-#define DTOAL \
-snprintf (buffer, size, "%+-#" STR(MIN_FIELD_WIDTH) ".*" \
-	  "Le", ndigits - 1, tmp);
+#ifdef HAVE_GFC_REAL_10
+    case 10:
+      EN_PREC(10,L)
+      break;
+#endif
+#ifdef HAVE_GFC_REAL_16
+    case 16:
+# ifdef GFC_REAL_16_IS_FLOAT128
+      EN_PREC(16,Q)
+# else
+      EN_PREC(16,L)
+# endif
+      break;
+#endif
+    default:
+      internal_error (NULL, "bad real kind");
+    }
 
+  if (nprinted == -1)
+    return -1;
 
-#if defined(GFC_REAL_16_IS_FLOAT128)
-#define DTOAQ \
-__qmath_(quadmath_snprintf) (buffer, sizeof buffer, \
-			     "%+-#" STR(MIN_FIELD_WIDTH) ".*" \
-			     "Qe", ndigits - 1, tmp);
-#endif
+  int e = atoi (&buffer[5]);
+  int nbefore; /* digits before decimal point - 1.  */
+  if (e >= 0)
+    nbefore = e % 3;
+  else
+    {
+      nbefore = (-e) % 3;
+      if (nbefore != 0)
+	nbefore = 3 - nbefore;
+    }
+  int prec = f->u.real.d + nbefore;
+  if (dtp->u.p.current_unit->round_status != ROUND_UNSPECIFIED
+      && dtp->u.p.current_unit->round_status != ROUND_PROCDEFINED)
+    prec += 2 * len + 4;
+  return prec;
+}
+  
 
 #define WRITE_FLOAT(x,y)\
 {\
@@ -1000,15 +1204,18 @@  __qmath_(quadmath_snprintf) (buffer, sizeof buffer, \
 	  }\
 	tmp = sign_bit ? -tmp : tmp;\
 	zero_flag = (tmp == 0.0);\
-\
-	DTOA ## y\
-\
-	if (f->format != FMT_G)\
-	  output_float (dtp, f, buffer, size, sign_bit, zero_flag, ndigits, \
-			edigits);\
-	else \
+	if (f->format == FMT_G)\
 	  output_float_FMT_G_ ## x (dtp, f, tmp, buffer, size, sign_bit, \
-				    zero_flag, ndigits, edigits, comp_d);\
+				    zero_flag, comp_d);\
+	else\
+	  {\
+	    if (f->format == FMT_F)\
+	      nprinted = FDTOA(y,precision,tmp);		\
+	    else\
+	      nprinted = DTOA(y,precision,tmp);					\
+	    output_float (dtp, f, buffer, size, nprinted, precision,\
+			  sign_bit, zero_flag);\
+	  }\
 }\
 
 /* Output a real number according to its format.  */
@@ -1017,29 +1224,21 @@  static void
 write_float (st_parameter_dt *dtp, const fnode *f, const char *source, \
             int len, int comp_d)
 {
-
-#if defined(HAVE_GFC_REAL_16) || __LDBL_DIG__ > 18
-# define MIN_FIELD_WIDTH 49
-#else
-# define MIN_FIELD_WIDTH 32
-#endif
-#define STR(x) STR1(x)
-#define STR1(x) #x
-
-  /* This must be large enough to accurately hold any value.  */
-  char buffer[MIN_FIELD_WIDTH+1];
-  int sign_bit, ndigits, edigits;
+  int sign_bit, nprinted;
+  int precision;  /* Precision for snprintf call.  */
   bool zero_flag;
-  size_t size;
-
-  size = MIN_FIELD_WIDTH+1;
 
-  /* printf pads blanks for us on the exponent so we just need it big enough
-     to handle the largest number of exponent digits expected.  */
-  edigits=4;
-
-  /* Always convert at full precision to avoid double rounding.  */
-    ndigits = MIN_FIELD_WIDTH - 4 - edigits;
+  if (f->format != FMT_EN)
+    precision = determine_precision (dtp, f, len);
+  else
+    precision = determine_en_precision (dtp, f, source, len);
+
+  /* 4932 is the maximum exponent of long double and quad precision, 3
+     extra characters for the sign, the decimal point, and the
+     trailing null, and finally some extra digits depending on the
+     requested precision.  */
+  const size_t size = 4932 + 3 + precision;
+  char buffer[size];
 
   switch (len)
     {