diff mbox

[libfortran,4.7/4.8/4.9,Regression] PR38199 missed optimization: I/O performance

Message ID 531ABAF8.2070604@charter.net
State New
Headers show

Commit Message

Jerry DeLisle March 8, 2014, 6:38 a.m. UTC
The attached patch addresses the problem identified in comment #22 of the PR.
For character array internal unit reads, eat_spaces must call next_char to
advance every single character until the end of the string is reached.  In the
case sited which is very contrived, this amounts to about 100000 calls to next_char.

For clarity, this test case:

      character buffer(1)*100000
      integer i,j

      j = 1234
      write(buffer(1),'(i4)') j

      DO j=1,9999
!        write(*,*) buffer(1)(1:4)
        read(buffer,*) i
!        write(*,*) i
      ENDDO
      end

Without the patch takes about 25 seconds to run.

With the patch this takes about 2.8 seconds.

The speedup is accomplished by simply skipping over spaces without calling
next_read, then backing up one character and letting the existing execution path
proceed, preserving all the end of record code needed in next_char.

I also remove some unneeded error checks.

Regression tested on X86_64 gnu.  No need for a new test case since no new
functionality is added.

OK for trunk? The PR is marked as a regression, so I think this could be the
last piece and call it done.

Regards,

Jerry

2014-03-08  Jerry DeLisle  <jvdelisle@gcc.gnu>

	PR libfortran/38199
	* io/list_read.c (next_char): Delete unuseful error checks.
	(eat_spaces): For character array reading, skip ahead over
	spaces rather than call next_char multiple times.

Comments

Tobias Burnus March 8, 2014, 10:45 a.m. UTC | #1
Jerry DeLisle wrote:
> The attached patch addresses the problem identified in comment #22 of the PR.
> For character array internal unit reads, eat_spaces must call next_char to
> advance every single character until the end of the string is reached.  In the
> case sited which is very contrived, this amounts to about 100000 calls to next_char.

> Without the patch takes about 25 seconds to run.
> With the patch this takes about 2.8 seconds.

Nice!

> The speedup is accomplished by simply skipping over spaces without calling
> next_read, then backing up one character and letting the existing execution path
> proceed, preserving all the end of record code needed in next_char.

I think that code is okay.

I wonder whether it can happen that we read one character too far: i.e. 
stell() == end of string; offset++ -> one behind. As one then 
immediately steps back (due to "offset < limit" plus offset--), the code 
should be handled just fine - except for the out of bounds read - which 
might be detected by, e.g., -fsanitizer=address (if libgfortran and the 
program use it). However, I am not sure whether eat_spaces would be 
called in that case or whether other conditions error out before.

> I also remove some unneeded error checks.

I have to admit that I do not really understand why the conditions are 
unreachable. Is it because "dtp->u.p.current_unit->bytes_left == 0" is 
already checked for "is_array_io (dtp)"? If so, can it happen for scalar 
internal I/O? I assume, it is obvious, but I don't quickly see it.

Tobias


> 2014-03-08  Jerry DeLisle  <jvdelisle@gcc.gnu>
>
> 	PR libfortran/38199
> 	* io/list_read.c (next_char): Delete unuseful error checks.
> 	(eat_spaces): For character array reading, skip ahead over
> 	spaces rather than call next_char multiple times.
Steven Bosscher March 8, 2014, 12:58 p.m. UTC | #2
On Sat, Mar 8, 2014 at 7:38 AM, Jerry DeLisle wrote:
> The speedup is accomplished by simply skipping over spaces without calling
> next_read, then backing up one character and letting the existing execution path
> proceed, preserving all the end of record code needed in next_char.
>
> I also remove some unneeded error checks.

Would it be enough to make them "unlikely" instead?

-      if (length < 0)
+      if (unlikely(length < 0))

Ciao!
Steven
Jerry DeLisle March 8, 2014, 3:39 p.m. UTC | #3
On 03/08/2014 02:45 AM, Tobias Burnus wrote:
--- snip ---
> 
> I think that code is okay.
> 
> I wonder whether it can happen that we read one character too far: i.e. stell()
> == end of string; offset++ -> one behind. As one then immediately steps back
> (due to "offset < limit" plus offset--), the code should be handled just fine -
> except for the out of bounds read - which might be detected by, e.g.,
> -fsanitizer=address (if libgfortran and the program use it). However, I am not
> sure whether eat_spaces would be called in that case or whether other conditions
> error out before.
>

The strings are internally NULL terminated, which helps. I will do some tests to
see.

>> I also remove some unneeded error checks.
> 
> I have to admit that I do not really understand why the conditions are
> unreachable. Is it because "dtp->u.p.current_unit->bytes_left == 0" is already
> checked for "is_array_io (dtp)"? If so, can it happen for scalar internal I/O? I
> assume, it is obvious, but I don't quickly see it.

The is_array_io (dtp) is checking for an array descriptor. For scalar internal
I/O the descriptor is a NULL, so the new code is not executed in that case.  I
ran a test using a scalar string and the new code is not needed.

The errors given are not normal error conditions.  If we hit these errors there
is a internal library bug and the program would fail out. Since we have had none
reported ever I don't think we need it.  As Steven suggested I can make them
unlikely.

Also I just realized I did not address the kind=4 case, so I will do that and
resubmit.


Thanks for review.

Jerry
Manfred Schwarb March 8, 2014, 10:37 p.m. UTC | #4
Am 08.03.2014 07:38, schrieb Jerry DeLisle:
> The attached patch addresses the problem identified in comment #22 of the PR.
> For character array internal unit reads, eat_spaces must call next_char to
> advance every single character until the end of the string is reached.  In the
> case sited which is very contrived, this amounts to about 100000 calls to next_char.
>
> For clarity, this test case:
>
>        character buffer(1)*100000
>        integer i,j
>
>        j = 1234
>        write(buffer(1),'(i4)') j
>
>        DO j=1,9999
> !        write(*,*) buffer(1)(1:4)
>          read(buffer,*) i
> !        write(*,*) i
>        ENDDO
>        end
>
> Without the patch takes about 25 seconds to run.
>
> With the patch this takes about 2.8 seconds.

This is great.

However, this is still 10 times slower than the LEN_TRIM variant:
         character buffer(1)*100000
         integer i,j
  
         j = 1234
         write(buffer(1),'(i4)') j
  
         DO j=1,9999
  !        write(*,*) buffer(1)(1:4)
           read(buffer(1)(1:LEN_TRIM(buffer(1))),*) i
  !        write(*,*) i
         ENDDO
         end

which takes 0.23s on my box. So on the on hand the improvement is great,
on the other hand it is really sad, because the user will still
need to do manual LEN_TRIM's when reading larger strings to get
optimal performance...

Thanks,
Manfred



>
> The speedup is accomplished by simply skipping over spaces without calling
> next_read, then backing up one character and letting the existing execution path
> proceed, preserving all the end of record code needed in next_char.
>
> I also remove some unneeded error checks.
>
> Regression tested on X86_64 gnu.  No need for a new test case since no new
> functionality is added.
>
> OK for trunk? The PR is marked as a regression, so I think this could be the
> last piece and call it done.
>
> Regards,
>
> Jerry
>
> 2014-03-08  Jerry DeLisle  <jvdelisle@gcc.gnu>
>
> 	PR libfortran/38199
> 	* io/list_read.c (next_char): Delete unuseful error checks.
> 	(eat_spaces): For character array reading, skip ahead over
> 	spaces rather than call next_char multiple times.
>
Manfred Schwarb March 8, 2014, 10:55 p.m. UTC | #5
Am 08.03.2014 23:37, schrieb Manfred Schwarb:
> Am 08.03.2014 07:38, schrieb Jerry DeLisle:
>> The attached patch addresses the problem identified in comment #22 of the PR.
>> For character array internal unit reads, eat_spaces must call next_char to
>> advance every single character until the end of the string is reached.  In the
>> case sited which is very contrived, this amounts to about 100000 calls to next_char.
>>
>> For clarity, this test case:
>>
>>        character buffer(1)*100000
>>        integer i,j
>>
>>        j = 1234
>>        write(buffer(1),'(i4)') j
>>
>>        DO j=1,9999
>> !        write(*,*) buffer(1)(1:4)
>>          read(buffer,*) i
>> !        write(*,*) i
>>        ENDDO
>>        end
>>
>> Without the patch takes about 25 seconds to run.
>>
>> With the patch this takes about 2.8 seconds.
>
> This is great.
>
> However, this is still 10 times slower than the LEN_TRIM variant:
>          character buffer(1)*100000
>          integer i,j
>
>          j = 1234
>          write(buffer(1),'(i4)') j
>
>          DO j=1,9999
>   !        write(*,*) buffer(1)(1:4)
>            read(buffer(1)(1:LEN_TRIM(buffer(1))),*) i
>   !        write(*,*) i
>          ENDDO
>          end
>

and also 10 times slower than the scalar variant (which was fixed by Thomas):
       character buffer*100000
       integer i,j

       j = 1234
       write(buffer,'(i4)') j

       DO j=1,9999
!        write(*,*) buffer(1:4)
         read(buffer,*) i
!        write(*,*) i
       ENDDO
       end



> which takes 0.23s on my box. So on the on hand the improvement is great,
> on the other hand it is really sad, because the user will still
> need to do manual LEN_TRIM's when reading larger strings to get
> optimal performance...
>
> Thanks,
> Manfred
>
>
>
>>
>> The speedup is accomplished by simply skipping over spaces without calling
>> next_read, then backing up one character and letting the existing execution path
>> proceed, preserving all the end of record code needed in next_char.
>>
>> I also remove some unneeded error checks.
>>
>> Regression tested on X86_64 gnu.  No need for a new test case since no new
>> functionality is added.
>>
>> OK for trunk? The PR is marked as a regression, so I think this could be the
>> last piece and call it done.
>>
>> Regards,
>>
>> Jerry
>>
>> 2014-03-08  Jerry DeLisle  <jvdelisle@gcc.gnu>
>>
>>     PR libfortran/38199
>>     * io/list_read.c (next_char): Delete unuseful error checks.
>>     (eat_spaces): For character array reading, skip ahead over
>>     spaces rather than call next_char multiple times.
>>
>
>
diff mbox

Patch

Index: list_read.c
===================================================================
--- list_read.c	(revision 208303)
+++ list_read.c	(working copy)
@@ -160,7 +160,7 @@  next_char (st_parameter_dt *dtp)
 
       dtp->u.p.line_buffer_pos = 0;
       dtp->u.p.line_buffer_enabled = 0;
-    }    
+    }
 
   /* Handle the end-of-record and end-of-file conditions for
      internal array unit.  */
@@ -208,20 +208,8 @@  next_char (st_parameter_dt *dtp)
          c = cc;
        }
 
-      if (length < 0)
-	{
-	  generate_error (&dtp->common, LIBERROR_OS, NULL);
-	  return '\0';
-	}
-  
       if (is_array_io (dtp))
 	{
-	  /* Check whether we hit EOF.  */ 
-	  if (length == 0)
-	    {
-	      generate_error (&dtp->common, LIBERROR_INTERNAL_UNIT, NULL);
-	      return '\0';
-	    } 
 	  dtp->u.p.current_unit->bytes_left--;
 	}
       else
@@ -264,6 +252,28 @@  eat_spaces (st_parameter_dt *dtp)
 {
   int c;
 
+  /* If internal character array IO, peak ahead and seek past spaces.
+     This is an optimazation to eliminate numerous calls to
+     next character unique to character arrays with large character
+     lengths (PR38199). */
+  if (is_array_io (dtp))
+    {
+      gfc_offset offset = stell (dtp->u.p.current_unit->s);
+      gfc_offset limit = dtp->u.p.current_unit->bytes_left;
+
+      do
+	{
+	  c = dtp->internal_unit[offset++];
+	  dtp->u.p.current_unit->bytes_left--;
+	}
+      while (offset < limit && (c == ' ' || c == '\t'));
+      /* Back up, seek ahead, and fall through to complete the process
+	 so that END conditions are handled correctly.  */
+      dtp->u.p.current_unit->bytes_left++;
+      sseek (dtp->u.p.current_unit->s, offset-1, SEEK_SET);
+    }
+
+  /* Now skip spaces, EOF and EOL are handled in next_char.  */
   do
     c = next_char (dtp);
   while (c != EOF && (c == ' ' || c == '\t'));