diff mbox

[libgfortran] PR47694 [4.3/4.4/4.5/4.6 Regression] Fortran read from named pipe fails

Message ID 4D5F287E.7070401@frontier.com
State New
Headers show

Commit Message

Jerry DeLisle Feb. 19, 2011, 2:18 a.m. UTC
Hi folks,

The problem here is that when a read from a named pipe reads more than one line 
worth, the formatted IO buffer holds onto the data.  When fbuf_read is 
subsequently invoked, a sread is dutifully called and it waits for the next read 
completion.  In the meantime, the next line of data is already in the buffer so 
the read waits for another "enter".  The reads get out of sync.  The problem 
does not occur with regular files because there is either always data available 
from the file to read or an EOF occurs and in both those cases sread returns.

The patch fixes this by scanning the buffer under certain conditions looking for 
an End-of-Line (or EOR). If EOR is found, there is no need to sread again from 
the pipe.  This patch does add a little overhead from the tight scan loop, but 
will save some system calls.

Regression tested on x86-64.  I would appreciate others testing please. I do not 
know how to create a specific named pipe test in the test suite. See the PR for 
the original test.

OK for trunk?

Jerry

2011-02-18  Jerry DeLisle  <jvdelisle@gcc.gnu.org>

	PR libgfortran/47694
	* io/fbuf.c (fbuf_read): Scan the buffer for end-of-record and if
	found, do not call sread unnecessarily.

Comments

Tobias Burnus Feb. 19, 2011, 10:44 a.m. UTC | #1
Jerry DeLisle wrote:
> Regression tested on x86-64.  I would appreciate others testing 
> please. I do not know how to create a specific named pipe test in the 
> test suite. See the PR for the original test.
> OK for trunk?

 From my side, it looks OK, though I would like if Janne could also look 
at the patch. I think one cannot add a (portable) test case to the test 
suite. I tried to get the program fail, but if failed :-)

Thanks for the patch! I think one should at least backport it to 4.5.

Tobias

> 2011-02-18  Jerry DeLisle <jvdelisle@gcc.gnu.org>
>
>     PR libgfortran/47694
>     * io/fbuf.c (fbuf_read): Scan the buffer for end-of-record and if
>     found, do not call sread unnecessarily.
Janne Blomqvist Feb. 19, 2011, 5:15 p.m. UTC | #2
On Sat, Feb 19, 2011 at 04:18, Jerry DeLisle <jvdelisle@frontier.com> wrote:
> Hi folks,
>
> The problem here is that when a read from a named pipe reads more than one
> line worth, the formatted IO buffer holds onto the data.  When fbuf_read is
> subsequently invoked, a sread is dutifully called and it waits for the next
> read completion.  In the meantime, the next line of data is already in the
> buffer so the read waits for another "enter".  The reads get out of sync.
>  The problem does not occur with regular files because there is either
> always data available from the file to read or an EOF occurs and in both
> those cases sread returns.

Thanks for the explanation; So I think the fundamental issue is that
in order to handle pipes (and perhaps other stuff like terminals too?)
correctly, we need to consume all the bytes in the buffer before
asking for more via sread.

> The patch fixes this by scanning the buffer under certain conditions looking
> for an End-of-Line (or EOR). If EOR is found, there is no need to sread
> again from the pipe.  This patch does add a little overhead from the tight
> scan loop, but will save some system calls.
>
> Regression tested on x86-64.  I would appreciate others testing please. I do
> not know how to create a specific named pipe test in the test suite. See the
> PR for the original test.
>
> OK for trunk?

   if (oldpos + *len > oldact)
     {
+      /* See if we have a pending EOR in here.  */
+      if (*len > oldact)

This second conditional makes no sense. What matters is if the current
position in the buffer + len goes past the active bytes, which is
exactly the previous conditional. Thus, does the patch work if you
remove that conditional and check for EOR unconditionally if oldpos +
*len > oldact?

Secondly, you check for "\n" or "\r", but I think you need to check
that a "\r" is followed by a "\n" before you can say that you're at
EOR.

I wonder, would it perhaps be cleaner to create a new function, say
something like

static inline uchar *
fbuf_getptr (gfc_unit * u)
{
  return (uchar*) u->fbuf->buf + u->fbuf->pos;
}

use this in read_sf() to get a pointer to eventually return, then loop
forwards one character at a time using fbuf_getc() checking for
EOR/EOF and other conditions/errors along the way.

What do you think?
Jerry DeLisle Feb. 19, 2011, 6:27 p.m. UTC | #3
On 02/19/2011 09:15 AM, Janne Blomqvist wrote:
> On Sat, Feb 19, 2011 at 04:18, Jerry DeLisle<jvdelisle@frontier.com>  wrote:
>> Hi folks,
>>
>> The problem here is that when a read from a named pipe reads more than one
>> line worth, the formatted IO buffer holds onto the data.  When fbuf_read is
>> subsequently invoked, a sread is dutifully called and it waits for the next
>> read completion.  In the meantime, the next line of data is already in the
>> buffer so the read waits for another "enter".  The reads get out of sync.
>>   The problem does not occur with regular files because there is either
>> always data available from the file to read or an EOF occurs and in both
>> those cases sread returns.
>
> Thanks for the explanation; So I think the fundamental issue is that
> in order to handle pipes (and perhaps other stuff like terminals too?)
> correctly, we need to consume all the bytes in the buffer before
> asking for more via sread.
>
>> The patch fixes this by scanning the buffer under certain conditions looking
>> for an End-of-Line (or EOR). If EOR is found, there is no need to sread
>> again from the pipe.  This patch does add a little overhead from the tight
>> scan loop, but will save some system calls.
>>
>> Regression tested on x86-64.  I would appreciate others testing please. I do
>> not know how to create a specific named pipe test in the test suite. See the
>> PR for the original test.
>>
>> OK for trunk?
>
>     if (oldpos + *len>  oldact)
>       {
> +      /* See if we have a pending EOR in here.  */
> +      if (*len>  oldact)
>
> This second conditional makes no sense. What matters is if the current
> position in the buffer + len goes past the active bytes, which is
> exactly the previous conditional. Thus, does the patch work if you
> remove that conditional and check for EOR unconditionally if oldpos +
> *len>  oldact?

I thought it made no sense either. But without it, fmt_t_2.f90 fails.  It is the 
only regression I found when testing this patch.  I determined that condition by 
experiment.  I looked at the values of oldpos, oldact, *len for various test 
cases to see what was unique about that one in particular. I then tried 
different combinations of conditions until it worked for the piped case and this 
case and no regressions. I suppose we could study that case more explicitly, but 
it is unique. Maybe a bug elsewhere?

>
> Secondly, you check for "\n" or "\r", but I think you need to check
> that a "\r" is followed by a "\n" before you can say that you're at
> EOR.
>

It does not matter because either way the result is we read what is in the 
buffer and sf_read looks for the \r followed by \n. All we are doing here is 
avoiding an sread and I wanted to address the case where there is a \r without a 
\n. Of course I just realized I can artificially test this on my machine here 
and I will do so.

> I wonder, would it perhaps be cleaner to create a new function, say
> something like
>
> static inline uchar *
> fbuf_getptr (gfc_unit * u)
> {
>    return (uchar*) u->fbuf->buf + u->fbuf->pos;
> }
>
> use this in read_sf() to get a pointer to eventually return, then loop
> forwards one character at a time using fbuf_getc() checking for
> EOR/EOF and other conditions/errors along the way.
>
> What do you think?
>

This alternate approach could work, but I think the numerous calls to fbuf_getc 
even if inlined will be less efficient then the approach I have which is a 
fairly tight loop.

Do you want to test and compare? It will take me a little while to modify and 
test it.  Also, how exactly should we test for performance? cat bigfile 
 >tmpfifo? or just read a big file? and time it.

Jerry
Jerry DeLisle Feb. 19, 2011, 6:46 p.m. UTC | #4
On 02/19/2011 10:27 AM, Jerry DeLisle wrote:
> On 02/19/2011 09:15 AM, Janne Blomqvist wrote:
>> On Sat, Feb 19, 2011 at 04:18, Jerry DeLisle<jvdelisle@frontier.com> wrote:
>>> Hi folks,
>>>
>>> The problem here is that when a read from a named pipe reads more than one
>>> line worth, the formatted IO buffer holds onto the data. When fbuf_read is
>>> subsequently invoked, a sread is dutifully called and it waits for the next
>>> read completion. In the meantime, the next line of data is already in the
>>> buffer so the read waits for another "enter". The reads get out of sync.
>>> The problem does not occur with regular files because there is either
>>> always data available from the file to read or an EOF occurs and in both
>>> those cases sread returns.
>>
>> Thanks for the explanation; So I think the fundamental issue is that
>> in order to handle pipes (and perhaps other stuff like terminals too?)
>> correctly, we need to consume all the bytes in the buffer before
>> asking for more via sread.
>>
>>> The patch fixes this by scanning the buffer under certain conditions looking
>>> for an End-of-Line (or EOR). If EOR is found, there is no need to sread
>>> again from the pipe. This patch does add a little overhead from the tight
>>> scan loop, but will save some system calls.
>>>
>>> Regression tested on x86-64. I would appreciate others testing please. I do
>>> not know how to create a specific named pipe test in the test suite. See the
>>> PR for the original test.
>>>
>>> OK for trunk?
>>
>> if (oldpos + *len> oldact)
>> {
>> + /* See if we have a pending EOR in here. */
>> + if (*len> oldact)
>>
>> This second conditional makes no sense. What matters is if the current
>> position in the buffer + len goes past the active bytes, which is
>> exactly the previous conditional. Thus, does the patch work if you
>> remove that conditional and check for EOR unconditionally if oldpos +
>> *len> oldact?
>
> I thought it made no sense either. But without it, fmt_t_2.f90 fails. It is the
> only regression I found when testing this patch. I determined that condition by
> experiment. I looked at the values of oldpos, oldact, *len for various test
> cases to see what was unique about that one in particular. I then tried
> different combinations of conditions until it worked for the piped case and this
> case and no regressions. I suppose we could study that case more explicitly, but
> it is unique. Maybe a bug elsewhere?
>
>>
>> Secondly, you check for "\n" or "\r", but I think you need to check
>> that a "\r" is followed by a "\n" before you can say that you're at
>> EOR.
>>
>
> It does not matter because either way the result is we read what is in the
> buffer and sf_read looks for the \r followed by \n. All we are doing here is
> avoiding an sread and I wanted to address the case where there is a \r without a
> \n. Of course I just realized I can artificially test this on my machine here
> and I will do so.
>

Well, I tested.  On my system, \r by itself works, \n by itself works, \r\n 
works, but my system sees it as two lines and not one.  That may be normal.  I 
will have to get it tested on a cr-lf based system.

What do you think?

Jerry
Tobias Burnus Feb. 19, 2011, 7:19 p.m. UTC | #5
Jerry DeLisle wrote:
> Well, I tested.  On my system, \r by itself works, \n by itself works, 
> \r\n works, but my system sees it as two lines and not one.  That may 
> be normal.  I will have to get it tested on a cr-lf based system.

I have not tested it, but at least for normal files - and possibly also 
for sockets*, I would like under Unix to have "\r\n" and "\n" be read as 
single EOR marker. I do not want to have "\r\n" as two new lines. 
(Similarly, I also would like to read Unix "\n" files correctly on 
Winows.)  I think the current code (without patch) and the submitted 
patch handle this correctly, though I have not extensively tested it.

(I do not care about "\r" as end of line marker as it effectively plays 
no role [1].)

Tobias

* e.g. internet protocols often require to use \r\n [1]

[1] http://en.wikipedia.org/wiki/Newline#Representations
Janne Blomqvist Feb. 19, 2011, 7:38 p.m. UTC | #6
On Sat, Feb 19, 2011 at 20:27, Jerry DeLisle <jvdelisle@frontier.com> wrote:
> On 02/19/2011 09:15 AM, Janne Blomqvist wrote:
>>    if (oldpos + *len>  oldact)
>>      {
>> +      /* See if we have a pending EOR in here.  */
>> +      if (*len>  oldact)
>>
>> This second conditional makes no sense. What matters is if the current
>> position in the buffer + len goes past the active bytes, which is
>> exactly the previous conditional. Thus, does the patch work if you
>> remove that conditional and check for EOR unconditionally if oldpos +
>> *len>  oldact?
>
> I thought it made no sense either. But without it, fmt_t_2.f90 fails.  It is
> the only regression I found when testing this patch.  I determined that
> condition by experiment.  I looked at the values of oldpos, oldact, *len for
> various test cases to see what was unique about that one in particular. I
> then tried different combinations of conditions until it worked for the
> piped case and this case and no regressions. I suppose we could study that
> case more explicitly, but it is unique. Maybe a bug elsewhere?

I'm very suspicious about this; I see no logical reason how it could
make sense. Perhaps a bug elsewhere, yes.

>> Secondly, you check for "\n" or "\r", but I think you need to check
>> that a "\r" is followed by a "\n" before you can say that you're at
>> EOR.
>>
>
> It does not matter because either way the result is we read what is in the
> buffer and sf_read looks for the \r followed by \n. All we are doing here is
> avoiding an sread and I wanted to address the case where there is a \r
> without a \n. Of course I just realized I can artificially test this on my
> machine here and I will do so.

Good point.

>
>> I wonder, would it perhaps be cleaner to create a new function, say
>> something like
>>
>> static inline uchar *
>> fbuf_getptr (gfc_unit * u)
>> {
>>   return (uchar*) u->fbuf->buf + u->fbuf->pos;
>> }
>>
>> use this in read_sf() to get a pointer to eventually return, then loop
>> forwards one character at a time using fbuf_getc() checking for
>> EOR/EOF and other conditions/errors along the way.
>>
>> What do you think?
>>
>
> This alternate approach could work, but I think the numerous calls to
> fbuf_getc even if inlined will be less efficient then the approach I have
> which is a fairly tight loop.
>
> Do you want to test and compare? It will take me a little while to modify
> and test it.  Also, how exactly should we test for performance? cat bigfile
>>tmpfifo? or just read a big file? and time it.

I suspect the fbuf_getc() approach might be faster, as in that case we
don't need to scan through the string twice (once in fbuf_read(), once
in read_sf()), but in any case I'm not that concerned about the
performance. I'd just like something that is straightforward to
understand and which makes sense, so we reduce the possibility of
further regressions.
Jerry DeLisle Feb. 19, 2011, 11:23 p.m. UTC | #7
On 02/19/2011 11:38 AM, Janne Blomqvist wrote:
> On Sat, Feb 19, 2011 at 20:27, Jerry DeLisle<jvdelisle@frontier.com>  wrote:
>> On 02/19/2011 09:15 AM, Janne Blomqvist wrote:
>>>     if (oldpos + *len>    oldact)
>>>       {
>>> +      /* See if we have a pending EOR in here.  */
>>> +      if (*len>    oldact)
>>>
>>> This second conditional makes no sense. What matters is if the current
>>> position in the buffer + len goes past the active bytes, which is
>>> exactly the previous conditional. Thus, does the patch work if you
>>> remove that conditional and check for EOR unconditionally if oldpos +
>>> *len>    oldact?
>>
>> I thought it made no sense either. But without it, fmt_t_2.f90 fails.  It is
>> the only regression I found when testing this patch.  I determined that
>> condition by experiment.  I looked at the values of oldpos, oldact, *len for
>> various test cases to see what was unique about that one in particular. I
>> then tried different combinations of conditions until it worked for the
>> piped case and this case and no regressions. I suppose we could study that
>> case more explicitly, but it is unique. Maybe a bug elsewhere?
>
> I'm very suspicious about this; I see no logical reason how it could
> make sense. Perhaps a bug elsewhere, yes.

Lets see what happens with the fbuf_getc approach and we can discuss this case 
further if needed.

>
>>> Secondly, you check for "\n" or "\r", but I think you need to check
>>> that a "\r" is followed by a "\n" before you can say that you're at
>>> EOR.
>>>
>>
>> It does not matter because either way the result is we read what is in the
>> buffer and sf_read looks for the \r followed by \n. All we are doing here is
>> avoiding an sread and I wanted to address the case where there is a \r
>> without a \n. Of course I just realized I can artificially test this on my
>> machine here and I will do so.
>
> Good point.
>
>>
>>> I wonder, would it perhaps be cleaner to create a new function, say
>>> something like
>>>
>>> static inline uchar *
>>> fbuf_getptr (gfc_unit * u)
>>> {
>>>    return (uchar*) u->fbuf->buf + u->fbuf->pos;
>>> }
>>>
>>> use this in read_sf() to get a pointer to eventually return, then loop
>>> forwards one character at a time using fbuf_getc() checking for
>>> EOR/EOF and other conditions/errors along the way.
>>>
>>> What do you think?
>>>
>>
>> This alternate approach could work, but I think the numerous calls to
>> fbuf_getc even if inlined will be less efficient then the approach I have
>> which is a fairly tight loop.
>>
>> Do you want to test and compare? It will take me a little while to modify
>> and test it.  Also, how exactly should we test for performance? cat bigfile
>>> tmpfifo? or just read a big file? and time it.
>
> I suspect the fbuf_getc() approach might be faster, as in that case we
> don't need to scan through the string twice (once in fbuf_read(), once
> in read_sf()), but in any case I'm not that concerned about the
> performance. I'd just like something that is straightforward to
> understand and which makes sense, so we reduce the possibility of
> further regressions.
>

Ah yes, I see your point, we scan it over again in read_sf.  I will work up a 
revised patch and see how it does.

Thanks for comments.

Jerry
diff mbox

Patch

Index: fbuf.c
===================================================================
--- fbuf.c	(revision 170273)
+++ fbuf.c	(working copy)
@@ -219,7 +219,7 @@  fbuf_seek (gfc_unit * u, int off, int whence)
 char *
 fbuf_read (gfc_unit * u, int * len)
 {
-  char *ptr;
+  char *ptr, *ch;
   int oldact, oldpos;
   int readlen = 0;
 
@@ -230,6 +230,15 @@  fbuf_read (gfc_unit * u, int * len)
   u->fbuf->pos = oldpos;
   if (oldpos + *len > oldact)
     {
+      /* See if we have a pending EOR in here.  */
+      if (*len > oldact)
+	for (ch = u->fbuf->buf + oldpos; ch < u->fbuf->buf + oldact; ch++)
+	  if (*ch == '\n' || *ch == '\r')
+	    {
+	      u->fbuf->act = oldact - oldpos;
+	      *len = u->fbuf->act;
+	      return ptr;
+	    }
       fbuf_debug (u, "reading %d bytes starting at %d ", 
                   oldpos + *len - oldact, oldact);
       readlen = sread (u->s, u->fbuf->buf + oldact, oldpos + *len - oldact);