diff mbox series

[libgfortran] PR114304 - [13/14 Regression] libgfortran I/O – bogus "Semicolon not allowed as separator with DECIMAL='point'"

Message ID 9a848b89-ee95-44f7-8491-cbe22804edf4@gmail.com
State New
Headers show
Series [libgfortran] PR114304 - [13/14 Regression] libgfortran I/O – bogus "Semicolon not allowed as separator with DECIMAL='point'" | expand

Commit Message

Jerry D April 4, 2024, 1:33 a.m. UTC
Hi all,

The attached log entry and patch (git show) fixes this issue by adding 
logic to handle spaces in eat_separators. One or more spaces by 
themselves are a valid separator. So in this case we look at the 
character following the spaces to see if it is a comma or semicolon.

If so, I change it to the valid separator for the given decimal mode, 
point or comma. This allows the comma or semicolon to be interpreted as 
a null read on the next effective item in the formatted read.

I chose a permissive approach here that allows reads to proceed when the
input line is mal-formed with an incorrect separator as long as there is 
at least one space in front of it.

New test case included. Regression tested on X86-64.

OK for trunk?  Backport to 13 after some time.

Regards,

Jerry

Comments

Paul Richard Thomas April 4, 2024, 8:17 a.m. UTC | #1
Hi Jerry,

It looks good to me. Noting that this is not a regression, OK for mainline
on condition that you keep a sharp eye out for any associated problems.
Likewise with backporting to 13-branch.

Thanks

Paul


On Thu, 4 Apr 2024 at 02:34, Jerry D <jvdelisle2@gmail.com> wrote:

> Hi all,
>
> The attached log entry and patch (git show) fixes this issue by adding
> logic to handle spaces in eat_separators. One or more spaces by
> themselves are a valid separator. So in this case we look at the
> character following the spaces to see if it is a comma or semicolon.
>
> If so, I change it to the valid separator for the given decimal mode,
> point or comma. This allows the comma or semicolon to be interpreted as
> a null read on the next effective item in the formatted read.
>
> I chose a permissive approach here that allows reads to proceed when the
> input line is mal-formed with an incorrect separator as long as there is
> at least one space in front of it.
>
> New test case included. Regression tested on X86-64.
>
> OK for trunk?  Backport to 13 after some time.
>
> Regards,
>
> Jerry
Tobias Burnus April 4, 2024, 9:31 a.m. UTC | #2
Hi Jerry,

Jerry D wrote:
> The attached log entry and patch (git show) fixes this issue by adding 
> logic to handle spaces in eat_separators. One or more spaces by 
> themselves are a valid separator. So in this case we look at the 
> character following the spaces to see if it is a comma or semicolon.
> 
> If so, I change it to the valid separator for the given decimal mode, 
> point or comma. This allows the comma or semicolon to be interpreted as 
> a null read on the next effective item in the formatted read.
> 
> I chose a permissive approach here that allows reads to proceed when the
> input line is mal-formed with an incorrect separator as long as there is 
> at least one space in front of it.

First: Consider also adding 'PR fortran/105473' to the commit log
as the PRs are closely related, albeit this PR is different-

The patch looks mostly like I would expect, except for decimal='point' 
and a ';' which is *not* preceded by a space.

Thanks for working on it.

Regarding the 'except' case:

* * *

If I try your patch with the testcase of at comment 19,

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114304#c19
→ https://gcc.gnu.org/bugzilla/attachment.cgi?id=57695 ,

I do note that with 'decimal=point', a tailing semicolon is silently
accepted – even if not proceeded by a space.

I think such code is invalid – and you could consider to reject it.
Otherwise, the handling all seems to be in line with the Fortran spec.

i.e. for the following string, I had *expected an error*:

  point, isreal =  F , testinput = ";"n=          42  ios=           0
  point, isreal =  F , testinput = "5;"n=           5  ios=           0
  point, isreal =  T , testinput = "8;"r=   8.00000000      ios= 0
  point, isreal =  T , testinput = "3.3;"r=   3.29999995      ios= 0
  point, isreal =  T , testinput = "3,3;"r=   3.00000000      ios= 0

while I think the following is OK (i.e. no error is what I expect) due 
to the the space before the ';'.

  point, isreal =  F , testinput = "7 ;"n=           7  ios= 0
  point, isreal =  T , testinput = "9 ;"r=   9.00000000      ios= 0
  point, isreal =  T , testinput = "4.4 ;"r=   4.40000010      ios=0
  point, isreal =  T , testinput = "9 ;"r=   9.00000000      ios= 0
  point, isreal =  T , testinput = "4,4 ;"r=   4.00000000      ios= 0

* * *

Looking at the other compilers, ifort, ifx and Flang do issue an error 
here. Likewise, g95 seems to yield an error in this case (see below).

I do note that the Lapack testcase that triggered this PR did have such 
a code - but it was then changed because g95 did not like it:

https://github.com/Reference-LAPACK/lapack/commit/64e8a7500d817869e5fcde35afd39af8bc7a8086

In terms of gfortran: until recently did accept it (all versions, 
including 13+14); it then rejected it due to the change in PR105473 (GCC 
14/mainline, backported to 13)– but I now think it rightly did so. With 
the current patch, it is accepted again.

* * *

I have attached the modified testcase linked above; consider adding it 
as well. - Changes to the one of the attachment:
- I added a few additional (albeit boring) tests
- I added an expected output + error diagnostic.

The testcase assumes an error for ';' as separator (with 'point'), 
unless there is a space before it.

[If we want to not diagnose this as vendor extension, we really need to 
add a comment to that testcase besides changing valid = .false. to .true.]

Tobias
Jerry D April 4, 2024, 8:05 p.m. UTC | #3
On 4/4/24 2:31 AM, Tobias Burnus wrote:
> Hi Jerry,
> 
> Jerry D wrote:
>> The attached log entry and patch (git show) fixes this issue by adding 
>> logic to handle spaces in eat_separators. One or more spaces by 
>> themselves are a valid separator. So in this case we look at the 
>> character following the spaces to see if it is a comma or semicolon.
>>
>> If so, I change it to the valid separator for the given decimal mode, 
>> point or comma. This allows the comma or semicolon to be interpreted 
>> as a null read on the next effective item in the formatted read.
>>
>> I chose a permissive approach here that allows reads to proceed when the
>> input line is mal-formed with an incorrect separator as long as there 
>> is at least one space in front of it.
> 
> First: Consider also adding 'PR fortran/105473' to the commit log
> as the PRs are closely related, albeit this PR is different-
> 
> The patch looks mostly like I would expect, except for decimal='point' 
> and a ';' which is *not* preceded by a space.
> 
> Thanks for working on it.
> 
> Regarding the 'except' case:
> 
> * * *
> 
> If I try your patch with the testcase of at comment 19,
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114304#c19
> → https://gcc.gnu.org/bugzilla/attachment.cgi?id=57695 ,
> 
> I do note that with 'decimal=point', a tailing semicolon is silently
> accepted – even if not proceeded by a space.

I did that on purpose out of leniency and fear of breaking something.  I 
will add that in and do some testing.

> 
> I think such code is invalid – and you could consider to reject it.
> Otherwise, the handling all seems to be in line with the Fortran spec.
> 
> i.e. for the following string, I had *expected an error*:

I will see what it does. I agree in principle.

> 
>   point, isreal =  F , testinput = ";"n=          42  ios=           0
>   point, isreal =  F , testinput = "5;"n=           5  ios=           0
>   point, isreal =  T , testinput = "8;"r=   8.00000000      ios= 0
>   point, isreal =  T , testinput = "3.3;"r=   3.29999995      ios= 0
>   point, isreal =  T , testinput = "3,3;"r=   3.00000000      ios= 0
> 
> while I think the following is OK (i.e. no error is what I expect) due 
> to the the space before the ';'.

Agree and this is what I was attempting.

> 
>   point, isreal =  F , testinput = "7 ;"n=           7  ios= 0
>   point, isreal =  T , testinput = "9 ;"r=   9.00000000      ios= 0
>   point, isreal =  T , testinput = "4.4 ;"r=   4.40000010      ios=0
>   point, isreal =  T , testinput = "9 ;"r=   9.00000000      ios= 0
>   point, isreal =  T , testinput = "4,4 ;"r=   4.00000000      ios= 0
> 
> * * *
> 
> Looking at the other compilers, ifort, ifx and Flang do issue an error 
> here. Likewise, g95 seems to yield an error in this case (see below).
> 
> I do note that the Lapack testcase that triggered this PR did have such 
> a code - but it was then changed because g95 did not like it:
> 
> https://github.com/Reference-LAPACK/lapack/commit/64e8a7500d817869e5fcde35afd39af8bc7a8086
> 

I am glad they fixed that, it triggered a whole lot of cycles here.

> In terms of gfortran: until recently did accept it (all versions, 
> including 13+14); it then rejected it due to the change in PR105473 (GCC 
> 14/mainline, backported to 13)– but I now think it rightly did so. With 
> the current patch, it is accepted again.
> 
> * * *
> 
> I have attached the modified testcase linked above; consider adding it 
> as well. - Changes to the one of the attachment:
> - I added a few additional (albeit boring) tests
> - I added an expected output + error diagnostic.
> 
> The testcase assumes an error for ';' as separator (with 'point'), 
> unless there is a space before it.
> 
> [If we want to not diagnose this as vendor extension, we really need to 
> add a comment to that testcase besides changing valid = .false. to .true.]

I have gone back and forth on this issue many times trying to find the 
middle ground. The standard is fairly clear. To me it is on the user to 
not use malformed input so allowing with the intervening space we are 
simply ignoring the trailing ',' or ';' and calling the spaces 
sufficient as a valid separator.

Regardless I now have your modified test case passing. Much appreciated.

Thanks for the reviews.  I will resubmit when I can, hopefully today.

Cheers,

Jerry
Jerry D April 4, 2024, 9:04 p.m. UTC | #4
On 4/4/24 2:31 AM, Tobias Burnus wrote:
> Hi Jerry,
> 
--- snip ---
> The patch looks mostly like I would expect, except for decimal='point' 
> and a ';' which is *not* preceded by a space.
> 
> Thanks for working on it.
> 
> Regarding the 'except' case:
> 
--- snip ---
> i.e. for the following string, I had *expected an error*:
> 
>   point, isreal =  F , testinput = ";"n=          42  ios=           0
>   point, isreal =  F , testinput = "5;"n=           5  ios=           0
>   point, isreal =  T , testinput = "8;"r=   8.00000000      ios= 0
>   point, isreal =  T , testinput = "3.3;"r=   3.29999995      ios= 0
>   point, isreal =  T , testinput = "3,3;"r=   3.00000000      ios= 0
> 
--- snip ---
> I have attached the modified testcase linked above; consider adding it 
> as well. - Changes to the one of the attachment:
> - I added a few additional (albeit boring) tests
> - I added an expected output + error diagnostic.
> 
> The testcase assumes an error for ';' as separator (with 'point'), 
> unless there is a space before it.
> 
--- snip ---

Here attached is the revised patch.  I replaced the new test case I had 
with the one you provided modified and it now passes.

I modified the test case pr105473.f90 to expect the error on semicolon.

Regression tested on X86_64.  OK for trunk and a bit later back port to 13?

Cheers,

Jerry
Tobias Burnus April 4, 2024, 9:41 p.m. UTC | #5
Hi Jerry,

I think for the current testcases, I like the patch – the question is 
only what's about:

   ',3' as input for 'comma'   (or '.3' as input for 'point')

For 'point' – 0.3 is read and ios = 0 (as expected)
But for 'comma':
* GCC 12 reads nothing and has ios = 0.
* GCC 13/mainline has an error (ios != 0 – and reads nothing)
* GCC with your patch: Same result: ios != 0 and nothing read.

Expected: Same as with ','/'comma' – namely: read-in value is 0.3.
→ https://godbolt.org/z/4rc8fz4sT for the full example, which works with 
ifort, ifx and flang

* * *

Can you check and fix this? It looks perfectly valid to me to have 
remove the '0' in the floating point numbers '0.3' or '0,3' seems to be 
permitted – and it works for '.' (with 'point') but not for ',' (with 
'comma').

F2023's "13.10.3.1 List-directed input forms" refers to "13.7.2.3.2 F 
editing", which states:

"The standard form of the input field [...] The form of the mantissa is 
an optional sign, followed by a string of one or more digits optionally 
containing a decimal symbol."

The latter does not require that the digit has to be before the decimal 
sign and as for output, it is optional, it is surely intended that ",3" 
is a valid floating-point number for decimal='comma'.

* * *

I extended the testcase to check for this – see attached diff. All 
'point' work, all 'comma' fail.

Thanks for working on this!

Tobias
Jerry D April 5, 2024, 5:47 p.m. UTC | #6
On 4/4/24 2:41 PM, Tobias Burnus wrote:
> Hi Jerry,
> 
> I think for the current testcases, I like the patch – the question is 
> only what's about:
> 
>    ',3' as input for 'comma'   (or '.3' as input for 'point')
> 
> For 'point' – 0.3 is read and ios = 0 (as expected)
> But for 'comma':
> * GCC 12 reads nothing and has ios = 0.
> * GCC 13/mainline has an error (ios != 0 – and reads nothing)
> * GCC with your patch: Same result: ios != 0 and nothing read.
> 
> Expected: Same as with ','/'comma' – namely: read-in value is 0.3.
> → https://godbolt.org/z/4rc8fz4sT for the full example, which works with 
> ifort, ifx and flang
> 
> * * *
> 
> Can you check and fix this? It looks perfectly valid to me to have 
> remove the '0' in the floating point numbers '0.3' or '0,3' seems to be 
> permitted – and it works for '.' (with 'point') but not for ',' (with 
> 'comma').
> 
Yes, I found the spot to fix this.

> F2023's "13.10.3.1 List-directed input forms" refers to "13.7.2.3.2 F 
> editing", which states:
> 
> "The standard form of the input field [...] The form of the mantissa is 
> an optional sign, followed by a string of one or more digits optionally 
> containing a decimal symbol."
> 
> The latter does not require that the digit has to be before the decimal 
> sign and as for output, it is optional, it is surely intended that ",3" 
> is a valid floating-point number for decimal='comma'.
> 

Agree

> * * *
> 
> I extended the testcase to check for this – see attached diff. All 
> 'point' work, all 'comma' fail.
> 
> Thanks for working on this!
> 
> Tobias

Thanks much. After I fix it, I will use your extended test case in the 
test suite.

Jerry -
Jerry D April 6, 2024, 2:38 a.m. UTC | #7
On 4/5/24 10:47 AM, Jerry D wrote:
> On 4/4/24 2:41 PM, Tobias Burnus wrote:
>> Hi Jerry,
>>
>> I think for the current testcases, I like the patch – the question is 
>> only what's about:
>>
>>    ',3' as input for 'comma'   (or '.3' as input for 'point')
>>
>> For 'point' – 0.3 is read and ios = 0 (as expected)
>> But for 'comma':
>> * GCC 12 reads nothing and has ios = 0.
>> * GCC 13/mainline has an error (ios != 0 – and reads nothing)
>> * GCC with your patch: Same result: ios != 0 and nothing read.
>>
>> Expected: Same as with ','/'comma' – namely: read-in value is 0.3.
>> → https://godbolt.org/z/4rc8fz4sT for the full example, which works 
>> with ifort, ifx and flang
>>
>> * * *
>>
>> Can you check and fix this? It looks perfectly valid to me to have 
>> remove the '0' in the floating point numbers '0.3' or '0,3' seems to 
>> be permitted – and it works for '.' (with 'point') but not for ',' 
>> (with 'comma').
>>
> Yes, I found the spot to fix this.
> 
>> F2023's "13.10.3.1 List-directed input forms" refers to "13.7.2.3.2 F 
>> editing", which states:
>>
>> "The standard form of the input field [...] The form of the mantissa 
>> is an optional sign, followed by a string of one or more digits 
>> optionally containing a decimal symbol."
>>
>> The latter does not require that the digit has to be before the 
>> decimal sign and as for output, it is optional, it is surely intended 
>> that ",3" is a valid floating-point number for decimal='comma'.
>>
> 
> Agree
> 
>> * * *
>>
>> I extended the testcase to check for this – see attached diff. All 
>> 'point' work, all 'comma' fail.
>>
>> Thanks for working on this!
>>
>> Tobias
> 
> Thanks much. After I fix it, I will use your extended test case in the 
> test suite.
> 
> Jerry -


See attached updated patch.

Regressions tested on x86-64. OK for trunk and 13 after a bit.

Jerry -
Tobias Burnus April 6, 2024, 5:17 a.m. UTC | #8
Hi Jerry, hello world,

Jerry D wrote:
> On 4/5/24 10:47 AM, Jerry D wrote:
>> On 4/4/24 2:41 PM, Tobias Burnus wrote:
>>> I think for the current testcases, I like the patch – the question 
>>> is only what's about:
>>>    ',3' as input for 'comma'   (or '.3' as input for 'point')
>>> [...]
>>> But for 'comma': [...]
>>> * GCC with your patch: Same result: ios != 0 and nothing read.
>>>
>>> Expected: [...] read-in value is 0.3. [...]

> See attached updated patch.
> Regressions tested on x86-64. OK for trunk and 13 after a bit.

OK. Thanks for the patch!

Tobias
diff mbox series

Patch

commit 7d1a958d6b099ea88b6c51649baf5dbd5e598909
Author: Jerry DeLisle <jvdelisle@gcc.gnu.org>
Date:   Wed Apr 3 18:07:30 2024 -0700

    libfortran: Fix handling of formatted separators.
    
            PR libfortran/114304
    
    libgfortran/ChangeLog:
    
            * io/list_read.c (eat_separator): Add logic to handle spaces
            preceding a comma or semicolon such that that a 'null' read
            occurs without error at the end of comma or semicolon
            terminated input lines.
    
    gcc/testsuite/ChangeLog:
    
            * gfortran.dg/pr114304.f90: New test.

diff --git a/gcc/testsuite/gfortran.dg/pr114304.f90 b/gcc/testsuite/gfortran.dg/pr114304.f90
new file mode 100644
index 00000000000..57af619246b
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr114304.f90
@@ -0,0 +1,49 @@ 
+! { dg-do run }
+! pr114304
+real :: x(3)
+character(len=1) :: s
+integer :: ios
+
+s = 'x'
+
+open(99, decimal="comma", status='scratch')
+write(99, '(a)') '1,23435 1243,24 13,24 a'
+rewind(99)
+read(99, *, iostat=ios) x, s
+if (ios /= 0 .or. s /= 'a') stop 1
+
+rewind(99)
+write(99, '(a)') '1,23435;1243,24;13,24;a'
+rewind(99)
+read(99, *, iostat=ios) x, s
+if (ios /= 0 .or. s /= 'a') stop 2
+
+! Note: not reading 's'
+rewind(99)
+write(99, '(a)') '1,23435 1243,24 13,24 ,'
+rewind(99)
+read(99, *) x
+if (ios /= 0) stop 3
+
+rewind(99)
+write(99, '(a)') '1,23435;1243,24;13,24 ,'
+rewind(99)
+read(99, *, iostat=ios) x
+if (ios /= 0) stop 4
+
+! Now reading 's'
+s = 'w'
+rewind(99)
+write(99, '(a)') '1,23435 1243,24 13,24 ,'
+rewind(99)
+read(99, *, iostat=ios) x, s
+if (ios /= 0 .or. s /= 'w') stop 5
+
+s = 'w'
+rewind(99)
+write(99, '(a)') '1,23435;1243,24;13,24 ,'
+rewind(99)
+read(99, *, iostat=ios) x, s
+if (ios /= 0 .or. s /= 'w') stop 6
+close(99)
+end
diff --git a/libgfortran/io/list_read.c b/libgfortran/io/list_read.c
index fb3f7dbc34d..f6f169043bf 100644
--- a/libgfortran/io/list_read.c
+++ b/libgfortran/io/list_read.c
@@ -461,11 +461,30 @@  eat_separator (st_parameter_dt *dtp)
   int c, n;
   int err = 0;
 
-  eat_spaces (dtp);
   dtp->u.p.comma_flag = 0;
+  c = next_char (dtp);
+  if (c == ' ')
+    {
+      eat_spaces (dtp);
+      c = next_char (dtp);
+      if (c == ',')
+	{
+	  if (dtp->u.p.current_unit->decimal_status == DECIMAL_COMMA)
+	    unget_char (dtp, ';');
+	  dtp->u.p.comma_flag = 1;
+	  eat_spaces (dtp);
+	  return err;
+	}
+      if (c == ';')
+	{
+	  if (dtp->u.p.current_unit->decimal_status == DECIMAL_POINT)
+	    unget_char (dtp, ',');
+	  dtp->u.p.comma_flag = 1;
+	  eat_spaces (dtp);
+	  return err;
+	}
+    }
 
-  if ((c = next_char (dtp)) == EOF)
-    return LIBERROR_END;
   switch (c)
     {
     case ',':
@@ -476,7 +495,10 @@  eat_separator (st_parameter_dt *dtp)
 	  unget_char (dtp, c);
 	  break;
 	}
-    /* Fall through. */
+      dtp->u.p.comma_flag = 1;
+      eat_spaces (dtp);
+      break;
+
     case ';':
       dtp->u.p.comma_flag = 1;
       eat_spaces (dtp);