[fortran/linemap] Add enough column hint to fit any possible offset
diff mbox

Message ID CAESRpQCNKPOsYZRg87nC2zdKtWxVaGbNw-XVC6A7gVzccWzoGQ@mail.gmail.com
State New
Headers show

Commit Message

Manuel López-Ibáñez Dec. 2, 2014, 12:12 a.m. UTC
This patch actually does not touch linemap but I will appreciate
Dodji's comments about the approach. The problem is that in case of
long lines, the column hint of 120 might be too small, thus we do not
have enough locations within one line to point to a higher column
(like 132 in the testcase). Giving as column hint the length of the
line seems the right fix.

The other issue that triggers with this testcase is that, even though
the column hint allows having line 6 + column 132, since we never
generate a location except for 6:0, if a new map is created for the
new line, then it will consider that the last location was 6:0 and the
new map for line 7 will start at location+1, hence, no offset within
line 6 can be represented. My fix here is to create a dummy location
for line_len -1 (the last column in the line). This forces the next
map to start after this last column's location.

I'm not sure if this latter fix is the approach we want to take. If
so, then we may want to change linemap.c itself to force new maps to
reserve enough locations for any offset in the line instead of doing
last_location+1. If not, perhaps the alternative implementation that
encodes offsets explicitly throughout the diagnostics machinery would
be better. In fact, diagnostic_info already contains an
override_column member:

struct diagnostic_info
{
  text_info message;
  location_t location;
  unsigned int override_column;
  ....

but I don't see a way to change that from within the function that
formats %L and %C:

static bool
gfc_format_decoder (pretty_printer *pp,
            text_info *text, const char *spec,
            int precision ATTRIBUTE_UNUSED, bool wide ATTRIBUTE_UNUSED,
            bool plus ATTRIBUTE_UNUSED, bool hash ATTRIBUTE_UNUSED)

unless we move override_column to text_info. In fact,
diagnostic_info.text_info already contains a locus, which is a pointer
to diagnostic_info.location.

In any case, the current patch fixes these ICEs and I couldn't trigger
new ones. Bootstrapped and regression tested.

OK for Fortran?

gcc/fortran/ChangeLog:

2014-12-02  Manuel López-Ibáñez  <manu@gcc.gnu.org>

    * scanner.c (gfc_next_char_literal): Use gfc_warning_now.
    (load_file): Use the line length as the column hint for
    linemap_line_start. Reserve a location for the highest column of
    the line.

gcc/testsuite/ChangeLog:

2014-12-02  Manuel López-Ibáñez  <manu@gcc.gnu.org>

    * gfortran.dg/line_length_5.f90: New test.

Comments

Tobias Burnus Dec. 2, 2014, 6:50 a.m. UTC | #1
Hi Manuel,

Manuel López-Ibáñez wrote:
> This patch actually does not touch linemap but I will appreciate
> Dodji's comments about the approach. The problem is that in case of
> long lines, the column hint of 120 might be too small, thus we do not
> have enough locations within one line to point to a higher column
> (like 132 in the testcase). Giving as column hint the length of the
> line seems the right fix.
> [...]
> My fix here is to create a dummy location for line_len -1 (the last column in the line). This forces the next map to start after this last column's location.

My feeling is that that doesn't work for 
-ffree-line-length-none/-ffixed-line-length-none. In that case, 
gfc_option.free_line_length == 0 and there is no limit to the number of 
characters per line. Thus, I fear that

        b->location
-	= linemap_line_start (line_table, current_file->line++, 120);
+	= linemap_line_start (line_table, current_file->line++, line_len);


might even set the line length to 0 – but I haven't checked. For all 
other values, it should work.

Tobias
Manuel López-Ibáñez Dec. 2, 2014, 1:04 p.m. UTC | #2
On 2 December 2014 at 07:50, Tobias Burnus <burnus@net-b.de> wrote:
> Hi Manuel,
>
> Manuel López-Ibáñez wrote:
>>
>> This patch actually does not touch linemap but I will appreciate
>> Dodji's comments about the approach. The problem is that in case of
>> long lines, the column hint of 120 might be too small, thus we do not
>> have enough locations within one line to point to a higher column
>> (like 132 in the testcase). Giving as column hint the length of the
>> line seems the right fix.
>> [...]
>> My fix here is to create a dummy location for line_len -1 (the last column
>> in the line). This forces the next map to start after this last column's
>> location.
>
>
> My feeling is that that doesn't work for
> -ffree-line-length-none/-ffixed-line-length-none. In that case,
> gfc_option.free_line_length == 0 and there is no limit to the number of
> characters per line. Thus, I fear that
>
>        b->location
> -       = linemap_line_start (line_table, current_file->line++, 120);
> +       = linemap_line_start (line_table, current_file->line++, line_len);
>
>
> might even set the line length to 0 - but I haven't checked. For all other
> values, it should work.

My understanding of this piece of code is that Fortran reads the whole
line into a buffer, and that line_len is the length of the line just
read. Am I wrong? If so, is there any way to compute the length of the
current line?

A small optimization would be to use all that information about
truncation limits to give a smaller column hint when possible, but
someone more knowledgeable in Fortran can do that as a follow-up. I
think setting it to the actual length line will fix the ICEs and only
waste a bit of memory if the lines are quite long.

Cheers,

Manuel.
Dodji Seketeli Dec. 2, 2014, 2:15 p.m. UTC | #3
Hello Manuel, Tobias,

Manuel López-Ibáñez <lopezibanez@gmail.com> writes:

> This patch actually does not touch linemap but I will appreciate
> Dodji's comments about the approach.

Thanks :-)

> The problem is that in case of long lines, the column hint of 120
> might be too small, thus we do not have enough locations within one
> line to point to a higher column (like 132 in the testcase). Giving as
> column hint the length of the line seems the right fix.

I see.

> The other issue that triggers with this testcase is that, even though
> the column hint allows having line 6 + column 132, since we never
> generate a location except for 6:0, if a new map is created for the
> new line, then it will consider that the last location was 6:0 and the
> new map for line 7 will start at location+1, hence, no offset within
> line 6 can be represented.

Right.  This is done by design to allow for a kind of 'compression' of
the theoretical source location space.  That is, the number of source
locations known to the line map sub-system roughly becomes (roughly) the
number of tokens issued by the tokenizer, rather than the total number
of columns multiplied by the number of lines of the input program.

And of course, we are we today willing something in-between, I guess.
:-)

> My fix here is to create a dummy location for line_len -1 (the last
> column in the line). This forces the next map to start after this last
> column's location.

I think this makes sense.  I'll get to the theoretically unlimited line
length case that Tobias alludes to later below.

> I'm not sure if this latter fix is the approach we want to take.

I think this approach is fine.

> If so, then we may want to change linemap.c itself to force new maps
> to reserve enough locations for any offset in the line instead of
> doing last_location+1.

Hmmh, we must keep in mind that this whole line map thing should keep
working for cases where the primary way to interact with the source
location management sub-system is basically cpp_get_token().  So if that
condition is satisfied, then, why not.  I guess I'd need to look at
specifics of what you are proposing here before talking :-)

[...]

Tobias Burnus <burnus@net-b.de> writes:

[...]

>> My fix here is to create a dummy location for line_len -1 (the last
>> column in the line). This forces the next map to start after this
>> last column's location.
>
> My feeling is that that doesn't work for
> -ffree-line-length-none/-ffixed-line-length-none. In that case,
> gfc_option.free_line_length == 0 and there is no limit to the number
> of characters per line. Thus, I fear that
>
>        b->location
> -	= linemap_line_start (line_table, current_file->line++, 120);
> +	= linemap_line_start (line_table, current_file->line++, line_len);
>
>
> might even set the line length to 0 – but I haven't checked. For all
> other values, it should work.

I am not sure what the value of line_len is when
gfc_option.free_line_length == 0.  If it's not zero, then we are good.
If it's zero (with the semantics of that zero meaning that the line
might be very big), then we might just get the actual length of the line
in terms of column count and pass that to linemap_line_start.

We must just keep in mind that the line subsystem has a hard limit on
column numbers (I think it's 100 000 columns at the moment).  Passed
that limit, we give up tracking column numbers in source locations; that
means that all columns numbers are set to zero.


I hope this is useful.

Cheers,
Tobias Burnus Dec. 3, 2014, 11:25 p.m. UTC | #4
Manuel López-Ibáñez wrote:
> In any case, the current patch fixes these ICEs and I couldn't trigger
> new ones. Bootstrapped and regression tested.
> OK for Fortran?

OK. Thanks.

  * * *

BTW: The output of the code changes with the common diagnostic. For a 
different test case, I get (old)

-------------------------
  print *, "rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr 
", "jgk
1
Warning: Nonconforming tab character at (1)
-------------------------

while the new code gives:

-------------------------

print *, "rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr 
", "jgkg"
1
Warning: Nonconforming tab character at (1) [-Wtabs]
-------------------------

Namely, the previous code trims the output to show only the code around 
the error location while the common-diagnostics code shows the whole line.

(That's normally not observable as the lines are short. That makes the ^ 
or rather (1) look rather misplaced. On the other hand, if the error 
location is misplaced, it is actually better; e.g. if a comment is at 
the end like the long "! {dg-error ...}", the location might point at 
the end of the comment, which is rather misplaced. Still, only showing 
part of the line probably makes sense in general.)

Tobias

> gcc/fortran/ChangeLog:
>
> 2014-12-02  Manuel López-Ibáñez  <manu@gcc.gnu.org>
>
>      * scanner.c (gfc_next_char_literal): Use gfc_warning_now.
>      (load_file): Use the line length as the column hint for
>      linemap_line_start. Reserve a location for the highest column of
>      the line.
>
> gcc/testsuite/ChangeLog:
>
> 2014-12-02  Manuel López-Ibáñez  <manu@gcc.gnu.org>
>
>      * gfortran.dg/line_length_5.f90: New test.
Manuel López-Ibáñez Dec. 4, 2014, 12:38 a.m. UTC | #5
On 4 December 2014 at 00:25, Tobias Burnus <burnus@net-b.de> wrote:
> Manuel López-Ibáñez wrote:
>>
>> In any case, the current patch fixes these ICEs and I couldn't trigger
>> new ones. Bootstrapped and regression tested.
>> OK for Fortran?
>
>
> OK. Thanks.

It is still not clear to me if line_len is the length of the line read
or not, is it? If not, is there any way to actually get the length of
the line?

> Namely, the previous code trims the output to show only the code around the
> error location while the common-diagnostics code shows the whole line.

The code in the common diagnostics will trim the caret line on both
sides as needed depending on your COLUMNS, to show at least 10
characters after the '1'. In particular, MAX_WIDTH will be COLUMNS-1,
if COLUMN is the column pointed by the '1', then:

/* If LINE is longer than MAX_WIDTH, and COLUMN is not smaller than
   MAX_WIDTH by some margin, then adjust the start of the line such
   that the COLUMN is smaller than MAX_WIDTH minus the margin.  The
   margin is either 10 characters or the difference between the column
   and the length of the line, whatever is smaller.  The length of
   LINE is given by LINE_WIDTH.  */

But perhaps the defaults for Fortran are different if COLUMNS is not
set. The common diagnostics defaults to no limit in that case
(MAX_WIDTH == INT_MAX).

Moreover, the caret is always printed indented by one whitespace (so
one can easily grep -v out the caret info), thus it is very strange
that you get the output that you pasted.

By forcing the maximum MAX_WIDTH to 80 columns I get:

$ f951 -Wtabs test.f90 -fmessage-length=80
test.f90:1:1:

  print *, "rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr ", "jgk
 1
Warning: Nonconforming tab character at (1) [-Wtabs]

which is exactly what you pasted for (old) except for the indentation
of the caret line. (The indentation sounded like a good idea at the
beginning of using the caret, but now I'm not so sure if it is that
useful).

Cheers,

Manuel.
Tobias Burnus Dec. 4, 2014, 10:21 a.m. UTC | #6
Hi Manuel,

thanks for rightfully nagging - I shouldn't do late at night reviews. Regarding:

Manuel López-Ibáñez wrote:
> It is still not clear to me if line_len is the length of the line read
> or not, is it? If not, is there any way to actually get the length of
> the line?

Looking at the code in load_line, the line_len in

   int trunc = load_line (input, &line, &line_len, NULL);

is the size of the buffer. That's not too bad but usually to large. However,
the actual line length is determined one line later:

   len = gfc_wide_strlen (line);

which does a strlen on gfc_char_t which is uint32_t to accomodate unicode
characters.


Hence, I think

       b->location
-	= linemap_line_start (line_table, current_file->line++, 120);
+	= linemap_line_start (line_table, current_file->line++, line_len);

should use "len" instead of "line_len".


 * * *

> > Namely, the previous code trims the output to show only the code around the
> > error location while the common-diagnostics code shows the whole line.

I just realized that all the space indentation got lost with Thunderbird - while
it still showed it in the edit window.

See attachment for a new attempt.

Using -fmessage-length= or COLUMNS explicitly works.

Fortran uses gcc/fortran/error.c's get_terminal_width() to determine the
terminal width. While the common code uses diagnostic_set_caret_max_width,
which does not seem to work reliably.

Playing around with it, the common approach seems to work with a normal
Xterm and gnome-terminal, but it fails with MobaXterm (Windows ssh client;
shows "echo $COLUMNS" in the shell correctly, but somehow that doesn't reach
GCC) - but using "COLUMNS=80 ~/gcc/gcc-trunk/bin/gfortran" works also with
MobaXterm. Yesterday, I tried KDE's konsole and it didn't seem to work there,
either. But I cannot check as the RHEL 6 here doesn't seem to have KDE.

Thus, it seems to be enough to transfer checks from get_terminal_width
to diagnostic_set_caret_max_width to fix this issue.

Tobias
Input file - all spaces, but at tab between "aaa',"  and "'bg":
-------<cut test.f90>--------------------------------
                                                                 print *, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa',	'bgfhkg'
                                                                 end
-------</cut test.f90>--------------------------------



Using:
   $ ~/gcc/gcc-4.9/bin/gfortran -ffree-line-length-none -pedantic test.f90

One gets with a pipe or wide terminal:
------------------------------------------------------------
test.f90:1.145:

                                                                 print *, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', 'bgfhkg'
                                                                                                                                                 1
Warning: Nonconforming tab character at (1)
------------------------------------------------------------



And with a small terminal:
------------------------------------------------------------
test.f90:1.145:

t *, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', 'bgf
                                                                            1
Warning: Nonconforming tab character at (1)
------------------------------------------------------------




And with the new code, one gets for a small window:
$ ~/gcc/gcc-trunk/bin/gfortran -ffree-line-length-none -pedantic test.f90
test.f90:1:145:

------------------------------------------------------------
                                                                  print *, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', 'bgfhkg'
                                                                                                                                                 1
Warning: Nonconforming tab character at (1) [-Wtabs]
------------------------------------------------------------

Patch
diff mbox

Index: gcc/testsuite/gfortran.dg/line_length_5.f90
===================================================================
--- gcc/testsuite/gfortran.dg/line_length_5.f90	(revision 0)
+++ gcc/testsuite/gfortran.dg/line_length_5.f90	(revision 0)
@@ -0,0 +1,7 @@ 
+! { dg-do compile }
+! { dg-options "-Wline-truncation" }
+print *, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+end 
+
+      print *, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+! { dg-warning "Line truncated" " " { target *-*-* } 3 }
Index: gcc/fortran/scanner.c
===================================================================
--- gcc/fortran/scanner.c	(revision 218192)
+++ gcc/fortran/scanner.c	(working copy)
@@ -1054,11 +1054,12 @@  restart:
 	  int maxlen = gfc_option.free_line_length;
 	  gfc_char_t *current_nextc = gfc_current_locus.nextc;
 
 	  gfc_current_locus.lb->truncated = 0;
 	  gfc_current_locus.nextc =  gfc_current_locus.lb->line + maxlen;
-	  gfc_warning_now_1 ("Line truncated at %L", &gfc_current_locus);
+	  gfc_warning_now (OPT_Wline_truncation,
+			   "Line truncated at %L", &gfc_current_locus);
 	  gfc_current_locus.nextc = current_nextc;
 	}
 
       if (c != '&')
 	goto done;
@@ -1192,11 +1193,12 @@  restart:
       /* Check to see if the continuation line was truncated.  */
       if (warn_line_truncation && gfc_current_locus.lb != NULL
 	  && gfc_current_locus.lb->truncated)
 	{
 	  gfc_current_locus.lb->truncated = 0;
-	  gfc_warning_now_1 ("Line truncated at %L", &gfc_current_locus);
+	  gfc_warning_now (OPT_Wline_truncation,
+			   "Line truncated at %L", &gfc_current_locus);
 	}
 
       prev_openmp_flag = openmp_flag;
       continue_flag = 1;
       old_loc = gfc_current_locus;
@@ -2041,11 +2043,17 @@  load_file (const char *realfilename, con
 
       b = XCNEWVAR (gfc_linebuf, gfc_linebuf_header_size
 		    + (len + 1) * sizeof (gfc_char_t));
 
       b->location
-	= linemap_line_start (line_table, current_file->line++, 120);
+	= linemap_line_start (line_table, current_file->line++, line_len);
+      /* ??? We add the location for the maximum column possible here,
+	 because otherwise if the next call creates a new line-map, it
+	 will not reserve space for any offset.  */
+      gcc_assert (line_len > 0);
+      linemap_position_for_column (line_table, line_len - 1);
+
       b->file = current_file;
       b->truncated = trunc;
       wide_strcpy (b->line, line);
 
       if (line_head == NULL)