diff mbox

[libcpp] maybe canonicalize system paths in line-map

Message ID m38vhikdia.fsf@seketeli.org
State New
Headers show

Commit Message

Dodji Seketeli April 26, 2012, 10:12 a.m. UTC
Jason Merrill <jason@redhat.com> a écrit:

> It seems like we'll do this for every line in the header, which could
> lead to a lot of leaked memory.  Instead, we should canonicalize when
> setting ORDINARY_MAP_FILE_NAME.

[...]

Manuel López-Ibáñez <lopezibanez@gmail.com> a écrit:

> On 21 April 2012 14:56, Jason Merrill <jason@redhat.com> wrote:
>> It seems like we'll do this for every line in the header, which could lead
>> to a lot of leaked memory.  Instead, we should canonicalize when setting
>> ORDINARY_MAP_FILE_NAME.
>
> Hum, my understanding of the code is that this is exactly what I
> implemented. That is, at most we leak every time
> ORDINARY_MAP_FILE_NAME is set to a system header file (if the realpath
> is shorter than the original path). This is a bit inefficient because
> this happens two times per #include (when entering and when leaving).
> But I honestly don't know how to avoid this.

My understanding is that by the time we are about to deal with the
ORDINARY_MAP_FILE_NAME property of a given map, it's too late; we have
lost the "memory leak game" because that property just points to the
path carried by the instance _cpp_file::path that is current when the
map is built.  So the lifetime of that property is tied to the lifetime
of the relevant instance of _cpp_file.

So maybe it'd be better to canonicalize the _cpp_file::path when it's
first build?  One drawback of that approach would be that
_cpp_file::path will then permanently loose the information about the
current directory, that is indirectly encoded into the way the file path
is originally built.  But do we really need that information?

If you agree with that approach, the patch below might do the trick.  I
have only lightly tested it and am about to bootstrap it to double
check.

On the test case below:

     1	#include <algorithm>
     2	
     3	void
     4	f ()
     5	{
     6	    int a = 0;
     7	    std::sort (a);
     8	}

It yields an output that looks like:

---------->8<----------------
test.cc: In function ‘void f()’:
test.cc:7:17: erreur: no matching function for call to ‘sort(int&)’
     std::sort (a);
                 ^
test.cc:7:17: note: candidates are:
     std::sort (a);
                 ^
In file included from /home/dodji/devel/git/gcc/tmp/libstdc++-v3/include/std/algorithm:63:0,
                 from test.cc:1:
/home/dodji/devel/git/gcc/tmp/libstdc++-v3/include/bits/stl_algo.h:5463:5: note: template<class _RAIter> void std::sort(_RAIter, _RAIter)
     sort(_RandomAccessIterator __first, _RandomAccessIterator __last)
     ^
/home/dodji/devel/git/gcc/tmp/libstdc++-v3/include/bits/stl_algo.h:5463:5: note:   template argument deduction/substitution failed:
     sort(_RandomAccessIterator __first, _RandomAccessIterator __last)
     ^
test.cc:7:17: note:   candidate expects 2 arguments, 1 provided
     std::sort (a);
                 ^
In file included from /home/dodji/devel/git/gcc/tmp/libstdc++-v3/include/std/algorithm:63:0,
                 from test.cc:1:
/home/dodji/devel/git/gcc/tmp/libstdc++-v3/include/bits/stl_algo.h:5499:5: note: template<class _RAIter, class _Compare> void std::sort(_RAIter, _RAIter, _Compare)
     sort(_RandomAccessIterator __first, _RandomAccessIterator __last,
     ^
/home/dodji/devel/git/gcc/tmp/libstdc++-v3/include/bits/stl_algo.h:5499:5: note:   template argument deduction/substitution failed:
     sort(_RandomAccessIterator __first, _RandomAccessIterator __last,
     ^
test.cc:7:17: note:   candidate expects 3 arguments, 1 provided
     std::sort (a);
                 ^
---------->8<----------------



>
> If any one has suggestions on a better implementation, I am happy to
> hear about it.

So here is the patch; it just borrows your implementation of
maybe_shorter_path and uses it in file_file_in_dir.  There should not be
any leak, IIUC.

Comments

Jonathan Wakely April 26, 2012, 10:20 a.m. UTC | #1
On 26 April 2012 11:12, Dodji Seketeli wrote:
>
> So maybe it'd be better to canonicalize the _cpp_file::path when it's
> first build?  One drawback of that approach would be that
> _cpp_file::path will then permanently loose the information about the
> current directory, that is indirectly encoded into the way the file path
> is originally built.  But do we really need that information?

I don't know, but I certainly don't need it in diagnostic output.  It
occurred to me last night (while debugging some deeply nested template
instantiations in libstdc++ headers) that the same
target/lib/../../../.. junk appears in file paths shown by GDB.  Does
this change also affect the paths in the debug info?  I hope it does,
as it will make stack traces involving libstdc++ headers easier to
read.
Manuel López-Ibáñez April 26, 2012, 10:26 a.m. UTC | #2
On 26 April 2012 12:12, Dodji Seketeli <dodji@seketeli.org> wrote:
>
> So maybe it'd be better to canonicalize the _cpp_file::path when it's
> first build?  One drawback of that approach would be that
> _cpp_file::path will then permanently loose the information about the
> current directory, that is indirectly encoded into the way the file path
> is originally built.  But do we really need that information?

That seems better to me.

>
> +/* Canonicalize the path to FILE. Return the canonical form if it is
> +   shorter, otherwise return the original.  This function may free the
> +   memory pointed by FILE.  */
> +
> +static char *
> +maybe_shorter_path (const char * file)
> +{
> +  const char * file2 = lrealpath (file);
> +  if (file2 && strlen (file2) < strlen (file))
> +    {
> +      /* Unfortunately, it is not safe to delete file, so we may leak
> +        some memory.  */

Why not remove this comment and free file here with XDELETEVEC (file) ?

> +  canonical_path = maybe_shorter_path (path);
> +  if (canonical_path != NULL && canonical_path != path)
> +    {
> +      /* The canonical path was newly allocated.  Let's free the
> +        non-canonical one.  */
> +      free (path);
> +      path = canonical_path;
> +    }
> +

This way you avoid doing all this extra work here.

Cheers,

Manuel.
Dodji Seketeli April 26, 2012, 6:06 p.m. UTC | #3
Jonathan Wakely <jwakely.gcc@gmail.com> a écrit:

> On 26 April 2012 11:12, Dodji Seketeli wrote:
>>
>> So maybe it'd be better to canonicalize the _cpp_file::path when it's
>> first build?  One drawback of that approach would be that
>> _cpp_file::path will then permanently loose the information about the
>> current directory, that is indirectly encoded into the way the file path
>> is originally built.  But do we really need that information?
>
> I don't know, but I certainly don't need it in diagnostic output.  It
> occurred to me last night (while debugging some deeply nested template
> instantiations in libstdc++ headers) that the same
> target/lib/../../../.. junk appears in file paths shown by GDB.  Does
> this change also affect the paths in the debug info?

I believe so.

>  I hope it does, as it will make stack traces involving libstdc++
> headers easier to read.

Agreed.
Dodji Seketeli April 26, 2012, 6:11 p.m. UTC | #4
Manuel López-Ibáñez <lopezibanez@gmail.com> a écrit:

> Why not remove this comment and free file here with XDELETEVEC (file) ?
>
>> +  canonical_path = maybe_shorter_path (path);
>> +  if (canonical_path != NULL && canonical_path != path)
>> +    {
>> +      /* The canonical path was newly allocated.  Let's free the
>> +        non-canonical one.  */
>> +      free (path);
>> +      path = canonical_path;
>> +    }
>> +
>
> This way you avoid doing all this extra work here.

If I follow my personal style, I'd prefer not having a function delete
what it receives in argument, unless the name of that function makes it
really obvious.  Furthermore, that function could be later re-used on a
string that is not necessarily meant to be deleted.

That being said, I don't feel like arguing strongly about this because
ultimately I think this is a matter of style.

I'll let those who have the powers to decide.  :-)
Manuel López-Ibáñez April 26, 2012, 6:28 p.m. UTC | #5
On 26 April 2012 20:11, Dodji Seketeli <dodji@seketeli.org> wrote:
> Manuel López-Ibáñez <lopezibanez@gmail.com> a écrit:
>
>> Why not remove this comment and free file here with XDELETEVEC (file) ?
>>
>>> +  canonical_path = maybe_shorter_path (path);
>>> +  if (canonical_path != NULL && canonical_path != path)
>>> +    {
>>> +      /* The canonical path was newly allocated.  Let's free the
>>> +        non-canonical one.  */
>>> +      free (path);
>>> +      path = canonical_path;
>>> +    }
>>> +
>>
>> This way you avoid doing all this extra work here.
>
> If I follow my personal style, I'd prefer not having a function delete
> what it receives in argument, unless the name of that function makes it
> really obvious.  Furthermore, that function could be later re-used on a
> string that is not necessarily meant to be deleted.

Fair enough. Still the comment at the top of the function needs to be changed:

+/* Canonicalize the path to FILE. Return the canonical form if it is
+   shorter, otherwise return the original.  This function may free the
+   memory pointed by FILE.  */

and then the function could return NULL when it fails to find a shorter path:

+  else
+    {
+      XDELETEVEC (file2);
+      return file;
+    }
+}

here. This way you can still simplify the caller by just checking if
(canonical_path)

> That being said, I don't feel like arguing strongly about this because
> ultimately I think this is a matter of style.
>
> I'll let those who have the powers to decide.  :-)

Always a good strategy ;-)

Cheers,

Manuel.
Dodji Seketeli April 26, 2012, 6:56 p.m. UTC | #6
Manuel López-Ibáñez <lopezibanez@gmail.com> a écrit:

> On 26 April 2012 20:11, Dodji Seketeli <dodji@seketeli.org> wrote:
>> Manuel López-Ibáñez <lopezibanez@gmail.com> a écrit:
>>
>>> Why not remove this comment and free file here with XDELETEVEC (file) ?
>>>
>>>> +  canonical_path = maybe_shorter_path (path);
>>>> +  if (canonical_path != NULL && canonical_path != path)
>>>> +    {
>>>> +      /* The canonical path was newly allocated.  Let's free the
>>>> +        non-canonical one.  */
>>>> +      free (path);
>>>> +      path = canonical_path;
>>>> +    }
>>>> +
>>>
>>> This way you avoid doing all this extra work here.
>>
>> If I follow my personal style, I'd prefer not having a function delete
>> what it receives in argument, unless the name of that function makes it
>> really obvious.  Furthermore, that function could be later re-used on a
>> string that is not necessarily meant to be deleted.
>
> Fair enough. Still the comment at the top of the function needs to be changed:
>
> +/* Canonicalize the path to FILE. Return the canonical form if it is
> +   shorter, otherwise return the original.  This function may free the
> +   memory pointed by FILE.  */
>
> and then the function could return NULL when it fails to find a shorter path:
>
> +  else
> +    {
> +      XDELETEVEC (file2);
> +      return file;
> +    }
> +}
>
> here. This way you can still simplify the caller by just checking if
> (canonical_path)

OK by me.  Thank you for caring about this.

Would you mind just taking it over again and submit a proper patch +
ChangeLog?  I just chimed in to help; I didn't mean to step on your
toes.  :-)

Cheerio.
Manuel López-Ibáñez April 26, 2012, 7:53 p.m. UTC | #7
On 26 April 2012 20:56, Dodji Seketeli <dodji@seketeli.org> wrote:
> Manuel López-Ibáñez <lopezibanez@gmail.com> a écrit:
>
>> On 26 April 2012 20:11, Dodji Seketeli <dodji@seketeli.org> wrote:
>>> Manuel López-Ibáñez <lopezibanez@gmail.com> a écrit:
>>>
>>>> Why not remove this comment and free file here with XDELETEVEC (file) ?
>>>>
>>>>> +  canonical_path = maybe_shorter_path (path);
>>>>> +  if (canonical_path != NULL && canonical_path != path)
>>>>> +    {
>>>>> +      /* The canonical path was newly allocated.  Let's free the
>>>>> +        non-canonical one.  */
>>>>> +      free (path);
>>>>> +      path = canonical_path;
>>>>> +    }
>>>>> +
>>>>
>>>> This way you avoid doing all this extra work here.
>>>
>>> If I follow my personal style, I'd prefer not having a function delete
>>> what it receives in argument, unless the name of that function makes it
>>> really obvious.  Furthermore, that function could be later re-used on a
>>> string that is not necessarily meant to be deleted.
>>
>> Fair enough. Still the comment at the top of the function needs to be changed:
>>
>> +/* Canonicalize the path to FILE. Return the canonical form if it is
>> +   shorter, otherwise return the original.  This function may free the
>> +   memory pointed by FILE.  */
>>
>> and then the function could return NULL when it fails to find a shorter path:
>>
>> +  else
>> +    {
>> +      XDELETEVEC (file2);
>> +      return file;
>> +    }
>> +}
>>
>> here. This way you can still simplify the caller by just checking if
>> (canonical_path)
>
> OK by me.  Thank you for caring about this.
>
> Would you mind just taking it over again and submit a proper patch +
> ChangeLog?  I just chimed in to help; I didn't mean to step on your
> toes.  :-)

Oh, no apologies needed. Thanks you chimed in! I wish more people were
working on this stuff! Anyway, let''s wait to see what the decision
makers say.

Cheers,

Manuel.
Manuel López-Ibáñez April 27, 2012, 1:46 p.m. UTC | #8
On 26 April 2012 12:12, Dodji Seketeli <dodji@seketeli.org> wrote:
> Manuel López-Ibáñez <lopezibanez@gmail.com> a écrit:
>
>> On 21 April 2012 14:56, Jason Merrill <jason@redhat.com> wrote:
>>> It seems like we'll do this for every line in the header, which could lead
>>> to a lot of leaked memory.  Instead, we should canonicalize when setting
>>> ORDINARY_MAP_FILE_NAME.
>>
>> Hum, my understanding of the code is that this is exactly what I
>> implemented. That is, at most we leak every time
>> ORDINARY_MAP_FILE_NAME is set to a system header file (if the realpath
>> is shorter than the original path). This is a bit inefficient because
>> this happens two times per #include (when entering and when leaving).
>> But I honestly don't know how to avoid this.
>
> My understanding is that by the time we are about to deal with the
> ORDINARY_MAP_FILE_NAME property of a given map, it's too late; we have
> lost the "memory leak game" because that property just points to the
> path carried by the instance _cpp_file::path that is current when the
> map is built.  So the lifetime of that property is tied to the lifetime
> of the relevant instance of _cpp_file.
>
> So maybe it'd be better to canonicalize the _cpp_file::path when it's
> first build?  One drawback of that approach would be that
> _cpp_file::path will then permanently loose the information about the
> current directory, that is indirectly encoded into the way the file path
> is originally built.  But do we really need that information?

Another drawback I didn't realize until now is that in this way the
canonicalize every path, instead of only touching those that belong to
system headers.

I am not sure if it is important or not.

Comments?

Cheers,

Manuel.
Dodji Seketeli April 27, 2012, 3:30 p.m. UTC | #9
Manuel López-Ibáñez <lopezibanez@gmail.com> a écrit:

> Another drawback I didn't realize until now is that in this way the
> canonicalize every path, instead of only touching those that belong to
> system headers.

Ah.  Good catch.

I guess file->dir->sysp should tell us if we are in a system directory,
so that we can avoid canonicalizing in that case.

> I am not sure if it is important or not.

I don't know either.  But I guess we could err on the safe side by
limiting the change to system just headers like you were initially
proposing.
Manuel López-Ibáñez April 29, 2012, 10 a.m. UTC | #10
New version of the patch. Bootstrapped and regression tested.

Is this version OK?


2012-04-29  Manuel López-Ibáñez  <manu@gcc.gnu.org>
	    Dodji Seketeli  <dodji@seketeli.org>

	PR 5297
	* libcpp/files.c (maybe_shorter_path): New.
	(find_file_in_dir): Use it.
Dodji Seketeli April 29, 2012, 5:24 p.m. UTC | #11
Manuel López-Ibáñez <lopezibanez@gmail.com> a écrit:

> 	PR 5297
> 	* libcpp/files.c (maybe_shorter_path): New.
> 	(find_file_in_dir): Use it.

I can't approve or deny this patch, but for what it's worth, I find it
fine.

Thanks.
Jason Merrill April 30, 2012, 4:48 p.m. UTC | #12
OK.

Jason
diff mbox

Patch

diff --git a/libcpp/files.c b/libcpp/files.c
index 29ccf3b..147963f 100644
--- a/libcpp/files.c
+++ b/libcpp/files.c
@@ -341,6 +341,27 @@  pch_open_file (cpp_reader *pfile, _cpp_file *file, bool *invalid_pch)
   return valid;
 }
 
+/* Canonicalize the path to FILE. Return the canonical form if it is
+   shorter, otherwise return the original.  This function may free the
+   memory pointed by FILE.  */
+
+static char *
+maybe_shorter_path (const char * file)
+{
+  const char * file2 = lrealpath (file);
+  if (file2 && strlen (file2) < strlen (file))
+    {
+      /* Unfortunately, it is not safe to delete file, so we may leak
+	 some memory.  */
+      return file2;
+    }
+  else 
+    {
+      XDELETEVEC (file2);
+      return file;
+    }
+}
+
 /* Try to open the path FILE->name appended to FILE->dir.  This is
    where remap and PCH intercept the file lookup process.  Return true
    if the file was found, whether or not the open was successful.
@@ -349,7 +370,7 @@  pch_open_file (cpp_reader *pfile, _cpp_file *file, bool *invalid_pch)
 static bool
 find_file_in_dir (cpp_reader *pfile, _cpp_file *file, bool *invalid_pch)
 {
-  char *path;
+  char *path, *canonical_path;
 
   if (CPP_OPTION (pfile, remap) && (path = remap_filename (pfile, file)))
     ;
@@ -359,6 +380,15 @@  find_file_in_dir (cpp_reader *pfile, _cpp_file *file, bool *invalid_pch)
     else
       path = append_file_to_dir (file->name, file->dir);
 
+  canonical_path = maybe_shorter_path (path);
+  if (canonical_path != NULL && canonical_path != path)
+    {
+      /* The canonical path was newly allocated.  Let's free the
+	 non-canonical one.  */
+      free (path);
+      path = canonical_path;
+    }
+
   if (path)
     {
       hashval_t hv = htab_hash_string (path);