Patchwork [libiberty,include] : Add additional helper functions for directory-separator searching

login
register
mail settings
Submitter Kai Tietz
Date March 8, 2011, 10:56 a.m.
Message ID <AANLkTimyJb-3jEaWbbmUj2ZAOYPzhGT_GX9QFwR4DBCh@mail.gmail.com>
Download mbox | patch
Permalink /patch/85964/
State New
Headers show

Comments

Kai Tietz - March 8, 2011, 10:56 a.m.
Hello,

This patch introduce directory-separator search routines to libiberty.
IMHO filename_cmp.c is suiteable for those functions, but if there are
objections about that I can move it into a separate source-file. It
helps to avoid a commonly used pattern about dir-separator searching
in code, which requires #if-conditions to check if DOS paths are used
and introduces additional internal variables.

the pattern

         const char *filename = strrchr (xloc.file, '/');
#ifdef HAVE_DOS_BASED_FILE_SYSTEM
         const char *filename2 = strrchr (xloc.file, '\\');

         if (!filename || (filename2 && filename2 > filename))
           filename = filename2;

can be written by this patch as
        const char *filename = filename_dirrchr (xloc.file);


ChangeLog include/

2011-03-08  Kai Tietz

	* filenames.h (filename_dirchr): New prototype.
	(filename_dirrchr): Likewise.

ChangeLog libiberty/

2011-03-08  Kai Tietz

	* filename_cmp.c (filename_dirchr): New function.
	(filename_dirrchr): Likewise.
	* functions.texi: Regenerated.


Tested for x86_64-pc-linux-gnu and x86_64-w64-mingw32. Ok for apply?

Regards,
Kai
Eli Zaretskii - March 8, 2011, 11:11 a.m.
> Date: Tue, 8 Mar 2011 11:56:45 +0100
> From: Kai Tietz <ktietz70@googlemail.com>
> 
> +@deftypefn Extension int filename_dirchr (const char *@var{p})
> +
> +The returned value is similar to what @code{strchr} would return for
> +searching for a directory separator.
> +
> +This function does not normalize file name.  However, it does handle
> +the fact that on DOS-like file systems, forward and backward slashes
> +are directory separators.

This is very mysterious.  The documentation should explain how this is
"handled", or else the user will have no choice but to look in the
sources.  And description "by similarity" doesn't help, because this
function is obviously different from strchr in _some_ ways, but you
don't say how.

While at that, explain the problem this solves, or else the raison
d'etre of this function will not be understood.  We do want this
function to be used instead of just strchr, don't we?  For it to be
used, its purpose and advantages should be well understood.

Btw, why do we need filename_dirchr?  The use case for
filename_dirrchr is clear, but in what situations will we need the
other one?

> +  if (!r || (r2 && r2 < r))

Why do you test for r2 being non-NULL?  You are not going to
dereference it in the next comparison, and NULL is comparable as any
other value.

> +@deftypefn Extension int filename_dirrchr (const char *@var{p})
> +
> +The returned value is similar to what @code{strrchr} would return for
> +searching for a directory separator.
> +
> +This function does not normalize file name.  However, it does handle
> +the fact that on DOS-like file systems, forward and backward slashes
> +are directory separators.

Same comments about this doc.

> +  if (!r || (r2 && r2 > r))

And same comment here about testing r2 for non-NULL value.

Please also wait for others to review, as I'm not authorized to
approve the changes.

Thanks for working on this.
Kai Tietz - March 8, 2011, 11:25 a.m.
2011/3/8 Eli Zaretskii <eliz@gnu.org>:
>> Date: Tue, 8 Mar 2011 11:56:45 +0100
>> From: Kai Tietz <ktietz70@googlemail.com>
>>
>> +@deftypefn Extension int filename_dirchr (const char *@var{p})
>> +
>> +The returned value is similar to what @code{strchr} would return for
>> +searching for a directory separator.
>> +
>> +This function does not normalize file name.  However, it does handle
>> +the fact that on DOS-like file systems, forward and backward slashes
>> +are directory separators.
>
> This is very mysterious.  The documentation should explain how this is
> "handled", or else the user will have no choice but to look in the
> sources.  And description "by similarity" doesn't help, because this
> function is obviously different from strchr in _some_ ways, but you
> don't say how.
>
> While at that, explain the problem this solves, or else the raison
> d'etre of this function will not be understood.  We do want this
> function to be used instead of just strchr, don't we?  For it to be
> used, its purpose and advantages should be well understood.
>
> Btw, why do we need filename_dirchr?  The use case for
> filename_dirrchr is clear, but in what situations will we need the
> other one?

As the comment notes. strchr/strrchr searches for one character. This
is for unix-file-system normally slash. On DOS based file-systems
there are two characters representing a directory-separator. Slash and
Backslash. Therefore this routine takes care that both are handled
similiar to a single character searching.

>> +  if (!r || (r2 && r2 < r))
>
> Why do you test for r2 being non-NULL?  You are not going to
> dereference it in the next comparison, and NULL is comparable as any
> other value.

As if we found slash, we don't want to override function's result by
backslash not found. If the null-check wouldn't be present condition
would be always true for r2 == NULL as, NULL is always less then a
pointer. But r shall be modified only if r2 (backslash) was found
before r (slash).
(same logic but here from right to left for the strrchr-case)

Regards,
Kai
Eli Zaretskii - March 8, 2011, 11:53 a.m.
> Date: Tue, 8 Mar 2011 12:25:37 +0100
> From: Kai Tietz <ktietz70@googlemail.com>
> Cc: binutils@sourceware.org, gdb-patches@sourceware.org, 
> 	gcc-patches@gcc.gnu.org
> 
> > Btw, why do we need filename_dirchr?  The use case for
> > filename_dirrchr is clear, but in what situations will we need the
> > other one?
> 
> As the comment notes. strchr/strrchr searches for one character. This
> is for unix-file-system normally slash. On DOS based file-systems
> there are two characters representing a directory-separator. Slash and
> Backslash. Therefore this routine takes care that both are handled
> similiar to a single character searching.

We are miscommunicating.  I was asking when would a program want to
find the _first_ directory separator character in a file name.
Searching for the last one is a frequent use case: when you want to
create a file in the same directory as another, or when you are
looking for a basename of a file.  But when do you need the first
slash?

> >> +  if (!r || (r2 && r2 < r))
> >
> > Why do you test for r2 being non-NULL?  You are not going to
> > dereference it in the next comparison, and NULL is comparable as any
> > other value.
> 
> As if we found slash, we don't want to override function's result by
> backslash not found. If the null-check wouldn't be present condition
> would be always true for r2 == NULL as, NULL is always less then a
> pointer. But r shall be modified only if r2 (backslash) was found
> before r (slash).
> (same logic but here from right to left for the strrchr-case)

But in strrchr-case, r2 cannot be greater than r1 if it is NULL,
right?
Andreas Schwab - March 8, 2011, 11:55 a.m.
Eli Zaretskii <eliz@gnu.org> writes:

> NULL is comparable as any other value.

Only for equality.  For relational operators the operands must point to
the same object.

Andreas.
Kai Tietz - March 8, 2011, 12:01 p.m.
2011/3/8 Eli Zaretskii <eliz@gnu.org>:
>> Date: Tue, 8 Mar 2011 12:25:37 +0100
>> From: Kai Tietz <ktietz70@googlemail.com>
>> Cc: binutils@sourceware.org, gdb-patches@sourceware.org,
>>       gcc-patches@gcc.gnu.org
>>
>> > Btw, why do we need filename_dirchr?  The use case for
>> > filename_dirrchr is clear, but in what situations will we need the
>> > other one?
>>
>> As the comment notes. strchr/strrchr searches for one character. This
>> is for unix-file-system normally slash. On DOS based file-systems
>> there are two characters representing a directory-separator. Slash and
>> Backslash. Therefore this routine takes care that both are handled
>> similiar to a single character searching.
>
> We are miscommunicating.  I was asking when would a program want to
> find the _first_ directory separator character in a file name.
> Searching for the last one is a frequent use case: when you want to
> create a file in the same directory as another, or when you are
> looking for a basename of a file.  But when do you need the first
> slash?

See for example remote-fileio.c in remote_fileio_extract_ptr_w_len()
as an example. There is more then one use-case.

>> >> +  if (!r || (r2 && r2 < r))
>> >
>> > Why do you test for r2 being non-NULL?  You are not going to
>> > dereference it in the next comparison, and NULL is comparable as any
>> > other value.
>>
>> As if we found slash, we don't want to override function's result by
>> backslash not found. If the null-check wouldn't be present condition
>> would be always true for r2 == NULL as, NULL is always less then a
>> pointer. But r shall be modified only if r2 (backslash) was found
>> before r (slash).
>> (same logic but here from right to left for the strrchr-case)
>
> But in strrchr-case, r2 cannot be greater than r1 if it is NULL,
> right?
It can. It is a matter of signness of pointer comparision.

Regards,
Kai
Pedro Alves - March 8, 2011, 12:43 p.m.
On Tuesday 08 March 2011 12:01:24, Kai Tietz wrote:
> See for example remote-fileio.c in remote_fileio_extract_ptr_w_len()
> as an example. There is more then one use-case.

That '/' has nothing to do with path separators.  It's simply
a separator between a pointer and a length fields.  E.g.,

 @item Request:
 @samp{Fopen,@var{pathptr}/@var{len},@var{flags},@var{mode}}

@var{pathptr} is a pointer that points to the real path
in the inferior's memory, not a path string.
Kai Tietz - March 8, 2011, 12:48 p.m.
2011/3/8 Pedro Alves <pedro@codesourcery.com>:
> On Tuesday 08 March 2011 12:01:24, Kai Tietz wrote:
>> See for example remote-fileio.c in remote_fileio_extract_ptr_w_len()
>> as an example. There is more then one use-case.
>
> That '/' has nothing to do with path separators.  It's simply
> a separator between a pointer and a length fields.  E.g.,
>
>  @item Request:
>  @samp{Fopen,@var{pathptr}/@var{len},@var{flags},@var{mode}}
>
> @var{pathptr} is a pointer that points to the real path
> in the inferior's memory, not a path string.
>
> --
> Pedro Alves
>

Well, a better example is elfstab_offset_sections() in elfread.c.
Another is in find_file_and_directory() in dwarf2read.c file.

Regards,
Kai
Pedro Alves - March 8, 2011, 1:33 p.m.
On Tuesday 08 March 2011 12:48:11, Kai Tietz wrote:

> Well, a better example is elfstab_offset_sections() in elfread.c.

  /* The ELF symbol info doesn't include path names, so strip the path
     (if any) from the psymtab filename.  */
  while (0 != (p = strchr (filename, '/')))
    filename = p + 1;

Looks like its looking for the last path separator, so
it might as well use filename_dirrchr instead.

> Another is in find_file_and_directory() in dwarf2read.c file.

Workaround for Irix.  Certainly that '/' should not depend
on the host gdb is running on.
Kai Tietz - March 8, 2011, 1:37 p.m.
2011/3/8 Pedro Alves <pedro@codesourcery.com>:
> On Tuesday 08 March 2011 12:48:11, Kai Tietz wrote:
>
>> Well, a better example is elfstab_offset_sections() in elfread.c.
>
>  /* The ELF symbol info doesn't include path names, so strip the path
>     (if any) from the psymtab filename.  */
>  while (0 != (p = strchr (filename, '/')))
>    filename = p + 1;
>
> Looks like its looking for the last path separator, so
> it might as well use filename_dirrchr instead.

True, see patch I've posted about filename_cmp. I replaced it there by
a strrchr search.

>> Another is in find_file_and_directory() in dwarf2read.c file.
>
> Workaround for Irix.  Certainly that '/' should not depend
> on the host gdb is running on.

Right. But well, I was asked if strchr is used in combination with
paths. And so I've shown. If those uses could be rewritten is a
different story and might be true.

Kai
Eli Zaretskii - March 8, 2011, 6:37 p.m.
> From: Pedro Alves <pedro@codesourcery.com>
> Date: Tue, 8 Mar 2011 13:33:02 +0000
> Cc: Kai Tietz <ktietz70@googlemail.com>,
>  gcc-patches@gcc.gnu.org,
>  Eli Zaretskii <eliz@gnu.org>,
>  binutils@sourceware.org
> 
> On Tuesday 08 March 2011 12:48:11, Kai Tietz wrote:
> 
> > Well, a better example is elfstab_offset_sections() in elfread.c.
> 
>   /* The ELF symbol info doesn't include path names, so strip the path
>      (if any) from the psymtab filename.  */
>   while (0 != (p = strchr (filename, '/')))
>     filename = p + 1;
> 
> Looks like its looking for the last path separator, so
> it might as well use filename_dirrchr instead.

Exactly.

> > Another is in find_file_and_directory() in dwarf2read.c file.
> 
> Workaround for Irix.  Certainly that '/' should not depend
> on the host gdb is running on.

It actually should use IS_ABSOLUTE_FILE_NAME, if any portability
enhancement is needed here.

In my experience, the strchr analog is not needed, only the strrchr
one (which could be used quite a lot).  The few places that use strchr
now should actually be rewritten to search from the end, because
that's what they need.
Pedro Alves - March 8, 2011, 6:50 p.m.
On Tuesday 08 March 2011 18:37:49, Eli Zaretskii wrote:
> > > Another is in find_file_and_directory() in dwarf2read.c file.
> > 
> > Workaround for Irix.  Certainly that '/' should not depend
> > on the host gdb is running on.
> 
> It actually should use IS_ABSOLUTE_FILE_NAME, if any portability
> enhancement is needed here.

The point of the code, according to its comment,
is to workaround an issue with the debug info output by the
native Irix compiler.  You wouldn't want a cross-Irix,
Windows-hosted gdb looking for '\' or a drive prefix in order
to decide whether to apply the workaround.  In other words,
we _always_ want to check for literal '/' here:

  if (*comp_dir != NULL)
    {
      /* Irix 6.2 native cc prepends <machine>.: to the compilation
	 directory, get rid of it.  */
      char *cp = strchr (*comp_dir, ':');

      if (cp && cp != *comp_dir && cp[-1] == '.' && cp[1] == '/')
	*comp_dir = cp + 1;
    }
Kai Tietz - March 8, 2011, 6:51 p.m.
2011/3/8 Eli Zaretskii <eliz@gnu.org>:
>> From: Pedro Alves <pedro@codesourcery.com>
>> Date: Tue, 8 Mar 2011 13:33:02 +0000
>> Cc: Kai Tietz <ktietz70@googlemail.com>,
>>  gcc-patches@gcc.gnu.org,
>>  Eli Zaretskii <eliz@gnu.org>,
>>  binutils@sourceware.org
>>
>> On Tuesday 08 March 2011 12:48:11, Kai Tietz wrote:
>>
>> > Well, a better example is elfstab_offset_sections() in elfread.c.
>>
>>   /* The ELF symbol info doesn't include path names, so strip the path
>>      (if any) from the psymtab filename.  */
>>   while (0 != (p = strchr (filename, '/')))
>>     filename = p + 1;
>>
>> Looks like its looking for the last path separator, so
>> it might as well use filename_dirrchr instead.
>
> Exactly.
>
>> > Another is in find_file_and_directory() in dwarf2read.c file.
>>
>> Workaround for Irix.  Certainly that '/' should not depend
>> on the host gdb is running on.
>
> It actually should use IS_ABSOLUTE_FILE_NAME, if any portability
> enhancement is needed here.
>
> In my experience, the strchr analog is not needed, only the strrchr
> one (which could be used quite a lot).  The few places that use strchr
> now should actually be rewritten to search from the end, because
> that's what they need.
>

Here I am not that sure. For example in gcc's gengtype.c
(read_input_list) is a use-case for strchr on filenames, which can't
be expressed by strrchr.
Please be aware that libiberty is shared between different ventures.

I admit that filenames/paths are searched normal from right to left.
But there might be cases a left to right search is suitable.

Regards,
Kai
Eli Zaretskii - March 8, 2011, 7:54 p.m.
> Date: Tue, 8 Mar 2011 19:51:14 +0100
> From: Kai Tietz <ktietz70@googlemail.com>
> Cc: Pedro Alves <pedro@codesourcery.com>, gdb-patches@sourceware.org, 
> 	gcc-patches@gcc.gnu.org, binutils@sourceware.org
> 
> > In my experience, the strchr analog is not needed, only the strrchr
> > one (which could be used quite a lot).  The few places that use strchr
> > now should actually be rewritten to search from the end, because
> > that's what they need.
> >
> 
> Here I am not that sure. For example in gcc's gengtype.c
> (read_input_list) is a use-case for strchr on filenames, which can't
> be expressed by strrchr.

I don't see any reason to have in libiberty a function that has a
single use.

Patch

Index: gcc/include/filenames.h
===================================================================
--- gcc.orig/include/filenames.h	2011-02-28 19:16:35.000000000 +0100
+++ gcc/include/filenames.h	2011-03-08 11:11:10.909109700 +0100
@@ -75,6 +75,8 @@  extern int filename_cmp (const char *s1,
 
 extern int filename_ncmp (const char *s1, const char *s2,
 			  size_t n);
+extern char *filename_dirchr (const char *p);
+extern char *filename_dirrchr (const char *p);
 
 #ifdef __cplusplus
 }
Index: gcc/libiberty/filename_cmp.c
===================================================================
--- gcc.orig/libiberty/filename_cmp.c	2011-02-28 19:16:35.000000000 +0100
+++ gcc/libiberty/filename_cmp.c	2011-03-08 11:43:32.797198100 +0100
@@ -125,3 +125,70 @@  filename_ncmp (const char *s1, const cha
   return 0;
 #endif
 }
+
+/*
+
+@deftypefn Extension int filename_dirchr (const char *@var{p})
+
+The returned value is similar to what @code{strchr} would return for
+searching for a directory separator.
+
+This function does not normalize file name.  However, it does handle
+the fact that on DOS-like file systems, forward and backward slashes
+are directory separators.
+
+@end deftypefn
+
+*/
+
+char *
+filename_dirchr (const char *p)
+{
+  char *r;
+#ifdef HAVE_DOS_BASED_FILE_SYSTEM
+  char *r2;
+#endif
+  if (!p)
+    return NULL;
+  r = strchr (p, '/');
+#ifdef HAVE_DOS_BASED_FILE_SYSTEM
+  r2 = strchr (p, '\\');
+  if (!r || (r2 && r2 < r))
+    r = r2;
+#endif
+  return r;
+}
+
+/*
+
+@deftypefn Extension int filename_dirrchr (const char *@var{p})
+
+The returned value is similar to what @code{strrchr} would return for
+searching for a directory separator.
+
+This function does not normalize file name.  However, it does handle
+the fact that on DOS-like file systems, forward and backward slashes
+are directory separators.
+
+@end deftypefn
+
+*/
+
+char *
+filename_dirrchr (const char *p)
+{
+  char *r;
+#ifdef HAVE_DOS_BASED_FILE_SYSTEM
+  char *r2;
+#endif
+
+  if (!p)
+    return NULL;
+  r = strrchr (p, '/');
+#ifdef HAVE_DOS_BASED_FILE_SYSTEM
+  r2 = strrchr (p, '\\');
+  if (!r || (r2 && r2 > r))
+    r = r2;
+#endif
+  return r;
+}
Index: gcc/libiberty/functions.texi
===================================================================
--- gcc.orig/libiberty/functions.texi	2011-02-28 19:16:35.000000000 +0100
+++ gcc/libiberty/functions.texi	2011-03-08 11:43:42.314406700 +0100
@@ -296,6 +296,30 @@  and backward slashes are equal.
 
 @end deftypefn
 
+@c filename_cmp.c:131
+@deftypefn Extension int filename_dirchr (const char *@var{p})
+
+The returned value is similar to what @code{strchr} would return for
+searching for a directory separator.
+
+This function does not normalize file name.  However, it does handle
+the fact that on DOS-like file systems, forward and backward slashes
+are directory separators.
+
+@end deftypefn
+
+@c filename_cmp.c:164
+@deftypefn Extension int filename_dirrchr (const char *@var{p})
+
+The returned value is similar to what @code{strrchr} would return for
+searching for a directory separator.
+
+This function does not normalize file name.  However, it does handle
+the fact that on DOS-like file systems, forward and backward slashes
+are directory separators.
+
+@end deftypefn
+
 @c filename_cmp.c:81
 @deftypefn Extension int filename_ncmp (const char *@var{s1}, const char *@var{s2}, size_t @var{n})