Patchwork [RFA,libiberty,gdb] Add hashtab support to filename_ncmp.c and use it in gdb.

login
register
mail settings
Submitter Doug Evans
Date July 9, 2012, 6:10 p.m.
Message ID <20120709181015.2FCA41E13A1@ruffy2.mtv.corp.google.com>
Download mbox | patch
Permalink /patch/169910/
State New
Headers show

Comments

Doug Evans - July 9, 2012, 6:10 p.m.
Hi.

filename_seen in gdb does a linear search, this patch changes it
to use a hash table.

Ok to check in?

I couldn't think of a good reason to put filename_hash,filename_eq in gdb,
and I like placing them close to where hashtab.c and filename_cmp are defined.
I also couldn't think of a sufficient reason to put them in a file by
themselves.  Ergo adding them to filename_cmp.c, filenames.h.
[It's possible there's a program that uses filename_cmp and already
defines functions with the same names (thus this will introduce a build
failure), but that's always a risk.  I couldn't find any in gdb,binutils,gcc.
Technically speaking, it's also possible that adding the #include "hashtab.h"
to filenames.h could introduce a build failure (e.g., some file has a static
symbol that collides with one used in hashtab.h).  I'm hoping that's more of
a theoretical concern.]

2012-07-09  Doug Evans  <dje@google.com>

	include/
	* filenames.h: #include "hashtab.h".
	(filename_hash, filename_eq): Declare.

	libiberty/
	* filename_cmp.c (filename_hash, filename_eq): New functions.

	gdb/
	* symtab.c (filename_seen): Rewrite to use a hash table.
Steven Bosscher - July 9, 2012, 6:51 p.m.
On Mon, Jul 9, 2012 at 8:10 PM, Doug Evans <dje@google.com> wrote:
> Hi.
>
> filename_seen in gdb does a linear search, this patch changes it
> to use a hash table.
>
> Ok to check in?

Why not just use htab_hash_string? The file name canonicalization can
be put in gdb itself.

Ciao!
Steven
Doug Evans - July 9, 2012, 7:49 p.m.
[- gdb, + gdb-patches]

On Mon, Jul 9, 2012 at 11:51 AM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
> On Mon, Jul 9, 2012 at 8:10 PM, Doug Evans <dje@google.com> wrote:
>> Hi.
>>
>> filename_seen in gdb does a linear search, this patch changes it
>> to use a hash table.
>>
>> Ok to check in?
>
> Why not just use htab_hash_string? The file name canonicalization can
> be put in gdb itself.
>
> Ciao!
> Steven

[blech, my first reply got turned into rich html]

Hi.
Given that filename_cmp.c lives in libiberty, putting related
functions together makes it easier to maintain them.
If you're suggesting writing a function that takes a file name and
returns a canonicalized name that will then work with
htab_hash_string, that's more expensive and I don't see the benefit.
Doug Evans - July 13, 2012, 6:52 p.m.
Hi.  "ping" [for the libiberty part]

[The gdb part needs to be updated due to recent changes there, but I'm
going to wait until the libiberty part is approved.]

On Mon, Jul 9, 2012 at 11:10 AM, Doug Evans <dje@google.com> wrote:
> Hi.
>
> filename_seen in gdb does a linear search, this patch changes it
> to use a hash table.
>
> Ok to check in?
>
> I couldn't think of a good reason to put filename_hash,filename_eq in gdb,
> and I like placing them close to where hashtab.c and filename_cmp are defined.
> I also couldn't think of a sufficient reason to put them in a file by
> themselves.  Ergo adding them to filename_cmp.c, filenames.h.
> [It's possible there's a program that uses filename_cmp and already
> defines functions with the same names (thus this will introduce a build
> failure), but that's always a risk.  I couldn't find any in gdb,binutils,gcc.
> Technically speaking, it's also possible that adding the #include "hashtab.h"
> to filenames.h could introduce a build failure (e.g., some file has a static
> symbol that collides with one used in hashtab.h).  I'm hoping that's more of
> a theoretical concern.]
>
> 2012-07-09  Doug Evans  <dje@google.com>
>
>         include/
>         * filenames.h: #include "hashtab.h".
>         (filename_hash, filename_eq): Declare.
>
>         libiberty/
>         * filename_cmp.c (filename_hash, filename_eq): New functions.
>
>         gdb/
>         * symtab.c (filename_seen): Rewrite to use a hash table.
>
> Index: include/filenames.h
> ===================================================================
> RCS file: /cvs/src/src/include/filenames.h,v
> retrieving revision 1.10
> diff -u -p -r1.10 filenames.h
> --- include/filenames.h 1 Jul 2011 18:24:38 -0000       1.10
> +++ include/filenames.h 9 Jul 2012 17:24:53 -0000
> @@ -26,6 +26,8 @@ Foundation, Inc., 51 Franklin Street - F
>  #ifndef FILENAMES_H
>  #define FILENAMES_H
>
> +#include "hashtab.h" /* for hashval_t */
> +
>  #ifdef __cplusplus
>  extern "C" {
>  #endif
> @@ -84,6 +86,10 @@ extern int filename_cmp (const char *s1,
>  extern int filename_ncmp (const char *s1, const char *s2,
>                           size_t n);
>
> +extern hashval_t filename_hash (const void *s);
> +
> +extern int filename_eq (const void *s1, const void *s2);
> +
>  #ifdef __cplusplus
>  }
>  #endif
> Index: libiberty/filename_cmp.c
> ===================================================================
> RCS file: /cvs/src/src/libiberty/filename_cmp.c,v
> retrieving revision 1.6
> diff -u -p -r1.6 filename_cmp.c
> --- libiberty/filename_cmp.c    1 Jul 2011 18:24:39 -0000       1.6
> +++ libiberty/filename_cmp.c    9 Jul 2012 17:24:53 -0000
> @@ -141,3 +141,52 @@ filename_ncmp (const char *s1, const cha
>    return 0;
>  #endif
>  }
> +
> +/*
> +
> +@deftypefn Extension hashval_t filename_hash (const void *@var{s})
> +
> +Return the hash value for file name @var{s} that will be compared
> +using filename_cmp.
> +This function is for use with hashtab.c hash tables.
> +
> +@end deftypefn
> +
> +*/
> +
> +hashval_t
> +filename_hash (const void *s)
> +{
> +  /* The cast is for -Wc++-compat.  */
> +  const unsigned char *str = (const unsigned char *) s;
> +  hashval_t r = 0;
> +  unsigned char c;
> +
> +  while ((c = *str++) != 0)
> +    {
> +      if (c == '\\')
> +       c = '/';
> +      c = TOLOWER (c);
> +      r = r * 67 + c - 113;
> +    }
> +
> +  return r;
> +}
> +
> +/*
> +
> +@deftypefn Extension int filename_eq (const void *@var{s1}, const void *@var{s2})
> +
> +Return non-zero if file names @var{s1} and @var{s2} are equivalent.
> +This function is for use with hashtab.c hash tables.
> +
> +@end deftypefn
> +
> +*/
> +
> +int
> +filename_eq (const void *s1, const void *s2)
> +{
> +  /* The casts are for -Wc++-compat.  */
> +  return filename_cmp ((const char *) s1, (const char *) s2) == 0;
> +}
> Index: gdb/symtab.c
> ===================================================================
> RCS file: /cvs/src/src/gdb/symtab.c,v
> retrieving revision 1.316
> diff -u -p -r1.316 symtab.c
> --- gdb/symtab.c        29 Jun 2012 22:46:45 -0000      1.316
> +++ gdb/symtab.c        9 Jul 2012 17:24:54 -0000
> @@ -3108,44 +3108,34 @@ operator_chars (char *p, char **end)
>  /* If FILE is not already in the table of files, return zero;
>     otherwise return non-zero.  Optionally add FILE to the table if ADD
>     is non-zero.  If *FIRST is non-zero, forget the old table
> -   contents.  */
> +   contents.
> +
> +   NOTE: We don't manage space for FILE, we assume FILE lives as long
> +   as the caller needs.  */
>
>  static int
>  filename_seen (const char *file, int add, int *first)
>  {
>    /* Table of files seen so far.  */
> -  static const char **tab = NULL;
> -  /* Allocated size of tab in elements.
> -     Start with one 256-byte block (when using GNU malloc.c).
> -     24 is the malloc overhead when range checking is in effect.  */
> -  static int tab_alloc_size = (256 - 24) / sizeof (char *);
> -  /* Current size of tab in elements.  */
> -  static int tab_cur_size;
> -  const char **p;
> +  static htab_t files_seen;
> +  void **slot;
>
>    if (*first)
>      {
> -      if (tab == NULL)
> -       tab = (const char **) xmalloc (tab_alloc_size * sizeof (*tab));
> -      tab_cur_size = 0;
> +      if (files_seen != NULL)
> +       htab_delete (files_seen);
> +      files_seen = htab_create_alloc (10, filename_hash, filename_eq,
> +                                     NULL, xcalloc, xfree);
>      }
>
>    /* Is FILE in tab?  */
> -  for (p = tab; p < tab + tab_cur_size; p++)
> -    if (filename_cmp (*p, file) == 0)
> -      return 1;
> +  slot = htab_find_slot (files_seen, file, add ? INSERT : NO_INSERT);
> +  if (*slot != NULL)
> +    return 1;
>
>    /* No; maybe add it to tab.  */
>    if (add)
> -    {
> -      if (tab_cur_size == tab_alloc_size)
> -       {
> -         tab_alloc_size *= 2;
> -         tab = (const char **) xrealloc ((char *) tab,
> -                                         tab_alloc_size * sizeof (*tab));
> -       }
> -      tab[tab_cur_size++] = file;
> -    }
> +    *slot = (char *) file;
>
>    return 0;
>  }
DJ Delorie - July 13, 2012, 7:21 p.m.
I think it's confusing to have filename_cmp and filename_eq that do
basically the same thing.  Perhaps filename_eq should be
filename_cmp_v or filename_cmp_hash or something, to indicate that
it's *supposed* to be the same functionality as filename_cmp but with
a different signature?
Doug Evans - July 13, 2012, 7:36 p.m.
On Fri, Jul 13, 2012 at 12:21 PM, DJ Delorie <dj@redhat.com> wrote:
>
> I think it's confusing to have filename_cmp and filename_eq that do
> basically the same thing.  Perhaps filename_eq should be
> filename_cmp_v or filename_cmp_hash or something, to indicate that
> it's *supposed* to be the same functionality as filename_cmp but with
> a different signature?

To be clear, filename_cmp is to strcmp as filename_eq is to streq.

ref: STREQ in libiberty/regex.c:
# define STREQ(s1, s2) ((strcmp (s1, s2) == 0))

Given that, I think the names are fine as is, but I'm happy to change them.
I like "eq", it's what hashtab uses (e.g. htab_eq_pointer).
How about filename_eq_hash?
DJ Delorie - July 13, 2012, 7:46 p.m.
If there's precedent, I'm not worried about it.

Ok to check in.
Eli Zaretskii - July 13, 2012, 7:47 p.m.
> Date: Fri, 13 Jul 2012 12:36:44 -0700
> From: Doug Evans <dje@google.com>
> Cc: gcc-patches@gcc.gnu.org, gdb-patches@sourceware.org
> 
> On Fri, Jul 13, 2012 at 12:21 PM, DJ Delorie <dj@redhat.com> wrote:
> >
> > I think it's confusing to have filename_cmp and filename_eq that do
> > basically the same thing.  Perhaps filename_eq should be
> > filename_cmp_v or filename_cmp_hash or something, to indicate that
> > it's *supposed* to be the same functionality as filename_cmp but with
> > a different signature?
> 
> To be clear, filename_cmp is to strcmp as filename_eq is to streq.
> 
> ref: STREQ in libiberty/regex.c:
> # define STREQ(s1, s2) ((strcmp (s1, s2) == 0))
> 
> Given that, I think the names are fine as is, but I'm happy to change them.

Sorry if I'm missing something, but why do we need to advertise such a
function at all?  Given that libiberty already provides filename_cmp,
isn't it trivial to write something like filename_eq whenever someone
needs to use hashes of file names?
Doug Evans - July 13, 2012, 9:26 p.m.
On Fri, Jul 13, 2012 at 12:47 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>> Date: Fri, 13 Jul 2012 12:36:44 -0700
>> From: Doug Evans <dje@google.com>
>> Cc: gcc-patches@gcc.gnu.org, gdb-patches@sourceware.org
>>
>> On Fri, Jul 13, 2012 at 12:21 PM, DJ Delorie <dj@redhat.com> wrote:
>> >
>> > I think it's confusing to have filename_cmp and filename_eq that do
>> > basically the same thing.  Perhaps filename_eq should be
>> > filename_cmp_v or filename_cmp_hash or something, to indicate that
>> > it's *supposed* to be the same functionality as filename_cmp but with
>> > a different signature?
>>
>> To be clear, filename_cmp is to strcmp as filename_eq is to streq.
>>
>> ref: STREQ in libiberty/regex.c:
>> # define STREQ(s1, s2) ((strcmp (s1, s2) == 0))
>>
>> Given that, I think the names are fine as is, but I'm happy to change them.
>
> Sorry if I'm missing something, but why do we need to advertise such a
> function at all?  Given that libiberty already provides filename_cmp,
> isn't it trivial to write something like filename_eq whenever someone
> needs to use hashes of file names?

It's a "matched set" with filename_hash.
The hashtab.c constructors take a hash_f function pointer and an eq_f
function pointer.
Eli Zaretskii - July 14, 2012, 6:34 a.m.
> Date: Fri, 13 Jul 2012 14:26:15 -0700
> From: Doug Evans <dje@google.com>
> Cc: dj@redhat.com, gcc-patches@gcc.gnu.org, gdb-patches@sourceware.org
> 
> > Sorry if I'm missing something, but why do we need to advertise such a
> > function at all?  Given that libiberty already provides filename_cmp,
> > isn't it trivial to write something like filename_eq whenever someone
> > needs to use hashes of file names?
> 
> It's a "matched set" with filename_hash.
> The hashtab.c constructors take a hash_f function pointer and an eq_f
> function pointer.

I understand all that, but why would the eq_f function need to be an
external function on its own?

E.g., if we were to write a qsort replacement, would we have a
suitable string comparison function declared extern, when it is a
trivial wrapper around strcmp?
Tom Tromey - July 15, 2012, 2:17 a.m.
>>>>> "Eli" == Eli Zaretskii <eliz@gnu.org> writes:

Eli> I understand all that, but why would the eq_f function need to be an
Eli> external function on its own?

It is just to avoid other users having to write their own.

Eli> E.g., if we were to write a qsort replacement, would we have a
Eli> suitable string comparison function declared extern, when it is a
Eli> trivial wrapper around strcmp?

Yes.

We have streq for this, for use in hash tables and elsewhere.  It would
have been better if this were in libiberty, since currently there is a
copy of this function in gdb and like 3 or 4 in gcc.

Tom

Patch

Index: include/filenames.h
===================================================================
RCS file: /cvs/src/src/include/filenames.h,v
retrieving revision 1.10
diff -u -p -r1.10 filenames.h
--- include/filenames.h	1 Jul 2011 18:24:38 -0000	1.10
+++ include/filenames.h	9 Jul 2012 17:24:53 -0000
@@ -26,6 +26,8 @@  Foundation, Inc., 51 Franklin Street - F
 #ifndef FILENAMES_H
 #define FILENAMES_H
 
+#include "hashtab.h" /* for hashval_t */
+
 #ifdef __cplusplus
 extern "C" {
 #endif
@@ -84,6 +86,10 @@  extern int filename_cmp (const char *s1,
 extern int filename_ncmp (const char *s1, const char *s2,
 			  size_t n);
 
+extern hashval_t filename_hash (const void *s);
+
+extern int filename_eq (const void *s1, const void *s2);
+
 #ifdef __cplusplus
 }
 #endif
Index: libiberty/filename_cmp.c
===================================================================
RCS file: /cvs/src/src/libiberty/filename_cmp.c,v
retrieving revision 1.6
diff -u -p -r1.6 filename_cmp.c
--- libiberty/filename_cmp.c	1 Jul 2011 18:24:39 -0000	1.6
+++ libiberty/filename_cmp.c	9 Jul 2012 17:24:53 -0000
@@ -141,3 +141,52 @@  filename_ncmp (const char *s1, const cha
   return 0;
 #endif
 }
+
+/*
+
+@deftypefn Extension hashval_t filename_hash (const void *@var{s})
+
+Return the hash value for file name @var{s} that will be compared
+using filename_cmp.
+This function is for use with hashtab.c hash tables.
+
+@end deftypefn
+
+*/
+
+hashval_t
+filename_hash (const void *s)
+{
+  /* The cast is for -Wc++-compat.  */
+  const unsigned char *str = (const unsigned char *) s;
+  hashval_t r = 0;
+  unsigned char c;
+
+  while ((c = *str++) != 0)
+    {
+      if (c == '\\')
+	c = '/';
+      c = TOLOWER (c);
+      r = r * 67 + c - 113;
+    }
+
+  return r;
+}
+
+/*
+
+@deftypefn Extension int filename_eq (const void *@var{s1}, const void *@var{s2})
+
+Return non-zero if file names @var{s1} and @var{s2} are equivalent.
+This function is for use with hashtab.c hash tables.
+
+@end deftypefn
+
+*/
+
+int
+filename_eq (const void *s1, const void *s2)
+{
+  /* The casts are for -Wc++-compat.  */
+  return filename_cmp ((const char *) s1, (const char *) s2) == 0;
+}
Index: gdb/symtab.c
===================================================================
RCS file: /cvs/src/src/gdb/symtab.c,v
retrieving revision 1.316
diff -u -p -r1.316 symtab.c
--- gdb/symtab.c	29 Jun 2012 22:46:45 -0000	1.316
+++ gdb/symtab.c	9 Jul 2012 17:24:54 -0000
@@ -3108,44 +3108,34 @@  operator_chars (char *p, char **end)
 /* If FILE is not already in the table of files, return zero;
    otherwise return non-zero.  Optionally add FILE to the table if ADD
    is non-zero.  If *FIRST is non-zero, forget the old table
-   contents.  */
+   contents.
+
+   NOTE: We don't manage space for FILE, we assume FILE lives as long
+   as the caller needs.  */
 
 static int
 filename_seen (const char *file, int add, int *first)
 {
   /* Table of files seen so far.  */
-  static const char **tab = NULL;
-  /* Allocated size of tab in elements.
-     Start with one 256-byte block (when using GNU malloc.c).
-     24 is the malloc overhead when range checking is in effect.  */
-  static int tab_alloc_size = (256 - 24) / sizeof (char *);
-  /* Current size of tab in elements.  */
-  static int tab_cur_size;
-  const char **p;
+  static htab_t files_seen;
+  void **slot;
 
   if (*first)
     {
-      if (tab == NULL)
-	tab = (const char **) xmalloc (tab_alloc_size * sizeof (*tab));
-      tab_cur_size = 0;
+      if (files_seen != NULL)
+	htab_delete (files_seen);
+      files_seen = htab_create_alloc (10, filename_hash, filename_eq,
+				      NULL, xcalloc, xfree);
     }
 
   /* Is FILE in tab?  */
-  for (p = tab; p < tab + tab_cur_size; p++)
-    if (filename_cmp (*p, file) == 0)
-      return 1;
+  slot = htab_find_slot (files_seen, file, add ? INSERT : NO_INSERT);
+  if (*slot != NULL)
+    return 1;
 
   /* No; maybe add it to tab.  */
   if (add)
-    {
-      if (tab_cur_size == tab_alloc_size)
-	{
-	  tab_alloc_size *= 2;
-	  tab = (const char **) xrealloc ((char *) tab,
-					  tab_alloc_size * sizeof (*tab));
-	}
-      tab[tab_cur_size++] = file;
-    }
+    *slot = (char *) file;
 
   return 0;
 }