Patchwork gengtype improvements for plugins. patch 4/N [files_rules]

login
register
mail settings
Submitter Basile Starynkevitch
Date Aug. 29, 2010, 11:34 a.m.
Message ID <1283081687.3067.86.camel@glinka>
Download mbox | patch
Permalink /patch/62942/
State New
Headers show

Comments

Basile Starynkevitch - Aug. 29, 2010, 11:34 a.m.
See http://gcc.gnu.org/ml/gcc-patches/2010-08/msg02058.html &
http://gcc.gnu.org/ml/gcc-patches/2010-08/msg02060.html &
http://gcc.gnu.org/ml/gcc-patches/2010-08/msg02063.html for the previous
pieces of the patch.



The fourth piece from our patch is improving much the core
get_output_file_with_visibility function. It is made much more modular
and less ad hoc by a file rule machinery. We now use regular expressions
to match the input file name and compute the associated output name &
for name.  We feel such an approach is much cleaner, and easier to
understand and to extend.


Here is a comment (contained in the patch) explaining the details.

/**
 Regexpr machinery to compute the output_name and for_name-s of each
 input_file. We have a sequence of file rules which gives the POSIX
 extended regular expression to match an input file path, and two
 transformed strings for the corresponding output_name and the
 corresponding for_name.  The transformed string contain dollars: $0
 is replaced by the entire match, $1 is replaced by the substring
 matching the first parenthesis in the regexp, etc. And $$ is replaced
 by a single verbatim dollar.  The rule order is important.  The
 general case is last, and the particular cases should be first.

 An action routine can, when needed, update the out_name & for_name
 and return the appropriate output file.
*/

Attached is relpatch04to03-filerules.diff the patch relative to previous
patches, relpatch04to03-filerules.ChangeLog its gcc/ChangeLog entry, and
for convenience the cumulated patches
all-patches-r163612-up-to-04.diff.gz to trunk.

Ok for trunk?
Laurynas Biveinis - Sept. 6, 2010, 2:56 a.m.
> 	* gengtype.c:
> 	include "xregex.h" & "obstack.h" from libiberty's include/.

* gengtype.c: Include xregex.h and obstack.h

> 	(frul_actionrout_t): new type - a signature for file rule actions.

(frul_actionrout_t): New.

+/***

/*

+/* Action handling *.c files */
+static outf_p implem_frul (input_file*, char**, char**);

source_frul ?

+static outf_p
+header_frul(input_file* inpf, char**poutname, char**pforname)
+{
+    const char *basename = 0;
+    int lang_index = 0;
+    const char* inpname = input_file_name (inpf);
+    dbgprintf ("inpf %p inpname %s outname %s forname %s", (void*)
inpf, inpname, *poutname, *pforname);
+    basename = get_file_basename (inpf);
+    lang_index = get_prefix_langdir_index (basename);
+    dbgprintf ("basename %s lang_index %d", basename, lang_index);
+
+    if (lang_index >= 0)
+	return base_files[lang_index];
+    else {
+	/* TODO: free the old outname */

Is there anything stopping you from just doing that? Same for the next TODO.

+      *poutname =  CONST_CAST (char*, "gtype-desc.c");

Watch spaces before CONST_CAST.

+    /* In the future, we need to add a case for C++ sources.  */
+

Just drop it.

+static char*
+input_file_substitute (const input_file *fil, const regex_t* rex,
+		       const char* trs, int rflags)
+{
+  regmatch_t pmatch[10];
+  int notmatched = 0;
+  struct obstack str_obstack;

Push the str_obstack down to the scope that exclusively uses it, same
for other variables.

Overall, I like this patch very much. I cannot approve it, but looks
good to me with the comments addressed.

2010/8/29 Basile Starynkevitch <basile@starynkevitch.net>:
> See http://gcc.gnu.org/ml/gcc-patches/2010-08/msg02058.html &
> http://gcc.gnu.org/ml/gcc-patches/2010-08/msg02060.html &
> http://gcc.gnu.org/ml/gcc-patches/2010-08/msg02063.html for the previous
> pieces of the patch.
>
>
>
> The fourth piece from our patch is improving much the core
> get_output_file_with_visibility function. It is made much more modular
> and less ad hoc by a file rule machinery. We now use regular expressions
> to match the input file name and compute the associated output name &
> for name.  We feel such an approach is much cleaner, and easier to
> understand and to extend.
>
>
> Here is a comment (contained in the patch) explaining the details.
>
> /**
>  Regexpr machinery to compute the output_name and for_name-s of each
>  input_file. We have a sequence of file rules which gives the POSIX
>  extended regular expression to match an input file path, and two
>  transformed strings for the corresponding output_name and the
>  corresponding for_name.  The transformed string contain dollars: $0
>  is replaced by the entire match, $1 is replaced by the substring
>  matching the first parenthesis in the regexp, etc. And $$ is replaced
>  by a single verbatim dollar.  The rule order is important.  The
>  general case is last, and the particular cases should be first.
>
>  An action routine can, when needed, update the out_name & for_name
>  and return the appropriate output file.
> */
>
> Attached is relpatch04to03-filerules.diff the patch relative to previous
> patches, relpatch04to03-filerules.ChangeLog its gcc/ChangeLog entry, and
> for convenience the cumulated patches
> all-patches-r163612-up-to-04.diff.gz to trunk.
>
> Ok for trunk?
>
>
> --
> Basile STARYNKEVITCH         http://starynkevitch.net/Basile/
> email: basile<at>starynkevitch<dot>net mobile: +33 6 8501 2359
> 8, rue de la Faiencerie, 92340 Bourg La Reine, France
> *** opinions {are only mine, sont seulement les miennes} ***
>

Patch

--- ../gengtype-gcc-03/gengtype.c	2010-08-29 11:25:37.000000000 +0200
+++ gcc/gengtype.c	2010-08-29 13:20:46.000000000 +0200
@@ -25,6 +25,8 @@ 
 #include "double-int.h"
 #include "hashtab.h"
 #include "version.h"    /* for version_string & pkgversion_string */
+#include "xregex.h" 
+#include "obstack.h"
 #include "gengtype.h"
 
 /* Data types, macros, etc. used only in this file.  */
@@ -1725,6 +1727,214 @@  get_file_gtfilename (const input_file *i
   return result;
 }
 
+/***
+ Regexpr machinery to compute the output_name and for_name-s of each
+ input_file. We have a sequence of file rules which gives the POSIX
+ extended regular expression to match an input file path, and two
+ transformed strings for the corresponding output_name and the
+ corresponding for_name.  The transformed string contain dollars: $0
+ is replaced by the entire match, $1 is replaced by the substring
+ matching the first parenthesis in the regexp, etc. And $$ is replaced
+ by a single verbatim dollar.  The rule order is important.  The
+ general case is last, and the particular cases should be first.
+
+ An action routine can, when needed, update the out_name & for_name
+ and return the appropriate output file.
+ */
+
+typedef outf_p (frul_actionrout_t)(input_file*, char**poutname, char**pforname);
+
+struct file_rule_st {
+    const char* frul_srcexpr;	/* source string for regular expression */
+    int frul_rflags;	/* flags for regcomp(3), usually
+			 * REG_EXTENDED */
+    regex_t* frul_re;		/* compiled regular expression */
+    const char* frul_tr_out;	/* transform string for making the
+				 * output_name, with $1 ... $9 for
+				 * subpatterns and $0 for the whole
+				 * matched filename */
+    const char* frul_tr_for;	/* tranform string for for_name */
+    /* the action, if non null, is called once the rule matches, on
+     * the transformed out_name & for_name.  It could change them and
+     * give the output file. */
+    frul_actionrout_t* frul_action;
+};
+
+/* Action handling *.h files */
+static outf_p header_frul (input_file*, char**, char**);
+
+/* Action handling *.c files */ 
+static outf_p implem_frul (input_file*, char**, char**);
+
+
+#define NULL_REGEX (regex_t*)0
+#define NULL_FRULACT (frul_actionrout_t*)0
+
+/* The array of our rules governing file name generation. Order
+   matters!  Change it with care! */
+
+struct file_rule_st files_rules[] = {
+
+    /* the c-family/ source directory is special */
+    { "^(([^/]*/)*)c-family/([[:alnum:]_-]*)\\.c$",  
+      REG_EXTENDED, NULL_REGEX, 
+      "gt-c-family-$3.h", "c-family/$3.c", NULL_FRULACT},
+
+    { "^(([^/]*/)*)c-family/([[:alnum:]_-]*)\\.h$",  
+      REG_EXTENDED, NULL_REGEX, 
+      "gt-c-family-$3.h", "c-family/$3.h", NULL_FRULACT},
+    
+    /* Both c-lang.h & c-tree.h gives gt-c-decl.h for c-decl.c ! */
+    { "^(([^/]*/)*)c-lang\\.h$",
+      REG_EXTENDED, NULL_REGEX, "gt-c-decl.h", "c-decl.c", NULL_FRULACT},
+
+    { "^(([^/]*/)*)c-tree\\.h$", 
+      REG_EXTENDED, NULL_REGEX, "gt-c-decl.h", "c-decl.c", NULL_FRULACT},
+
+    /* cp/cp-tree.h gives gt-cp-tree.h for cp/tree.c ! */
+    { "^(([^/]*/)*)cp/cp-tree\\.h$", 
+      REG_EXTENDED, NULL_REGEX, 
+      "gt-cp-tree.h", "cp/tree.c", NULL_FRULACT },
+
+    /* cp/decl.h & cp/decl.c gives gt-cp-decl.h for cp/decl.c ! */
+    { "^(([^/]*/)*)cp/decl\\.[ch]$", 
+      REG_EXTENDED, NULL_REGEX, 
+      "gt-cp-decl.h", "cp/decl.c", NULL_FRULACT },
+
+    /* cp/name-lookup.h gives gt-cp-name-lookup.h for cp/name-lookup.c ! */
+    { "^(([^/]*/)*)cp/name-lookup\\.h$", 
+      REG_EXTENDED, NULL_REGEX, 
+      "gt-cp-name-lookup.h", "cp/name-lookup.c", NULL_FRULACT },
+
+    /* objc/objc-act.h fives gt-objc-objc-act.h for objc/objc-act.c ! */
+    { "^(([^/]*/)*)objc/objc-act\\.h$", 
+      REG_EXTENDED, NULL_REGEX,
+      "gt-objc-objc-act.h", "objc/objc-act.c", NULL_FRULACT },
+
+    /* General cases.  For header & implementation files, we need a
+     * special action to handle the language. */
+    { "^(([^/]*/)*)([[:alnum:]_-]*)\\.c$", 
+      REG_EXTENDED, NULL_REGEX, "gt-$3.h", "$3.c", implem_frul},
+    { "^(([^/]*/)*)([[:alnum:]_-]*)\\.h$", 
+      REG_EXTENDED, NULL_REGEX, "gt-$3.h", "$3.h", header_frul},
+    { "^(([^/]*/)*)([[:alnum:]_-]*)\\.in$", 
+      REG_EXTENDED, NULL_REGEX, "gt-$3.h", "$3.in", NULL_FRULACT},
+
+    /* In the future, we need to add a case for C++ sources.  */
+
+    /* null for end of rules */
+    {NULL, 0, NULL_REGEX, NULL, NULL, NULL_FRULACT}
+};
+
+
+/* Special file rules action for handling header files. */
+static outf_p 
+header_frul(input_file* inpf, char**poutname, char**pforname)
+{
+    const char *basename = 0;
+    int lang_index = 0;
+    const char* inpname = input_file_name (inpf);
+    dbgprintf ("inpf %p inpname %s outname %s forname %s", (void*) inpf, inpname, *poutname, *pforname);
+    basename = get_file_basename (inpf);
+    lang_index = get_prefix_langdir_index (basename);
+    dbgprintf ("basename %s lang_index %d", basename, lang_index);
+
+    if (lang_index >= 0) 
+	return base_files[lang_index];
+    else {
+	/* TODO: free the old outname */
+      *poutname =  CONST_CAST (char*, "gtype-desc.c");
+      *pforname = NULL;
+      dbgprintf("special gtype-desc.c for inpname %s", inpname);
+      return NULL;
+    }
+}
+
+/* Special file rules action for handling implementation files,
+ * notably taking care of the language. */
+
+static outf_p 
+implem_frul (input_file* inpf, char**poutname, char**pforname)
+{
+    char *newbasename = NULL;
+    char* newoutname = NULL;
+    const char* inpname = input_file_name (inpf);
+    dbgprintf ("inpf %p inpname %s oriiginal outname %s forname %s",
+	       (void*) inpf, inpname, *poutname, *pforname);
+    newoutname = CONST_CAST (char*, get_file_gtfilename (inpf));
+    dbgprintf ("newoutname %s", newoutname);
+    newbasename = CONST_CAST (char*, get_file_basename (inpf));
+    dbgprintf ("newbasename %s", newbasename);
+    /* TODO: free the old outname & forname */
+    *poutname = newoutname;
+    *pforname = newbasename;
+    return NULL;
+}
+
+
+/* utility function which returns NULL on regexpr mismatch, or the
+ * malloc-ed substituted string using TRS on matching of the FIL input
+ * file against the REX regexp. */
+static char* 
+input_file_substitute (const input_file *fil, const regex_t* rex, 
+		       const char* trs, int rflags)
+{
+  regmatch_t pmatch[10];
+  int notmatched = 0;
+  struct obstack str_obstack;
+  char* str = NULL;
+  const char* filnam = input_file_name (fil);
+  memset (&pmatch, 0, sizeof(pmatch));
+  notmatched = regexec (rex, filnam, 10, pmatch, rflags);
+  dbgprintf ("filnam %s", filnam);
+  if (!notmatched) 
+    {
+      char* rawstr = NULL;
+      const char* pt = NULL;
+      obstack_init (&str_obstack);
+      for (pt = trs; *pt; pt++) {
+	char c = *pt;
+	if (c == '$') {
+	  if (pt[1] == '$') 
+	    {
+	    /* A double dollar $$ is substituted by a single verbatim
+	       dollar, but who really uses dollar signs in file
+	       paths? */
+	    obstack_1grow (&str_obstack, '$');
+	    }
+	  else if (ISDIGIT(pt[1])) 
+	    {
+	      /* Handle $0 $1 .. $9 by appropriate substitution. */
+	      int dolnum = pt[1] - '0';
+	      int so = pmatch[dolnum].rm_so;
+	      int eo = pmatch[dolnum].rm_eo;
+	      dbgprintf ("so=%d eo=%d dolnum=%d", so, eo, dolnum);
+	      if (so>=0 && eo>=so) 
+		obstack_grow (&str_obstack, filnam + so, eo - so);
+	    }
+	  else
+	    /* This can happen only when files_rules is buggy! */
+	    fatal ("invalid dollar in transform string %s", trs);
+	  /* Always skip the character after the dollar.  */
+	  pt++;
+	}
+	else
+	  obstack_1grow (&str_obstack, c);
+      }
+      /* add the terminating null */
+      obstack_1grow (&str_obstack, (char) 0);
+      rawstr = XOBFINISH (&str_obstack, char *);
+      str = xstrdup (rawstr);
+      obstack_free (&str_obstack, rawstr);
+      dbgprintf ("matched replacement %s", str);
+      rawstr = NULL;
+      return str;
+    }
+  else 
+    dbgprintf ("non-matched filename %s", filnam);
+  return NULL;
+}
+
 /* An output file, suitable for definitions, that can see declarations
    made in INPF and is linked into every language that uses
    INPF.  */
@@ -1733,10 +1943,8 @@  outf_p
 get_output_file_with_visibility (input_file *inpf)
 {
   outf_p r;
-  size_t len;
-  const char *basename;
-  const char *for_name;
-  const char *output_name;
+  const char *for_name = NULL;
+  const char *output_name = NULL;
 
   /* This can happen when we need a file with visibility on a
      structure that we've never seen.  We have to just hope that it's
@@ -1763,64 +1971,93 @@  get_output_file_with_visibility (input_f
   if (inpf->inpoutf != NULL)
     return inpf->inpoutf;
 
-  /* Determine the output file name.  */
-  basename = get_file_basename (inpf);
 
-  len = strlen (basename);
-  if ((len > 2 && memcmp (basename+len-2, ".c", 2) == 0)
-      || (len > 2 && memcmp (basename+len-2, ".y", 2) == 0)
-      || (len > 3 && memcmp (basename+len-3, ".in", 3) == 0))
-    {
-      output_name = get_file_gtfilename (inpf);
-      for_name = basename;
-    }
-  /* Some headers get used by more than one front-end; hence, it
-     would be inappropriate to spew them out to a single gtype-<lang>.h
-     (and gengtype doesn't know how to direct spewage into multiple
-     gtype-<lang>.h headers at this time).  Instead, we pair up these
-     headers with source files (and their special purpose gt-*.h headers).  */
-  else if (strncmp (basename, "c-family", 8) == 0
-	   && IS_DIR_SEPARATOR (basename[8])
-	   && strcmp (basename + 9, "c-common.h") == 0)
-    output_name = "gt-c-family-c-common.h", for_name = "c-family/c-common.c";
-  else if (strcmp (basename, "c-lang.h") == 0)
-    output_name = "gt-c-decl.h", for_name = "c-decl.c";
-  else if (strcmp (basename, "c-tree.h") == 0)
-    output_name = "gt-c-decl.h", for_name = "c-decl.c";
-  else if (strncmp (basename, "cp", 2) == 0 && IS_DIR_SEPARATOR (basename[2])
-	   && strcmp (basename + 3, "cp-tree.h") == 0)
-    output_name = "gt-cp-tree.h", for_name = "cp/tree.c";
-  else if (strncmp (basename, "cp", 2) == 0 && IS_DIR_SEPARATOR (basename[2])
-	   && strcmp (basename + 3, "decl.h") == 0)
-    output_name = "gt-cp-decl.h", for_name = "cp/decl.c";
-  else if (strncmp (basename, "cp", 2) == 0 && IS_DIR_SEPARATOR (basename[2])
-	   && strcmp (basename + 3, "name-lookup.h") == 0)
-    output_name = "gt-cp-name-lookup.h", for_name = "cp/name-lookup.c";
-  else if (strncmp (basename, "objc", 4) == 0 && IS_DIR_SEPARATOR (basename[4])
-	   && strcmp (basename + 5, "objc-act.h") == 0)
-    output_name = "gt-objc-objc-act.h", for_name = "objc/objc-act.c";
-  else
+  /* Use our file_rules machinery! */
     {
-      int lang_index = get_prefix_langdir_index (basename);
-
-      if (lang_index >= 0) {
-	inpf->inpoutf = base_files[lang_index];
-	return base_files[lang_index];
+    int rulix = 0;
+    for (; files_rules[rulix].frul_srcexpr != NULL; rulix++)
+      {
+	char* outs = NULL;
+	char* fors = NULL;
+	dbgprintf("rulix#%d srcexpr %s",
+		  rulix, files_rules[rulix].frul_srcexpr);
+	if (!files_rules[rulix].frul_re)
+	  {
+	    /* We lazily compile the regexpr only once. */
+	    int err = 0;
+	    files_rules[rulix].frul_re = XCNEW(regex_t);
+	    err = regcomp (files_rules[rulix].frul_re,
+			   files_rules[rulix].frul_srcexpr, 
+			   files_rules[rulix].frul_rflags);
+	    if (err) {
+	      /* The regular expression compilation fails only when
+		 file_rules is buggy.  We give a possibly truncated
+		 error message in this impossible case. */
+	      char errbuf[80];
+	      memset(errbuf, 0, sizeof(errbuf));
+	      regerror (err, files_rules[rulix].frul_re,
+			errbuf, sizeof(errbuf)-1);
+	      fatal("file rule regexpr error %s", errbuf);
+	    }
+	  };
+	outs = input_file_substitute (inpf, files_rules[rulix].frul_re, 
+				      files_rules[rulix].frul_tr_out, 0);
+	if (!outs)
+	  continue;
+
+	fors = input_file_substitute (inpf, files_rules[rulix].frul_re, 
+				      files_rules[rulix].frul_tr_for, 0);
+	dbgprintf("rulix#%d outs %s fors %s", 
+		  rulix, outs, fors);
+	if (outs && fors) {
+	  dbgprintf ("raw outs %s fors %s", outs, fors);
+	  output_name = outs;
+	  for_name = fors;
+	  if (files_rules[rulix].frul_action) {
+	    /* Invoke our action routine. */
+	    outf_p of = NULL;
+	    dbgprintf("before action rulix %d outs %s fors %s", 
+		      rulix, outs, fors);
+	    of = 
+	      (files_rules[rulix].frul_action) (inpf,
+						&outs, &fors);
+	    output_name = outs;
+	    for_name = fors;
+	    dbgprintf("after action rulix %d of=%p output_name %s for_name %s",
+		      rulix, (void*)of, output_name, for_name);
+	    /* If the action routine returned something, give it back
+	       immediately. */
+	    if (of) {
+	      inpf->inpoutf = of;
+	      return of;
+	    }
+	  };
+	  /* The rule matched, and had no action, or that action did
+	     not return any output file but could have changed the
+	     output_name or for_name.  We continue out of the loop. */
+	  break;
+	}
       }
-
-      output_name = "gtype-desc.c";
-      for_name = NULL;
     }
 
-  /* Look through to see if we've ever seen this output filename before.  */
+  dbgprintf ("usual case output_name %s for_name %s", output_name, for_name);
+
+  /* Look through to see if we've ever seen this output filename
+     before.  */
   for (r = output_files; r; r = r->next)
     if (strcmp (r->name, output_name) == 0)
+      {
+	dbgprintf("found r @ %p %s", (void*)r, r->name);
+	inpf->inpoutf = r;
       return r;
+      }
 
-  /* If not, create it.  */
+  /* If not, create it, and cache it in the input file.  */
   r = create_file (for_name, output_name);
 
   gcc_assert (r && r->name);
+  dbgprintf("created r %s", r->name);
+  inpf->inpoutf = r;
   return r;
 }