From patchwork Tue Sep 21 19:41:40 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Basile Starynkevitch X-Patchwork-Id: 65376 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id E7E2FB70D2 for ; Wed, 22 Sep 2010 05:41:57 +1000 (EST) Received: (qmail 19138 invoked by alias); 21 Sep 2010 19:41:53 -0000 Received: (qmail 19107 invoked by uid 22791); 21 Sep 2010 19:41:48 -0000 X-SWARE-Spam-Status: No, hits=-0.4 required=5.0 tests=AWL,BAYES_50 X-Spam-Check-By: sourceware.org Received: from smtp-152-tuesday.nerim.net (HELO maiev.nerim.net) (194.79.134.152) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 21 Sep 2010 19:41:35 +0000 Received: from hector.lesours (ours.starynkevitch.net [213.41.244.95]) by maiev.nerim.net (Postfix) with ESMTPS id 784B22E021; Tue, 21 Sep 2010 21:41:31 +0200 (CEST) Received: from glinka.lesours ([192.168.0.1] helo=glinka) by hector.lesours with smtp (Exim 4.72) (envelope-from ) id 1Oy8iY-00005X-VO; Tue, 21 Sep 2010 21:41:30 +0200 Date: Tue, 21 Sep 2010 21:41:40 +0200 From: Basile Starynkevitch To: Basile Starynkevitch Cc: gcc-patches@gcc.gnu.org Subject: gengtype improvements for plugins, thirdround! patch 7/7 [doc] Message-Id: <20100921214140.3867b748.basile@starynkevitch.net> In-Reply-To: <20100921213151.0332d8b3.basile@starynkevitch.net> References: <20100921210301.d92889be.basile@starynkevitch.net> <20100921210829.5f2ea8b0.basile@starynkevitch.net> <20100921211217.df1ef337.basile@starynkevitch.net> <20100921211628.969ed469.basile@starynkevitch.net> <20100921212431.f07dad12.basile@starynkevitch.net> <20100921212926.f7688cfd.basile@starynkevitch.net> <20100921213151.0332d8b3.basile@starynkevitch.net> Mime-Version: 1.0 X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hello All Please notice that I made a mail mistake in the previous patch chunk. The 6th patch [wstate] is in http://gcc.gnu.org/ml/gcc-patches/2010-09/msg01716.html and not in http://gcc.gnu.org/ml/gcc-patches/2010-09/msg01714.html which got the wrong attachments (no patches there!). Sorry for the noise. References http://gcc.gnu.org/ml/gcc-patches/2010-09/msg01716.html http://gcc.gnu.org/ml/gcc-patches/2010-09/msg01032.html http://gcc.gnu.org/ml/gcc-patches/2010-09/msg01081.html The last patch takes into account Lauryanas remarks in http://gcc.gnu.org/ml/gcc-patches/2010-09/msg01081.html and is a documentation patch. I added some more explanations so please check my poor english language. I still hope that my patch serie "thirdround" will be Ok, with perhaps minor changes required. #################### gcc/ChangeLog entry for documentation 2010-09-21 Basile Starynkevitch * gcc/doc/gty.texi: (Generating GGC code with gengtype): New node. (GTY Options): Documented that tag should be unique. Added ptr_alias documentation. (How to run the gengtype generator): New section. ################################################################ The attached patch is relative to http://gcc.gnu.org/ml/gcc-patches/2010-09/msg01716.html Ok for trunk? With what changes? BTW, the dates in the patch series (and notably their ChangeLog entry) might be wrong. If Ok-ed, I will put the date (as seen from Paris, France) of the commit. Cheers --- ../thirdround_06_wstate//gty.texi 2010-09-21 20:21:19.000000000 +0200 +++ gcc/doc/gty.texi 2010-09-21 20:20:20.000000000 +0200 @@ -0,0 +1,552 @@ +@c Copyright (C) 2002, 2003, 2004, 2007, 2008, 2009 +@c Free Software Foundation, Inc. +@c This is part of the GCC manual. +@c For copying conditions, see the file gcc.texi. + +@node Type Information +@chapter Memory Management and Type Information +@cindex GGC +@findex GTY + +GCC uses some fairly sophisticated memory management techniques, which +involve determining information about GCC's data structures from GCC's +source code and using this information to perform garbage collection and +implement precompiled headers. + +A full C parser would be too complicated for this task, so a limited +subset of C is interpreted and special markers are used to determine +what parts of the source to look at. All @code{struct} and +@code{union} declarations that define data structures that are +allocated under control of the garbage collector must be marked. All +global variables that hold pointers to garbage-collected memory must +also be marked. Finally, all global variables that need to be saved +and restored by a precompiled header must be marked. (The precompiled +header mechanism can only save static variables if they're scalar. +Complex data structures must be allocated in garbage-collected memory +to be saved in a precompiled header.) + +The full format of a marker is +@smallexample +GTY (([@var{option}] [(@var{param})], [@var{option}] [(@var{param})] @dots{})) +@end smallexample +@noindent +but in most cases no options are needed. The outer double parentheses +are still necessary, though: @code{GTY(())}. Markers can appear: + +@itemize @bullet +@item +In a structure definition, before the open brace; +@item +In a global variable declaration, after the keyword @code{static} or +@code{extern}; and +@item +In a structure field definition, before the name of the field. +@end itemize + +Here are some examples of marking simple data structures and globals. + +@smallexample +struct GTY(()) @var{tag} +@{ + @var{fields}@dots{} +@}; + +typedef struct GTY(()) @var{tag} +@{ + @var{fields}@dots{} +@} *@var{typename}; + +static GTY(()) struct @var{tag} *@var{list}; /* @r{points to GC memory} */ +static GTY(()) int @var{counter}; /* @r{save counter in a PCH} */ +@end smallexample + +The parser understands simple typedefs such as +@code{typedef struct @var{tag} *@var{name};} and +@code{typedef int @var{name};}. +These don't need to be marked. + +@menu +* GTY Options:: What goes inside a @code{GTY(())}. +* GGC Roots:: Making global variables GGC roots. +* Files:: How the generated files work. +* Generating GGC code with gengtype:: How to run the gengtype generator. +* Invoking the garbage collector:: How to invoke the garbage collector. +@end menu + +@node GTY Options +@section The Inside of a @code{GTY(())} + +Sometimes the C code is not enough to fully describe the type +structure. Extra information can be provided with @code{GTY} options +and additional markers. Some options take a parameter, which may be +either a string or a type name, depending on the parameter. If an +option takes no parameter, it is acceptable either to omit the +parameter entirely, or to provide an empty string as a parameter. For +example, @code{@w{GTY ((skip))}} and @code{@w{GTY ((skip ("")))}} are +equivalent. + +When the parameter is a string, often it is a fragment of C code. Four +special escapes may be used in these strings, to refer to pieces of +the data structure being marked: + +@cindex % in GTY option +@table @code +@item %h +The current structure. +@item %1 +The structure that immediately contains the current structure. +@item %0 +The outermost structure that contains the current structure. +@item %a +A partial expression of the form @code{[i1][i2]@dots{}} that indexes +the array item currently being marked. +@end table + +For instance, suppose that you have a structure of the form +@smallexample +struct A @{ + @dots{} +@}; +struct B @{ + struct A foo[12]; +@}; +@end smallexample +@noindent +and @code{b} is a variable of type @code{struct B}. When marking +@samp{b.foo[11]}, @code{%h} would expand to @samp{b.foo[11]}, +@code{%0} and @code{%1} would both expand to @samp{b}, and @code{%a} +would expand to @samp{[11]}. + +As in ordinary C, adjacent strings will be concatenated; this is +helpful when you have a complicated expression. +@smallexample +@group +GTY ((chain_next ("TREE_CODE (&%h.generic) == INTEGER_TYPE" + " ? TYPE_NEXT_VARIANT (&%h.generic)" + " : TREE_CHAIN (&%h.generic)"))) +@end group +@end smallexample + +The available options are: + +@table @code +@findex length +@item length ("@var{expression}") + +There are two places the type machinery will need to be explicitly told +the length of an array. The first case is when a structure ends in a +variable-length array, like this: +@smallexample +struct GTY(()) rtvec_def @{ + int num_elem; /* @r{number of elements} */ + rtx GTY ((length ("%h.num_elem"))) elem[1]; +@}; +@end smallexample + +In this case, the @code{length} option is used to override the specified +array length (which should usually be @code{1}). The parameter of the +option is a fragment of C code that calculates the length. + +The second case is when a structure or a global variable contains a +pointer to an array, like this: +@smallexample +struct gimple_omp_for_iter * GTY((length ("%h.collapse"))) iter; +@end smallexample +In this case, @code{iter} has been allocated by writing something like +@smallexample + x->iter = ggc_alloc_cleared_vec_gimple_omp_for_iter (collapse); +@end smallexample +and the @code{collapse} provides the length of the field. + +This second use of @code{length} also works on global variables, like: +@verbatim +static GTY((length("reg_known_value_size"))) rtx *reg_known_value; +@end verbatim + +@findex skip +@item skip + +If @code{skip} is applied to a field, the type machinery will ignore it. +This is somewhat dangerous; the only safe use is in a union when one +field really isn't ever used. + +@findex desc +@findex tag +@findex default +@item desc ("@var{expression}") +@itemx tag ("@var{constant}") +@itemx default + +The type machinery needs to be told which field of a @code{union} is +currently active. This is done by giving each field a constant +@code{tag} value, and then specifying a discriminator using @code{desc}. +The value of the expression given by @code{desc} is compared against +each @code{tag} value, each of which should be different. If no +@code{tag} is matched, the field marked with @code{default} is used if +there is one, otherwise no field in the union will be marked. +A union field can have at most one @code{tag}. + +In the @code{desc} option, the ``current structure'' is the union that +it discriminates. Use @code{%1} to mean the structure containing it. +There are no escapes available to the @code{tag} option, since it is a +constant. + +For example, +@smallexample +struct GTY(()) tree_binding +@{ + struct tree_common common; + union tree_binding_u @{ + tree GTY ((tag ("0"))) scope; + struct cp_binding_level * GTY ((tag ("1"))) level; + @} GTY ((desc ("BINDING_HAS_LEVEL_P ((tree)&%0)"))) xscope; + tree value; +@}; +@end smallexample + +In this example, the value of BINDING_HAS_LEVEL_P when applied to a +@code{struct tree_binding *} is presumed to be 0 or 1. If 1, the type +mechanism will treat the field @code{level} as being present and if 0, +will treat the field @code{scope} as being present. + +@findex param_is +@findex use_param +@item param_is (@var{type}) +@itemx use_param + +Sometimes it's convenient to define some data structure to work on +generic pointers (that is, @code{PTR}) and then use it with a specific +type. @code{param_is} specifies the real type pointed to, and +@code{use_param} says where in the generic data structure that type +should be put. + +For instance, to have a @code{htab_t} that points to trees, one would +write the definition of @code{htab_t} like this: +@smallexample +typedef struct GTY(()) @{ + @dots{} + void ** GTY ((use_param, @dots{})) entries; + @dots{} +@} htab_t; +@end smallexample +and then declare variables like this: +@smallexample + static htab_t GTY ((param_is (union tree_node))) ict; +@end smallexample + +@findex param@var{n}_is +@findex use_param@var{n} +@item param@var{n}_is (@var{type}) +@itemx use_param@var{n} + +In more complicated cases, the data structure might need to work on +several different types, which might not necessarily all be pointers. +For this, @code{param1_is} through @code{param9_is} may be used to +specify the real type of a field identified by @code{use_param1} through +@code{use_param9}. + +@findex use_params +@item use_params + +When a structure contains another structure that is parameterized, +there's no need to do anything special, the inner structure inherits the +parameters of the outer one. When a structure contains a pointer to a +parameterized structure, the type machinery won't automatically detect +this (it could, it just doesn't yet), so it's necessary to tell it that +the pointed-to structure should use the same parameters as the outer +structure. This is done by marking the pointer with the +@code{use_params} option. + +@findex deletable +@item deletable + +@code{deletable}, when applied to a global variable, indicates that when +garbage collection runs, there's no need to mark anything pointed to +by this variable, it can just be set to @code{NULL} instead. This is used +to keep a list of free structures around for re-use. + +@findex if_marked +@item if_marked ("@var{expression}") + +Suppose you want some kinds of object to be unique, and so you put them +in a hash table. If garbage collection marks the hash table, these +objects will never be freed, even if the last other reference to them +goes away. GGC has special handling to deal with this: if you use the +@code{if_marked} option on a global hash table, GGC will call the +routine whose name is the parameter to the option on each hash table +entry. If the routine returns nonzero, the hash table entry will +be marked as usual. If the routine returns zero, the hash table entry +will be deleted. + +The routine @code{ggc_marked_p} can be used to determine if an element +has been marked already; in fact, the usual case is to use +@code{if_marked ("ggc_marked_p")}. + +@findex mark_hook +@item mark_hook ("@var{hook-routine-name}") + +If provided for a structure or union type, the given +@var{hook-routine-name} (between double-quotes) is the name of a +routine called when the garbage collector has just marked the data as +reachable. This routine should not change the data, or call any ggc +routine. Its only argument is a pointer to the just marked (const) +structure or union. + +@findex maybe_undef +@item maybe_undef + +When applied to a field, @code{maybe_undef} indicates that it's OK if +the structure that this fields points to is never defined, so long as +this field is always @code{NULL}. This is used to avoid requiring +backends to define certain optional structures. It doesn't work with +language frontends. + +@findex nested_ptr +@item nested_ptr (@var{type}, "@var{to expression}", "@var{from expression}") + +The type machinery expects all pointers to point to the start of an +object. Sometimes for abstraction purposes it's convenient to have +a pointer which points inside an object. So long as it's possible to +convert the original object to and from the pointer, such pointers +can still be used. @var{type} is the type of the original object, +the @var{to expression} returns the pointer given the original object, +and the @var{from expression} returns the original object given +the pointer. The pointer will be available using the @code{%h} +escape. + +@findex ptr_alias +@item ptr_alias (@var{type}) + +When applied to a type, @code{ptr_alias} indicates that the GTY-ed +type is an alias or synonym of the given @var{type}. + +@findex chain_next +@findex chain_prev +@findex chain_circular +@item chain_next ("@var{expression}") +@itemx chain_prev ("@var{expression}") +@itemx chain_circular ("@var{expression}") + +It's helpful for the type machinery to know if objects are often +chained together in long lists; this lets it generate code that uses +less stack space by iterating along the list instead of recursing down +it. @code{chain_next} is an expression for the next item in the list, +@code{chain_prev} is an expression for the previous item. For singly +linked lists, use only @code{chain_next}; for doubly linked lists, use +both. The machinery requires that taking the next item of the +previous item gives the original item. @code{chain_circular} is similar +to @code{chain_next}, but can be used for circular single linked lists. + +@findex reorder +@item reorder ("@var{function name}") + +Some data structures depend on the relative ordering of pointers. If +the precompiled header machinery needs to change that ordering, it +will call the function referenced by the @code{reorder} option, before +changing the pointers in the object that's pointed to by the field the +option applies to. The function must take four arguments, with the +signature @samp{@w{void *, void *, gt_pointer_operator, void *}}. +The first parameter is a pointer to the structure that contains the +object being updated, or the object itself if there is no containing +structure. The second parameter is a cookie that should be ignored. +The third parameter is a routine that, given a pointer, will update it +to its correct new value. The fourth parameter is a cookie that must +be passed to the second parameter. + +PCH cannot handle data structures that depend on the absolute values +of pointers. @code{reorder} functions can be expensive. When +possible, it is better to depend on properties of the data, like an ID +number or the hash of a string instead. + +@findex variable_size +@item variable_size + +The type machinery expects the types to be of constant size. When this +is not true, for example, with structs that have array fields or unions, +the type machinery cannot tell how many bytes need to be allocated at +each allocation. The @code{variable_size} is used to mark such types. +The type machinery then provides allocators that take a parameter +indicating an exact size of object being allocated. + +For example, +@smallexample +struct GTY((variable_size)) sorted_fields_type @{ + int len; + tree GTY((length ("%h.len"))) elts[1]; +@}; +@end smallexample + +Then the objects of @code{struct sorted_fields_type} are allocated in GC +memory as follows: +@smallexample + field_vec = ggc_alloc_sorted_fields_type (size); +@end smallexample + +@findex special +@item special ("@var{name}") + +The @code{special} option is used to mark types that have to be dealt +with by special case machinery. The parameter is the name of the +special case. See @file{gengtype.c} for further details. Avoid +adding new special cases unless there is no other alternative. +@end table + +@node GGC Roots +@section Marking Roots for the Garbage Collector +@cindex roots, marking +@cindex marking roots + +In addition to keeping track of types, the type machinery also locates +the global variables (@dfn{roots}) that the garbage collector starts +at. Roots must be declared using one of the following syntaxes: + +@itemize @bullet +@item +@code{extern GTY(([@var{options}])) @var{type} @var{name};} +@item +@code{static GTY(([@var{options}])) @var{type} @var{name};} +@end itemize +@noindent +The syntax +@itemize @bullet +@item +@code{GTY(([@var{options}])) @var{type} @var{name};} +@end itemize +@noindent +is @emph{not} accepted. There should be an @code{extern} declaration +of such a variable in a header somewhere---mark that, not the +definition. Or, if the variable is only used in one file, make it +@code{static}. + +@node Files +@section Source Files Containing Type Information +@cindex generated files +@cindex files, generated + +Whenever you add @code{GTY} markers to a source file that previously +had none, or create a new source file containing @code{GTY} markers, +there are three things you need to do: + +@enumerate +@item +You need to add the file to the list of source files the type +machinery scans. There are four cases: + +@enumerate a +@item +For a back-end file, this is usually done +automatically; if not, you should add it to @code{target_gtfiles} in +the appropriate port's entries in @file{config.gcc}. + +@item +For files shared by all front ends, add the filename to the +@code{GTFILES} variable in @file{Makefile.in}. + +@item +For files that are part of one front end, add the filename to the +@code{gtfiles} variable defined in the appropriate +@file{config-lang.in}. For C, the file is @file{c-config-lang.in}. +Headers should appear before non-headers in this list. + +@item +For files that are part of some but not all front ends, add the +filename to the @code{gtfiles} variable of @emph{all} the front ends +that use it. +@end enumerate + +@item +If the file was a header file, you'll need to check that it's included +in the right place to be visible to the generated files. For a back-end +header file, this should be done automatically. For a front-end header +file, it needs to be included by the same file that includes +@file{gtype-@var{lang}.h}. For other header files, it needs to be +included in @file{gtype-desc.c}, which is a generated file, so add it to +@code{ifiles} in @code{open_base_file} in @file{gengtype.c}. + +For source files that aren't header files, the machinery will generate a +header file that should be included in the source file you just changed. +The file will be called @file{gt-@var{path}.h} where @var{path} is the +pathname relative to the @file{gcc} directory with slashes replaced by +@verb{|-|}, so for example the header file to be included in +@file{cp/parser.c} is called @file{gt-cp-parser.c}. The +generated header file should be included after everything else in the +source file. Don't forget to mention this file as a dependency in the +@file{Makefile}! + +@end enumerate + +For language frontends, there is another file that needs to be included +somewhere. It will be called @file{gtype-@var{lang}.h}, where +@var{lang} is the name of the subdirectory the language is contained in. + +Plugins can add additional root tables. Run the @code{gengtype} +utility in plugin mode as @code{gengtype -P @var{gt-pluginout.h} -r +@var{gtype.state} @var{plugin*.c}} with your plugin files +@var{plugin*.c} using @code{GTY} to generate the @var{gt-pluginout.h} +file. The @var{gtype.state} is usually in the plugin directory. + + +@node Generating GGC code with gengtype +@section How to run the gengtype generator. +@cindex garbage collector, generation, gengtype +@findex gengtype + +The @code{gengtype} utility (which may be installed as +@code{gcc-gengtype} or some other name on your system) is a C code +generator program. It generates suitable marking and type-walking C +routines for GGC. Generated GC marking routines ensure that every +live data is ultimately marked, so that GGC can delete, at the end of +its @code{ggc_collect} function, every non-marked data (which is +unreachable and dead). Generated PCH type-walking routines are used +for ``compilation'' of header files @var{*.h} into a @var{*.gch} +file. The @code{gengtype} utility is used when building GCC itself +from its source tree, and also for plugins dealing with their own +GTY-ed data. + +The @code{gengtype} utility is run once specially when compiling GCC +itself. It parses many GCC source files (mostly internal GCC headers +declaring @code{GTY}-ed types and internal GCC files declaring +@code{GTY}-ed variables). It saves its state in a persistent textual +file @file{gtype.state}. This file has a format tied to a particular +GCC release and configuration. This @file{gtype.state} file should be +parsed only by @code{gengtype} (of the same version which has +generated it). Following GNU usage, @code{gengtype} accepts the usual +@code{--version} or @code{-V} and @code{--help} arguments. It also +accepts @code{--verbose} or @code{-v} to show what it is doing, and +@code{--debug} or @code{-D} for debugging purposes. It may dump every +internal data in human readable form with @code{--dump} or @code{-d}. +The @code{--backupdir} @var{directory} or @code{-B} @var{directory} +specifies a back-up directory used to store, as e.g. @file{gt-*.h~} +files, the previous version of generated C files. + +During GCC build process, a file @file{gtyp-input.list} is built and +contains a long list of files to be parsed. Then, @code{gengtype} is +run once to generate its state with @code{gengtype -S +@file{source-directory} -I @file{gtyp-input.list} -w +@file{gtype.state}}. Immediately after, it is re-run to read its just +generated state and generate @file{gt*.[ch]} files with @code{gengtype +-r @file{gtype.state}}. + +Plugin developers using @code{GTY} in their plugin source code should +run @code{gengtype -P @var{gt-pluginout.h} -r @file{gtype.state} +@var{plugin*.c}} + + +@node Invoking the garbage collector +@section How to invoke the garbage collector +@cindex garbage collector, invocation +@findex ggc_collect + +The GCC garbage collector GGC is only invoked explicitly. In contrast +with many other garbage collectors, it is not implicitly invoked by +allocation routines when a lot of memory has been consumed. So the +only way to have GGC reclaim storage it to call the @code{ggc_collect} +function explicitly. This call is an expensive operation, as it may +have to scan the entire heap. Beware that local variables (on the GCC +call stack) are not followed by such an invocation (as many other +garbage collectors do): you should reference all your data from static +or external @code{GTY}-ed variables, and it is advised to call +@code{ggc_collect} with a shallow call stack. The GGC is an exact mark +and sweep garbage collector (so it does not scan the call stack for +pointers). In practice GCC passes don't often call @code{ggc_collect} +themselves, because it is called by the pass manager between passes.