Patchwork Patch: New GTY ((atomic)) option

login
register
mail settings
Submitter Nicola Pero
Date May 16, 2011, 12:13 a.m.
Message ID <1305504836.57384542@www2.webmail.us>
Download mbox | patch
Permalink /patch/95670/
State New
Headers show

Comments

Nicola Pero - May 16, 2011, 12:13 a.m.
This patch adds a new GTY option, "atomic", which is similar to the identical option you have with Boehm GC
and which can be used with pointers to inform the GC/PCH machinery that they point to an area of memory that
contains no pointers (and hence needs no scanning).

The reason for adding this option is that, without it, it seems to be (surprisingly) impossible
to write code that keeps a GC pointer to a plain array of C stuff such as integers.  In my case,
I was experimenting with hash tables that can automatically cache hash values.  So I needed a plain
C array to store the cached hash values, but found that it is currently unsupported by GC/PCH! :-(

That is, at the moment you can't have a struct such as the following one --

struct GTY(()) my_struct {
  ...
  unsigned int * some_ints;
  size_t count;
  ...
};

because gengtype rejects it with the error "field `(*x).some_ints' is pointer to unimplemented type".

This patch basically implements it, but at this stage requires you to explicitly tell gengtype that the
pointer is atomic (and that is safe for gengtype to ignore the memory it points to).  So, the following
now works as expected --

struct GTY(()) my_struct {
  ...
  unsigned int * GTY((atomic)) some_ints;
  size_t count;
  ...
};

A next, nice step would be to have gengtype automatically mark as "atomic" any pointers that gengtype can safely determine
point to an area of memory that never contains any pointers.  But that's slightly more complicated (eg, currently
gengtype makes no difference between "unsigned int" and "void", hence "unsigned int *" and "void *" would be treated
the same, while you'd want the first one to be automatically marked as atomic, and the second one to generate an error
as gengtype has no way to determine if it's atomic or not - unless it's explicitly marked as atomic of course), so for now
I haven't implemented it; it could be a follow-up patch (even after implementing it, the explicit "atomic" option
would remain useful for "void *" pointers and such like, so it's a good starting point).

Btw, there are a few existing pointers in GCC that could be marked as atomic, for example the field "su" of struct
function in function.h.  The advantage of marking them as atomic would be a slight speedup of the GC marking by saving
a function call each time one of these structs is being walked; I suspect that alone wouldn't make any visibile difference
in practice, but I haven't done any profiling or benchmarking to know for sure.

I have done some testing of this patch, and I want to do some more before I commit.  If anyone has good ideas on how
to perform throughout testing, they are welcome. :-)

Ok to commit ?

Thanks

PS: This patch does not include support for marking root/global variables with "atomic" (neither manually nor automatically);
only fields in a struct.  That would be useful too, but I'm leaving it for yet another patch.

2011-05-16  Nicola Pero  <nicola.pero@meta-innovation.com>

        * gengtype.c (walk_type): Implemented "atomic" GTY option.
        * doc/gty.texi (GTY Options): Document "atomic" GTY option.
Gabriel Dos Reis - May 16, 2011, 12:49 a.m.
On Sun, May 15, 2011 at 7:13 PM, Nicola Pero
<nicola.pero@meta-innovation.com> wrote:
> This patch adds a new GTY option, "atomic", which is similar to the identical option you have with Boehm GC
> and which can be used with pointers to inform the GC/PCH machinery that they point to an area of memory that
[...]
> This patch basically implements it, but at this stage requires you to explicitly tell gengtype that the
> pointer is atomic (and that is safe for gengtype to ignore the memory it points to).

then should you not name the attribute "ignore"?

-- Gaby
Nathan Froyd - May 16, 2011, 12:52 a.m.
On 05/15/2011 08:49 PM, Gabriel Dos Reis wrote:
> On Sun, May 15, 2011 at 7:13 PM, Nicola Pero
> <nicola.pero@meta-innovation.com> wrote:
>> This patch adds a new GTY option, "atomic", which is similar to the identical option you have with Boehm GC
>> and which can be used with pointers to inform the GC/PCH machinery that they point to an area of memory that
> [...]
>> This patch basically implements it, but at this stage requires you to explicitly tell gengtype that the
>> pointer is atomic (and that is safe for gengtype to ignore the memory it points to).
> 
> then should you not name the attribute "ignore"?

Or even the existing attribute "skip"?

-Nathan
Gabriel Dos Reis - May 16, 2011, 1:51 a.m.
On Sun, May 15, 2011 at 7:52 PM, Nathan Froyd <froydnj@codesourcery.com> wrote:
> On 05/15/2011 08:49 PM, Gabriel Dos Reis wrote:
>> On Sun, May 15, 2011 at 7:13 PM, Nicola Pero
>> <nicola.pero@meta-innovation.com> wrote:
>>> This patch adds a new GTY option, "atomic", which is similar to the identical option you have with Boehm GC
>>> and which can be used with pointers to inform the GC/PCH machinery that they point to an area of memory that
>> [...]
>>> This patch basically implements it, but at this stage requires you to explicitly tell gengtype that the
>>> pointer is atomic (and that is safe for gengtype to ignore the memory it points to).
>>
>> then should you not name the attribute "ignore"?
>
> Or even the existing attribute "skip"?

better, indeed. :-)

-- Gaby
Nicola Pero - May 16, 2011, 8:47 a.m.
>>> This patch adds a new GTY option, "atomic", which is similar to the identical option you have with Boehm GC
>>> and which can be used with pointers to inform the GC/PCH machinery that they point to an area of memory that
>> [...]
>>> This patch basically implements it, but at this stage requires you to explicitly tell gengtype that the
>>> pointer is atomic (and that is safe for gengtype to ignore the memory it points to).
>>
>> then should you not name the attribute "ignore"?
>
> Or even the existing attribute "skip"?

"skip" is different.  With "skip", the pointer is completely ignored by GC/PCH.  In this case, we don't want
to skip the pointer; we want the pointer itself to be managed by GC/PCH, but the memory *it points to* to not
be scanned for pointers. ;-)

In the example I gave,

struct GTY(()) my_struct {
 ...
 unsigned int * GTY((atomic)) some_ints;
 size_t count;
 ...
};

you'd allocate "some_ints" using, for example, ggc_alloc_atomic_stat(), which already exists in the GC (even
if at the moment there seems to be no particular support for atomic allocation other than the name of that
function, which allocates memory in the same way as for non-atomic allocation), and which would put the pointer
under control of the GC; so it is freed when the GC decides that it is no longer referenced; you don't free
it manually.  That is different from "skip", which would make then pointer simply invisible to GC (you'd allocate
it using malloc()), and you'd have to free it manually (or to never free it).

In practice, when the GC is doing its marking pass, and is marking a structure of type "my_struct", if the
"some_ints" pointer has the option "skip", the GC would not mark it at all; it's ignored.  The option "atomic"
would cause the GC to mark the pointer but ignore what it points to.  The default behaviour is yet different;
it is to examine the memory it points to, mark any pointers in there, and then mark the pointer itself
too.  But because gengtype does not know, at the moment, how to examine unsigned ints (you don't examine them,
because the pointer is atomic!), it will print a error saying that the pointer type is uninmplemented, and abort
(a further step, after introducing the "atomic" option, would be to have the GC automatically mark such pointers
as atomic, as explained in the original post).

To clarify, I didn't invent the word "atomic" - AFAIK it is the standard GC naming convention for memory that contains
no pointers.  It's the name used for this in Boehm GC (the most popular C/C++ GC), where the function is called
GC_MALLOC_ATOMIC(), and it is also the name for it in the GCC GC, presumably, since there already is a function
"ggc_alloc_atomic_stat()" which presumably is meant to allocate atomic memory (hard to say in the absence of
documentation and given that the implementation is identical to the other memory allocation at the moment, but it's
a safe guess).

What is the problem with "atomic" ?  I guess you find it confusing because it makes you think of "atomic access"
to memory ?  You are right that there is that potential for confusion. :-(

We could rename it, but then we'd want to rename the GCC ggc_alloc_atomic_stat() function too, and I'm not entirely
sure it would make anything clearer ... as "atomic" is the standard word for that.  I think the best we can do is
provide good documentation.

So, I guess what I take from your comments is that I should update the documentation in my patch to include
a short discussion of how "atomic" differs from "skip", since it doesn't seem to be that obvious for people. :-)

But please let me know if I'm missing something.

Thanks
Laurynas Biveinis - May 16, 2011, 9:59 a.m.
2011/5/16 Nicola Pero <nicola.pero@meta-innovation.com>:
> 2011-05-16  Nicola Pero  <nicola.pero@meta-innovation.com>
>
>        * gengtype.c (walk_type): Implemented "atomic" GTY option.
>        * doc/gty.texi (GTY Options): Document "atomic" GTY option.

The patch is OK, with difference between "skip" and "atomic" options
documented. (Can be done as a follow-up patch).

Thanks,

Patch

Index: doc/gty.texi
===================================================================
--- doc/gty.texi        (revision 173768)
+++ doc/gty.texi        (working copy)
@@ -383,6 +383,42 @@  could be calculated as follows:
   size_t size = sizeof (struct sorted_fields_type) + n * sizeof (tree);
 @end smallexample
 
+@findex atomic
+@item atomic
+
+The @code{atomic} option can only be used with pointers.  It informs
+the GC machinery that the memory that the pointer points to does not
+contain any pointers, and hence it should be treated by the GC and PCH
+machinery as an ``atomic'' block of memory that does not need to be
+examined.  In particular, the machinery will not scan that memory for
+pointers to mark them as reachable (when marking pointers for GC) or
+to relocate them (when writing a PCH file).
+
+The @code{atomic} option must be used with great care, because all
+sorts of problem can occur if used incorrectly, that is, if the memory
+the pointer points to does actually contain a pointer.
+
+Here is an example of how to use it:
+@smallexample
+struct GTY(()) my_struct @{
+  int number_of_elements;
+  unsigned int GTY ((atomic)) * elements;
+@};
+@end smallexample
+In this case, @code{elements} is a pointer under GC, and the memory it
+points to needs to be allocated using the Garbage Collector, and will
+be freed automatically by the Garbage Collector when it is no longer
+referenced.  But the memory that the pointer points to is an array of
+@code{unsigned int} elements, and the GC does not need, and indeed
+must not, try to scan it to find pointers to mark or relocate, which
+is why it is marked with the @code{atomic} option.
+
+Note that, currently, global variables can not be marked with
+@code{atomic}; only fields of a struct can.  This is a known
+limitation.  It would be useful to be able to mark global pointers
+with @code{atomic} to make the PCH machinery aware of them so that
+they are saved and restored correctly to PCH files.
+
 @findex special
 @item special ("@var{name}")
 

Index: gengtype.c
===================================================================
--- gengtype.c  (revision 173768)
+++ gengtype.c  (working copy)
@@ -2386,6 +2386,7 @@  walk_type (type_p t, struct walk_type_data *d)
   int maybe_undef_p = 0;
   int use_param_num = -1;
   int use_params_p = 0;
+  int atomic_p = 0;
   options_p oo;
   const struct nested_ptr_data *nested_ptr_d = NULL;
 
@@ -2415,6 +2416,8 @@  walk_type (type_p t, struct walk_type_data *d)
       ;
     else if (strcmp (oo->name, "skip") == 0)
       ;
+    else if (strcmp (oo->name, "atomic") == 0)
+      atomic_p = 1;
     else if (strcmp (oo->name, "default") == 0)
       ;
     else if (strcmp (oo->name, "param_is") == 0)
@@ -2480,6 +2483,12 @@  walk_type (type_p t, struct walk_type_data *d)
       return;
     }
 
+  if (atomic_p && (t->kind != TYPE_POINTER))
+    {
+      error_at_line (d->line, "field `%s' has invalid option `atomic'\n", d->val);
+      return;
+    }
+
   switch (t->kind)
     {
     case TYPE_SCALAR:
@@ -2495,6 +2504,25 @@  walk_type (type_p t, struct walk_type_data *d)
            break;
          }
 
+       /* If a pointer type is marked as "atomic", we process the
+          field itself, but we don't walk the data that they point to.
+          
+          There are two main cases where we walk types: to mark
+          pointers that are reachable, and to relocate pointers when
+          writing a PCH file.  In both cases, an atomic pointer is
+          itself marked or relocated, but the memory that it points
+          to is left untouched.  In the case of PCH, that memory will
+          be read/written unchanged to the PCH file.  */
+       if (atomic_p)
+         {
+           oprintf (d->of, "%*sif (%s != NULL) {\n", d->indent, "", d->val);
+           d->indent += 2;
+           d->process_field (t, d);
+           d->indent -= 2;
+           oprintf (d->of, "%*s}\n", d->indent, "");
+           break;
+         }
+
        if (!length)
          {
            if (!UNION_OR_STRUCT_P (t->u.p)