diff mbox

[fortran] PR58007: unresolved fixup hell

Message ID 52C55062.3000809@sfr.fr
State New
Headers show

Commit Message

Mikael Morin Jan. 2, 2014, 11:41 a.m. UTC
Hello,

this patch fixes PR58007, where the compiler was not able to relate a
component pointer to any loaded derived type symbol.
The problem came from an optimization avoiding loading again a symbol
which had already been loaded, skipping by the way the association of
component pointers (if the symbol was a derived type) with the
corresponding module indexes.

The attached patch fixes this by reading the derived type symbol dump's
component list and do the pointer<->integer association by hand.  Then
the regular on demand module loading code works properly and some code
in mio_component_ref that was forcing the module loading of the
containing derived type can be removed.
To associate the component pointers with the module integers, one has to
parse the symbol, or at least its components.  It is done by hand in
this patch and to reduce the maintainance burden (for possible future
evolutions of symbol dumping format/content) the component list is moved
at the beginning.  More exactly just after the symbol attributes,
because the check_for_ambiguous function relies on the symbol attributes
appearing first in a symbol.
This changes the module format, so MOD_VERSION is  bumped.

Regression tested on x86_64-unknown-linux-gnu. OK for trunk?
I plan to submit a variant that doesn't change the module format for the
branches (doing more parsing by hand).

Mikael
2014-01-02  Mikael Morin  <mikael@gcc.gnu.org>

	PR fortran/58007
	* module.c (MOD_VERSION): Bump.
	(fp2, find_pointer2): Remove.
	(mio_component_ref): Don't forcedfully set the containing derived type
	symbol for loading.
	(mio_symbol): Dump components earlier.
	(skip_list): New argument nest_level.  Initialize level with the new
	argument.
	(read_module): Add forced pointer components association for derived
	type symbols.
	
2014-01-02  Mikael Morin  <mikael@gcc.gnu.org>

	PR fortran/58007
	* gfortran.dg/unresolved_fixup_1.f90

Comments

Janus Weil Jan. 4, 2014, 3:35 p.m. UTC | #1
Hi Mikael,

> this patch fixes PR58007, where the compiler was not able to relate a
> component pointer to any loaded derived type symbol.
> The problem came from an optimization avoiding loading again a symbol
> which had already been loaded, skipping by the way the association of
> component pointers (if the symbol was a derived type) with the
> corresponding module indexes.
>
> The attached patch fixes this by reading the derived type symbol dump's
> component list and do the pointer<->integer association by hand.  Then
> the regular on demand module loading code works properly and some code
> in mio_component_ref that was forcing the module loading of the
> containing derived type can be removed.
> To associate the component pointers with the module integers, one has to
> parse the symbol, or at least its components.  It is done by hand in
> this patch and to reduce the maintainance burden (for possible future
> evolutions of symbol dumping format/content) the component list is moved
> at the beginning.  More exactly just after the symbol attributes,
> because the check_for_ambiguous function relies on the symbol attributes
> appearing first in a symbol.
> This changes the module format, so MOD_VERSION is  bumped.
>
> Regression tested on x86_64-unknown-linux-gnu. OK for trunk?

Basically your patch looks ok to me.

However, I don't quite see the necessity for changing the module
format (apart from the fact that it makes your patch slightly
simpler). Anyway, since the module format has been changed already in
4.9, I think it doesn't matter much if we do it once more, so okay
with me.

One question, though: I don't understand how  the problem is specific
to OOP code (and why it did not come up before with F95-style code).
Do you think it would be possible to find a non-OOP case where it
fails? If not, could there be some problem in gfortran's OOP
implementation which causes the failure?


> I plan to submit a variant that doesn't change the module format for the
> branches (doing more parsing by hand).

Isn't that what you posted in comment 12 of the PR?

Which branches did you have in mind? The PR contains various test
cases which fail on either 4.7 or 4.8 and sometimes both, so
potentially both of them are affected by the bug, I guess?

Cheers,
Janus
Mikael Morin Jan. 11, 2014, 12:18 p.m. UTC | #2
Le 04/01/2014 16:35, Janus Weil a écrit :
> Hi Mikael,
> 
>> this patch fixes PR58007, where the compiler was not able to relate a
>> component pointer to any loaded derived type symbol.
>> The problem came from an optimization avoiding loading again a symbol
>> which had already been loaded, skipping by the way the association of
>> component pointers (if the symbol was a derived type) with the
>> corresponding module indexes.
>>
>> The attached patch fixes this by reading the derived type symbol dump's
>> component list and do the pointer<->integer association by hand.  Then
>> the regular on demand module loading code works properly and some code
>> in mio_component_ref that was forcing the module loading of the
>> containing derived type can be removed.
>> To associate the component pointers with the module integers, one has to
>> parse the symbol, or at least its components.  It is done by hand in
>> this patch and to reduce the maintainance burden (for possible future
>> evolutions of symbol dumping format/content) the component list is moved
>> at the beginning.  More exactly just after the symbol attributes,
>> because the check_for_ambiguous function relies on the symbol attributes
>> appearing first in a symbol.
>> This changes the module format, so MOD_VERSION is  bumped.
>>
>> Regression tested on x86_64-unknown-linux-gnu. OK for trunk?
> 
> Basically your patch looks ok to me.
> 
> However, I don't quite see the necessity for changing the module
> format (apart from the fact that it makes your patch slightly
> simpler). Anyway, since the module format has been changed already in
> 4.9, I think it doesn't matter much if we do it once more, so okay
> with me.
Well, I'm afraid of a future evolution where we change the module
format, update mio_symbol, and forget to change read_module as well, so
that the code reads something thinking it is the component list, but it
has become something else.
It is possible as empty parentheses `()' are common in the module
format, and it matches an empty component list.  Moving the component
list earlier only makes it less likely.  Another possibility to prevent
this is adding a recognizable pattern like "@# component-list #@" or
something, but the price to pay is the same: changing the module format.

> 
> One question, though: I don't understand how  the problem is specific
> to OOP code (and why it did not come up before with F95-style code).
> Do you think it would be possible to find a non-OOP case where it
> fails? If not, could there be some problem in gfortran's OOP
> implementation which causes the failure?
> 
I attach here a non-OOP variant, as well as the script that generated
it.  You can see that i'm no good at writing shell scripts. ;-)
Anyway the principle is:  the bugs happens when the symbols are read in
one particular (failing) order, so this script adds useless variables to
the modules to saturate the pointer tree and trigger a tree rebalance,
until it fails.
For those who cannot afford updating their compilers, a variant of this
script might be used to have this bug _not_ happen on their codes.  This
is left as an exercise; not sure how difficult it is. ;-)

> 
>> I plan to submit a variant that doesn't change the module format for the
>> branches (doing more parsing by hand).
> 
> Isn't that what you posted in comment 12 of the PR?
> 
It is, yes.

> Which branches did you have in mind? The PR contains various test
> cases which fail on either 4.7 or 4.8 and sometimes both, so
> potentially both of them are affected by the bug, I guess?
> 
Oops, I thought it was a 4.7/4.8 regression.
Actually, the testcase provided here also fails with 4.8, so I was
actually right. :-)

Mikael
Janus Weil Jan. 11, 2014, 9:48 p.m. UTC | #3
Hi,

>>> this patch fixes PR58007, where the compiler was not able to relate a
>>> component pointer to any loaded derived type symbol.
>>> The problem came from an optimization avoiding loading again a symbol
>>> which had already been loaded, skipping by the way the association of
>>> component pointers (if the symbol was a derived type) with the
>>> corresponding module indexes.
>>>
>>> The attached patch fixes this by reading the derived type symbol dump's
>>> component list and do the pointer<->integer association by hand.  Then
>>> the regular on demand module loading code works properly and some code
>>> in mio_component_ref that was forcing the module loading of the
>>> containing derived type can be removed.
>>> To associate the component pointers with the module integers, one has to
>>> parse the symbol, or at least its components.  It is done by hand in
>>> this patch and to reduce the maintainance burden (for possible future
>>> evolutions of symbol dumping format/content) the component list is moved
>>> at the beginning.  More exactly just after the symbol attributes,
>>> because the check_for_ambiguous function relies on the symbol attributes
>>> appearing first in a symbol.
>>> This changes the module format, so MOD_VERSION is  bumped.
>>>
>>> Regression tested on x86_64-unknown-linux-gnu. OK for trunk?
>>
>> Basically your patch looks ok to me.
>>
>> However, I don't quite see the necessity for changing the module
>> format (apart from the fact that it makes your patch slightly
>> simpler). Anyway, since the module format has been changed already in
>> 4.9, I think it doesn't matter much if we do it once more, so okay
>> with me.
> Well, I'm afraid of a future evolution where we change the module
> format, update mio_symbol, and forget to change read_module as well, so
> that the code reads something thinking it is the component list, but it
> has become something else.
> It is possible as empty parentheses `()' are common in the module
> format, and it matches an empty component list.  Moving the component
> list earlier only makes it less likely.  Another possibility to prevent
> this is adding a recognizable pattern like "@# component-list #@" or
> something, but the price to pay is the same: changing the module format.
>
>> One question, though: I don't understand how  the problem is specific
>> to OOP code (and why it did not come up before with F95-style code).
>> Do you think it would be possible to find a non-OOP case where it
>> fails? If not, could there be some problem in gfortran's OOP
>> implementation which causes the failure?
>>
> I attach here a non-OOP variant, as well as the script that generated
> it.  You can see that i'm no good at writing shell scripts. ;-)
> Anyway the principle is:  the bugs happens when the symbols are read in
> one particular (failing) order, so this script adds useless variables to
> the modules to saturate the pointer tree and trigger a tree rebalance,
> until it fails.

Good, thanks for checking. As written before, the patch is ok for
trunk from my side.


>> Which branches did you have in mind? The PR contains various test
>> cases which fail on either 4.7 or 4.8 and sometimes both, so
>> potentially both of them are affected by the bug, I guess?
>>
> Oops, I thought it was a 4.7/4.8 regression.
> Actually, the testcase provided here also fails with 4.8, so I was
> actually right. :-)

In fact your test case fails with all versions I tried (4.4, 4.6, 4.7,
4.8 and trunk). So, is it a regression at all?

Cheers,
Janus
diff mbox

Patch

diff --git a/gcc/fortran/module.c b/gcc/fortran/module.c
index 98e22df..20bf1ba 100644
--- a/gcc/fortran/module.c
+++ b/gcc/fortran/module.c
@@ -82,7 +82,7 @@  along with GCC; see the file COPYING3.  If not see
 
 /* Don't put any single quote (') in MOD_VERSION, if you want it to be
    recognized.  */
-#define MOD_VERSION "11"
+#define MOD_VERSION "12"
 
 
 /* Structure that describes a position within a module file.  */
@@ -390,37 +390,6 @@  get_integer (int integer)
 }
 
 
-/* Recursive function to find a pointer within a tree by brute force.  */
-
-static pointer_info *
-fp2 (pointer_info *p, const void *target)
-{
-  pointer_info *q;
-
-  if (p == NULL)
-    return NULL;
-
-  if (p->u.pointer == target)
-    return p;
-
-  q = fp2 (p->left, target);
-  if (q != NULL)
-    return q;
-
-  return fp2 (p->right, target);
-}
-
-
-/* During reading, find a pointer_info node from the pointer value.
-   This amounts to a brute-force search.  */
-
-static pointer_info *
-find_pointer2 (void *p)
-{
-  return fp2 (pi_root, p);
-}
-
-
 /* Resolve any fixups using a known pointer.  */
 
 static void
@@ -2588,45 +2557,13 @@  mio_pointer_ref (void *gp)
    the namespace and is not loaded again.  */
 
 static void
-mio_component_ref (gfc_component **cp, gfc_symbol *sym)
+mio_component_ref (gfc_component **cp)
 {
-  char name[GFC_MAX_SYMBOL_LEN + 1];
-  gfc_component *q;
   pointer_info *p;
 
   p = mio_pointer_ref (cp);
   if (p->type == P_UNKNOWN)
     p->type = P_COMPONENT;
-
-  if (iomode == IO_OUTPUT)
-    mio_pool_string (&(*cp)->name);
-  else
-    {
-      mio_internal_string (name);
-
-      if (sym && sym->attr.is_class)
-	sym = sym->components->ts.u.derived;
-
-      /* It can happen that a component reference can be read before the
-	 associated derived type symbol has been loaded. Return now and
-	 wait for a later iteration of load_needed.  */
-      if (sym == NULL)
-	return;
-
-      if (sym->components != NULL && p->u.pointer == NULL)
-	{
-	  /* Symbol already loaded, so search by name.  */
-	  q = gfc_find_component (sym, name, true, true);
-
-	  if (q)
-	    associate_integer_pointer (p, q);
-	}
-
-      /* Make sure this symbol will eventually be loaded.  */
-      p = find_pointer2 (sym);
-      if (p->u.rsym.state == UNUSED)
-	p->u.rsym.state = NEEDED;
-    }
 }
 
 
@@ -2983,7 +2920,7 @@  mio_ref (gfc_ref **rp)
 
     case REF_COMPONENT:
       mio_symbol_ref (&r->u.c.sym);
-      mio_component_ref (&r->u.c.component, r->u.c.sym);
+      mio_component_ref (&r->u.c.component);
       break;
 
     case REF_SUBSTRING:
@@ -3865,6 +3802,14 @@  mio_symbol (gfc_symbol *sym)
   mio_lparen ();
 
   mio_symbol_attribute (&sym->attr);
+
+  /* Note that components are always saved, even if they are supposed
+     to be private.  Component access is checked during searching.  */
+  mio_component_list (&sym->components, sym->attr.vtype);
+  if (sym->components != NULL)
+    sym->component_access
+      = MIO_NAME (gfc_access) (sym->component_access, access_types);
+
   mio_typespec (&sym->ts);
   if (sym->ts.type == BT_CLASS)
     sym->attr.class_ok = 1;
@@ -3893,15 +3838,6 @@  mio_symbol (gfc_symbol *sym)
   if (sym->attr.cray_pointee)
     mio_symbol_ref (&sym->cp_pointer);
 
-  /* Note that components are always saved, even if they are supposed
-     to be private.  Component access is checked during searching.  */
-
-  mio_component_list (&sym->components, sym->attr.vtype);
-
-  if (sym->components != NULL)
-    sym->component_access
-      = MIO_NAME (gfc_access) (sym->component_access, access_types);
-
   /* Load/save the f2k_derived namespace of a derived-type symbol.  */
   mio_full_f2k_derived (sym);
 
@@ -4000,11 +3936,11 @@  find_symbol (gfc_symtree *st, const char *name,
 /* Skip a list between balanced left and right parens.  */
 
 static void
-skip_list (void)
+skip_list (int nest_level = 0)
 {
   int level;
 
-  level = 0;
+  level = nest_level;
   do
     {
       switch (parse_atom ())
@@ -4638,7 +4574,6 @@  read_module (void)
       info->u.rsym.ns = atom_int;
 
       get_module_locus (&info->u.rsym.where);
-      skip_list ();
 
       /* See if the symbol has already been loaded by a previous module.
 	 If so, we reference the existing symbol and prevent it from
@@ -4649,10 +4584,44 @@  read_module (void)
 
       if (sym == NULL
 	  || (sym->attr.flavor == FL_VARIABLE && info->u.rsym.ns !=1))
-	continue;
+	{
+	  skip_list ();
+	  continue;
+	}
 
       info->u.rsym.state = USED;
       info->u.rsym.sym = sym;
+      /* The current symbol has already been loaded, so we can avoid loading
+	 it again.  However, if it is a derived type, some of its components
+	 can be used in expressions in the module.  To avoid the module loading
+	 failing, we need to associate the module's component pointer indexes
+	 with the existing symbol's component pointers.  */
+      if (sym->attr.flavor == FL_DERIVED)
+	{
+	  gfc_component *c;
+
+	  mio_lparen (); /* symbol opening.  */
+	  skip_list (); /* skip symbol attribute.  */
+
+	  mio_lparen (); /* component list opening.  */
+	  for (c = sym->components; c; c = c->next)
+	    {
+	      pointer_info *p;
+	      int n;
+
+	      mio_lparen (); /* component opening.  */
+	      mio_integer (&n);
+	      p = get_integer (n);
+	      if (p->u.pointer == NULL)
+		associate_integer_pointer (p, c);
+	      skip_list (1); /* component end.  */
+	    }
+	  mio_rparen (); /* component list closing.  */
+
+	  skip_list (1); /* symbol end.  */
+	}
+      else
+	skip_list ();
 
       /* Some symbols do not have a namespace (eg. formal arguments),
 	 so the automatic "unique symtree" mechanism must be suppressed