diff mbox series

[PATCHv2] Handle overlength string literals in the fortan FE

Message ID AM5PR0701MB2657FF3F97522D1476134A6DE4360@AM5PR0701MB2657.eurprd07.prod.outlook.com
State New
Headers show
Series [PATCHv2] Handle overlength string literals in the fortan FE | expand

Commit Message

Bernd Edlinger Aug. 24, 2018, 8:06 p.m. UTC
Hi!


This is an alternative approach to handle overlength strings in the Fortran FE.

The difference to the previous version is that overlength
STRING_CST never have a longer TREE_STRING_LENGTH than the TYPE_DOMAIN.
And those STRING_CSTs are thus no longer zero terminated.

And the requirement to have all sting constants internally zero-terminated
is dropped.


Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
Is it OK for trunk?


Thanks
Bernd.

Comments

Janne Blomqvist Sept. 3, 2018, 7:25 p.m. UTC | #1
On Fri, Aug 24, 2018 at 11:06 PM Bernd Edlinger <bernd.edlinger@hotmail.de>
wrote:

> Hi!
>
>
> This is an alternative approach to handle overlength strings in the
> Fortran FE.
>

Hi,

can you explain a little more what the problem that this patch tries to
solve is? What is an "overlength" string?
Bernd Edlinger Sept. 4, 2018, 7:05 a.m. UTC | #2
On 03/09/2018, 21:25 Janne Blomqvist wrote:
> On Fri, Aug 24, 2018 at 11:06 PM Bernd Edlinger <bernd.edlinger@hotmail.de>
> wrote:
>
>> Hi!
>>
>>
>> This is an alternative approach to handle overlength strings in the
>> Fortran FE.
>>
>
> Hi,
>
> can you explain a little more what the problem that this patch tries to
> solve is? What is an "overlength" string?

In the middle-end STRING_CST objects have a TYPE_DOMAIN
which specifies how much memory the string constant uses,
and what kind of characters the string constant consists of,
and a TREE_STRING_LENGTH which specifies how many
bytes the string value contains.

Everything is fine, if both sizes agree, or the memory size
is larger than the string length, in which case the string is simply
padded with zero bytes to the full length.

But things get unnecessarily complicated if the memory size
is smaller than the string length.

In this situation we have two different use cases of STRING_CST
which have contradicting rules:

For string literals and flexible arrays the memory size is ignored
and the TREE_STRING_LENGTH is used to specify both the
string length and the memory size.  Fortran does not use those.

For STRING_CST used in a CONSTRUCTOR of a string object
the TREE_STRING_LENGTH is ignored, and only the part of the
string value is used that fits into the memory size, the situation
is similar to excess precision floating point values.

Now it happens that the middle-end sees a STRING_CST with
overlength and wants to know if the string constant is properly
zero-terminated, and it is impossible to tell, since any nul byte
at the end of the string value might be part of the ignored excess
precision, but this depends on where the string constant actually
came from.

Therefore I started an effort to sanitize the STRING_CST via
an assertion in the varasm.c where most of the string constants
finally come along, and it triggered in two fortran test cases,
and a few other languages of course.

This is what this patch tries to fix.

Bernd.
Janne Blomqvist Sept. 5, 2018, 6:16 p.m. UTC | #3
On Tue, Sep 4, 2018 at 10:05 AM Bernd Edlinger <bernd.edlinger@hotmail.de>
wrote:

> On 03/09/2018, 21:25 Janne Blomqvist wrote:
> > On Fri, Aug 24, 2018 at 11:06 PM Bernd Edlinger <
> bernd.edlinger@hotmail.de>
> > wrote:
> >
> >> Hi!
> >>
> >>
> >> This is an alternative approach to handle overlength strings in the
> >> Fortran FE.
> >>
> >
> > Hi,
> >
> > can you explain a little more what the problem that this patch tries to
> > solve is? What is an "overlength" string?
>
> In the middle-end STRING_CST objects have a TYPE_DOMAIN
> which specifies how much memory the string constant uses,
> and what kind of characters the string constant consists of,
> and a TREE_STRING_LENGTH which specifies how many
> bytes the string value contains.
>
> Everything is fine, if both sizes agree, or the memory size
> is larger than the string length, in which case the string is simply
> padded with zero bytes to the full length.
>
> But things get unnecessarily complicated if the memory size
> is smaller than the string length.
>
> In this situation we have two different use cases of STRING_CST
> which have contradicting rules:
>
> For string literals and flexible arrays the memory size is ignored
> and the TREE_STRING_LENGTH is used to specify both the
> string length and the memory size.  Fortran does not use those.
>
> For STRING_CST used in a CONSTRUCTOR of a string object
> the TREE_STRING_LENGTH is ignored, and only the part of the
> string value is used that fits into the memory size, the situation
> is similar to excess precision floating point values.
>
> Now it happens that the middle-end sees a STRING_CST with
> overlength and wants to know if the string constant is properly
> zero-terminated, and it is impossible to tell, since any nul byte
> at the end of the string value might be part of the ignored excess
> precision, but this depends on where the string constant actually
> came from.
>
> Therefore I started an effort to sanitize the STRING_CST via
> an assertion in the varasm.c where most of the string constants
> finally come along, and it triggered in two fortran test cases,
> and a few other languages of course.
>
> This is what this patch tries to fix.
>
> Bernd.
>

I guess, I'm slightly confused why this mismatch happens in the first place
(does the Fortran frontend do something dumb wrt string declarations, or?),
but, Ok for trunk.
Bernd Edlinger Sept. 6, 2018, 11:29 a.m. UTC | #4
On 09/05/18 20:16, Janne Blomqvist wrote:
> On Tue, Sep 4, 2018 at 10:05 AM Bernd Edlinger <bernd.edlinger@hotmail.de <mailto:bernd.edlinger@hotmail.de>> wrote:
> 
>     On 03/09/2018, 21:25 Janne Blomqvist wrote:
>      > On Fri, Aug 24, 2018 at 11:06 PM Bernd Edlinger <bernd.edlinger@hotmail.de <mailto:bernd.edlinger@hotmail.de>>
>      > wrote:
>      >
>      >> Hi!
>      >>
>      >>
>      >> This is an alternative approach to handle overlength strings in the
>      >> Fortran FE.
>      >>
>      >
>      > Hi,
>      >
>      > can you explain a little more what the problem that this patch tries to
>      > solve is? What is an "overlength" string?
> 
>     In the middle-end STRING_CST objects have a TYPE_DOMAIN
>     which specifies how much memory the string constant uses,
>     and what kind of characters the string constant consists of,
>     and a TREE_STRING_LENGTH which specifies how many
>     bytes the string value contains.
> 
>     Everything is fine, if both sizes agree, or the memory size
>     is larger than the string length, in which case the string is simply
>     padded with zero bytes to the full length.
> 
>     But things get unnecessarily complicated if the memory size
>     is smaller than the string length.
> 
>     In this situation we have two different use cases of STRING_CST
>     which have contradicting rules:
> 
>     For string literals and flexible arrays the memory size is ignored
>     and the TREE_STRING_LENGTH is used to specify both the
>     string length and the memory size.  Fortran does not use those.
> 
>     For STRING_CST used in a CONSTRUCTOR of a string object
>     the TREE_STRING_LENGTH is ignored, and only the part of the
>     string value is used that fits into the memory size, the situation
>     is similar to excess precision floating point values.
> 
>     Now it happens that the middle-end sees a STRING_CST with
>     overlength and wants to know if the string constant is properly
>     zero-terminated, and it is impossible to tell, since any nul byte
>     at the end of the string value might be part of the ignored excess
>     precision, but this depends on where the string constant actually
>     came from.
> 
>     Therefore I started an effort to sanitize the STRING_CST via
>     an assertion in the varasm.c where most of the string constants
>     finally come along, and it triggered in two fortran test cases,
>     and a few other languages of course.
> 
>     This is what this patch tries to fix.
> 
>     Bernd.
> 
> 
> I guess, I'm slightly confused why this mismatch happens in the first place (does the Fortran frontend do something dumb wrt string declarations, or?), but, Ok for trunk.
> 
> 

This is something that happens only on the test case that is mentioned in the comment.
If I remember correctly the string constant is 3 characters long, as well as the
type info on the STRING_CST itself, but the type of the object has only 2 byte
space for the string.  Therefore make the string shorter, and use the original type from
the declaration.

I am going to apply this together with the rest of the STRING_CST semantic patches,
once those are approved.


Thanks
Bernd.
diff mbox series

Patch

2018-08-01  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* trans-array.c (gfc_conv_array_initializer): Remove excess precision
	from overlength string initializers.

Index: gcc/fortran/trans-array.c
===================================================================
--- gcc/fortran/trans-array.c	(revision 263807)
+++ gcc/fortran/trans-array.c	(working copy)
@@ -5964,6 +5964,26 @@  gfc_conv_array_initializer (tree type, gfc_expr *
 	    {
 	    case EXPR_CONSTANT:
 	      gfc_conv_constant (&se, c->expr);
+
+	      /* See gfortran.dg/charlen_15.f90 for instance.  */
+	      if (TREE_CODE (se.expr) == STRING_CST
+		  && TREE_CODE (type) == ARRAY_TYPE)
+		{
+		  tree atype = type;
+		  while (TREE_CODE (TREE_TYPE (atype)) == ARRAY_TYPE)
+		    atype = TREE_TYPE (atype);
+		  if (TREE_CODE (TREE_TYPE (atype)) == INTEGER_TYPE
+		      && tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (se.expr)))
+			 > tree_to_uhwi (TYPE_SIZE_UNIT (atype)))
+		    {
+		      unsigned HOST_WIDE_INT size
+			= tree_to_uhwi (TYPE_SIZE_UNIT (atype));
+		      const char *p = TREE_STRING_POINTER (se.expr);
+
+		      se.expr = build_string (size, p);
+		      TREE_TYPE (se.expr) = atype;
+		    }
+		}
 	      break;
 
 	    case EXPR_STRUCTURE: