diff mbox

[fortran,RFC] PR28105 Overflow check for ALLOCATE statement

Message ID AANLkTimWJMty0jEWVc8d3wZvmfw=aXxXY3tFCNa7tEaz@mail.gmail.com
State New
Headers show

Commit Message

Janne Blomqvist Nov. 22, 2010, 10:37 p.m. UTC
Hi,

the attached patch implements overflow checking for the ALLOCATE
statement. It also fixes the minor regression introduced a few days
ago when the old "< 0" overflow check was removed as a side-effect of
fixing the unsignedness of size_t.

Wrt whether these checks should be done always, or only when
-fcheck=mem is enabled, IMHO some benchmarks should be run and if no
difference can be seen they can be left enabled. So, does anybody know
of suitable benchmarks that are influenced by ALLOCATE performance?
Polyhedron, maybe not so much?

Regtested on x86_64-unknown-linux-gnu, and provided there is agreement
to always have it enabled, Ok for trunk?

2010-11-23  Janne Blomqvist  <jb@gcc.gnu.org>

	PR fortran/28105
	* trans-array.c (gfc_array_index_size): Check whether the size
	overflows.
	(gfc_array_allocate): Check whether size overflows and generate
	error.

Comments

Tobias Burnus Nov. 23, 2010, 3:50 p.m. UTC | #1
On 11/22/2010 11:37 PM, Janne Blomqvist wrote:
> the attached patch implements overflow checking for the ALLOCATE
> statement. It also fixes the minor regression introduced a few days
> ago when the old "<  0" overflow check was removed as a side-effect of
> fixing the unsignedness of size_t.

Frankly, I do not like the way the overflow check is done - namely by 
requiring an overflow:

         D.1506 = MAX_EXPR <my_size, 0>;
         D.1507 = D.1506 != 0 ? 2147483647 / D.1506 <= 0 ? 1 : 0 : 0;
         D.1508 = NON_LVALUE_EXPR <D.1506>;
         D.1509 = ((<unnamed-unsigned:32>) D.1508 > -1 ? 1 : 0) + D.1507;
         overflow.1 = D.1509;
         if (overflow.1 != 0)
             _gfortran_runtime_error (&"Integer overflow when 
calculating the amount of memory to allocate"[1]{lb: 1 sz: 1});

Namely, doing the "MAXSIZE/my_size <= 0" check where my_size is the 
signed requested size of the array. Explicitly asking for an overflow 
seems to be wrong. Additionally, I have no idea what the supposed result 
is; MAXSIZE (= "TYPE_MAX_VALUE (sizetype)") is unsigned and does not fit 
into a signed variable, however, the calculation is supposed to be done 
signed as both the D.1506 and the D.1507 is signed.

Additionally, you completely ignore one important ingredient: The size 
of the actual object which is to be allocated; using derived types those 
can be large. Example:

   use iso_c_binding, only: c_intptr_t
   integer(1), allocatable :: a(:)
   integer(4), allocatable :: b(:)
   allocate(a(huge(0_c_intptr_t)*i))  ! OK
   allocate(b(huge(0_c_intptr_t)*i))  ! Too big
   end

That's the reason I was thinking of:

  if (my_size > MAXSIZE/sizeof(type)) error()

if one wants to include the multiplications for 
allocate(array(sizeA,sizeB)) then one can do:
   if (sizeB > 0 && sizeA > MAXSIZE/sizeB)

That way one always avoids an overflow. I am not sure whether one 
rejects a value close to MAXSIZE, but at least one avoids 
signed/unsigned trickery.

For the conditions, one should consider to use __builtin_expect to give 
the compiler some additional optimization potential.

> Wrt whether these checks should be done always, or only when
> -fcheck=mem is enabled, IMHO some benchmarks should be run and if no
> difference can be seen they can be left enabled. So, does anybody know
> of suitable benchmarks that are influenced by ALLOCATE performance?
> Polyhedron, maybe not so much?

I don't. The issue is presumably largest for calls to small functions 
called in loops, which get later inlined. The less conditions and other 
trickery is done, the more likely the compiler can move the malloc out 
of the loop. I think it is trivial to construct such a program but it is 
hard to know how representative such a program would be for the real 
world. The probability has to be weighted against using a too large 
value for ALLOCATE without -check=all/-check=mem. I am inclined to make 
it conditional to -check=mem; but on the other hand, I live in a 64bit 
world where the chance of an overflow is rather low (for somewhat 
reasonable programs).

Tobias
Janne Blomqvist Nov. 24, 2010, 12:15 p.m. UTC | #2
On Tue, Nov 23, 2010 at 17:50, Tobias Burnus <burnus@net-b.de> wrote:
> On 11/22/2010 11:37 PM, Janne Blomqvist wrote:
>>
>> the attached patch implements overflow checking for the ALLOCATE
>> statement. It also fixes the minor regression introduced a few days
>> ago when the old "<  0" overflow check was removed as a side-effect of
>> fixing the unsignedness of size_t.
>
> Frankly, I do not like the way the overflow check is done - namely by
> requiring an overflow:
>
>        D.1506 = MAX_EXPR <my_size, 0>;
>        D.1507 = D.1506 != 0 ? 2147483647 / D.1506 <= 0 ? 1 : 0 : 0;
>        D.1508 = NON_LVALUE_EXPR <D.1506>;
>        D.1509 = ((<unnamed-unsigned:32>) D.1508 > -1 ? 1 : 0) + D.1507;
>        overflow.1 = D.1509;
>        if (overflow.1 != 0)
>            _gfortran_runtime_error (&"Integer overflow when calculating the
> amount of memory to allocate"[1]{lb: 1 sz: 1});
>
> Namely, doing the "MAXSIZE/my_size <= 0" check where my_size is the signed
> requested size of the array. Explicitly asking for an overflow seems to be
> wrong.

I suspect you are confused; I wondered the same thing when I was
developing the patch, and as far as I was able to determine, the issue
is that the tree dump is slightly different (although the result is
the same) than the actual code that the frontend generates. So, what
gfc_array_init_size does is roughly

// overflow = 0 is set in gfc_array_allocate() before calling
gfc_array_init_size().
stride = 1;
for (int i = 0; i < rank; i++)
{
    size = gfc_conv_array_extent_dim (...);  /* max(ubound + 1 - lbound, 0) */
    overflow += size != 0 ? (MAX/size < stride ? 1: 0): 0;
    stride *= size;
}

So, somehow for the D.1507 statement, the overflow check for the first
dimension which is then "MAX/size < 1" has been turned into "MAX/size
<= 0" which is of course equivalent but is not what the code generated
by the patch actually does! But for higher dimensions you'll see that
the tree dump corresponds to the pseudocode above.

> Additionally, I have no idea what the supposed result is; MAXSIZE (=
> "TYPE_MAX_VALUE (sizetype)") is unsigned and does not fit into a signed
> variable, however, the calculation is supposed to be done signed as both the
> D.1506 and the D.1507 is signed.

Ah, you're right. This should be fixed. Come to think of it, one might
as well instead convert size and stride to size_t right away and do
the overflow check and multiplication in that type, since in the end
it has to be converted to size_t anyway in order to call malloc()
(also, I think I read in the AMD64 software optimization manual that
unsigned divides are slightly faster than signed). In
gfc_conv_array_extent_dim() the calculation must still be done with
signed arithmetic since there the result might be negative, but that
function makes sure to return only a positive value, so size and
stride are guaranteed to be always positive in gfc_array_init_size().

> Additionally, you completely ignore one important ingredient: The size of
> the actual object which is to be allocated; using derived types those can be
> large.

No, AFAICS this is included in the patch. After the loop above, the
pseudocode continues as

tmp = TYPE_SIZE_UNIT (gfc_get_element_type (type));
element_size = fold_convert (sizetype, tmp);
stride = fold_convert (sizetype, stride);
overflow += size != 0 ? (MAX/element_size < stride ? 1: 0): 0;

I believe this check corresponds to the line beginning with D.1509 in
your tree dump above.

> For the conditions, one should consider to use __builtin_expect to give the
> compiler some additional optimization potential.

Good point, I'll have to investigate how to do this.

>> Wrt whether these checks should be done always, or only when
>> -fcheck=mem is enabled, IMHO some benchmarks should be run and if no
>> difference can be seen they can be left enabled. So, does anybody know
>> of suitable benchmarks that are influenced by ALLOCATE performance?
>> Polyhedron, maybe not so much?
>
> I don't. The issue is presumably largest for calls to small functions called
> in loops, which get later inlined. The less conditions and other trickery is
> done, the more likely the compiler can move the malloc out of the loop. I
> think it is trivial to construct such a program but it is hard to know how
> representative such a program would be for the real world. The probability
> has to be weighted against using a too large value for ALLOCATE without
> -check=all/-check=mem. I am inclined to make it conditional to -check=mem;
> but on the other hand, I live in a 64bit world where the chance of an
> overflow is rather low (for somewhat reasonable programs).

I'll see if I can come up with some benchmarks.

Thanks for the review.
diff mbox

Patch

diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index 3d5e5ba..dbd336c 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -3944,13 +3944,14 @@  gfc_conv_descriptor_size (tree desc, int rank)
 static tree
 gfc_array_init_size (tree descriptor, int rank, int corank, tree * poffset,
 		     gfc_expr ** lower, gfc_expr ** upper,
-		     stmtblock_t * pblock)
+		     stmtblock_t * pblock, tree * overflow)
 {
   tree type;
   tree tmp;
   tree size;
   tree offset;
   tree stride;
+  tree element_size;
   tree or_expr;
   tree thencase;
   tree elsecase;
@@ -4027,7 +4028,34 @@  gfc_array_init_size (tree descriptor, int rank, int corank, tree * poffset,
 
       /* Calculate size and check whether extent is negative.  */
       size = gfc_conv_array_extent_dim (conv_lbound, conv_ubound, &or_expr);
-
+      size = gfc_evaluate_now (size, pblock);
+
+      /* Check whether multiplying the stride by the number of
+	 elements in this dimension would overflow. We must also check
+	 whether the current dimension has zero size in order to avoid
+	 division by zero. The algorithm is thus
+
+	 overflow += size != 0 ? (M/size < stride ? 1: 0): 0;
+
+	 where M is the maximum value of gfc_array_index_type.
+      */
+      tmp = fold_build2_loc (input_location, TRUNC_DIV_EXPR, 
+			     gfc_array_index_type, 
+			     TYPE_MAX_VALUE (gfc_array_index_type), size);
+      tmp = fold_build3_loc (input_location, COND_EXPR, integer_type_node,
+			     fold_build2_loc (input_location, LT_EXPR, 
+					      boolean_type_node, tmp, stride),
+			     integer_one_node, integer_zero_node);
+      tmp = fold_build3_loc (input_location, COND_EXPR, integer_type_node,
+			     fold_build2_loc (input_location, NE_EXPR,
+					      boolean_type_node, size, 
+					      build_zero_cst (gfc_array_index_type)),
+			     tmp, integer_zero_node);
+      tmp = fold_build2_loc (input_location, PLUS_EXPR, integer_type_node,
+			     *overflow, tmp);
+      tmp = gfc_evaluate_now (tmp, pblock);
+      *overflow = tmp;
+      
       /* Multiply the stride by the number of elements in this dimension.  */
       stride = fold_build2_loc (input_location, MULT_EXPR,
 				gfc_array_index_type, stride, size);
@@ -4075,8 +4103,32 @@  gfc_array_init_size (tree descriptor, int rank, int corank, tree * poffset,
   /* The stride is the number of elements in the array, so multiply by the
      size of an element to get the total size.  */
   tmp = TYPE_SIZE_UNIT (gfc_get_element_type (type));
-  size = fold_build2_loc (input_location, MULT_EXPR, gfc_array_index_type,
-			  stride, fold_convert (gfc_array_index_type, tmp));
+  /* Convert to size_t.  */
+  element_size = fold_convert (sizetype, tmp);
+  stride = fold_convert (sizetype, stride);
+
+  /* First check for overflow. Since an array of type character can
+     have zero element_size, we must check for that before
+     dividing.  */
+  tmp = fold_build2_loc (input_location, TRUNC_DIV_EXPR, 
+			 sizetype, 
+			 TYPE_MAX_VALUE (sizetype), element_size);
+  tmp = fold_build3_loc (input_location, COND_EXPR, integer_type_node,
+			 fold_build2_loc (input_location, LT_EXPR, 
+					  boolean_type_node, tmp, stride),
+			 integer_one_node, integer_zero_node);
+  tmp = fold_build3_loc (input_location, COND_EXPR, integer_type_node,
+			 fold_build2_loc (input_location, NE_EXPR,
+					  boolean_type_node, element_size, 
+					  size_zero_node),
+			 tmp, integer_zero_node);
+  tmp = fold_build2_loc (input_location, PLUS_EXPR, integer_type_node,
+			 *overflow, tmp);
+  tmp = gfc_evaluate_now (tmp, pblock);
+  *overflow = tmp;
+
+  size = fold_build2_loc (input_location, MULT_EXPR, sizetype,
+			  stride, element_size);
 
   if (poffset != NULL)
     {
@@ -4091,7 +4143,7 @@  gfc_array_init_size (tree descriptor, int rank, int corank, tree * poffset,
 
   var = gfc_create_var (TREE_TYPE (size), "size");
   gfc_start_block (&thenblock);
-  gfc_add_modify (&thenblock, var, gfc_index_zero_node);
+  gfc_add_modify (&thenblock, var, size_zero_node);
   thencase = gfc_finish_block (&thenblock);
 
   gfc_start_block (&elseblock);
@@ -4117,6 +4169,12 @@  gfc_array_allocate (gfc_se * se, gfc_expr * expr, tree pstat)
   tree pointer;
   tree offset;
   tree size;
+  tree msg;
+  tree error;
+  tree overflow; /* Boolean storing whether size calculation overflows.  */
+  tree var_overflow;
+  tree cond;
+  stmtblock_t elseblock;
   gfc_expr **lower;
   gfc_expr **upper;
   gfc_ref *ref, *prev_ref = NULL;
@@ -4184,21 +4242,60 @@  gfc_array_allocate (gfc_se * se, gfc_expr * expr, tree pstat)
       break;
     }
 
+  overflow = integer_zero_node;
   size = gfc_array_init_size (se->expr, ref->u.ar.as->rank,
 			      ref->u.ar.as->corank, &offset, lower, upper,
-			      &se->pre);
+			      &se->pre, &overflow);
+
+  var_overflow = gfc_create_var (integer_type_node, "overflow");
+  gfc_add_modify (&se->pre, var_overflow, overflow);
+
+  /* Generate the block of code handling overflow.  */
+  msg = gfc_build_addr_expr (pchar_type_node, gfc_build_localized_cstring_const
+  			("Integer overflow when calculating the amount of "
+  			 "memory to allocate"));
+  error = build_call_expr_loc (input_location,
+  			   gfor_fndecl_runtime_error, 1, msg);
+
+  if (pstat != NULL_TREE && !integer_zerop (pstat))
+    {
+      /* Set the status variable if it's present.  */
+      stmtblock_t set_status_block;
+      tree status_type = pstat ? TREE_TYPE (TREE_TYPE (pstat)) : NULL_TREE;
 
+      gfc_start_block (&set_status_block);
+      gfc_add_modify (&set_status_block,
+  		      fold_build1_loc (input_location, INDIRECT_REF,
+  				       status_type, pstat),
+  			   build_int_cst (status_type, LIBERROR_ALLOCATION));
+
+      tmp = fold_build2_loc (input_location, EQ_EXPR, boolean_type_node,
+  			     pstat, build_int_cst (TREE_TYPE (pstat), 0));
+      error = fold_build3_loc (input_location, COND_EXPR, void_type_node, tmp,
+  			       error, gfc_finish_block (&set_status_block));
+    }
+
+  gfc_start_block (&elseblock);
+  
   /* Allocate memory to store the data.  */
   pointer = gfc_conv_descriptor_data_get (se->expr);
   STRIP_NOPS (pointer);
 
   /* The allocate_array variants take the old pointer as first argument.  */
   if (allocatable_array)
-    tmp = gfc_allocate_array_with_status (&se->pre, pointer, size, pstat, expr);
+    tmp = gfc_allocate_array_with_status (&elseblock, pointer, size, pstat, expr);
   else
-    tmp = gfc_allocate_with_status (&se->pre, size, pstat);
+    tmp = gfc_allocate_with_status (&elseblock, size, pstat);
   tmp = fold_build2_loc (input_location, MODIFY_EXPR, void_type_node, pointer,
 			 tmp);
+
+  gfc_add_expr_to_block (&elseblock, tmp);
+
+  cond = fold_build2_loc (input_location, NE_EXPR, boolean_type_node,
+  			 var_overflow, integer_zero_node);
+  tmp = fold_build3_loc (input_location, COND_EXPR, void_type_node, cond, 
+			 error, gfc_finish_block (&elseblock));
+
   gfc_add_expr_to_block (&se->pre, tmp);
 
   gfc_conv_descriptor_offset_set (&se->pre, se->expr, offset);