Message ID | AANLkTimWJMty0jEWVc8d3wZvmfw=aXxXY3tFCNa7tEaz@mail.gmail.com |
---|---|
State | New |
Headers | show |
On 11/22/2010 11:37 PM, Janne Blomqvist wrote: > the attached patch implements overflow checking for the ALLOCATE > statement. It also fixes the minor regression introduced a few days > ago when the old "< 0" overflow check was removed as a side-effect of > fixing the unsignedness of size_t. Frankly, I do not like the way the overflow check is done - namely by requiring an overflow: D.1506 = MAX_EXPR <my_size, 0>; D.1507 = D.1506 != 0 ? 2147483647 / D.1506 <= 0 ? 1 : 0 : 0; D.1508 = NON_LVALUE_EXPR <D.1506>; D.1509 = ((<unnamed-unsigned:32>) D.1508 > -1 ? 1 : 0) + D.1507; overflow.1 = D.1509; if (overflow.1 != 0) _gfortran_runtime_error (&"Integer overflow when calculating the amount of memory to allocate"[1]{lb: 1 sz: 1}); Namely, doing the "MAXSIZE/my_size <= 0" check where my_size is the signed requested size of the array. Explicitly asking for an overflow seems to be wrong. Additionally, I have no idea what the supposed result is; MAXSIZE (= "TYPE_MAX_VALUE (sizetype)") is unsigned and does not fit into a signed variable, however, the calculation is supposed to be done signed as both the D.1506 and the D.1507 is signed. Additionally, you completely ignore one important ingredient: The size of the actual object which is to be allocated; using derived types those can be large. Example: use iso_c_binding, only: c_intptr_t integer(1), allocatable :: a(:) integer(4), allocatable :: b(:) allocate(a(huge(0_c_intptr_t)*i)) ! OK allocate(b(huge(0_c_intptr_t)*i)) ! Too big end That's the reason I was thinking of: if (my_size > MAXSIZE/sizeof(type)) error() if one wants to include the multiplications for allocate(array(sizeA,sizeB)) then one can do: if (sizeB > 0 && sizeA > MAXSIZE/sizeB) That way one always avoids an overflow. I am not sure whether one rejects a value close to MAXSIZE, but at least one avoids signed/unsigned trickery. For the conditions, one should consider to use __builtin_expect to give the compiler some additional optimization potential. > Wrt whether these checks should be done always, or only when > -fcheck=mem is enabled, IMHO some benchmarks should be run and if no > difference can be seen they can be left enabled. So, does anybody know > of suitable benchmarks that are influenced by ALLOCATE performance? > Polyhedron, maybe not so much? I don't. The issue is presumably largest for calls to small functions called in loops, which get later inlined. The less conditions and other trickery is done, the more likely the compiler can move the malloc out of the loop. I think it is trivial to construct such a program but it is hard to know how representative such a program would be for the real world. The probability has to be weighted against using a too large value for ALLOCATE without -check=all/-check=mem. I am inclined to make it conditional to -check=mem; but on the other hand, I live in a 64bit world where the chance of an overflow is rather low (for somewhat reasonable programs). Tobias
On Tue, Nov 23, 2010 at 17:50, Tobias Burnus <burnus@net-b.de> wrote: > On 11/22/2010 11:37 PM, Janne Blomqvist wrote: >> >> the attached patch implements overflow checking for the ALLOCATE >> statement. It also fixes the minor regression introduced a few days >> ago when the old "< 0" overflow check was removed as a side-effect of >> fixing the unsignedness of size_t. > > Frankly, I do not like the way the overflow check is done - namely by > requiring an overflow: > > D.1506 = MAX_EXPR <my_size, 0>; > D.1507 = D.1506 != 0 ? 2147483647 / D.1506 <= 0 ? 1 : 0 : 0; > D.1508 = NON_LVALUE_EXPR <D.1506>; > D.1509 = ((<unnamed-unsigned:32>) D.1508 > -1 ? 1 : 0) + D.1507; > overflow.1 = D.1509; > if (overflow.1 != 0) > _gfortran_runtime_error (&"Integer overflow when calculating the > amount of memory to allocate"[1]{lb: 1 sz: 1}); > > Namely, doing the "MAXSIZE/my_size <= 0" check where my_size is the signed > requested size of the array. Explicitly asking for an overflow seems to be > wrong. I suspect you are confused; I wondered the same thing when I was developing the patch, and as far as I was able to determine, the issue is that the tree dump is slightly different (although the result is the same) than the actual code that the frontend generates. So, what gfc_array_init_size does is roughly // overflow = 0 is set in gfc_array_allocate() before calling gfc_array_init_size(). stride = 1; for (int i = 0; i < rank; i++) { size = gfc_conv_array_extent_dim (...); /* max(ubound + 1 - lbound, 0) */ overflow += size != 0 ? (MAX/size < stride ? 1: 0): 0; stride *= size; } So, somehow for the D.1507 statement, the overflow check for the first dimension which is then "MAX/size < 1" has been turned into "MAX/size <= 0" which is of course equivalent but is not what the code generated by the patch actually does! But for higher dimensions you'll see that the tree dump corresponds to the pseudocode above. > Additionally, I have no idea what the supposed result is; MAXSIZE (= > "TYPE_MAX_VALUE (sizetype)") is unsigned and does not fit into a signed > variable, however, the calculation is supposed to be done signed as both the > D.1506 and the D.1507 is signed. Ah, you're right. This should be fixed. Come to think of it, one might as well instead convert size and stride to size_t right away and do the overflow check and multiplication in that type, since in the end it has to be converted to size_t anyway in order to call malloc() (also, I think I read in the AMD64 software optimization manual that unsigned divides are slightly faster than signed). In gfc_conv_array_extent_dim() the calculation must still be done with signed arithmetic since there the result might be negative, but that function makes sure to return only a positive value, so size and stride are guaranteed to be always positive in gfc_array_init_size(). > Additionally, you completely ignore one important ingredient: The size of > the actual object which is to be allocated; using derived types those can be > large. No, AFAICS this is included in the patch. After the loop above, the pseudocode continues as tmp = TYPE_SIZE_UNIT (gfc_get_element_type (type)); element_size = fold_convert (sizetype, tmp); stride = fold_convert (sizetype, stride); overflow += size != 0 ? (MAX/element_size < stride ? 1: 0): 0; I believe this check corresponds to the line beginning with D.1509 in your tree dump above. > For the conditions, one should consider to use __builtin_expect to give the > compiler some additional optimization potential. Good point, I'll have to investigate how to do this. >> Wrt whether these checks should be done always, or only when >> -fcheck=mem is enabled, IMHO some benchmarks should be run and if no >> difference can be seen they can be left enabled. So, does anybody know >> of suitable benchmarks that are influenced by ALLOCATE performance? >> Polyhedron, maybe not so much? > > I don't. The issue is presumably largest for calls to small functions called > in loops, which get later inlined. The less conditions and other trickery is > done, the more likely the compiler can move the malloc out of the loop. I > think it is trivial to construct such a program but it is hard to know how > representative such a program would be for the real world. The probability > has to be weighted against using a too large value for ALLOCATE without > -check=all/-check=mem. I am inclined to make it conditional to -check=mem; > but on the other hand, I live in a 64bit world where the chance of an > overflow is rather low (for somewhat reasonable programs). I'll see if I can come up with some benchmarks. Thanks for the review.
diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c index 3d5e5ba..dbd336c 100644 --- a/gcc/fortran/trans-array.c +++ b/gcc/fortran/trans-array.c @@ -3944,13 +3944,14 @@ gfc_conv_descriptor_size (tree desc, int rank) static tree gfc_array_init_size (tree descriptor, int rank, int corank, tree * poffset, gfc_expr ** lower, gfc_expr ** upper, - stmtblock_t * pblock) + stmtblock_t * pblock, tree * overflow) { tree type; tree tmp; tree size; tree offset; tree stride; + tree element_size; tree or_expr; tree thencase; tree elsecase; @@ -4027,7 +4028,34 @@ gfc_array_init_size (tree descriptor, int rank, int corank, tree * poffset, /* Calculate size and check whether extent is negative. */ size = gfc_conv_array_extent_dim (conv_lbound, conv_ubound, &or_expr); - + size = gfc_evaluate_now (size, pblock); + + /* Check whether multiplying the stride by the number of + elements in this dimension would overflow. We must also check + whether the current dimension has zero size in order to avoid + division by zero. The algorithm is thus + + overflow += size != 0 ? (M/size < stride ? 1: 0): 0; + + where M is the maximum value of gfc_array_index_type. + */ + tmp = fold_build2_loc (input_location, TRUNC_DIV_EXPR, + gfc_array_index_type, + TYPE_MAX_VALUE (gfc_array_index_type), size); + tmp = fold_build3_loc (input_location, COND_EXPR, integer_type_node, + fold_build2_loc (input_location, LT_EXPR, + boolean_type_node, tmp, stride), + integer_one_node, integer_zero_node); + tmp = fold_build3_loc (input_location, COND_EXPR, integer_type_node, + fold_build2_loc (input_location, NE_EXPR, + boolean_type_node, size, + build_zero_cst (gfc_array_index_type)), + tmp, integer_zero_node); + tmp = fold_build2_loc (input_location, PLUS_EXPR, integer_type_node, + *overflow, tmp); + tmp = gfc_evaluate_now (tmp, pblock); + *overflow = tmp; + /* Multiply the stride by the number of elements in this dimension. */ stride = fold_build2_loc (input_location, MULT_EXPR, gfc_array_index_type, stride, size); @@ -4075,8 +4103,32 @@ gfc_array_init_size (tree descriptor, int rank, int corank, tree * poffset, /* The stride is the number of elements in the array, so multiply by the size of an element to get the total size. */ tmp = TYPE_SIZE_UNIT (gfc_get_element_type (type)); - size = fold_build2_loc (input_location, MULT_EXPR, gfc_array_index_type, - stride, fold_convert (gfc_array_index_type, tmp)); + /* Convert to size_t. */ + element_size = fold_convert (sizetype, tmp); + stride = fold_convert (sizetype, stride); + + /* First check for overflow. Since an array of type character can + have zero element_size, we must check for that before + dividing. */ + tmp = fold_build2_loc (input_location, TRUNC_DIV_EXPR, + sizetype, + TYPE_MAX_VALUE (sizetype), element_size); + tmp = fold_build3_loc (input_location, COND_EXPR, integer_type_node, + fold_build2_loc (input_location, LT_EXPR, + boolean_type_node, tmp, stride), + integer_one_node, integer_zero_node); + tmp = fold_build3_loc (input_location, COND_EXPR, integer_type_node, + fold_build2_loc (input_location, NE_EXPR, + boolean_type_node, element_size, + size_zero_node), + tmp, integer_zero_node); + tmp = fold_build2_loc (input_location, PLUS_EXPR, integer_type_node, + *overflow, tmp); + tmp = gfc_evaluate_now (tmp, pblock); + *overflow = tmp; + + size = fold_build2_loc (input_location, MULT_EXPR, sizetype, + stride, element_size); if (poffset != NULL) { @@ -4091,7 +4143,7 @@ gfc_array_init_size (tree descriptor, int rank, int corank, tree * poffset, var = gfc_create_var (TREE_TYPE (size), "size"); gfc_start_block (&thenblock); - gfc_add_modify (&thenblock, var, gfc_index_zero_node); + gfc_add_modify (&thenblock, var, size_zero_node); thencase = gfc_finish_block (&thenblock); gfc_start_block (&elseblock); @@ -4117,6 +4169,12 @@ gfc_array_allocate (gfc_se * se, gfc_expr * expr, tree pstat) tree pointer; tree offset; tree size; + tree msg; + tree error; + tree overflow; /* Boolean storing whether size calculation overflows. */ + tree var_overflow; + tree cond; + stmtblock_t elseblock; gfc_expr **lower; gfc_expr **upper; gfc_ref *ref, *prev_ref = NULL; @@ -4184,21 +4242,60 @@ gfc_array_allocate (gfc_se * se, gfc_expr * expr, tree pstat) break; } + overflow = integer_zero_node; size = gfc_array_init_size (se->expr, ref->u.ar.as->rank, ref->u.ar.as->corank, &offset, lower, upper, - &se->pre); + &se->pre, &overflow); + + var_overflow = gfc_create_var (integer_type_node, "overflow"); + gfc_add_modify (&se->pre, var_overflow, overflow); + + /* Generate the block of code handling overflow. */ + msg = gfc_build_addr_expr (pchar_type_node, gfc_build_localized_cstring_const + ("Integer overflow when calculating the amount of " + "memory to allocate")); + error = build_call_expr_loc (input_location, + gfor_fndecl_runtime_error, 1, msg); + + if (pstat != NULL_TREE && !integer_zerop (pstat)) + { + /* Set the status variable if it's present. */ + stmtblock_t set_status_block; + tree status_type = pstat ? TREE_TYPE (TREE_TYPE (pstat)) : NULL_TREE; + gfc_start_block (&set_status_block); + gfc_add_modify (&set_status_block, + fold_build1_loc (input_location, INDIRECT_REF, + status_type, pstat), + build_int_cst (status_type, LIBERROR_ALLOCATION)); + + tmp = fold_build2_loc (input_location, EQ_EXPR, boolean_type_node, + pstat, build_int_cst (TREE_TYPE (pstat), 0)); + error = fold_build3_loc (input_location, COND_EXPR, void_type_node, tmp, + error, gfc_finish_block (&set_status_block)); + } + + gfc_start_block (&elseblock); + /* Allocate memory to store the data. */ pointer = gfc_conv_descriptor_data_get (se->expr); STRIP_NOPS (pointer); /* The allocate_array variants take the old pointer as first argument. */ if (allocatable_array) - tmp = gfc_allocate_array_with_status (&se->pre, pointer, size, pstat, expr); + tmp = gfc_allocate_array_with_status (&elseblock, pointer, size, pstat, expr); else - tmp = gfc_allocate_with_status (&se->pre, size, pstat); + tmp = gfc_allocate_with_status (&elseblock, size, pstat); tmp = fold_build2_loc (input_location, MODIFY_EXPR, void_type_node, pointer, tmp); + + gfc_add_expr_to_block (&elseblock, tmp); + + cond = fold_build2_loc (input_location, NE_EXPR, boolean_type_node, + var_overflow, integer_zero_node); + tmp = fold_build3_loc (input_location, COND_EXPR, void_type_node, cond, + error, gfc_finish_block (&elseblock)); + gfc_add_expr_to_block (&se->pre, tmp); gfc_conv_descriptor_offset_set (&se->pre, se->expr, offset);