Message ID | 1286731389.6166.7.camel@linux-fd1f.site |
---|---|
State | New |
Headers | show |
I wrote
> Comments?
.. apart from the fact that I forgot
_gfortran_transfer_character_wide_write
in gfortran.map.
Thomas
Thomas Koenig wrote: > this RFC patch separates reads from writes in Fortran, by establishing > separate transfer functions for writing, with a different spec. For > now, the write functions just call the original transfer functions, with > the spec making all the difference. I think you are currently trading non-clobber for having a function call overhead: The newly generated functions in libgfortran will probably not be inlined as the called procedure is also publicly visible. Though, they might be cloned and inlined - as least in the cases when the transfer function is small. The question is whether one can do without duplication of code, e.g., by having two different decls but with the same assembler name - though, I am not sure whether this will work - including with LTO. What do our middle end experts think? a) Status quo: Read and write I/O call the same libgfortran function, i.e. the non-clobbering is not taken into account. b) Thomas' patch: Adding a wrapper function for write, i.e. a function call overhead but allows to use fn-spec to mark the arguments as noclobber. c) Other ideas? Tobias
On 10/10/2010 10:23 AM, Thomas Koenig wrote: > Hello world, > > this RFC patch separates reads from writes in Fortran, by establishing > separate transfer functions for writing, with a different spec. For > now, the write functions just call the original transfer functions, with > the spec making all the difference. That patch seems straight forward, so I like that aspect of it. We can later consider further sepration on the library side. I do not understand what these specs wW and rW mean. How did you learn about this? > > Do you think this is the right approach? Did I get the specs right (I > didn't find any documentation, but I may have missed something)? > > This does speed up the test case from PR 31593, and should also at least > partially fix PR 20165. What is the difference in time with and without the patch? > > No changelog yet, and a test case probably isn't needed. I'm currently > regression-testing. If there is general agreement, I think this could > still go into 4.6. > I think this is OK for 4.6 if it does not break ABI. Since the original functions are retained it should be OK. What about the other patches discuss in pr31593? No longer needed? Jerry
On 10/10/2010 01:44 PM, Tobias Burnus wrote: > Thomas Koenig wrote: >> this RFC patch separates reads from writes in Fortran, by establishing >> separate transfer functions for writing, with a different spec. For >> now, the write functions just call the original transfer functions, with >> the spec making all the difference. > > I think you are currently trading non-clobber for having a function call > overhead: The newly generated functions in libgfortran will probably not be > inlined as the called procedure is also publicly visible. Though, they might be > cloned and inlined - as least in the cases when the transfer function is small. > > The question is whether one can do without duplication of code, e.g., by having > two different decls but with the same assembler name - though, I am not sure > whether this will work - including with LTO. > > What do our middle end experts think? > a) Status quo: Read and write I/O call the same libgfortran function, i.e. the > non-clobbering is not taken into account. > b) Thomas' patch: Adding a wrapper function for write, i.e. a function call > overhead but allows to use fn-spec to mark the arguments as noclobber. I don't think this function call overhead is that significant, however it would be nice to run some tests and measure it. Also, It would make sense to me to separate the READ and WRITE in the library. I will be looking into doing this soon. This would clear the over head issue, if any, and we woud also be able to get rid of runtime tests for whether we are doing READ or WRITE which is additional overhead we have now. So over all, it looks like a step in the right direction. > c) Other ideas? Yes I would really like to see other ideas. > > Tobias >
On 10/10/2010 11:58 PM, Jerry DeLisle wrote: > On 10/10/2010 10:23 AM, Thomas Koenig wrote: >> this RFC patch separates reads from writes in Fortran, by establishing >> separate transfer functions for writing, with a different spec. For >> now, the write functions just call the original transfer functions, with >> the spec making all the difference. > > That patch seems straight forward, so I like that aspect of it. We > can later consider further sepration on the library side. I think if one separates the library part, the patch is really profitable - though the library part is a post-4.6 item as it will break the ABI - unless we add also a read version and use the generic version to branch to either version. > I do not understand what these specs wW and rW mean. How did you > learn about this? That's in a way the middle-end equivalent to INTENT(IN) and to TARGET/POINTER vs. not-TARGET/POINTER: Namely, whether the value is modified (clobbered) or not and whether a pointer address can escape or not. This is already widely used by gfortran - but as for READ the same transfer function is used as for WRITE, one cannot tell the middle end that the value in a READ transfer is not modified. The annotation was introduced by Richard and works like "__attribute__((...))"; however, the attribute has a space in the name ("fn spec") which prevents the external, user-code usage. (The reason for the space is: It was a bit unclear how the syntax should be - if it is only internally usable, one can easily change it, if needed.) Cf. PR 43665 for the details of "fn spec". (That PR is about using "fn spec" in gfortran - thus, the changelog to Thomas' patch should in my opinion also list that PR.) >> This does speed up the test case from PR 31593, and should also at least >> partially fix PR 20165. > > What is the difference in time with and without the patch? That strongly depends on the test case; one tells the middle end that the argument does not get modified, implying that the middle end can do all kind of optimizations. I assume that most of the time the optimizations are not crucial, but for some programs it could make a huge difference. (However, one should, if possible, not use I/O in a hot loop ...) > What about the other patches discuss in pr31593? No longer needed? I think this patch is a bit orthogonal. If you use do i = 1, 10 call foo(i) ! Or "write(*,*) i" end do The "i" is assumed to be potentially modified by "foo" unless its first argument is known to be INTENT(IN) (or VALUE). If it is INTENT(IN) or a WRITE/PRINT statement (with Thomas' patch), the previous patches are not needed (and make things worse). The problem are procedure calls which have no (known) INTENT attribute - hence, a "fn spec" cannot be used* and thus doing something as in Tobias Schlueter's patches could still be useful. Tobias * The fact that the procedure is called within a DO loop does not help; for instance: subroutine foo(i) if (.not. in_do_loop) ! Host associated logical variable i = 42 end do end would be valid in DO loops. Thus, assuming that the argument is INTENT(IN) since it is used in a loop is wrong - leading to wrong code if "foo" is called with no loop variable as argument.
On Mon, Oct 11, 2010 at 01:07, Jerry DeLisle <jvdelisle@frontier.com> wrote: > On 10/10/2010 01:44 PM, Tobias Burnus wrote: >> >> Thomas Koenig wrote: >>> >>> this RFC patch separates reads from writes in Fortran, by establishing >>> separate transfer functions for writing, with a different spec. For >>> now, the write functions just call the original transfer functions, with >>> the spec making all the difference. >> >> I think you are currently trading non-clobber for having a function call >> overhead: The newly generated functions in libgfortran will probably not >> be >> inlined as the called procedure is also publicly visible. Though, they >> might be >> cloned and inlined - as least in the cases when the transfer function is >> small. >> >> The question is whether one can do without duplication of code, e.g., by >> having >> two different decls but with the same assembler name - though, I am not >> sure >> whether this will work - including with LTO. >> >> What do our middle end experts think? >> a) Status quo: Read and write I/O call the same libgfortran function, i.e. >> the >> non-clobbering is not taken into account. >> b) Thomas' patch: Adding a wrapper function for write, i.e. a function >> call >> overhead but allows to use fn-spec to mark the arguments as noclobber. > > I don't think this function call overhead is that significant, however it > would be nice to run some tests and measure it. Also, It would make sense > to me to separate the READ and WRITE in the library. I will be looking into > doing this soon. This would clear the over head issue, if any, and we woud > also be able to get rid of runtime tests for whether we are doing READ or > WRITE which is additional overhead we have now. So over all, it looks like a > step in the right direction. I agree with Jerry here; one function call more or less is insignificant compared to all the other work the IO library does. Indeed, all this work is AFAICS the reason why gfortran IO performance suffers for simple IO statements. Thanks to transfer_array() we do quite well when reading/writing large array (sections), but something very simple like WRITE (10) i end up doing quite a lot of work. To some extent this is inevitable due to the "impedance mismatch" between Fortran and POSIX IO models, but still. Say, take a look at the horror that is data_transfer_init(), finalize_transfer(), next_record_*() and so forth that is executed for every single READ/WRITE statement. Not that I have any good idea on how to tackle this.
On 10/12/2010 03:18 PM, Janne Blomqvist wrote: > On Mon, Oct 11, 2010 at 01:07, Jerry DeLisle<jvdelisle@frontier.com> wrote: >> On 10/10/2010 01:44 PM, Tobias Burnus wrote: >>> >>> Thomas Koenig wrote: >>>> >>>> this RFC patch separates reads from writes in Fortran, by establishing >>>> separate transfer functions for writing, with a different spec. For >>>> now, the write functions just call the original transfer functions, with >>>> the spec making all the difference. >>> >>> I think you are currently trading non-clobber for having a function call >>> overhead: The newly generated functions in libgfortran will probably not >>> be >>> inlined as the called procedure is also publicly visible. Though, they >>> might be >>> cloned and inlined - as least in the cases when the transfer function is >>> small. >>> >>> The question is whether one can do without duplication of code, e.g., by >>> having >>> two different decls but with the same assembler name - though, I am not >>> sure >>> whether this will work - including with LTO. >>> >>> What do our middle end experts think? >>> a) Status quo: Read and write I/O call the same libgfortran function, i.e. >>> the >>> non-clobbering is not taken into account. >>> b) Thomas' patch: Adding a wrapper function for write, i.e. a function >>> call >>> overhead but allows to use fn-spec to mark the arguments as noclobber. >> >> I don't think this function call overhead is that significant, however it >> would be nice to run some tests and measure it. Also, It would make sense >> to me to separate the READ and WRITE in the library. I will be looking into >> doing this soon. This would clear the over head issue, if any, and we woud >> also be able to get rid of runtime tests for whether we are doing READ or >> WRITE which is additional overhead we have now. So over all, it looks like a >> step in the right direction. > > I agree with Jerry here; one function call more or less is > insignificant compared to all the other work the IO library does. > Indeed, all this work is AFAICS the reason why gfortran IO performance > suffers for simple IO statements. Thanks to transfer_array() we do > quite well when reading/writing large array (sections), but something > very simple like > > WRITE (10) i > > end up doing quite a lot of work. To some extent this is inevitable > due to the "impedance mismatch" between Fortran and POSIX IO models, > but still. Say, take a look at the horror that is > data_transfer_init(), finalize_transfer(), next_record_*() and so > forth that is executed for every single READ/WRITE statement. > > Not that I have any good idea on how to tackle this. > I have thought for a long time to refactor all of this. Perhaps now is the time to start witth an eye toward 4.7. I will open a PR and assign to myself. One obvious benefit will be getting rid of a lot of conditionals at run-time that are known at compile time and don't change. I will plan on doing this as soon as I finish with the format checking revamp I am doing. Regards, Jerry
Index: libgfortran/gfortran.map =================================================================== --- libgfortran/gfortran.map (Revision 165124) +++ libgfortran/gfortran.map (Arbeitskopie) @@ -1141,6 +1141,12 @@ _gfortran_parity_l8; _gfortran_parity_l16; _gfortran_selected_real_kind2008; + _gfortran_transfer_array_write; + _gfortran_transfer_character_write; + _gfortran_transfer_complex_write; + _gfortran_transfer_integer_write; + _gfortran_transfer_logical_write; + _gfortran_transfer_real_write; } GFORTRAN_1.3; F2C_1.0 { Index: libgfortran/io/transfer.c =================================================================== --- libgfortran/io/transfer.c (Revision 165124) +++ libgfortran/io/transfer.c (Arbeitskopie) @@ -67,25 +67,48 @@ extern void transfer_integer (st_parameter_dt *, void *, int); export_proto(transfer_integer); +extern void transfer_integer_write (st_parameter_dt *, void *, int); +export_proto(transfer_integer_write); + extern void transfer_real (st_parameter_dt *, void *, int); export_proto(transfer_real); +extern void transfer_real_write (st_parameter_dt *, void *, int); +export_proto(transfer_real_write); + extern void transfer_logical (st_parameter_dt *, void *, int); export_proto(transfer_logical); +extern void transfer_logical_write (st_parameter_dt *, void *, int); +export_proto(transfer_logical_write); + extern void transfer_character (st_parameter_dt *, void *, int); export_proto(transfer_character); +extern void transfer_character_write (st_parameter_dt *, void *, int); +export_proto(transfer_character_write); + extern void transfer_character_wide (st_parameter_dt *, void *, int, int); export_proto(transfer_character_wide); +extern void transfer_character_wide_write (st_parameter_dt *, + void *, int, int); +export_proto(transfer_character_wide_write); + extern void transfer_complex (st_parameter_dt *, void *, int); export_proto(transfer_complex); +extern void transfer_complex_write (st_parameter_dt *, void *, int); +export_proto(transfer_complex_write); + extern void transfer_array (st_parameter_dt *, gfc_array_char *, int, gfc_charlen_type); export_proto(transfer_array); +extern void transfer_array_write (st_parameter_dt *, gfc_array_char *, int, + gfc_charlen_type); +export_proto(transfer_array_write); + static void us_read (st_parameter_dt *, int); static void us_write (st_parameter_dt *, int); static void next_record_r_unf (st_parameter_dt *, int); @@ -1847,6 +1870,11 @@ dtp->u.p.transfer (dtp, BT_INTEGER, p, kind, kind, 1); } +void +transfer_integer_write (st_parameter_dt *dtp, void *p, int kind) +{ + transfer_integer (dtp, p, kind); +} void transfer_real (st_parameter_dt *dtp, void *p, int kind) @@ -1858,6 +1886,11 @@ dtp->u.p.transfer (dtp, BT_REAL, p, kind, size, 1); } +void +transfer_real_write (st_parameter_dt *dtp, void *p, int kind) +{ + transfer_real (dtp, p, kind); +} void transfer_logical (st_parameter_dt *dtp, void *p, int kind) @@ -1867,6 +1900,11 @@ dtp->u.p.transfer (dtp, BT_LOGICAL, p, kind, kind, 1); } +void +transfer_logical_write (st_parameter_dt *dtp, void *p, int kind) +{ + transfer_logical (dtp, p, kind); +} void transfer_character (st_parameter_dt *dtp, void *p, int len) @@ -1887,6 +1925,12 @@ } void +transfer_character_write (st_parameter_dt *dtp, void *p, int len) +{ + transfer_character (dtp, p, len); +} + +void transfer_character_wide (st_parameter_dt *dtp, void *p, int len, int kind) { static char *empty_string[0]; @@ -1904,6 +1948,11 @@ dtp->u.p.transfer (dtp, BT_CHARACTER, p, kind, len, 1); } +void +transfer_character_wide_write (st_parameter_dt *dtp, void *p, int len, int kind) +{ + transfer_character_wide (dtp, p, len, kind); +} void transfer_complex (st_parameter_dt *dtp, void *p, int kind) @@ -1915,6 +1964,11 @@ dtp->u.p.transfer (dtp, BT_COMPLEX, p, kind, size, 1); } +void +transfer_complex_write (st_parameter_dt *dtp, void *p, int kind) +{ + transfer_complex (dtp, p, kind); +} void transfer_array (st_parameter_dt *dtp, gfc_array_char *desc, int kind, @@ -2020,6 +2074,12 @@ } } +void +transfer_array_write (st_parameter_dt *dtp, gfc_array_char *desc, int kind, + gfc_charlen_type charlen) +{ + transfer_array (dtp, desc, kind, charlen); +} /* Preposition a sequential unformatted file while reading. */ Index: gcc/fortran/trans-io.c =================================================================== --- gcc/fortran/trans-io.c (Revision 165124) +++ gcc/fortran/trans-io.c (Arbeitskopie) @@ -115,12 +115,19 @@ IOCALL_WRITE, IOCALL_WRITE_DONE, IOCALL_X_INTEGER, + IOCALL_X_INTEGER_WRITE, IOCALL_X_LOGICAL, + IOCALL_X_LOGICAL_WRITE, IOCALL_X_CHARACTER, + IOCALL_X_CHARACTER_WRITE, IOCALL_X_CHARACTER_WIDE, + IOCALL_X_CHARACTER_WIDE_WRITE, IOCALL_X_REAL, + IOCALL_X_REAL_WRITE, IOCALL_X_COMPLEX, + IOCALL_X_COMPLEX_WRITE, IOCALL_X_ARRAY, + IOCALL_X_ARRAY_WRITE, IOCALL_OPEN, IOCALL_CLOSE, IOCALL_INQUIRE, @@ -303,9 +310,7 @@ for (ptype = IOPARM_ptype_common; ptype < IOPARM_ptype_num; ptype++) gfc_build_st_parameter ((enum ioparam_type) ptype, types); - /* Define the transfer functions. - TODO: Split them between READ and WRITE to allow further - optimizations, e.g. by using aliases? */ + /* Define the transfer functions. */ dt_parm_type = build_pointer_type (st_parameter[IOPARM_ptype_dt].type); @@ -313,32 +318,63 @@ get_identifier (PREFIX("transfer_integer")), ".wW", void_type_node, 3, dt_parm_type, pvoid_type_node, gfc_int4_type_node); + iocall[IOCALL_X_INTEGER_WRITE] = gfc_build_library_function_decl_with_spec ( + get_identifier (PREFIX("transfer_integer_write")), ".rW", + void_type_node, 3, dt_parm_type, pvoid_type_node, gfc_int4_type_node); + iocall[IOCALL_X_LOGICAL] = gfc_build_library_function_decl_with_spec ( get_identifier (PREFIX("transfer_logical")), ".wW", void_type_node, 3, dt_parm_type, pvoid_type_node, gfc_int4_type_node); + iocall[IOCALL_X_LOGICAL_WRITE] = gfc_build_library_function_decl_with_spec ( + get_identifier (PREFIX("transfer_logical_write")), ".rW", + void_type_node, 3, dt_parm_type, pvoid_type_node, gfc_int4_type_node); + iocall[IOCALL_X_CHARACTER] = gfc_build_library_function_decl_with_spec ( get_identifier (PREFIX("transfer_character")), ".wW", void_type_node, 3, dt_parm_type, pvoid_type_node, gfc_int4_type_node); + iocall[IOCALL_X_CHARACTER_WRITE] = gfc_build_library_function_decl_with_spec ( + get_identifier (PREFIX("transfer_character_write")), ".rW", + void_type_node, 3, dt_parm_type, pvoid_type_node, gfc_int4_type_node); + iocall[IOCALL_X_CHARACTER_WIDE] = gfc_build_library_function_decl_with_spec ( get_identifier (PREFIX("transfer_character_wide")), ".wW", void_type_node, 4, dt_parm_type, pvoid_type_node, gfc_charlen_type_node, gfc_int4_type_node); + iocall[IOCALL_X_CHARACTER_WIDE_WRITE] = + gfc_build_library_function_decl_with_spec ( + get_identifier (PREFIX("transfer_character_wide_write")), ".rW", + void_type_node, 4, dt_parm_type, pvoid_type_node, + gfc_charlen_type_node, gfc_int4_type_node); + iocall[IOCALL_X_REAL] = gfc_build_library_function_decl_with_spec ( get_identifier (PREFIX("transfer_real")), ".wW", void_type_node, 3, dt_parm_type, pvoid_type_node, gfc_int4_type_node); + iocall[IOCALL_X_REAL_WRITE] = gfc_build_library_function_decl_with_spec ( + get_identifier (PREFIX("transfer_real_write")), ".rW", + void_type_node, 3, dt_parm_type, pvoid_type_node, gfc_int4_type_node); + iocall[IOCALL_X_COMPLEX] = gfc_build_library_function_decl_with_spec ( get_identifier (PREFIX("transfer_complex")), ".wW", void_type_node, 3, dt_parm_type, pvoid_type_node, gfc_int4_type_node); + iocall[IOCALL_X_COMPLEX_WRITE] = gfc_build_library_function_decl_with_spec ( + get_identifier (PREFIX("transfer_complex_write")), ".rW", + void_type_node, 3, dt_parm_type, pvoid_type_node, gfc_int4_type_node); + iocall[IOCALL_X_ARRAY] = gfc_build_library_function_decl_with_spec ( get_identifier (PREFIX("transfer_array")), ".wW", void_type_node, 4, dt_parm_type, pvoid_type_node, integer_type_node, gfc_charlen_type_node); + iocall[IOCALL_X_ARRAY_WRITE] = gfc_build_library_function_decl_with_spec ( + get_identifier (PREFIX("transfer_array_write")), ".rW", + void_type_node, 4, dt_parm_type, pvoid_type_node, + integer_type_node, gfc_charlen_type_node); + /* Library entry points */ iocall[IOCALL_READ] = gfc_build_library_function_decl_with_spec ( @@ -2037,22 +2073,38 @@ { case BT_INTEGER: arg2 = build_int_cst (NULL_TREE, kind); - function = iocall[IOCALL_X_INTEGER]; + if (last_dt == READ) + function = iocall[IOCALL_X_INTEGER]; + else + function = iocall[IOCALL_X_INTEGER_WRITE]; + break; case BT_REAL: arg2 = build_int_cst (NULL_TREE, kind); - function = iocall[IOCALL_X_REAL]; + if (last_dt == READ) + function = iocall[IOCALL_X_REAL]; + else + function = iocall[IOCALL_X_REAL_WRITE]; + break; case BT_COMPLEX: arg2 = build_int_cst (NULL_TREE, kind); - function = iocall[IOCALL_X_COMPLEX]; + if (last_dt == READ) + function = iocall[IOCALL_X_COMPLEX]; + else + function = iocall[IOCALL_X_COMPLEX_WRITE]; + break; case BT_LOGICAL: arg2 = build_int_cst (NULL_TREE, kind); - function = iocall[IOCALL_X_LOGICAL]; + if (last_dt == READ) + function = iocall[IOCALL_X_LOGICAL]; + else + function = iocall[IOCALL_X_LOGICAL_WRITE]; + break; case BT_CHARACTER: @@ -2069,7 +2121,11 @@ arg2 = fold_convert (gfc_charlen_type_node, arg2); } arg3 = build_int_cst (NULL_TREE, kind); - function = iocall[IOCALL_X_CHARACTER_WIDE]; + if (last_dt == READ) + function = iocall[IOCALL_X_CHARACTER_WIDE]; + else + function = iocall[IOCALL_X_CHARACTER_WIDE_WRITE]; + tmp = gfc_build_addr_expr (NULL_TREE, dt_parm); tmp = build_call_expr_loc (input_location, function, 4, tmp, addr_expr, arg2, arg3); @@ -2088,7 +2144,11 @@ gcc_assert (TREE_CODE (TREE_TYPE (tmp)) == ARRAY_TYPE); arg2 = TYPE_MAX_VALUE (TYPE_DOMAIN (TREE_TYPE (tmp))); } - function = iocall[IOCALL_X_CHARACTER]; + if (last_dt == READ) + function = iocall[IOCALL_X_CHARACTER]; + else + function = iocall[IOCALL_X_CHARACTER_WRITE]; + break; case BT_DERIVED: @@ -2139,7 +2199,7 @@ static void transfer_array_desc (gfc_se * se, gfc_typespec * ts, tree addr_expr) { - tree tmp, charlen_arg, kind_arg; + tree tmp, charlen_arg, kind_arg, io_call; if (ts->type == BT_CHARACTER) charlen_arg = se->string_length; @@ -2149,8 +2209,13 @@ kind_arg = build_int_cst (NULL_TREE, ts->kind); tmp = gfc_build_addr_expr (NULL_TREE, dt_parm); + if (last_dt == READ) + io_call = iocall[IOCALL_X_ARRAY]; + else + io_call = iocall[IOCALL_X_ARRAY_WRITE]; + tmp = build_call_expr_loc (UNKNOWN_LOCATION, - iocall[IOCALL_X_ARRAY], 4, + io_call, 4, tmp, addr_expr, kind_arg, charlen_arg); gfc_add_expr_to_block (&se->pre, tmp); gfc_add_block_to_block (&se->pre, &se->post);