Message ID | b656db0b-04da-2303-e07c-44f7237bfe47@charter.net |
---|---|
State | New |
Headers | show |
Series | [libgfortran] PR78351 comma not terminating READ of formatted input field | expand |
On Nov 03 2018, Jerry DeLisle <jvdelisle@charter.net> wrote: > + /* To support legacy code we have to scan the input string one byte > + at a time because we don't no where an early comma may be and the s/no/know/ Andreas.
On Sat, 3 Nov 2018 15:33:07 -0700 Jerry DeLisle <jvdelisle@charter.net> wrote: > diff --git a/libgfortran/io/transfer.c b/libgfortran/io/transfer.c > index 31198a3cc39..0d26101cef0 100644 > --- a/libgfortran/io/transfer.c > +++ b/libgfortran/io/transfer.c > @@ -260,22 +250,80 @@ read_sf_internal (st_parameter_dt *dtp, size_t *length) > sseek (dtp->u.p.current_unit->s, -1, SEEK_CUR); > } > > - lorig = *length; > - if (is_char4_unit(dtp)) > + /* To support legacy code we have to scan the input string one byte > + at a time because we don't no where an early comma may be and the As Andreas said s/no/know/ > + requested length could go passed the end of a comma shortened s/passed/past/ > + /* Get the first chracter of the string to establish the base s/chracter/character/ > + /* Now we scan the rest and exit deal with an end-of-file s/ exit// ?
On 11/4/18 1:51 AM, Bernhard Reutner-Fischer wrote: > On Sat, 3 Nov 2018 15:33:07 -0700 > Jerry DeLisle <jvdelisle@charter.net> wrote: > >> diff --git a/libgfortran/io/transfer.c b/libgfortran/io/transfer.c >> index 31198a3cc39..0d26101cef0 100644 >> --- a/libgfortran/io/transfer.c >> +++ b/libgfortran/io/transfer.c > >> @@ -260,22 +250,80 @@ read_sf_internal (st_parameter_dt *dtp, size_t *length) >> sseek (dtp->u.p.current_unit->s, -1, SEEK_CUR); >> } >> >> - lorig = *length; >> - if (is_char4_unit(dtp)) >> + /* To support legacy code we have to scan the input string one byte >> + at a time because we don't no where an early comma may be and the > > As Andreas said s/no/know/ > >> + requested length could go passed the end of a comma shortened > > s/passed/past/ > >> + /* Get the first chracter of the string to establish the base > > s/chracter/character/ > >> + /* Now we scan the rest and exit deal with an end-of-file > > s/ exit// ? > > > Fixed all my typos, thanks. Jerry
Hi Jerry! > Hi all, > > The attached patch adds code in read_sf_internal to handle early > termination of reads in the presence of comma's. This is to support > legacy codes which are not standard conforming as far as we can tell. > > The additions are executed only if -std=legacy is given at compile time. > It does not support kind=4 internal units since in legacy years there > should be no kind=4 internal units. > > I have provuded a simplified test case for various combinations of comma > embedded strings. > > This has been regression tested on x86_64-pc-linux-gnu. > > OK for trunk? Yes, OK. I like the fact that the old, slow behavior is restricted to -std=legacy :-) > This use to work way back in early versions so should probably go to 7 > and 8 branches. Opinions welcome. I agree, it should go in. Thanks a lot for the patch! Regards Thomas
diff --git a/libgfortran/io/transfer.c b/libgfortran/io/transfer.c index 31198a3cc39..0d26101cef0 100644 --- a/libgfortran/io/transfer.c +++ b/libgfortran/io/transfer.c @@ -241,16 +241,6 @@ read_sf_internal (st_parameter_dt *dtp, size_t *length) && dtp->u.p.current_unit->pad_status == PAD_NO) hit_eof (dtp); - /* If we have seen an eor previously, return a length of 0. The - caller is responsible for correctly padding the input field. */ - if (dtp->u.p.sf_seen_eor) - { - *length = 0; - /* Just return something that isn't a NULL pointer, otherwise the - caller thinks an error occurred. */ - return (char*) empty_string; - } - /* There are some cases with mixed DTIO where we have read a character and saved it in the last character buffer, so we need to backup. */ if (unlikely (dtp->u.p.current_unit->child_dtio > 0 && @@ -260,22 +250,80 @@ read_sf_internal (st_parameter_dt *dtp, size_t *length) sseek (dtp->u.p.current_unit->s, -1, SEEK_CUR); } - lorig = *length; - if (is_char4_unit(dtp)) + /* To support legacy code we have to scan the input string one byte + at a time because we don't no where an early comma may be and the + requested length could go passed the end of a comma shortened + string. We only do this if -std=legacy was given at compile + time. We also do not support this on kind=4 strings. */ + if (unlikely(compile_options.warn_std == 0)) // the slow legacy way. { - gfc_char4_t *p = (gfc_char4_t *) mem_alloc_r4 (dtp->u.p.current_unit->s, - length); - base = fbuf_alloc (dtp->u.p.current_unit, lorig); - for (size_t i = 0; i < *length; i++, p++) - base[i] = *p > 255 ? '?' : (unsigned char) *p; - } - else - base = mem_alloc_r (dtp->u.p.current_unit->s, length); + size_t n; + size_t tmp = 1; + char *q; + + /* If we have seen an eor previously, return a length of 0. The + caller is responsible for correctly padding the input field. */ + if (dtp->u.p.sf_seen_eor) + { + *length = 0; + /* Just return something that isn't a NULL pointer, otherwise the + caller thinks an error occurred. */ + return (char*) empty_string; + } + + /* Get the first chracter of the string to establish the base + address and check for comma or end-of-record condition. */ + base = mem_alloc_r (dtp->u.p.current_unit->s, &tmp); + if (tmp == 0) + { + dtp->u.p.sf_seen_eor = 1; + *length = 0; + return (char*) empty_string; + } + if (*base == ',') + { + dtp->u.p.current_unit->bytes_left--; + *length = 0; + return (char*) empty_string; + } - if (unlikely (lorig > *length)) + /* Now we scan the rest and exit deal with an end-of-file + condition or the comma. */ + for (n = 1; n < *length; n++) + { + q = mem_alloc_r (dtp->u.p.current_unit->s, &tmp); + if (tmp == 0) + { + hit_eof (dtp); + return NULL; + } + if (*q == ',') + { + dtp->u.p.current_unit->bytes_left -= n; + *length = n; + break; + } + } + } + else // the fast way { - hit_eof (dtp); - return NULL; + lorig = *length; + if (is_char4_unit(dtp)) + { + gfc_char4_t *p = (gfc_char4_t *) mem_alloc_r4 (dtp->u.p.current_unit->s, + length); + base = fbuf_alloc (dtp->u.p.current_unit, lorig); + for (size_t i = 0; i < *length; i++, p++) + base[i] = *p > 255 ? '?' : (unsigned char) *p; + } + else + base = mem_alloc_r (dtp->u.p.current_unit->s, length); + + if (unlikely (lorig > *length)) + { + hit_eof (dtp); + return NULL; + } } dtp->u.p.current_unit->bytes_left -= *length;