From patchwork Sat Nov 3 22:33:07 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jerry DeLisle X-Patchwork-Id: 992686 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-488950-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=charter.net Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="boC3XDww"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 42nYck4sQFzB4hj for ; Sun, 4 Nov 2018 09:33:24 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:message-id:date:mime-version:content-type; q=dns; s=default; b=KUWAH1K7qD/W2vhzkng/bUIURVl1J99b9QMIMaWHeNA6bjQra2 F4/PNi1Wet6CO/+hcsgCXwXizS32biEmMdG7RKT04v6df5NttETWIY4BrUS3vcfp T5UK8ijtP4zkyTYvH8IxsuS38ruyq7PZDcJUQZzmpGhufWSStUsf0xi6Q= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:message-id:date:mime-version:content-type; s= default; bh=tH7Tshw6AUdpeAYnAbyeUhK2d6s=; b=boC3XDwwXFJe/itRPTgs cULvxinLQ4taehS235q6WNKCcZgFVTc1it2G7FK36ah9/A4U+Jd6KqVemnmnjNyz Sk54qetlSC5KBhb6GENt/BMo2o1AbrRjvF0sFER8NQupMelcg11sRk8h+QCLZV9/ gzoQAKkc9qhfGA1wG0tN1to= Received: (qmail 16701 invoked by alias); 3 Nov 2018 22:33:15 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 16666 invoked by uid 89); 3 Nov 2018 22:33:13 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-24.3 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy=Hx-spam-relays-external:sk:2018110, H*RU:sk:2018110, H*r:InterMail, H*r:sk:2018110 X-HELO: mtaout002-public.msg.strl.va.charter.net Received: from mtaout002-public.msg.strl.va.charter.net (HELO mtaout002-public.msg.strl.va.charter.net) (68.114.190.27) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sat, 03 Nov 2018 22:33:10 +0000 Received: from impout003 ([68.114.189.18]) by mtaout002.msg.strl.va.charter.net (InterMail vM.9.00.023.01 201-2473-194) with ESMTP id <20181103223308.PSBG7360.mtaout002.msg.strl.va.charter.net@impout003>; Sat, 3 Nov 2018 17:33:08 -0500 Received: from [192.168.1.6] ([66.191.41.128]) by impout003 with charter.net id vaZ71y00G2lujD601aZ8pf; Sat, 03 Nov 2018 17:33:08 -0500 X-Auth-id: anZkZWxpc2xlQGNoYXJ0ZXIubmV0 To: "fortran@gcc.gnu.org" Cc: GCC Patches From: Jerry DeLisle Subject: [patch, libgfortran] PR78351 comma not terminating READ of formatted input field Message-ID: Date: Sat, 3 Nov 2018 15:33:07 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 MIME-Version: 1.0 Hi all, The attached patch adds code in read_sf_internal to handle early termination of reads in the presence of comma's. This is to support legacy codes which are not standard conforming as far as we can tell. The additions are executed only if -std=legacy is given at compile time. It does not support kind=4 internal units since in legacy years there should be no kind=4 internal units. I have provuded a simplified test case for various combinations of comma embedded strings. This has been regression tested on x86_64-pc-linux-gnu. OK for trunk? This use to work way back in early versions so should probably go to 7 and 8 branches. Opinions welcome. Regards, Jerry 2018-11-04 Jerry DeLisle * io/transfer.c (read_sf_internal): Add support for early comma termination of internal unit formatted reads. diff --git a/libgfortran/io/transfer.c b/libgfortran/io/transfer.c index 31198a3cc39..0d26101cef0 100644 --- a/libgfortran/io/transfer.c +++ b/libgfortran/io/transfer.c @@ -241,16 +241,6 @@ read_sf_internal (st_parameter_dt *dtp, size_t *length) && dtp->u.p.current_unit->pad_status == PAD_NO) hit_eof (dtp); - /* If we have seen an eor previously, return a length of 0. The - caller is responsible for correctly padding the input field. */ - if (dtp->u.p.sf_seen_eor) - { - *length = 0; - /* Just return something that isn't a NULL pointer, otherwise the - caller thinks an error occurred. */ - return (char*) empty_string; - } - /* There are some cases with mixed DTIO where we have read a character and saved it in the last character buffer, so we need to backup. */ if (unlikely (dtp->u.p.current_unit->child_dtio > 0 && @@ -260,22 +250,80 @@ read_sf_internal (st_parameter_dt *dtp, size_t *length) sseek (dtp->u.p.current_unit->s, -1, SEEK_CUR); } - lorig = *length; - if (is_char4_unit(dtp)) + /* To support legacy code we have to scan the input string one byte + at a time because we don't no where an early comma may be and the + requested length could go passed the end of a comma shortened + string. We only do this if -std=legacy was given at compile + time. We also do not support this on kind=4 strings. */ + if (unlikely(compile_options.warn_std == 0)) // the slow legacy way. { - gfc_char4_t *p = (gfc_char4_t *) mem_alloc_r4 (dtp->u.p.current_unit->s, - length); - base = fbuf_alloc (dtp->u.p.current_unit, lorig); - for (size_t i = 0; i < *length; i++, p++) - base[i] = *p > 255 ? '?' : (unsigned char) *p; - } - else - base = mem_alloc_r (dtp->u.p.current_unit->s, length); + size_t n; + size_t tmp = 1; + char *q; + + /* If we have seen an eor previously, return a length of 0. The + caller is responsible for correctly padding the input field. */ + if (dtp->u.p.sf_seen_eor) + { + *length = 0; + /* Just return something that isn't a NULL pointer, otherwise the + caller thinks an error occurred. */ + return (char*) empty_string; + } + + /* Get the first chracter of the string to establish the base + address and check for comma or end-of-record condition. */ + base = mem_alloc_r (dtp->u.p.current_unit->s, &tmp); + if (tmp == 0) + { + dtp->u.p.sf_seen_eor = 1; + *length = 0; + return (char*) empty_string; + } + if (*base == ',') + { + dtp->u.p.current_unit->bytes_left--; + *length = 0; + return (char*) empty_string; + } - if (unlikely (lorig > *length)) + /* Now we scan the rest and exit deal with an end-of-file + condition or the comma. */ + for (n = 1; n < *length; n++) + { + q = mem_alloc_r (dtp->u.p.current_unit->s, &tmp); + if (tmp == 0) + { + hit_eof (dtp); + return NULL; + } + if (*q == ',') + { + dtp->u.p.current_unit->bytes_left -= n; + *length = n; + break; + } + } + } + else // the fast way { - hit_eof (dtp); - return NULL; + lorig = *length; + if (is_char4_unit(dtp)) + { + gfc_char4_t *p = (gfc_char4_t *) mem_alloc_r4 (dtp->u.p.current_unit->s, + length); + base = fbuf_alloc (dtp->u.p.current_unit, lorig); + for (size_t i = 0; i < *length; i++, p++) + base[i] = *p > 255 ? '?' : (unsigned char) *p; + } + else + base = mem_alloc_r (dtp->u.p.current_unit->s, length); + + if (unlikely (lorig > *length)) + { + hit_eof (dtp); + return NULL; + } } dtp->u.p.current_unit->bytes_left -= *length;