From patchwork Sat Jun 14 09:49:52 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 359761 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 016B61400AB for ; Sat, 14 Jun 2014 19:50:08 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:references:date:in-reply-to:message-id :mime-version:content-type; q=dns; s=default; b=whwkmjOKvl88PLHF FYdVQmGZ5GPnTqBtbR3AZDzieEgzEvSZJiRXXiwMi2yJqXsex/AFhNL9mSw1RwO9 6stmS/qOU7TkB6IEvz5R47vBpZkDWCtUbsJ5VjoPM+mKzm9PdpWgDcWsN5TC4UtW RiVKu14jhjc4E/y7ESjZkQzGoeY= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:references:date:in-reply-to:message-id :mime-version:content-type; s=default; bh=517/Ul9lKMTyYsEwIUP95L m/ZU4=; b=Kwu5oIUdsZDZx8m6cb1b3/U+koqP5SLCdORmeF6lJetG4zaP/ekzWq tg0kQvrc8jsDcVQ7017B/dI6+YuifaBWC3L/4AbuOZlo/EI/7lDJayf2WV5xBtqS vGiSkY2XD0r/rRk2uqC180oXKTFmoWCpGcP5pTMNUdQnCqeNf15zI= Received: (qmail 16710 invoked by alias); 14 Jun 2014 09:50:00 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 16697 invoked by uid 89); 14 Jun 2014 09:49:58 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.3 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-we0-f182.google.com Received: from mail-we0-f182.google.com (HELO mail-we0-f182.google.com) (74.125.82.182) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Sat, 14 Jun 2014 09:49:57 +0000 Received: by mail-we0-f182.google.com with SMTP id q59so3688289wes.41 for ; Sat, 14 Jun 2014 02:49:53 -0700 (PDT) X-Received: by 10.180.212.77 with SMTP id ni13mr11425680wic.5.1402739393829; Sat, 14 Jun 2014 02:49:53 -0700 (PDT) Received: from localhost ([2.26.169.52]) by mx.google.com with ESMTPSA id cz4sm1916151wib.23.2014.06.14.02.49.52 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 14 Jun 2014 02:49:53 -0700 (PDT) From: Richard Sandiford To: Bernd Schmidt Mail-Followup-To: Bernd Schmidt , , rdsandiford@googlemail.com Cc: Subject: Re: RFA: speeding up dg-extract-results.sh References: <878utfe5g0.fsf@talisman.default> <53807FD2.6060805@t-online.de> <87tx8efbff.fsf@talisman.default> <5399CD05.1040406@codesourcery.com> Date: Sat, 14 Jun 2014 10:49:52 +0100 In-Reply-To: <5399CD05.1040406@codesourcery.com> (Bernd Schmidt's message of "Thu, 12 Jun 2014 17:53:41 +0200") Message-ID: <877g4jvn0f.fsf@talisman.default> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 Bernd Schmidt writes: > On 05/25/2014 11:35 AM, Richard Sandiford wrote: >> Bernd Schmidt writes: >>> On 02/13/2014 10:18 AM, Richard Sandiford wrote: >>>> contrib/ >>>> * dg-extract-results.py: New file. >>>> * dg-extract-results.sh: Use it if the environment seems suitable. >>> >>> I'm now seeing the following: >>> >>> Traceback (most recent call last): >>> File "../../git/gcc/../contrib/dg-extract-results.py", line 581, in >>> >>> Prog().main() >>> File "../../git/gcc/../contrib/dg-extract-results.py", line 569, in main >>> self.output_tool (self.runs[name]) >>> File "../../git/gcc/../contrib/dg-extract-results.py", line 534, in >>> output_tool >>> self.output_variation (tool, variation) >>> File "../../git/gcc/../contrib/dg-extract-results.py", line 483, in >>> output_variation >>> for harness in sorted (variation.harnesses.values()): >>> TypeError: unorderable types: HarnessRun() < HarnessRun() >>> >>> $ /usr/bin/python --version >>> Python 3.3.3 >> >> Sorry, thought I'd tested it with python3, but obviously not. >> I've applied the fix below after testing that it didn't change the >> output for python 2.6 and python 2.7. > > I've recently been trying to add ada to my set of tested languages, and > I now encounter the following: > > Traceback (most recent call last): > File "../../git/gcc/../contrib/dg-extract-results.py", line 580, in > > Prog().main() > File "../../git/gcc/../contrib/dg-extract-results.py", line 544, in main > self.parse_file (filename, file) > File "../../git/gcc/../contrib/dg-extract-results.py", line 427, in > parse_file > self.parse_acats_run (filename, file) > File "../../git/gcc/../contrib/dg-extract-results.py", line 342, in > parse_acats_run > self.parse_run (filename, file, tool, variation, 1) > File "../../git/gcc/../contrib/dg-extract-results.py", line 242, in > parse_run > line = file.readline() > File "/usr/lib64/python3.3/codecs.py", line 301, in decode > (result, consumed) = self._buffer_decode(data, self.errors, final) > UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position > 5227: invalid continuation byte Bah. I'm seriously beginning to regret choosing Python for this. Getting code to work with both Python 2 and Python 3 is like the bad old days of getting stuff to work with both K&R and ANSI C. I see the weird character is coming from C250002, which is specifically testing that some arbitrary byte above 127 can be used in identifier names. The actual choice of byte or its meaning in the locale encoding doesn't seem to be relevant. I committed the fix below after checking it against an Ada log for both python2 and python3. Thanks, Richard contrib/ * dg-extract-results.py: For Python 3, force sys.stdout to handle surrogate escape sequences. (safe_open): New function. (output_segment, main): Use it. Index: contrib/dg-extract-results.py =================================================================== --- contrib/dg-extract-results.py 2014-06-14 10:17:41.698438403 +0100 +++ contrib/dg-extract-results.py 2014-06-14 10:45:12.586546139 +0100 @@ -10,6 +10,7 @@ import sys import getopt import re +import io from datetime import datetime from operator import attrgetter @@ -21,6 +22,18 @@ strict = False # they should keep the original order. sort_logs = True +# A version of open() that is safe against whatever binary output +# might be added to the log. +def safe_open (self, filename): + if sys.version_info >= (3, 0): + return open (filename, 'r', errors = 'surrogateescape') + return open (filename, 'r') + +# Force stdout to handle escape sequences from a safe_open file. +if sys.version_info >= (3, 0): + sys.stdout = io.TextIOWrapper (sys.stdout.buffer, + errors = 'surrogateescape') + class Named: def __init__ (self, name): self.name = name @@ -457,7 +470,7 @@ class Prog: # Output a segment of text. def output_segment (self, segment): - with open (segment.filename, 'r') as file: + with safe_open (segment.filename) as file: file.seek (segment.start) for i in range (segment.lines): sys.stdout.write (file.readline()) @@ -540,7 +553,7 @@ class Prog: try: # Parse the input files. for filename in self.files: - with open (filename, 'r') as file: + with safe_open (filename) as file: self.parse_file (filename, file) # Decide what to output.