diff mbox

RFA: speeding up dg-extract-results.sh

Message ID 877g4jvn0f.fsf@talisman.default
State New
Headers show

Commit Message

Richard Sandiford June 14, 2014, 9:49 a.m. UTC
Bernd Schmidt <bernds@codesourcery.com> writes:
> On 05/25/2014 11:35 AM, Richard Sandiford wrote:
>> Bernd Schmidt <bernds_cb1@t-online.de> writes:
>>> On 02/13/2014 10:18 AM, Richard Sandiford wrote:
>>>> contrib/
>>>> 	* dg-extract-results.py: New file.
>>>> 	* dg-extract-results.sh: Use it if the environment seems suitable.
>>>
>>> I'm now seeing the following:
>>>
>>> Traceback (most recent call last):
>>>     File "../../git/gcc/../contrib/dg-extract-results.py", line 581, in
>>> <module>
>>>       Prog().main()
>>>     File "../../git/gcc/../contrib/dg-extract-results.py", line 569, in main
>>>       self.output_tool (self.runs[name])
>>>     File "../../git/gcc/../contrib/dg-extract-results.py", line 534, in
>>> output_tool
>>>       self.output_variation (tool, variation)
>>>     File "../../git/gcc/../contrib/dg-extract-results.py", line 483, in
>>> output_variation
>>>       for harness in sorted (variation.harnesses.values()):
>>> TypeError: unorderable types: HarnessRun() < HarnessRun()
>>>
>>> $ /usr/bin/python --version
>>> Python 3.3.3
>>
>> Sorry, thought I'd tested it with python3, but obviously not.
>> I've applied the fix below after testing that it didn't change the
>> output for python 2.6 and python 2.7.
>
> I've recently been trying to add ada to my set of tested languages, and 
> I now encounter the following:
>
> Traceback (most recent call last):
>    File "../../git/gcc/../contrib/dg-extract-results.py", line 580, in 
> <module>
>      Prog().main()
>    File "../../git/gcc/../contrib/dg-extract-results.py", line 544, in main
>      self.parse_file (filename, file)
>    File "../../git/gcc/../contrib/dg-extract-results.py", line 427, in 
> parse_file
>      self.parse_acats_run (filename, file)
>    File "../../git/gcc/../contrib/dg-extract-results.py", line 342, in 
> parse_acats_run
>      self.parse_run (filename, file, tool, variation, 1)
>    File "../../git/gcc/../contrib/dg-extract-results.py", line 242, in 
> parse_run
>      line = file.readline()
>    File "/usr/lib64/python3.3/codecs.py", line 301, in decode
>      (result, consumed) = self._buffer_decode(data, self.errors, final)
> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 
> 5227: invalid continuation byte

Bah.  I'm seriously beginning to regret choosing Python for this.
Getting code to work with both Python 2 and Python 3 is like the bad
old days of getting stuff to work with both K&R and ANSI C.

I see the weird character is coming from C250002, which is specifically
testing that some arbitrary byte above 127 can be used in identifier names.
The actual choice of byte or its meaning in the locale encoding doesn't
seem to be relevant.

I committed the fix below after checking it against an Ada log for
both python2 and python3.

Thanks,
Richard


contrib/
	* dg-extract-results.py: For Python 3, force sys.stdout to handle
	surrogate escape sequences.
	(safe_open): New function.
	(output_segment, main): Use it.
diff mbox

Patch

Index: contrib/dg-extract-results.py
===================================================================
--- contrib/dg-extract-results.py	2014-06-14 10:17:41.698438403 +0100
+++ contrib/dg-extract-results.py	2014-06-14 10:45:12.586546139 +0100
@@ -10,6 +10,7 @@ 
 import sys
 import getopt
 import re
+import io
 from datetime import datetime
 from operator import attrgetter
 
@@ -21,6 +22,18 @@  strict = False
 # they should keep the original order.
 sort_logs = True
 
+# A version of open() that is safe against whatever binary output
+# might be added to the log.
+def safe_open (self, filename):
+    if sys.version_info >= (3, 0):
+        return open (filename, 'r', errors = 'surrogateescape')
+    return open (filename, 'r')
+
+# Force stdout to handle escape sequences from a safe_open file.
+if sys.version_info >= (3, 0):
+    sys.stdout = io.TextIOWrapper (sys.stdout.buffer,
+                                   errors = 'surrogateescape')
+
 class Named:
     def __init__ (self, name):
         self.name = name
@@ -457,7 +470,7 @@  class Prog:
 
     # Output a segment of text.
     def output_segment (self, segment):
-        with open (segment.filename, 'r') as file:
+        with safe_open (segment.filename) as file:
             file.seek (segment.start)
             for i in range (segment.lines):
                 sys.stdout.write (file.readline())
@@ -540,7 +553,7 @@  class Prog:
         try:
             # Parse the input files.
             for filename in self.files:
-                with open (filename, 'r') as file:
+                with safe_open (filename) as file:
                     self.parse_file (filename, file)
 
             # Decide what to output.