diff mbox series

[gitdm,2/2] logparser.py: Try and be more robust with unicode handling

Message ID 20220707192215.509444-2-trini@konsulko.com
State Deferred
Delegated to: Tom Rini
Headers show
Series [gitdm,1/2] Merge branch 'master' into u-boot | expand

Commit Message

Tom Rini July 7, 2022, 7:22 p.m. UTC
Given the sometimes oddly formatted data that can come through when
removing code, we need to be as flexible as possible when handling it.
Set our encoding to unicode_escape and if we still run in to a problem,
it's likely going to be OK to ignore it.

Signed-off-by: Tom Rini <trini@konsulko.com>
---
I've emailed this to Jonathan Corbet as well as he's the upstream for
the project, and this does work for me.  But I'm not a python guru by
any means.  But trying to run the stats for v2022.04..v2022.07-rc6 blows
up in places otherwise.

 logparser.py | 1 +
 1 file changed, 1 insertion(+)

Comments

Simon Glass July 12, 2022, 10:58 a.m. UTC | #1
On Thu, 7 Jul 2022 at 13:22, Tom Rini <trini@konsulko.com> wrote:
>
> Given the sometimes oddly formatted data that can come through when
> removing code, we need to be as flexible as possible when handling it.
> Set our encoding to unicode_escape and if we still run in to a problem,
> it's likely going to be OK to ignore it.
>
> Signed-off-by: Tom Rini <trini@konsulko.com>
> ---
> I've emailed this to Jonathan Corbet as well as he's the upstream for
> the project, and this does work for me.  But I'm not a python guru by
> any means.  But trying to run the stats for v2022.04..v2022.07-rc6 blows
> up in places otherwise.
>
>  logparser.py | 1 +
>  1 file changed, 1 insertion(+)

Reviewed-by: Simon Glass <sjg@chromium.org>

BTW I have found that using binary is helpful in many places, the
convert to UTF-8 when displaying things.


>
> diff --git a/logparser.py b/logparser.py
> index efbc72f868eb..d5906e97689d 100644
> --- a/logparser.py
> +++ b/logparser.py
> @@ -37,6 +37,7 @@ class LogPatchSplitter:
>          self.fd = fd
>          self.buffer = None
>          self.patch = []
> +        sys.stdin.reconfigure(encoding='unicode_escape', errors='ignore')
>
>      def __iter__(self):
>          return self
> --
> 2.25.1
>
Tom Rini July 12, 2022, 11:05 a.m. UTC | #2
On Tue, Jul 12, 2022 at 04:58:46AM -0600, Simon Glass wrote:
> On Thu, 7 Jul 2022 at 13:22, Tom Rini <trini@konsulko.com> wrote:
> >
> > Given the sometimes oddly formatted data that can come through when
> > removing code, we need to be as flexible as possible when handling it.
> > Set our encoding to unicode_escape and if we still run in to a problem,
> > it's likely going to be OK to ignore it.
> >
> > Signed-off-by: Tom Rini <trini@konsulko.com>
> > ---
> > I've emailed this to Jonathan Corbet as well as he's the upstream for
> > the project, and this does work for me.  But I'm not a python guru by
> > any means.  But trying to run the stats for v2022.04..v2022.07-rc6 blows
> > up in places otherwise.
> >
> >  logparser.py | 1 +
> >  1 file changed, 1 insertion(+)
> 
> Reviewed-by: Simon Glass <sjg@chromium.org>
> 
> BTW I have found that using binary is helpful in many places, the
> convert to UTF-8 when displaying things.
> 
> 
> >
> > diff --git a/logparser.py b/logparser.py
> > index efbc72f868eb..d5906e97689d 100644
> > --- a/logparser.py
> > +++ b/logparser.py
> > @@ -37,6 +37,7 @@ class LogPatchSplitter:
> >          self.fd = fd
> >          self.buffer = None
> >          self.patch = []
> > +        sys.stdin.reconfigure(encoding='unicode_escape', errors='ignore')
> >
> >      def __iter__(self):
> >          return self

So, I followed up with Jonathan, but hadn't yet for  the list.
unicode_escape works, but then the results don't read right.  It turned
out utf-8 was the right encoding, but the first time I tried testing it
I had some other problem locally.
diff mbox series

Patch

diff --git a/logparser.py b/logparser.py
index efbc72f868eb..d5906e97689d 100644
--- a/logparser.py
+++ b/logparser.py
@@ -37,6 +37,7 @@  class LogPatchSplitter:
         self.fd = fd
         self.buffer = None
         self.patch = []
+        sys.stdin.reconfigure(encoding='unicode_escape', errors='ignore')
 
     def __iter__(self):
         return self