diff mbox

[06/10] parser: better date parsing

Message ID 20170628074852.15254-7-dja@axtens.net
State Accepted
Headers show

Commit Message

Daniel Axtens June 28, 2017, 7:48 a.m. UTC
It turns out that there is a lot that can go wrong in parsing a
date. OverflowError, ValueError and OSError have all been observed.

If these go wrong, substitute the current datetime.

Signed-off-by: Daniel Axtens <dja@axtens.net>
---
 patchwork/parser.py | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

Comments

Andrew Donnellan June 28, 2017, 8:36 a.m. UTC | #1
On 28/06/17 17:48, Daniel Axtens wrote:
> It turns out that there is a lot that can go wrong in parsing a
> date. OverflowError, ValueError and OSError have all been observed.
>
> If these go wrong, substitute the current datetime.
>
> Signed-off-by: Daniel Axtens <dja@axtens.net>

You could merge all those into one except block.

Regardless,

Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

> ---
>  patchwork/parser.py | 25 ++++++++++++++++++++++++-
>  1 file changed, 24 insertions(+), 1 deletion(-)
>
> diff --git a/patchwork/parser.py b/patchwork/parser.py
> index 203e11584504..80450c2e4860 100644
> --- a/patchwork/parser.py
> +++ b/patchwork/parser.py
> @@ -344,10 +344,33 @@ def find_date(mail):
>      h = clean_header(mail.get('Date', ''))
>      if not h:
>          return datetime.datetime.utcnow()
> +
>      t = parsedate_tz(h)
>      if not t:
>          return datetime.datetime.utcnow()
> -    return datetime.datetime.utcfromtimestamp(mktime_tz(t))
> +
> +    try:
> +        d = datetime.datetime.utcfromtimestamp(mktime_tz(t))
> +    except OverflowError:
> +        # If you have a date like:
> +        # Date: Wed, 4 Jun 207777777777777777777714 17:50:46 0
> +        # then you can end up with:
> +        # OverflowError: Python int too large to convert to C long
> +        d = datetime.datetime.utcnow()
> +    except ValueError:
> +        # If you have a date like:
> +        # Date:, 11 Sep 2016 23:22:904070804030804 +0100
> +        # then you can end up with:
> +        # ValueError: year is out of range
> +        d = datetime.datetime.utcnow()
> +    except OSError:
> +        # If you have a date like:
> +        # Date:, 11 Sep 2016 407080403080105:04 +0100
> +        # then you can end up with (in py3)
> +        # OSError: [Errno 75] Value too large for defined data type
> +        d = datetime.datetime.utcnow()
> +
> +    return d
>
>
>  def find_headers(mail):
>
Daniel Axtens June 28, 2017, 2:02 p.m. UTC | #2
Andrew Donnellan <andrew.donnellan@au1.ibm.com> writes:

> On 28/06/17 17:48, Daniel Axtens wrote:
>> It turns out that there is a lot that can go wrong in parsing a
>> date. OverflowError, ValueError and OSError have all been observed.
>>
>> If these go wrong, substitute the current datetime.
>>
>> Signed-off-by: Daniel Axtens <dja@axtens.net>
>
> You could merge all those into one except block.

Yeah true. Stephen, if you want to do that at merge time that'd be
fine with me.

>
> Regardless,
>
> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
>
>> ---
>>  patchwork/parser.py | 25 ++++++++++++++++++++++++-
>>  1 file changed, 24 insertions(+), 1 deletion(-)
>>
>> diff --git a/patchwork/parser.py b/patchwork/parser.py
>> index 203e11584504..80450c2e4860 100644
>> --- a/patchwork/parser.py
>> +++ b/patchwork/parser.py
>> @@ -344,10 +344,33 @@ def find_date(mail):
>>      h = clean_header(mail.get('Date', ''))
>>      if not h:
>>          return datetime.datetime.utcnow()
>> +
>>      t = parsedate_tz(h)
>>      if not t:
>>          return datetime.datetime.utcnow()
>> -    return datetime.datetime.utcfromtimestamp(mktime_tz(t))
>> +
>> +    try:
>> +        d = datetime.datetime.utcfromtimestamp(mktime_tz(t))
>> +    except OverflowError:
>> +        # If you have a date like:
>> +        # Date: Wed, 4 Jun 207777777777777777777714 17:50:46 0
>> +        # then you can end up with:
>> +        # OverflowError: Python int too large to convert to C long
>> +        d = datetime.datetime.utcnow()
>> +    except ValueError:
>> +        # If you have a date like:
>> +        # Date:, 11 Sep 2016 23:22:904070804030804 +0100
>> +        # then you can end up with:
>> +        # ValueError: year is out of range
>> +        d = datetime.datetime.utcnow()
>> +    except OSError:
>> +        # If you have a date like:
>> +        # Date:, 11 Sep 2016 407080403080105:04 +0100
>> +        # then you can end up with (in py3)
>> +        # OSError: [Errno 75] Value too large for defined data type
>> +        d = datetime.datetime.utcnow()
>> +
>> +    return d
>>
>>
>>  def find_headers(mail):
>>
>
> -- 
> Andrew Donnellan              OzLabs, ADL Canberra
> andrew.donnellan@au1.ibm.com  IBM Australia Limited
Stephen Finucane June 28, 2017, 8:14 p.m. UTC | #3
On Wed, 2017-06-28 at 17:48 +1000, Daniel Axtens wrote:
> It turns out that there is a lot that can go wrong in parsing a
> date. OverflowError, ValueError and OSError have all been observed.
> 
> If these go wrong, substitute the current datetime.
> 
> Signed-off-by: Daniel Axtens <dja@axtens.net>

Reviewed-by: Stephen Finucane <stephen@that.guru>

...and applied with one change.

> ---
>  patchwork/parser.py | 25 ++++++++++++++++++++++++-
>  1 file changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/patchwork/parser.py b/patchwork/parser.py
> index 203e11584504..80450c2e4860 100644
> --- a/patchwork/parser.py
> +++ b/patchwork/parser.py
> @@ -344,10 +344,33 @@ def find_date(mail):
>      h = clean_header(mail.get('Date', ''))
>      if not h:
>          return datetime.datetime.utcnow()
> +
>      t = parsedate_tz(h)
>      if not t:
>          return datetime.datetime.utcnow()
> -    return datetime.datetime.utcfromtimestamp(mktime_tz(t))
> +
> +    try:
> +        d = datetime.datetime.utcfromtimestamp(mktime_tz(t))
> +    except OverflowError:
> +        # If you have a date like:
> +        # Date: Wed, 4 Jun 207777777777777777777714 17:50:46 0
> +        # then you can end up with:
> +        # OverflowError: Python int too large to convert to C long
> +        d = datetime.datetime.utcnow()
> +    except ValueError:
> +        # If you have a date like:
> +        # Date:, 11 Sep 2016 23:22:904070804030804 +0100
> +        # then you can end up with:
> +        # ValueError: year is out of range
> +        d = datetime.datetime.utcnow()
> +    except OSError:
> +        # If you have a date like:
> +        # Date:, 11 Sep 2016 407080403080105:04 +0100
> +        # then you can end up with (in py3)
> +        # OSError: [Errno 75] Value too large for defined data type
> +        d = datetime.datetime.utcnow()

Merged all of these into one, as suggested by Andrew.

Stephen
diff mbox

Patch

diff --git a/patchwork/parser.py b/patchwork/parser.py
index 203e11584504..80450c2e4860 100644
--- a/patchwork/parser.py
+++ b/patchwork/parser.py
@@ -344,10 +344,33 @@  def find_date(mail):
     h = clean_header(mail.get('Date', ''))
     if not h:
         return datetime.datetime.utcnow()
+
     t = parsedate_tz(h)
     if not t:
         return datetime.datetime.utcnow()
-    return datetime.datetime.utcfromtimestamp(mktime_tz(t))
+
+    try:
+        d = datetime.datetime.utcfromtimestamp(mktime_tz(t))
+    except OverflowError:
+        # If you have a date like:
+        # Date: Wed, 4 Jun 207777777777777777777714 17:50:46 0
+        # then you can end up with:
+        # OverflowError: Python int too large to convert to C long
+        d = datetime.datetime.utcnow()
+    except ValueError:
+        # If you have a date like:
+        # Date:, 11 Sep 2016 23:22:904070804030804 +0100
+        # then you can end up with:
+        # ValueError: year is out of range
+        d = datetime.datetime.utcnow()
+    except OSError:
+        # If you have a date like:
+        # Date:, 11 Sep 2016 407080403080105:04 +0100
+        # then you can end up with (in py3)
+        # OSError: [Errno 75] Value too large for defined data type
+        d = datetime.datetime.utcnow()
+
+    return d
 
 
 def find_headers(mail):