diff mbox

[v4,BZ,#16141] strptime %z: fix rounding, extend range to +/-9959

Message ID 1439339237-22204-1-git-send-email-james@loowit.net
State New
Headers show

Commit Message

James Perkins Aug. 12, 2015, 12:27 a.m. UTC
Fixes [BZ #16141] strptime %z offset restriction.

Topic: strptime supports a %z input field descriptor, which parses a
time zone offset from UTC time into the broken-out time field tm_gmtoff.

Problems:

1) In the current implementation, the minutes portion calculation is
correct only for minutes evenly divisible by 3. This is because the
minutes value is converted to decimal time, but inadequate precision
leads to rounding which calculates results that are too low for
some values.

For example, due to rounding, a +1159 offset string results in an
incorrect tm_gmtoff of 43128 (== 11 * 3600 + 58.8 * 60) seconds,
instead of 43140 (== 11 * 3600 + 59 * 60) seconds. In contrast,
a +1157 offset (minutes divisible by 3) does not cause the bug,
and results in a correct tm_gmtoff of 43020.

2) strptime's %z specifier will not parse time offsets less than
-1200 or greater than +1200, or if only hour digits are present, less
than -12 or greater than +12. It will return NULL for offsets outside
that range. These limits do not meet historical and modern use cases:

  * Present day exceeds the +1200 limit:
    - Pacific/Auckland (New Zealand) summer time is +1300.
    - Pacific/Kiritimati (Christmas Island) is +1400.
    - Pacific/Apia (Samoa) summer time is +1400.
  * Historical offsets exceeded +1500/-1500.
  * POSIX supports -2459 to +2559.
  * Offsets up to +/-9959 may occasionally be useful.
  * Paul Eggert's notes provide additional detail:
    - https://sourceware.org/ml/libc-alpha/2014-12/msg00068.html
    - https://sourceware.org/ml/libc-alpha/2014-12/msg00072.html

3) tst-strptime2, part of the 'make check' test suite, does not test
for the above problems.

Corrective actions:

1) In time/strptime_l.c, calculate the offset from the hour and
minute portions directly, without the rounding errors introduced by
decimal time.

2) Remove the +/-1200 range limit, permitting strptime to parse offsets
from -9959 through +9959.

3) Add zone offset values to time/tst-strptime2.c.

  * Test minutes evenly divisible by three (+1157) and not evenly
    divisible by three (+1158 and +1159).
  * Test offsets near the old and new range limits (-1201, -1330, -2459,
    -2500, -99, -9959, +1201, +1330, +1400, +1401, +2559, +2600, +99,
    and +9959)

The revised strptime passes all old and new tst-strptime2 tests.

James

2015-08-11  James Perkins  <james@loowit.net>

	[BZ #16141]
	* time/strptime_l.c (__strptime_internal): Fix %z minutes
	calculation, removing incorrect decimal time rounding, so that
	all minute values result in a valid seconds value.
	* time/strptime_l.c (__strptime_internal): Extend %z time zone
	offset range limits to UTC-99:59 through UTC+99:59 to parse
	current and historical use cases.
	* time/strptime2.c (tests): Modify and add tests for the
	strptime %z input field descriptor, specifically conversion of
	minutes to seconds and validating an offset range of -9959 to
	+9959.

Signed-off-by: James Perkins <james@loowit.net>
---
 time/strptime_l.c    | 12 +++---------
 time/tst-strptime2.c | 21 +++++++++++++++++++--
 2 files changed, 22 insertions(+), 11 deletions(-)

Comments

Carlos O'Donell Aug. 13, 2015, 4:45 p.m. UTC | #1
On 08/11/2015 08:27 PM, James Perkins wrote:
> Fixes [BZ #16141] strptime %z offset restriction.
> 
> Topic: strptime supports a %z input field descriptor, which parses a
> time zone offset from UTC time into the broken-out time field tm_gmtoff.

Overall the patch looks great. The high-level idea is correct.

The architecture of the changes is almost right, I think though that we
need a more thorough test case that covers every value in +HHMM to -HHMM
(skipping the invalid ones) to make sure we never ever break this again
when someone tries to optimize the integer math.

After we fix the testing we can do a thorough polish of any final details.

> Problems:
> 
> 1) In the current implementation, the minutes portion calculation is
> correct only for minutes evenly divisible by 3. This is because the
> minutes value is converted to decimal time, but inadequate precision
> leads to rounding which calculates results that are too low for
> some values.

Agreed, the 50/30 value is not sufficiently accurate.

> For example, due to rounding, a +1159 offset string results in an
> incorrect tm_gmtoff of 43128 (== 11 * 3600 + 58.8 * 60) seconds,
> instead of 43140 (== 11 * 3600 + 59 * 60) seconds. In contrast,
> a +1157 offset (minutes divisible by 3) does not cause the bug,
> and results in a correct tm_gmtoff of 43020.

Agreed.

> 2) strptime's %z specifier will not parse time offsets less than
> -1200 or greater than +1200, or if only hour digits are present, less
> than -12 or greater than +12. It will return NULL for offsets outside
> that range. These limits do not meet historical and modern use cases:
> 
>   * Present day exceeds the +1200 limit:
>     - Pacific/Auckland (New Zealand) summer time is +1300.
>     - Pacific/Kiritimati (Christmas Island) is +1400.
>     - Pacific/Apia (Samoa) summer time is +1400.

Agreed.

>   * Historical offsets exceeded +1500/-1500.

Agreed.

>   * POSIX supports -2459 to +2559.

Agreed, but note that POSIX has no %z support in strptime, and this
is a glibc extension to support %z in strptime (like it does in strftime).

>   * Offsets up to +/-9959 may occasionally be useful.

Agreed.

>   * Paul Eggert's notes provide additional detail:
>     - https://sourceware.org/ml/libc-alpha/2014-12/msg00068.html
>     - https://sourceware.org/ml/libc-alpha/2014-12/msg00072.html

OK.

> 
> 3) tst-strptime2, part of the 'make check' test suite, does not test
> for the above problems.

As mentioned above, this needs to be more thorough.

> Corrective actions:
> 
> 1) In time/strptime_l.c, calculate the offset from the hour and
> minute portions directly, without the rounding errors introduced by
> decimal time.

OK.

> 2) Remove the +/-1200 range limit, permitting strptime to parse offsets
> from -9959 through +9959.

OK.

> 3) Add zone offset values to time/tst-strptime2.c.
> 
>   * Test minutes evenly divisible by three (+1157) and not evenly
>     divisible by three (+1158 and +1159).
>   * Test offsets near the old and new range limits (-1201, -1330, -2459,
>     -2500, -99, -9959, +1201, +1330, +1400, +1401, +2559, +2600, +99,
>     and +9959)

Not enough IMO. We might break this again in the future. I'd like to see
a thorough iteration of the values.

> The revised strptime passes all old and new tst-strptime2 tests.

Good thanks for clarifying that.

> 2015-08-11  James Perkins  <james@loowit.net>
> 
> 	[BZ #16141]
> 	* time/strptime_l.c (__strptime_internal): Fix %z minutes
> 	calculation, removing incorrect decimal time rounding, so that
> 	all minute values result in a valid seconds value.
> 	* time/strptime_l.c (__strptime_internal): Extend %z time zone
> 	offset range limits to UTC-99:59 through UTC+99:59 to parse
> 	current and historical use cases.
> 	* time/strptime2.c (tests): Modify and add tests for the
> 	strptime %z input field descriptor, specifically conversion of
> 	minutes to seconds and validating an offset range of -9959 to
> 	+9959.
> 
> Signed-off-by: James Perkins <james@loowit.net>
> ---
>  time/strptime_l.c    | 12 +++---------
>  time/tst-strptime2.c | 21 +++++++++++++++++++--
>  2 files changed, 22 insertions(+), 11 deletions(-)
> 
> diff --git a/time/strptime_l.c b/time/strptime_l.c
> index 5640cce..4203ad8 100644
> --- a/time/strptime_l.c
> +++ b/time/strptime_l.c
> @@ -770,16 +770,10 @@ __strptime_internal (rp, fmt, tmp, statep LOCALE_PARAM)
>  	    else if (n != 4)
>  	      /* Only two or four digits recognized.  */
>  	      return NULL;
> -	    else
> -	      {
> -		/* We have to convert the minutes into decimal.  */
> -		if (val % 100 >= 60)
> -		  return NULL;
> -		val = (val / 100) * 100 + ((val % 100) * 50) / 30;

OK, I agree that 50/30 does result in a rounding error.

> -	      }
> -	    if (val > 1200)
> +	    else if (val % 100 >= 60)
> +	      /* Minutes valid range is 0 through 59.  */
>  	      return NULL;

OK.

> -	    tm->tm_gmtoff = (val * 3600) / 100;
> +	    tm->tm_gmtoff = (val / 100) * 3600 + (val % 100) * 60;

OK.

I created a small test case to isolate this code and validated it for
all values that were valid and your change matches the expected values
computed using double precision.

>  	    if (neg)
>  	      tm->tm_gmtoff = -tm->tm_gmtoff;
>  	  }
> diff --git a/time/tst-strptime2.c b/time/tst-strptime2.c
> index a08e6d7..4db8321 100644
> --- a/time/tst-strptime2.c
> +++ b/time/tst-strptime2.c
> @@ -17,8 +17,25 @@ static const struct
>      { "1113472456 -1030", -37800 },
>      { "1113472456 +0030", 1800 },
>      { "1113472456 -0030", -1800 },
> -    { "1113472456 -1330", LONG_MAX },
> -    { "1113472456 +1330", LONG_MAX },
> +    { "1113472456 +1157", 43020 },
> +    { "1113472456 +1158", 43080 },
> +    { "1113472456 +1159", 43140 },
> +    { "1113472456 +1200", 43200 },
> +    { "1113472456 -1200", -43200 },
> +    { "1113472456 +1201", 43260 },
> +    { "1113472456 -1201", -43260 },
> +    { "1113472456 +1330", 48600 },
> +    { "1113472456 -1330", -48600 },
> +    { "1113472456 +1400", 50400 },
> +    { "1113472456 +1401", 50460 },
> +    { "1113472456 -2459", -89940 },
> +    { "1113472456 -2500", -90000 },
> +    { "1113472456 +2559", 93540 },
> +    { "1113472456 +2600", 93600 },
> +    { "1113472456 -99", -356400 },
> +    { "1113472456 +99", 356400 },
> +    { "1113472456 -9959", -359940 },
> +    { "1113472456 +9959", 359940 },
>      { "1113472456 -1060", LONG_MAX },
>      { "1113472456 +1060", LONG_MAX },
>      { "1113472456  1030", LONG_MAX },

This test needs to be extended to cover all somehow. I don't want to
come back to this in 5 years when it breaks again.

Is that asking too much?

Cheers,
Carlos.
James Perkins Aug. 14, 2015, 12:47 a.m. UTC | #2
Thanks for the extensive and thorough review, Carlos.

On Thu, Aug 13, 2015 at 9:45 AM, Carlos O'Donell <carlos@redhat.com> wrote:
> On 08/11/2015 08:27 PM, James Perkins wrote:

> After we fix the testing we can do a thorough polish of any final details.

If you feel that base patch needs any final tweaks, I would very much
like to hear the details.

> As mentioned above, this needs to be more thorough.

I feel significant test enhancements are separate from the base patch
which fixes the problem in strptime and adds a few tests for it.

>> 3) Add zone offset values to time/tst-strptime2.c.
>>
>>   * Test minutes evenly divisible by three (+1157) and not evenly
>>     divisible by three (+1158 and +1159).
>>   * Test offsets near the old and new range limits (-1201, -1330, -2459,
>>     -2500, -99, -9959, +1201, +1330, +1400, +1401, +2559, +2600, +99,
>>     and +9959)
>
> Not enough IMO. We might break this again in the future. I'd like to see
> a thorough iteration of the values.

>>       * time/strptime2.c (tests): Modify and add tests for the
>>       strptime %z input field descriptor, specifically conversion of
>>       minutes to seconds and validating an offset range of -9959 to
>>       +9959.

I see a small omission in my strptime_v4 patch here where it's
actually time/tst-strptime2.c. So, that's one thing I need to fix.

> I created a small test case to isolate this code and validated it for
> all values that were valid and your change matches the expected values
> computed using double precision.

In exploring test enhancements, I didn't need to use floating point.

> This test needs to be extended to cover all somehow. I don't want to
> come back to this in 5 years when it breaks again.
>
> Is that asking too much?

I'll send a patch right along for an extended/iterative test case.

The test rework I did covers the entire range of digit values from
-9999 to +9999, also -999 to +999, -99 to +99, -9 to +9, and the lack
of any digit found. In short, it goes a bit beyond what you asked, but
I feel better testing an entire swath of input that users, valid and
invalid, that users may be expected to provide strptime.

Cheers,
James
Carlos O'Donell Aug. 14, 2015, 2:30 p.m. UTC | #3
On 08/13/2015 08:47 PM, James Perkins wrote:
> Thanks for the extensive and thorough review, Carlos.
> 
> On Thu, Aug 13, 2015 at 9:45 AM, Carlos O'Donell <carlos@redhat.com> wrote:
>> On 08/11/2015 08:27 PM, James Perkins wrote:
> 
>> After we fix the testing we can do a thorough polish of any final details.
> 
> If you feel that base patch needs any final tweaks, I would very much
> like to hear the details.

My apologies, I did a thorough review, so there shouldn't be anything left,
but I usually say the above as "boiler plate" since something might come
up and I don't want to say "It's perfect" until we get to that last version
before commit.

>> As mentioned above, this needs to be more thorough.
> 
> I feel significant test enhancements are separate from the base patch
> which fixes the problem in strptime and adds a few tests for it.

I see you extended the test case to cover -9999 to +9999 which exceeds
my expectations. I appreciate your professional response to this, and
I guarantee you the additional testing will reap benefits in the future.

Keep in mind that as the reviewer I see (a) an extension of the accepted
inputs and (b) only light coverage of those new inputs, where what I really
want is for this problem never to come back again.

I appreciate deeply your commitment to testing the entire range.

>>> 3) Add zone offset values to time/tst-strptime2.c.
>>>
>>>   * Test minutes evenly divisible by three (+1157) and not evenly
>>>     divisible by three (+1158 and +1159).
>>>   * Test offsets near the old and new range limits (-1201, -1330, -2459,
>>>     -2500, -99, -9959, +1201, +1330, +1400, +1401, +2559, +2600, +99,
>>>     and +9959)
>>
>> Not enough IMO. We might break this again in the future. I'd like to see
>> a thorough iteration of the values.
> 
>>>       * time/strptime2.c (tests): Modify and add tests for the
>>>       strptime %z input field descriptor, specifically conversion of
>>>       minutes to seconds and validating an offset range of -9959 to
>>>       +9959.
> 
> I see a small omission in my strptime_v4 patch here where it's
> actually time/tst-strptime2.c. So, that's one thing I need to fix.
> 
>> I created a small test case to isolate this code and validated it for
>> all values that were valid and your change matches the expected values
>> computed using double precision.
> 
> In exploring test enhancements, I didn't need to use floating point.

That's OK. I was simply giving some background to the independent testing
that I did on my end. As someone who has also worked on compilers I like
to use two distinct types since it avoids common mode failures.

>> This test needs to be extended to cover all somehow. I don't want to
>> come back to this in 5 years when it breaks again.
>>
>> Is that asking too much?
> 
> I'll send a patch right along for an extended/iterative test case.
> 
> The test rework I did covers the entire range of digit values from
> -9999 to +9999, also -999 to +999, -99 to +99, -9 to +9, and the lack
> of any digit found. In short, it goes a bit beyond what you asked, but
> I feel better testing an entire swath of input that users, valid and
> invalid, that users may be expected to provide strptime.

Exactly. Thank you. I'll review the test case, and then commit the
test case and the new fixes at the same time.

Cheers,
Carlos.
James Perkins Aug. 15, 2015, 6:33 p.m. UTC | #4
On Fri, Aug 14, 2015 at 7:30 AM, Carlos O'Donell <carlos@redhat.com> wrote:
> On 08/13/2015 08:47 PM, James Perkins wrote:
> > On Thu, Aug 13, 2015 at 9:45 AM, Carlos O'Donell <carlos@redhat.com> wrote:
> >> On 08/11/2015 08:27 PM, James Perkins wrote:
> >
> >> After we fix the testing we can do a thorough polish of any final details.
> >
> > If you feel that base patch needs any final tweaks, I would very much
> > like to hear the details.
>
> My apologies, I did a thorough review, so there shouldn't be anything left,
> but I usually say the above as "boiler plate" since something might come
> up and I don't want to say "It's perfect" until we get to that last version
> before commit.

Completely understandable.... I now feel confident that you mentioned
every thing you could think of at the time you reviewed it. It's always fine
with me if something was overlooked the first time around.

> >> As mentioned above, this needs to be more thorough.
> >
> > I feel significant test enhancements are separate from the base patch
> > which fixes the problem in strptime and adds a few tests for it.
>
> I see you extended the test case to cover -9999 to +9999 which exceeds
> my expectations. I appreciate your professional response to this, and
> I guarantee you the additional testing will reap benefits in the future.

I appreciate constructive feedback which may challenge my base
assumptions, so that I can make the most useful contribution (even if it is
a small contribution)... so this works for me. Thank you for the constructive
and professional review.

> Keep in mind that as the reviewer I see (a) an extension of the accepted
> inputs and (b) only light coverage of those new inputs, where what I really
> want is for this problem never to come back again.
>
> I appreciate deeply your commitment to testing the entire range.

I agree that I want to fix this code and see it stay fixed. Your suggestion
to iteratively test the input range was a good one. It seems to me the code
to test the gamut of valid and invalid inputs isn't significantly more work
than just the valid input set.

> > I see a small omission in my strptime_v4 patch here where it's
> > actually time/tst-strptime2.c. So, that's one thing I need to fix.

This issue is fixed in my V5 patchset (coming very soon).

> >> I created a small test case to isolate this code and validated it for
> >> all values that were valid and your change matches the expected values
> >> computed using double precision.
> >
> > In exploring test enhancements, I didn't need to use floating point.
>
> That's OK. I was simply giving some background to the independent testing
> that I did on my end. As someone who has also worked on compilers I like
> to use two distinct types since it avoids common mode failures.

Avoiding common mode failures is a good point. I appreciate hearing your
background. I'm somewhat naive with my test writing skills as well as
floating point precision, rounding modes, and fuzzy equality tests, or I
might apply those techniques.

In my defense, I come from the embedded operating system architecture
support direction and that makes me tend to avoid running thousands of
floating point operations on processors that have none (anemic MIPS4kc
CPUs, for example). I've waited hours for test results many times.

> > I feel better testing an entire swath of input that users, valid and
> > invalid, that users may be expected to provide strptime.
>
> Exactly. Thank you. I'll review the test case, and then commit the
> test case and the new fixes at the same time.

Great! I see after doing them that my tst-strptime2.c rework logically
must follow the V4 strptime fix plus test additions, so, indeed, you were
right to suggest from the start that the test rework goes logically with the
strptime fix.

I will be posting a V5 series of two patches. The first is not significantly
changed from the V4 strptime fix plus test additions. The second updates
the tst-strptime2.c changes that I proposed.

Cheers,
James
diff mbox

Patch

diff --git a/time/strptime_l.c b/time/strptime_l.c
index 5640cce..4203ad8 100644
--- a/time/strptime_l.c
+++ b/time/strptime_l.c
@@ -770,16 +770,10 @@  __strptime_internal (rp, fmt, tmp, statep LOCALE_PARAM)
 	    else if (n != 4)
 	      /* Only two or four digits recognized.  */
 	      return NULL;
-	    else
-	      {
-		/* We have to convert the minutes into decimal.  */
-		if (val % 100 >= 60)
-		  return NULL;
-		val = (val / 100) * 100 + ((val % 100) * 50) / 30;
-	      }
-	    if (val > 1200)
+	    else if (val % 100 >= 60)
+	      /* Minutes valid range is 0 through 59.  */
 	      return NULL;
-	    tm->tm_gmtoff = (val * 3600) / 100;
+	    tm->tm_gmtoff = (val / 100) * 3600 + (val % 100) * 60;
 	    if (neg)
 	      tm->tm_gmtoff = -tm->tm_gmtoff;
 	  }
diff --git a/time/tst-strptime2.c b/time/tst-strptime2.c
index a08e6d7..4db8321 100644
--- a/time/tst-strptime2.c
+++ b/time/tst-strptime2.c
@@ -17,8 +17,25 @@  static const struct
     { "1113472456 -1030", -37800 },
     { "1113472456 +0030", 1800 },
     { "1113472456 -0030", -1800 },
-    { "1113472456 -1330", LONG_MAX },
-    { "1113472456 +1330", LONG_MAX },
+    { "1113472456 +1157", 43020 },
+    { "1113472456 +1158", 43080 },
+    { "1113472456 +1159", 43140 },
+    { "1113472456 +1200", 43200 },
+    { "1113472456 -1200", -43200 },
+    { "1113472456 +1201", 43260 },
+    { "1113472456 -1201", -43260 },
+    { "1113472456 +1330", 48600 },
+    { "1113472456 -1330", -48600 },
+    { "1113472456 +1400", 50400 },
+    { "1113472456 +1401", 50460 },
+    { "1113472456 -2459", -89940 },
+    { "1113472456 -2500", -90000 },
+    { "1113472456 +2559", 93540 },
+    { "1113472456 +2600", 93600 },
+    { "1113472456 -99", -356400 },
+    { "1113472456 +99", 356400 },
+    { "1113472456 -9959", -359940 },
+    { "1113472456 +9959", 359940 },
     { "1113472456 -1060", LONG_MAX },
     { "1113472456 +1060", LONG_MAX },
     { "1113472456  1030", LONG_MAX },