time: Avoid alignment gaps in __tzfile_read

Message ID 87o97t7rzm.fsf@oldenburg2.str.redhat.com
State New
Headers show
Series
  • time: Avoid alignment gaps in __tzfile_read
Related show

Commit Message

Florian Weimer Feb. 2, 2019, 10:19 p.m.
By ordering the suballocations by decreasing alignment, alignment
gaps can be avoided.

Also use __glibc_unlikely for reading the transitions and type
indexes.  In the 8-byte case, two reads are now needed because the
transitions and type indexes are no longer adjacent.

2019-02-02  Florian Weimer  <fweimer@redhat.com>

	* time/tzfile.c (__tzfile_read): Reorder suballocations to avoid
	alignment gaps.

Comments

Paul Eggert Feb. 3, 2019, 2 a.m. | #1
Florian Weimer wrote:
> By ordering the suballocations by decreasing alignment, alignment
> gaps can be avoided.
> 
> ... In the 8-byte case, two reads are now needed because the
> transitions and type indexes are no longer adjacent.

Is the idea to slightly decrease the amount of memory used, at the expense of a 
slight increase in CPU time because of the two fread calls? If so, I don't see 
the benefit of the change; usually we'd rather save CPU time even if a bit more 
memory is used.
Florian Weimer Feb. 3, 2019, 8:29 a.m. | #2
* Paul Eggert:

> Florian Weimer wrote:
>> By ordering the suballocations by decreasing alignment, alignment
>> gaps can be avoided.
>>
>> ... In the 8-byte case, two reads are now needed because the
>> transitions and type indexes are no longer adjacent.
>
> Is the idea to slightly decrease the amount of memory used, at the
> expense of a slight increase in CPU time because of the two fread
> calls?

Essentially yes.

> If so, I don't see the benefit of the change; usually we'd
> rather save CPU time even if a bit more memory is used.

It also simplifies the code because it removes one of the cases that
depends on trans_width.

Thanks,
Florian
Florian Weimer Feb. 3, 2019, 8:34 a.m. | #3
* Florian Weimer:

> * Paul Eggert:
>
>> Florian Weimer wrote:
>>> By ordering the suballocations by decreasing alignment, alignment
>>> gaps can be avoided.
>>>
>>> ... In the 8-byte case, two reads are now needed because the
>>> transitions and type indexes are no longer adjacent.
>>
>> Is the idea to slightly decrease the amount of memory used, at the
>> expense of a slight increase in CPU time because of the two fread
>> calls?
>
> Essentially yes.
>
>> If so, I don't see the benefit of the change; usually we'd
>> rather save CPU time even if a bit more memory is used.
>
> It also simplifies the code because it removes one of the cases that
> depends on trans_width.

Sorry, I misread what you wrote.  This is a long-term allocation, and
the reading of the file happens only once per allocation, so I do think
that the additional __fread_unlocked call is not a problem from a
performance perspective.

Thanks,
Florian
Paul Eggert Feb. 3, 2019, 5:06 p.m. | #4
Florian Weimer wrote:
> This is a long-term allocation, and
> the reading of the file happens only once per allocation, so I do think
> that the additional __fread_unlocked call is not a problem from a
> performance perspective.

The upstream code uses a single 'read' call, bypassing stdio entirely. It 
doesn't even stat the file; all it does is 'access' (if a relative filename, for 
security reasons), 'open', 'read', 'close'. I wonder why glibc went to stdio for 
this? Seems overkill. In practice, I'd think going back to syscalls would gain 
more performance than any of the stuff we've talked about so far.

Anyway - if we're moving things around, how about putting 'leaps' before 
'transitions' instead of after? This will eliminate alignment gaps just as well 
in practice, and it will allow us to do one fread instead of two.

PS. While you're in the neighborhood, the leap second correction field (struct 
leap.change) can be 'int' instead of 'long int', as the TZif format allows only 
32 bits of (signed) correction.
Florian Weimer Feb. 3, 2019, 5:44 p.m. | #5
* Paul Eggert:

> Florian Weimer wrote:
>> This is a long-term allocation, and
>> the reading of the file happens only once per allocation, so I do think
>> that the additional __fread_unlocked call is not a problem from a
>> performance perspective.
>
> The upstream code uses a single 'read' call, bypassing stdio
> entirely. It doesn't even stat the file; all it does is 'access' (if a
> relative filename, for security reasons), 'open', 'read', 'close'. I
> wonder why glibc went to stdio for this? Seems overkill. In practice,
> I'd think going back to syscalls would gain more performance than any
> of the stuff we've talked about so far.

It's not about performance, it's about simplifying the code in
preparation of eliminating most of the locking.

> Anyway - if we're moving things around, how about putting 'leaps'
> before 'transitions' instead of after? This will eliminate alignment
> gaps just as well in practice, and it will allow us to do one fread
> instead of two.

This is not possible because struct leap has alignment 4 on i386, while
the transitions array has alignment 8.  Sorry.

+     The order of the suballocations is important for alignment
+     purposes.  __time64_t outside a struct may require more alignment
+     then inside a struct on some architectures, so it must come
+     first. */

> PS. While you're in the neighborhood, the leap second correction field
> (struct leap.change) can be 'int' instead of 'long int', as the TZif
> format allows only 32 bits of (signed) correction.

I know, but I'm pondering whether we should leap second support
altogether because it leads to non-conformance with POSIX.

Thanks,
Florian
Paul Eggert Feb. 3, 2019, 7:32 p.m. | #6
Florian Weimer wrote:

>> Anyway - if we're moving things around, how about putting 'leaps'
>> before 'transitions' instead of after? This will eliminate alignment
>> gaps just as well in practice, and it will allow us to do one fread
>> instead of two.
> This is not possible because struct leap has alignment 4 on i386, while
> the transitions array has alignment 8.

If that's the issue, we can easily change struct leap to have an alignment of 8 
on x86-64 by using an alignment directive, or (less good) change the transitions 
array to have an alignment of 4 by packaging each transition value inside a struct.

> It's not about performance, it's about simplifying the code in
> preparation of eliminating most of the locking.

I guess I'm not seeing the point. The code isn't that much simpler and it is 
surely a bit fatter and slower, and I don't see how the simplification would 
make it easier to eliminate most locking. Perhaps it'll become more obvious in 
later patches....

> I'm pondering whether we should leap second support
> altogether because it leads to non-conformance with POSIX.

Oh my, that would be a bigger deal. Strictly speaking glibc currently conforms 
to POSIX in this respect, so what you're pondering is removing an extension to 
POSIX rather than fixing a conformance issue. Has anybody else ever done that; 
that is, has any other C library supported tzdb-style leap seconds and then 
removed the support?

PS. It would be ironic for glibc to remove leap second support at about the same 
time that Microsoft Windows finally added support. See:

Cuomo D. Top 10 Networking Features in Windows Server 2019: #10 Accurate Network 
Time. [Microsoft] Networking Blog. 2018-07-18. 
https://blogs.technet.microsoft.com/networking/2018/07/18/top10-ws2019-hatime/
Florian Weimer Feb. 3, 2019, 9:33 p.m. | #7
* Paul Eggert:

> Florian Weimer wrote:
>
>>> Anyway - if we're moving things around, how about putting 'leaps'
>>> before 'transitions' instead of after? This will eliminate alignment
>>> gaps just as well in practice, and it will allow us to do one fread
>>> instead of two.
>> This is not possible because struct leap has alignment 4 on i386, while
>> the transitions array has alignment 8.
>
> If that's the issue, we can easily change struct leap to have an
> alignment of 8 on x86-64 by using an alignment directive, or (less
> good) change the transitions array to have an alignment of 4 by
> packaging each transition value inside a struct.

Well, or we could use this patch.

>> It's not about performance, it's about simplifying the code in
>> preparation of eliminating most of the locking.
>
> I guess I'm not seeing the point. The code isn't that much simpler and
> it is surely a bit fatter and slower,

With the old allocation order:

0000000000000000 0000000000000d4c T __tzfile_read

With the new allocation order:

0000000000000000 0000000000000cd8 T __tzfile_read

So it's not just the allocations that are more compact, the code is as
well.

>> I'm pondering whether we should leap second support
>> altogether because it leads to non-conformance with POSIX.
>
> Oh my, that would be a bigger deal. Strictly speaking glibc currently
> conforms to POSIX in this respect, so what you're pondering is
> removing an extension to POSIX rather than fixing a conformance
> issue. Has anybody else ever done that; that is, has any other C
> library supported tzdb-style leap seconds and then removed the
> support?
>
> PS. It would be ironic for glibc to remove leap second support at
> about the same time that Microsoft Windows finally added support. See:
>
> Cuomo D. Top 10 Networking Features in Windows Server 2019: #10
> Accurate Network Time. [Microsoft] Networking
> Blog. 2018-07-18. https://blogs.technet.microsoft.com/networking/2018/07/18/top10-ws2019-hatime/

It's not possible to do this on Linux because if the kernel clock does
not run on an UTC clock, many programs will not report the correct time.

Even if all these applications were magically fixed to use the glibc
functions (which includes gmtime_r with its locking overhead, something
many programs avoid for performance reasons), then the current code is
still broken because it does not reload the time zone data when new leap
second information is added to it.

Anyway, this is a separate discussion.

Thanks,
Florian
Paul Eggert Feb. 4, 2019, 6:25 a.m. | #8
Florian Weimer wrote:

> it's not just the allocations that are more compact, the code is as
> well.

Ah, if that's the goal, then I guess the patch is OK.

Patch

diff --git a/time/tzfile.c b/time/tzfile.c
index fb1202b80c..d0054459fe 100644
--- a/time/tzfile.c
+++ b/time/tzfile.c
@@ -246,25 +246,32 @@  __tzfile_read (const char *file, size_t extra, char **extrap)
      the following arrays:
 
      __time64_t transitions[num_transitions];
+     struct leap leaps[num_leaps];
      struct ttinfo types[num_types];
      unsigned char type_idxs[num_types];
      char zone_names[chars];
-     struct leap leaps[num_leaps];
-     char extra_array[extra]; // Stored into *pextras if requested.
      char tzspec[tzspec_len];
+     char extra_array[extra]; // Stored into *pextras if requested.
 
      The piece-wise allocations from buf below verify that no
-     overflow/wraparound occurred in these computations.  */
+     overflow/wraparound occurred in these computations.
+
+     The order of the suballocations is important for alignment
+     purposes.  __time64_t outside a struct may require more alignment
+     then inside a struct on some architectures, so it must come
+     first. */
+  _Static_assert (__alignof (__time64_t) >= __alignof (struct leap),
+		  "alignment of __time64_t");
+  _Static_assert (__alignof (struct leap) >= __alignof (struct ttinfo),
+		  "alignment of struct leap");
   struct alloc_buffer buf;
   {
-    size_t total_size = num_transitions * (sizeof (__time64_t) + 1);
-    total_size = ((total_size + __alignof__ (struct ttinfo) - 1)
-		  & ~(__alignof__ (struct ttinfo) - 1));
-    total_size += num_types * sizeof (struct ttinfo) + chars;
-    total_size = ((total_size + __alignof__ (struct leap) - 1)
-		  & ~(__alignof__ (struct leap) - 1));
-    total_size += num_leaps * sizeof (struct leap) + tzspec_len + extra;
-
+    size_t total_size = (num_transitions * sizeof (__time64_t)
+			 + num_leaps * sizeof (struct leap)
+			 + num_types * sizeof (struct ttinfo)
+			 + num_transitions /* type_idxs */
+			 + chars /* zone_names */
+			 + tzspec_len + extra);
     transitions = malloc (total_size);
     if (transitions == NULL)
       goto lose;
@@ -277,35 +284,25 @@  __tzfile_read (const char *file, size_t extra, char **extrap)
        (misaligned heap pointer, should not happen), or we had an
        overflow.  */
     goto lose;
-  type_idxs = alloc_buffer_alloc_array (&buf, unsigned char, num_transitions);
+  leaps = alloc_buffer_alloc_array (&buf, struct leap, num_leaps);
   types = alloc_buffer_alloc_array (&buf, struct ttinfo, num_types);
+  type_idxs = alloc_buffer_alloc_array (&buf, unsigned char, num_transitions);
   zone_names = alloc_buffer_alloc_array (&buf, char, chars);
-  leaps = alloc_buffer_alloc_array (&buf, struct leap, num_leaps);
+  if (trans_width == 8)
+    tzspec = alloc_buffer_alloc_array (&buf, char, tzspec_len);
+  else
+    tzspec = NULL;
   if (extra > 0)
     *extrap = alloc_buffer_alloc_array (&buf, char, extra);
-  if (trans_width == 8)
-    tzspec = alloc_buffer_alloc_array (&buf, char, tzspec_len);
-  else
-    tzspec = NULL;
   if (alloc_buffer_has_failed (&buf))
     goto lose;
 
-  if (__builtin_expect (trans_width == 8, 1))
-    {
-      if (__builtin_expect (__fread_unlocked (transitions, trans_width + 1,
-					      num_transitions, f)
-			    != num_transitions, 0))
+  if (__glibc_unlikely (__fread_unlocked (transitions, trans_width,
+					  num_transitions, f)
+			!= num_transitions)
+      || __glibc_unlikely (__fread_unlocked (type_idxs, 1, num_transitions, f)
+			   != num_transitions))
 	goto lose;
-    }
-  else
-    {
-      if (__builtin_expect (__fread_unlocked (transitions, 4,
-					      num_transitions, f)
-			    != num_transitions, 0)
-	  || __builtin_expect (__fread_unlocked (type_idxs, 1, num_transitions,
-						 f) != num_transitions, 0))
-	goto lose;
-    }
 
   /* Check for bogus indices in the data file, so we can hereafter
      safely use type_idxs[T] as indices into `types' and never crash.  */