Patchwork qemu-thread: add TLS wrappers

login
register
mail settings
Submitter Paolo Bonzini
Date Aug. 16, 2011, 6:51 p.m.
Message ID <1313520667-26507-1-git-send-email-pbonzini@redhat.com>
Download mbox | patch
Permalink /patch/110215/
State New
Headers show

Comments

Paolo Bonzini - Aug. 16, 2011, 6:51 p.m.
Win32 emulated TLS is slow and is not available on all versions of GCC.
Actually, Win32 does have support for decent TLS, only GCC does not map
__thread to it.  The good news is, it's perfectly possible to declare
TLS variables with simple C code!

This patch adds wrappers to qemu-thread that will use __thread on POSIX
systems, and the .tls segment on Windows.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
	It does kinda uglify the declarations, but not too much IMO.
	Do we want to go this way?  It will certainly limit the number
	of bug reports we get.

 coroutine-win32.c   |    7 +++++--
 qemu-thread-posix.h |    4 ++++
 qemu-thread-win32.c |   17 +++++++++++++++++
 qemu-thread-win32.h |   40 +++++++++++++++++++++++++++++++++++++++-
 qemu-thread.h       |    2 ++
 5 files changed, 67 insertions(+), 3 deletions(-)
Brad - Aug. 17, 2011, 4:36 a.m.
On 16/08/11 2:51 PM, Paolo Bonzini wrote:
> Win32 emulated TLS is slow and is not available on all versions of GCC.
> Actually, Win32 does have support for decent TLS, only GCC does not map
> __thread to it.  The good news is, it's perfectly possible to declare
> TLS variables with simple C code!
>
> This patch adds wrappers to qemu-thread that will use __thread on POSIX
> systems, and the .tls segment on Windows.

The actual use of TLS within QEMU on POSIX systems will be optional 
though, right?
Paolo Bonzini - Aug. 17, 2011, 6:55 a.m.
On 08/16/2011 09:36 PM, Brad wrote:
>
>> Win32 emulated TLS is slow and is not available on all versions of GCC.
>> Actually, Win32 does have support for decent TLS, only GCC does not map
>> __thread to it.  The good news is, it's perfectly possible to declare
>> TLS variables with simple C code!
>>
>> This patch adds wrappers to qemu-thread that will use __thread on POSIX
>> systems, and the .tls segment on Windows.
>
> The actual use of TLS within QEMU on POSIX systems will be optional
> though, right?

Actually, TLS is already mandatory for both POSIX and Win32 systems.

This patch tries to improve the situation on Windows, both by allowing 
older GCC versions, and by making TLS faster.  On ELF systems, TLS has 
been supported by GCC for perhaps 10 years, there's no reason to care.

Paolo
Stefan Hajnoczi - Aug. 17, 2011, 4:51 p.m.
On Tue, Aug 16, 2011 at 7:51 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Win32 emulated TLS is slow and is not available on all versions of GCC.
> Actually, Win32 does have support for decent TLS, only GCC does not map
> __thread to it.  The good news is, it's perfectly possible to declare
> TLS variables with simple C code!
>
> This patch adds wrappers to qemu-thread that will use __thread on POSIX
> systems, and the .tls segment on Windows.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>        It does kinda uglify the declarations, but not too much IMO.
>        Do we want to go this way?  It will certainly limit the number
>        of bug reports we get.
>
>  coroutine-win32.c   |    7 +++++--
>  qemu-thread-posix.h |    4 ++++
>  qemu-thread-win32.c |   17 +++++++++++++++++
>  qemu-thread-win32.h |   40 +++++++++++++++++++++++++++++++++++++++-
>  qemu-thread.h       |    2 ++
>  5 files changed, 67 insertions(+), 3 deletions(-)

Hmm...this is pretty ugly low-level stuff.  glib provides portable TLS
although I have found __thread significantly faster on Linux.

I think our options are:
1. Use __thread and require recent gcc versions on Windows.  Not as
fast as native Windows TLS.
2. Implement our own TLS abstraction to use native TLS on each platform.
3. Use glib TLS (GStaticPrivate).  It's slower than __thread on Linux
but seems portable.

Options 1 and 3 seem better because they do not involve us
reimplementing yet more platform abstractions.  Option 1 would be
nicest.

Do you know which Windows gcc versions support __thread and if it
would be reasonable to require those versions?

Stefan
Paolo Bonzini - Aug. 17, 2011, 5:13 p.m.
On 08/17/2011 09:51 AM, Stefan Hajnoczi wrote:
> Hmm...this is pretty ugly low-level stuff.  glib provides portable TLS
> although I have found __thread significantly faster on Linux.
>
> I think our options are:
> 1. Use __thread and require recent gcc versions on Windows.  Not as
> fast as native Windows TLS.
> 2. Implement our own TLS abstraction to use native TLS on each platform.
> 3. Use glib TLS (GStaticPrivate).  It's slower than __thread on Linux
> but seems portable.
>
> Options 1 and 3 seem better because they do not involve us
> reimplementing yet more platform abstractions.

True, but I'm afraid we'll need to reimplement more of them sooner or 
later, for example for PI locks.  QEMU's needs in this area are just 
more low level than GLib's, though we could surely use glib's other 
threading primitives such as the thread pool.

> Do you know which Windows gcc versions support __thread and if it
> would be reasonable to require those versions?

It's probably 4.3 and newer, or something like that.

Paolo
Stefan Hajnoczi - Aug. 17, 2011, 5:26 p.m.
On Wed, Aug 17, 2011 at 6:13 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 08/17/2011 09:51 AM, Stefan Hajnoczi wrote:
>>
>> Hmm...this is pretty ugly low-level stuff.  glib provides portable TLS
>> although I have found __thread significantly faster on Linux.
>>
>> I think our options are:
>> 1. Use __thread and require recent gcc versions on Windows.  Not as
>> fast as native Windows TLS.
>> 2. Implement our own TLS abstraction to use native TLS on each platform.
>> 3. Use glib TLS (GStaticPrivate).  It's slower than __thread on Linux
>> but seems portable.
>>
>> Options 1 and 3 seem better because they do not involve us
>> reimplementing yet more platform abstractions.
>
> True, but I'm afraid we'll need to reimplement more of them sooner or later,
> for example for PI locks.  QEMU's needs in this area are just more low level
> than GLib's, though we could surely use glib's other threading primitives
> such as the thread pool.
>
>> Do you know which Windows gcc versions support __thread and if it
>> would be reasonable to require those versions?
>
> It's probably 4.3 and newer, or something like that.

I suggest we simply require that minimum gcc version.

Stefan
Blue Swirl - Aug. 17, 2011, 5:28 p.m.
On Wed, Aug 17, 2011 at 6:55 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 08/16/2011 09:36 PM, Brad wrote:
>>
>>> Win32 emulated TLS is slow and is not available on all versions of GCC.
>>> Actually, Win32 does have support for decent TLS, only GCC does not map
>>> __thread to it.  The good news is, it's perfectly possible to declare
>>> TLS variables with simple C code!
>>>
>>> This patch adds wrappers to qemu-thread that will use __thread on POSIX
>>> systems, and the .tls segment on Windows.
>>
>> The actual use of TLS within QEMU on POSIX systems will be optional
>> though, right?
>
> Actually, TLS is already mandatory for both POSIX and Win32 systems.

Not true, we have avoided TLS.

> This patch tries to improve the situation on Windows, both by allowing older
> GCC versions, and by making TLS faster.  On ELF systems, TLS has been
> supported by GCC for perhaps 10 years, there's no reason to care.

GCC may support it but it can't use it on a platform if the other
parts are missing (libc etc.).
Paolo Bonzini - Aug. 17, 2011, 5:28 p.m.
On 08/17/2011 10:26 AM, Stefan Hajnoczi wrote:
>>> >>  Do you know which Windows gcc versions support __thread and if it
>>> >>  would be reasonable to require those versions?
>> >
>> >  It's probably 4.3 and newer, or something like that.
> I suggest we simply require that minimum gcc version.

I would agree (just did it for fun and to see how it looks like), but 
perhaps we can do something similar to what Windows does for OpenBSD too.

Though I'm quite disappointed and I must say OpenBSD should enter the 
21st century.

Paolo
Blue Swirl - Aug. 17, 2011, 5:34 p.m.
On Wed, Aug 17, 2011 at 5:13 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 08/17/2011 09:51 AM, Stefan Hajnoczi wrote:
>>
>> Hmm...this is pretty ugly low-level stuff.  glib provides portable TLS
>> although I have found __thread significantly faster on Linux.
>>
>> I think our options are:
>> 1. Use __thread and require recent gcc versions on Windows.  Not as
>> fast as native Windows TLS.
>> 2. Implement our own TLS abstraction to use native TLS on each platform.
>> 3. Use glib TLS (GStaticPrivate).  It's slower than __thread on Linux
>> but seems portable.
>>
>> Options 1 and 3 seem better because they do not involve us
>> reimplementing yet more platform abstractions.
>
> True, but I'm afraid we'll need to reimplement more of them sooner or later,
> for example for PI locks.  QEMU's needs in this area are just more low level
> than GLib's, though we could surely use glib's other threading primitives
> such as the thread pool.
>
>> Do you know which Windows gcc versions support __thread and if it
>> would be reasonable to require those versions?
>
> It's probably 4.3 and newer, or something like that.

No, GCC 3.3.6 manual already mentions it:
http://gcc.gnu.org/onlinedocs/gcc-3.3.6/gcc/Thread_002dLocal.html#Thread_002dLocal
Paolo Bonzini - Aug. 17, 2011, 5:37 p.m.
On 08/17/2011 10:28 AM, Blue Swirl wrote:
>>> >>  The actual use of TLS within QEMU on POSIX systems will be optional
>>> >>  though, right?
>> >
>> >  Actually, TLS is already mandatory for both POSIX and Win32 systems.
> Not true, we have avoided TLS.
>
>> >  This patch tries to improve the situation on Windows, both by allowing older
>> >  GCC versions, and by making TLS faster.  On ELF systems, TLS has been
>> >  supported by GCC for perhaps 10 years, there's no reason to care.
> GCC may support it but it can't use it on a platform if the other
> parts are missing (libc etc.).

Again, welcome to the 21st century.  There's no way you can have 
something that is scalable and not use fast TLS.  IMNSHO if OpenBSD 
still wants to miss features that all sane OSes (and Windows _is_ sane) 
have nowadays, fine, it will be stuck with an old QEMU or with 
single-thread user-level emulation.

As far as portability is concerned, FreeBSD provides a much more 
interesting target.

Paolo
Blue Swirl - Aug. 17, 2011, 5:37 p.m.
On Wed, Aug 17, 2011 at 5:28 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 08/17/2011 10:26 AM, Stefan Hajnoczi wrote:
>>>>
>>>> >>  Do you know which Windows gcc versions support __thread and if it
>>>> >>  would be reasonable to require those versions?
>>>
>>> >
>>> >  It's probably 4.3 and newer, or something like that.
>>
>> I suggest we simply require that minimum gcc version.
>
> I would agree (just did it for fun and to see how it looks like), but
> perhaps we can do something similar to what Windows does for OpenBSD too.
>
> Though I'm quite disappointed and I must say OpenBSD should enter the 21st
> century.

I can't justify their design decisions, but isn't __thread a
non-standard extension by GCC anyway whereas POSIX threads are the
standard? Are other compilers supporting it?
Paolo Bonzini - Aug. 17, 2011, 5:38 p.m.
On 08/17/2011 10:34 AM, Blue Swirl wrote:
>>>  Do you know which Windows gcc versions support __thread and if it
>>>  would be reasonable to require those versions?
>>
>>  It's probably 4.3 and newer, or something like that.
>
> No, GCC 3.3.6 manual already mentions it:
> http://gcc.gnu.org/onlinedocs/gcc-3.3.6/gcc/Thread_002dLocal.html#Thread_002dLocal

Not on Windows though.

Paolo
Paolo Bonzini - Aug. 17, 2011, 5:45 p.m.
On 08/17/2011 10:37 AM, Blue Swirl wrote:
> >  I would agree (just did it for fun and to see how it looks like), but
> >  perhaps we can do something similar to what Windows does for OpenBSD too.
> >
> >  Though I'm quite disappointed and I must say OpenBSD should enter the 21st
> >  century.
>
> I can't justify their design decisions, but isn't __thread a
> non-standard extension by GCC anyway whereas POSIX threads are the
> standard? Are other compilers supporting it?

Visual C++ supports __declspec(thread).

pthread TLS is orders of magnitudes slower than __thread.  __thread in 
an executable is just as fast as a non-TLS variable access, 2 or 3 
machine language instructions at most.

Some pthread_getspecific implementation do a _linear walk_ of the keys. 
  In that case it may work to have something like Windows plus a single 
pthread_getspecific key.  On the other hand, Windows provides very good 
support for that thanks to its ordered subsections (.tls$something).  It 
only needs a bit of preprocessor magic.  Thinks such as linker scripts 
would be way out of scope.

Paolo
Blue Swirl - Aug. 17, 2011, 5:51 p.m.
On Wed, Aug 17, 2011 at 5:45 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 08/17/2011 10:37 AM, Blue Swirl wrote:
>>
>> >  I would agree (just did it for fun and to see how it looks like), but
>> >  perhaps we can do something similar to what Windows does for OpenBSD
>> > too.
>> >
>> >  Though I'm quite disappointed and I must say OpenBSD should enter the
>> > 21st
>> >  century.
>>
>> I can't justify their design decisions, but isn't __thread a
>> non-standard extension by GCC anyway whereas POSIX threads are the
>> standard? Are other compilers supporting it?
>
> Visual C++ supports __declspec(thread).
>
> pthread TLS is orders of magnitudes slower than __thread.  __thread in an
> executable is just as fast as a non-TLS variable access, 2 or 3 machine
> language instructions at most.
>
> Some pthread_getspecific implementation do a _linear walk_ of the keys.  In
> that case it may work to have something like Windows plus a single
> pthread_getspecific key.  On the other hand, Windows provides very good
> support for that thanks to its ordered subsections (.tls$something).  It
> only needs a bit of preprocessor magic.  Thinks such as linker scripts would
> be way out of scope.

According to this blog, also pthread_getspecific on OSX can amount to
three instructions except on PPC:
http://lifecs.likai.org/2010/05/mac-os-x-thread-local-storage.html
Paolo Bonzini - Aug. 17, 2011, 6:03 p.m.
On 08/17/2011 10:51 AM, Blue Swirl wrote:
> >  Visual C++ supports __declspec(thread).
> >
> >  pthread TLS is orders of magnitudes slower than __thread.  __thread in an
> >  executable is just as fast as a non-TLS variable access, 2 or 3 machine
> >  language instructions at most.
> >
> >  Some pthread_getspecific implementation do a_linear walk_  of the keys.  In
> >  that case it may work to have something like Windows plus a single
> >  pthread_getspecific key.  On the other hand, Windows provides very good
> >  support for that thanks to its ordered subsections (.tls$something).  It
> >  only needs a bit of preprocessor magic.  Thinks such as linker scripts would
> >  be way out of scope.
>
> According to this blog, also pthread_getspecific on OSX can amount to
> three instructions except on PPC:
> http://lifecs.likai.org/2010/05/mac-os-x-thread-local-storage.html

I'll try to work out a DEFINE_TLS/DECLARE_TLS version for pthreads too.

BTW, __thread is in C1X too though I don't remember how they called it.

Paolo
malc - Aug. 17, 2011, 6:32 p.m.
On Wed, 17 Aug 2011, Paolo Bonzini wrote:

> On 08/17/2011 10:51 AM, Blue Swirl wrote:
> > >  Visual C++ supports __declspec(thread).
> > >
> > >  pthread TLS is orders of magnitudes slower than __thread.  __thread in an
> > >  executable is just as fast as a non-TLS variable access, 2 or 3 machine
> > >  language instructions at most.
> > >
> > >  Some pthread_getspecific implementation do a_linear walk_  of the keys.
> > In
> > >  that case it may work to have something like Windows plus a single
> > >  pthread_getspecific key.  On the other hand, Windows provides very good
> > >  support for that thanks to its ordered subsections (.tls$something).  It
> > >  only needs a bit of preprocessor magic.  Thinks such as linker scripts
> > would
> > >  be way out of scope.
> > 
> > According to this blog, also pthread_getspecific on OSX can amount to
> > three instructions except on PPC:
> > http://lifecs.likai.org/2010/05/mac-os-x-thread-local-storage.html
> 
> I'll try to work out a DEFINE_TLS/DECLARE_TLS version for pthreads too.
> 
> BTW, __thread is in C1X too though I don't remember how they called it.
> 

thread_local 3.7.2 in n3290.pdf
Richard Henderson - Aug. 17, 2011, 7 p.m.
On 08/17/2011 10:28 AM, Blue Swirl wrote:
> GCC may support it but it can't use it on a platform if the other
> parts are missing (libc etc.).

With a new-enough gcc, we provide the support in libgcc.


r~
Brad - Aug. 17, 2011, 11:41 p.m.
----- Original message -----
> On Wed, Aug 17, 2011 at 5:28 PM, Paolo Bonzini <pbonzini@redhat.com>
> wrote:
> > On 08/17/2011 10:26 AM, Stefan Hajnoczi wrote:
> > > > > 
> > > > > > >  Do you know which Windows gcc versions support __thread and
> > > > > > > if it would be reasonable to require those versions?
> > > > 
> > > > > 
> > > > >  It's probably 4.3 and newer, or something like that.
> > > 
> > > I suggest we simply require that minimum gcc version.
> > 
> > I would agree (just did it for fun and to see how it looks like), but
> > perhaps we can do something similar to what Windows does for OpenBSD
> > too.
> > 
> > Though I'm quite disappointed and I must say OpenBSD should enter the
> > 21st century.
> 
> I can't justify their design decisions, but isn't __thread a
> non-standard extension by GCC anyway whereas POSIX threads are the
> standard? Are other compilers supporting it?

Its not a "design decision" so much as when someone does the work it'll be supported. There is work under way to add support but its far from trivial and requires somewhat heavy lifting to get there.
Brad - Aug. 17, 2011, 11:50 p.m.
----- Original message -----
> On 08/17/2011 10:28 AM, Blue Swirl wrote:
> > GCC may support it but it can't use it on a platform if the other
> > parts are missing (libc etc.).
> 
> With a new-enough gcc, we provide the support in libgcc.

I doubt gcc 4.2.1 fits the bill.
Paolo Bonzini - Aug. 17, 2011, 11:51 p.m.
On 08/17/2011 04:50 PM, Brad wrote:
> ----- Original message -----
>> On 08/17/2011 10:28 AM, Blue Swirl wrote:
>>> GCC may support it but it can't use it on a platform if the other
>>> parts are missing (libc etc.).
>>
>> With a new-enough gcc, we provide the support in libgcc.
>
> I doubt gcc 4.2.1 fits the bill.

Nope, I'll generalize this patch to include pthread too.

Paolo

Patch

diff --git a/coroutine-win32.c b/coroutine-win32.c
index 0e29448..7f58432 100644
--- a/coroutine-win32.c
+++ b/coroutine-win32.c
@@ -24,6 +24,7 @@ 
 
 #include "qemu-common.h"
 #include "qemu-coroutine-int.h"
+#include "qemu-thread.h"
 
 typedef struct
 {
@@ -33,8 +34,10 @@  typedef struct
     CoroutineAction action;
 } CoroutineWin32;
 
-static __thread CoroutineWin32 leader;
-static __thread Coroutine *current;
+static DEFINE_TLS(CoroutineWin32, tls_leader);
+static DEFINE_TLS(Coroutine *, tls_current);
+#define leader get_tls(tls_leader)
+#define current get_tls(tls_current)
 
 CoroutineAction qemu_coroutine_switch(Coroutine *from_, Coroutine *to_,
                                       CoroutineAction action)
diff --git a/qemu-thread-posix.h b/qemu-thread-posix.h
index d781ca6..2a302da 100644
--- a/qemu-thread-posix.h
+++ b/qemu-thread-posix.h
@@ -27,4 +27,8 @@  struct QemuThread {
     pthread_t thread;
 };
 
+#define DEFINE_TLS(type, x)	__thread type x
+#define DECLARE_TLS(type, x)	extern DEFINE_TLS(type, x)
+#define get_tls(x)		(x)
+
 #endif
diff --git a/qemu-thread-win32.c b/qemu-thread-win32.c
index f5891bd..3eb6b89 100644
--- a/qemu-thread-win32.c
+++ b/qemu-thread-win32.c
@@ -17,6 +17,23 @@ 
 #include <assert.h>
 #include <limits.h>
 
+/* TLS support, not exported by the mingw run-time library.  */
+
+int __attribute__((section(".tls$000"))) _tls_start = 0;
+int __attribute__((section(".tls$ZZZ"))) _tls_end = 0;
+int _tls_index = 0;
+
+const IMAGE_TLS_DIRECTORY _tls_used __attribute__((used, section(".rdata$T"))) =
+{
+ (ULONG)(ULONG_PTR) &_tls_start, // start of tls data
+ (ULONG)(ULONG_PTR) &_tls_end,   // end of tls data
+ (ULONG)(ULONG_PTR) &_tls_index, // address of tls_index
+ (ULONG) 0,                      // pointer to callbacks
+ (ULONG) 0,                      // size of tls zero fill
+ (ULONG) 0                       // characteristics
+};
+
+
 static void error_exit(int err, const char *msg)
 {
     char *pstr;
diff --git a/qemu-thread-win32.h b/qemu-thread-win32.h
index 6cb248f..2ffe7de 100644
--- a/qemu-thread-win32.h
+++ b/qemu-thread-win32.h
@@ -1,6 +1,7 @@ 
 #ifndef __QEMU_THREAD_WIN32_H
 #define __QEMU_THREAD_WIN32_H 1
-#include "windows.h"
+#include <windows.h>
+#include <winnt.h>
 
 struct QemuMutex {
     CRITICAL_SECTION lock;
@@ -27,4 +28,41 @@  struct QemuThread {
     void *ret;
 };
 
+typedef struct _TEB {
+  _NT_TIB NtTib;
+  void* EnvironmentPointer;
+  void *x[3];
+  char **ThreadLocalStoragePointer;
+} TEB, *PTEB;
+
+/* 1) The initial contents TLS variables is placed in the .tls section.  */
+
+#define DEFINE_TLS(type, x)	type tls__##x __attribute__((section(".tls$BBB")))
+
+/* 2) _tls_index holds the number of our module.  The executable should be
+   zero, DLLs are numbered 1 and up.  The loader fills it in for us.  */
+
+extern int _tls_index;
+extern int _tls_start;
+
+/* 3) Thus, Teb->ThreadLocalStoragePointer[_tls_index] is the base of
+   the TLS segment for this (thread, module) pair.  Each segment has
+   the same layout as this module's .tls segment and is initialized
+   with the content of the .tls segment; 0 is the _tls_start variable.
+   So, get_tls passes us the offset of the passed variable relative to
+   _tls_start, and we return that same offset plus the base of segment.  */
+
+static inline __attribute__((__pure__)) void *_get_tls(int offset)
+{
+    PTEB Teb = NtCurrentTeb();
+    return (char *)(Teb->ThreadLocalStoragePointer[_tls_index]) + offset;
+}
+
+/* 4) get_tls, in addition to computing the offset, returns an lvalue.
+      "I got it.  Magic."  */
+
+#define get_tls(x)		\
+  (*(__typeof__(tls__##x) *)    \
+    _get_tls((ULONG_PTR)&(tls__##x) - (ULONG_PTR)&_tls_start))
+
 #endif
diff --git a/qemu-thread.h b/qemu-thread.h
index ae75638..858c8cc 100644
--- a/qemu-thread.h
+++ b/qemu-thread.h
@@ -49,4 +49,6 @@  void qemu_thread_get_self(QemuThread *thread);
 int qemu_thread_is_self(QemuThread *thread);
 void qemu_thread_exit(void *retval);
 
+#define DECLARE_TLS(type, x)	extern DEFINE_TLS(type, x)
+
 #endif