Message ID | 1313520667-26507-1-git-send-email-pbonzini@redhat.com |
---|---|
State | New |
Headers | show |
On 16/08/11 2:51 PM, Paolo Bonzini wrote: > Win32 emulated TLS is slow and is not available on all versions of GCC. > Actually, Win32 does have support for decent TLS, only GCC does not map > __thread to it. The good news is, it's perfectly possible to declare > TLS variables with simple C code! > > This patch adds wrappers to qemu-thread that will use __thread on POSIX > systems, and the .tls segment on Windows. The actual use of TLS within QEMU on POSIX systems will be optional though, right?
On 08/16/2011 09:36 PM, Brad wrote: > >> Win32 emulated TLS is slow and is not available on all versions of GCC. >> Actually, Win32 does have support for decent TLS, only GCC does not map >> __thread to it. The good news is, it's perfectly possible to declare >> TLS variables with simple C code! >> >> This patch adds wrappers to qemu-thread that will use __thread on POSIX >> systems, and the .tls segment on Windows. > > The actual use of TLS within QEMU on POSIX systems will be optional > though, right? Actually, TLS is already mandatory for both POSIX and Win32 systems. This patch tries to improve the situation on Windows, both by allowing older GCC versions, and by making TLS faster. On ELF systems, TLS has been supported by GCC for perhaps 10 years, there's no reason to care. Paolo
On Tue, Aug 16, 2011 at 7:51 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: > Win32 emulated TLS is slow and is not available on all versions of GCC. > Actually, Win32 does have support for decent TLS, only GCC does not map > __thread to it. The good news is, it's perfectly possible to declare > TLS variables with simple C code! > > This patch adds wrappers to qemu-thread that will use __thread on POSIX > systems, and the .tls segment on Windows. > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > --- > It does kinda uglify the declarations, but not too much IMO. > Do we want to go this way? It will certainly limit the number > of bug reports we get. > > coroutine-win32.c | 7 +++++-- > qemu-thread-posix.h | 4 ++++ > qemu-thread-win32.c | 17 +++++++++++++++++ > qemu-thread-win32.h | 40 +++++++++++++++++++++++++++++++++++++++- > qemu-thread.h | 2 ++ > 5 files changed, 67 insertions(+), 3 deletions(-) Hmm...this is pretty ugly low-level stuff. glib provides portable TLS although I have found __thread significantly faster on Linux. I think our options are: 1. Use __thread and require recent gcc versions on Windows. Not as fast as native Windows TLS. 2. Implement our own TLS abstraction to use native TLS on each platform. 3. Use glib TLS (GStaticPrivate). It's slower than __thread on Linux but seems portable. Options 1 and 3 seem better because they do not involve us reimplementing yet more platform abstractions. Option 1 would be nicest. Do you know which Windows gcc versions support __thread and if it would be reasonable to require those versions? Stefan
On 08/17/2011 09:51 AM, Stefan Hajnoczi wrote: > Hmm...this is pretty ugly low-level stuff. glib provides portable TLS > although I have found __thread significantly faster on Linux. > > I think our options are: > 1. Use __thread and require recent gcc versions on Windows. Not as > fast as native Windows TLS. > 2. Implement our own TLS abstraction to use native TLS on each platform. > 3. Use glib TLS (GStaticPrivate). It's slower than __thread on Linux > but seems portable. > > Options 1 and 3 seem better because they do not involve us > reimplementing yet more platform abstractions. True, but I'm afraid we'll need to reimplement more of them sooner or later, for example for PI locks. QEMU's needs in this area are just more low level than GLib's, though we could surely use glib's other threading primitives such as the thread pool. > Do you know which Windows gcc versions support __thread and if it > would be reasonable to require those versions? It's probably 4.3 and newer, or something like that. Paolo
On Wed, Aug 17, 2011 at 6:13 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: > On 08/17/2011 09:51 AM, Stefan Hajnoczi wrote: >> >> Hmm...this is pretty ugly low-level stuff. glib provides portable TLS >> although I have found __thread significantly faster on Linux. >> >> I think our options are: >> 1. Use __thread and require recent gcc versions on Windows. Not as >> fast as native Windows TLS. >> 2. Implement our own TLS abstraction to use native TLS on each platform. >> 3. Use glib TLS (GStaticPrivate). It's slower than __thread on Linux >> but seems portable. >> >> Options 1 and 3 seem better because they do not involve us >> reimplementing yet more platform abstractions. > > True, but I'm afraid we'll need to reimplement more of them sooner or later, > for example for PI locks. QEMU's needs in this area are just more low level > than GLib's, though we could surely use glib's other threading primitives > such as the thread pool. > >> Do you know which Windows gcc versions support __thread and if it >> would be reasonable to require those versions? > > It's probably 4.3 and newer, or something like that. I suggest we simply require that minimum gcc version. Stefan
On Wed, Aug 17, 2011 at 6:55 AM, Paolo Bonzini <pbonzini@redhat.com> wrote: > On 08/16/2011 09:36 PM, Brad wrote: >> >>> Win32 emulated TLS is slow and is not available on all versions of GCC. >>> Actually, Win32 does have support for decent TLS, only GCC does not map >>> __thread to it. The good news is, it's perfectly possible to declare >>> TLS variables with simple C code! >>> >>> This patch adds wrappers to qemu-thread that will use __thread on POSIX >>> systems, and the .tls segment on Windows. >> >> The actual use of TLS within QEMU on POSIX systems will be optional >> though, right? > > Actually, TLS is already mandatory for both POSIX and Win32 systems. Not true, we have avoided TLS. > This patch tries to improve the situation on Windows, both by allowing older > GCC versions, and by making TLS faster. On ELF systems, TLS has been > supported by GCC for perhaps 10 years, there's no reason to care. GCC may support it but it can't use it on a platform if the other parts are missing (libc etc.).
On 08/17/2011 10:26 AM, Stefan Hajnoczi wrote: >>> >> Do you know which Windows gcc versions support __thread and if it >>> >> would be reasonable to require those versions? >> > >> > It's probably 4.3 and newer, or something like that. > I suggest we simply require that minimum gcc version. I would agree (just did it for fun and to see how it looks like), but perhaps we can do something similar to what Windows does for OpenBSD too. Though I'm quite disappointed and I must say OpenBSD should enter the 21st century. Paolo
On Wed, Aug 17, 2011 at 5:13 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: > On 08/17/2011 09:51 AM, Stefan Hajnoczi wrote: >> >> Hmm...this is pretty ugly low-level stuff. glib provides portable TLS >> although I have found __thread significantly faster on Linux. >> >> I think our options are: >> 1. Use __thread and require recent gcc versions on Windows. Not as >> fast as native Windows TLS. >> 2. Implement our own TLS abstraction to use native TLS on each platform. >> 3. Use glib TLS (GStaticPrivate). It's slower than __thread on Linux >> but seems portable. >> >> Options 1 and 3 seem better because they do not involve us >> reimplementing yet more platform abstractions. > > True, but I'm afraid we'll need to reimplement more of them sooner or later, > for example for PI locks. QEMU's needs in this area are just more low level > than GLib's, though we could surely use glib's other threading primitives > such as the thread pool. > >> Do you know which Windows gcc versions support __thread and if it >> would be reasonable to require those versions? > > It's probably 4.3 and newer, or something like that. No, GCC 3.3.6 manual already mentions it: http://gcc.gnu.org/onlinedocs/gcc-3.3.6/gcc/Thread_002dLocal.html#Thread_002dLocal
On 08/17/2011 10:28 AM, Blue Swirl wrote: >>> >> The actual use of TLS within QEMU on POSIX systems will be optional >>> >> though, right? >> > >> > Actually, TLS is already mandatory for both POSIX and Win32 systems. > Not true, we have avoided TLS. > >> > This patch tries to improve the situation on Windows, both by allowing older >> > GCC versions, and by making TLS faster. On ELF systems, TLS has been >> > supported by GCC for perhaps 10 years, there's no reason to care. > GCC may support it but it can't use it on a platform if the other > parts are missing (libc etc.). Again, welcome to the 21st century. There's no way you can have something that is scalable and not use fast TLS. IMNSHO if OpenBSD still wants to miss features that all sane OSes (and Windows _is_ sane) have nowadays, fine, it will be stuck with an old QEMU or with single-thread user-level emulation. As far as portability is concerned, FreeBSD provides a much more interesting target. Paolo
On Wed, Aug 17, 2011 at 5:28 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: > On 08/17/2011 10:26 AM, Stefan Hajnoczi wrote: >>>> >>>> >> Do you know which Windows gcc versions support __thread and if it >>>> >> would be reasonable to require those versions? >>> >>> > >>> > It's probably 4.3 and newer, or something like that. >> >> I suggest we simply require that minimum gcc version. > > I would agree (just did it for fun and to see how it looks like), but > perhaps we can do something similar to what Windows does for OpenBSD too. > > Though I'm quite disappointed and I must say OpenBSD should enter the 21st > century. I can't justify their design decisions, but isn't __thread a non-standard extension by GCC anyway whereas POSIX threads are the standard? Are other compilers supporting it?
On 08/17/2011 10:34 AM, Blue Swirl wrote: >>> Do you know which Windows gcc versions support __thread and if it >>> would be reasonable to require those versions? >> >> It's probably 4.3 and newer, or something like that. > > No, GCC 3.3.6 manual already mentions it: > http://gcc.gnu.org/onlinedocs/gcc-3.3.6/gcc/Thread_002dLocal.html#Thread_002dLocal Not on Windows though. Paolo
On 08/17/2011 10:37 AM, Blue Swirl wrote: > > I would agree (just did it for fun and to see how it looks like), but > > perhaps we can do something similar to what Windows does for OpenBSD too. > > > > Though I'm quite disappointed and I must say OpenBSD should enter the 21st > > century. > > I can't justify their design decisions, but isn't __thread a > non-standard extension by GCC anyway whereas POSIX threads are the > standard? Are other compilers supporting it? Visual C++ supports __declspec(thread). pthread TLS is orders of magnitudes slower than __thread. __thread in an executable is just as fast as a non-TLS variable access, 2 or 3 machine language instructions at most. Some pthread_getspecific implementation do a _linear walk_ of the keys. In that case it may work to have something like Windows plus a single pthread_getspecific key. On the other hand, Windows provides very good support for that thanks to its ordered subsections (.tls$something). It only needs a bit of preprocessor magic. Thinks such as linker scripts would be way out of scope. Paolo
On Wed, Aug 17, 2011 at 5:45 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: > On 08/17/2011 10:37 AM, Blue Swirl wrote: >> >> > I would agree (just did it for fun and to see how it looks like), but >> > perhaps we can do something similar to what Windows does for OpenBSD >> > too. >> > >> > Though I'm quite disappointed and I must say OpenBSD should enter the >> > 21st >> > century. >> >> I can't justify their design decisions, but isn't __thread a >> non-standard extension by GCC anyway whereas POSIX threads are the >> standard? Are other compilers supporting it? > > Visual C++ supports __declspec(thread). > > pthread TLS is orders of magnitudes slower than __thread. __thread in an > executable is just as fast as a non-TLS variable access, 2 or 3 machine > language instructions at most. > > Some pthread_getspecific implementation do a _linear walk_ of the keys. In > that case it may work to have something like Windows plus a single > pthread_getspecific key. On the other hand, Windows provides very good > support for that thanks to its ordered subsections (.tls$something). It > only needs a bit of preprocessor magic. Thinks such as linker scripts would > be way out of scope. According to this blog, also pthread_getspecific on OSX can amount to three instructions except on PPC: http://lifecs.likai.org/2010/05/mac-os-x-thread-local-storage.html
On 08/17/2011 10:51 AM, Blue Swirl wrote: > > Visual C++ supports __declspec(thread). > > > > pthread TLS is orders of magnitudes slower than __thread. __thread in an > > executable is just as fast as a non-TLS variable access, 2 or 3 machine > > language instructions at most. > > > > Some pthread_getspecific implementation do a_linear walk_ of the keys. In > > that case it may work to have something like Windows plus a single > > pthread_getspecific key. On the other hand, Windows provides very good > > support for that thanks to its ordered subsections (.tls$something). It > > only needs a bit of preprocessor magic. Thinks such as linker scripts would > > be way out of scope. > > According to this blog, also pthread_getspecific on OSX can amount to > three instructions except on PPC: > http://lifecs.likai.org/2010/05/mac-os-x-thread-local-storage.html I'll try to work out a DEFINE_TLS/DECLARE_TLS version for pthreads too. BTW, __thread is in C1X too though I don't remember how they called it. Paolo
On Wed, 17 Aug 2011, Paolo Bonzini wrote: > On 08/17/2011 10:51 AM, Blue Swirl wrote: > > > Visual C++ supports __declspec(thread). > > > > > > pthread TLS is orders of magnitudes slower than __thread. __thread in an > > > executable is just as fast as a non-TLS variable access, 2 or 3 machine > > > language instructions at most. > > > > > > Some pthread_getspecific implementation do a_linear walk_ of the keys. > > In > > > that case it may work to have something like Windows plus a single > > > pthread_getspecific key. On the other hand, Windows provides very good > > > support for that thanks to its ordered subsections (.tls$something). It > > > only needs a bit of preprocessor magic. Thinks such as linker scripts > > would > > > be way out of scope. > > > > According to this blog, also pthread_getspecific on OSX can amount to > > three instructions except on PPC: > > http://lifecs.likai.org/2010/05/mac-os-x-thread-local-storage.html > > I'll try to work out a DEFINE_TLS/DECLARE_TLS version for pthreads too. > > BTW, __thread is in C1X too though I don't remember how they called it. > thread_local 3.7.2 in n3290.pdf
On 08/17/2011 10:28 AM, Blue Swirl wrote: > GCC may support it but it can't use it on a platform if the other > parts are missing (libc etc.). With a new-enough gcc, we provide the support in libgcc. r~
----- Original message ----- > On Wed, Aug 17, 2011 at 5:28 PM, Paolo Bonzini <pbonzini@redhat.com> > wrote: > > On 08/17/2011 10:26 AM, Stefan Hajnoczi wrote: > > > > > > > > > > > > Do you know which Windows gcc versions support __thread and > > > > > > > if it would be reasonable to require those versions? > > > > > > > > > > > > > > It's probably 4.3 and newer, or something like that. > > > > > > I suggest we simply require that minimum gcc version. > > > > I would agree (just did it for fun and to see how it looks like), but > > perhaps we can do something similar to what Windows does for OpenBSD > > too. > > > > Though I'm quite disappointed and I must say OpenBSD should enter the > > 21st century. > > I can't justify their design decisions, but isn't __thread a > non-standard extension by GCC anyway whereas POSIX threads are the > standard? Are other compilers supporting it? Its not a "design decision" so much as when someone does the work it'll be supported. There is work under way to add support but its far from trivial and requires somewhat heavy lifting to get there.
----- Original message ----- > On 08/17/2011 10:28 AM, Blue Swirl wrote: > > GCC may support it but it can't use it on a platform if the other > > parts are missing (libc etc.). > > With a new-enough gcc, we provide the support in libgcc. I doubt gcc 4.2.1 fits the bill.
On 08/17/2011 04:50 PM, Brad wrote: > ----- Original message ----- >> On 08/17/2011 10:28 AM, Blue Swirl wrote: >>> GCC may support it but it can't use it on a platform if the other >>> parts are missing (libc etc.). >> >> With a new-enough gcc, we provide the support in libgcc. > > I doubt gcc 4.2.1 fits the bill. Nope, I'll generalize this patch to include pthread too. Paolo
diff --git a/coroutine-win32.c b/coroutine-win32.c index 0e29448..7f58432 100644 --- a/coroutine-win32.c +++ b/coroutine-win32.c @@ -24,6 +24,7 @@ #include "qemu-common.h" #include "qemu-coroutine-int.h" +#include "qemu-thread.h" typedef struct { @@ -33,8 +34,10 @@ typedef struct CoroutineAction action; } CoroutineWin32; -static __thread CoroutineWin32 leader; -static __thread Coroutine *current; +static DEFINE_TLS(CoroutineWin32, tls_leader); +static DEFINE_TLS(Coroutine *, tls_current); +#define leader get_tls(tls_leader) +#define current get_tls(tls_current) CoroutineAction qemu_coroutine_switch(Coroutine *from_, Coroutine *to_, CoroutineAction action) diff --git a/qemu-thread-posix.h b/qemu-thread-posix.h index d781ca6..2a302da 100644 --- a/qemu-thread-posix.h +++ b/qemu-thread-posix.h @@ -27,4 +27,8 @@ struct QemuThread { pthread_t thread; }; +#define DEFINE_TLS(type, x) __thread type x +#define DECLARE_TLS(type, x) extern DEFINE_TLS(type, x) +#define get_tls(x) (x) + #endif diff --git a/qemu-thread-win32.c b/qemu-thread-win32.c index f5891bd..3eb6b89 100644 --- a/qemu-thread-win32.c +++ b/qemu-thread-win32.c @@ -17,6 +17,23 @@ #include <assert.h> #include <limits.h> +/* TLS support, not exported by the mingw run-time library. */ + +int __attribute__((section(".tls$000"))) _tls_start = 0; +int __attribute__((section(".tls$ZZZ"))) _tls_end = 0; +int _tls_index = 0; + +const IMAGE_TLS_DIRECTORY _tls_used __attribute__((used, section(".rdata$T"))) = +{ + (ULONG)(ULONG_PTR) &_tls_start, // start of tls data + (ULONG)(ULONG_PTR) &_tls_end, // end of tls data + (ULONG)(ULONG_PTR) &_tls_index, // address of tls_index + (ULONG) 0, // pointer to callbacks + (ULONG) 0, // size of tls zero fill + (ULONG) 0 // characteristics +}; + + static void error_exit(int err, const char *msg) { char *pstr; diff --git a/qemu-thread-win32.h b/qemu-thread-win32.h index 6cb248f..2ffe7de 100644 --- a/qemu-thread-win32.h +++ b/qemu-thread-win32.h @@ -1,6 +1,7 @@ #ifndef __QEMU_THREAD_WIN32_H #define __QEMU_THREAD_WIN32_H 1 -#include "windows.h" +#include <windows.h> +#include <winnt.h> struct QemuMutex { CRITICAL_SECTION lock; @@ -27,4 +28,41 @@ struct QemuThread { void *ret; }; +typedef struct _TEB { + _NT_TIB NtTib; + void* EnvironmentPointer; + void *x[3]; + char **ThreadLocalStoragePointer; +} TEB, *PTEB; + +/* 1) The initial contents TLS variables is placed in the .tls section. */ + +#define DEFINE_TLS(type, x) type tls__##x __attribute__((section(".tls$BBB"))) + +/* 2) _tls_index holds the number of our module. The executable should be + zero, DLLs are numbered 1 and up. The loader fills it in for us. */ + +extern int _tls_index; +extern int _tls_start; + +/* 3) Thus, Teb->ThreadLocalStoragePointer[_tls_index] is the base of + the TLS segment for this (thread, module) pair. Each segment has + the same layout as this module's .tls segment and is initialized + with the content of the .tls segment; 0 is the _tls_start variable. + So, get_tls passes us the offset of the passed variable relative to + _tls_start, and we return that same offset plus the base of segment. */ + +static inline __attribute__((__pure__)) void *_get_tls(int offset) +{ + PTEB Teb = NtCurrentTeb(); + return (char *)(Teb->ThreadLocalStoragePointer[_tls_index]) + offset; +} + +/* 4) get_tls, in addition to computing the offset, returns an lvalue. + "I got it. Magic." */ + +#define get_tls(x) \ + (*(__typeof__(tls__##x) *) \ + _get_tls((ULONG_PTR)&(tls__##x) - (ULONG_PTR)&_tls_start)) + #endif diff --git a/qemu-thread.h b/qemu-thread.h index ae75638..858c8cc 100644 --- a/qemu-thread.h +++ b/qemu-thread.h @@ -49,4 +49,6 @@ void qemu_thread_get_self(QemuThread *thread); int qemu_thread_is_self(QemuThread *thread); void qemu_thread_exit(void *retval); +#define DECLARE_TLS(type, x) extern DEFINE_TLS(type, x) + #endif
Win32 emulated TLS is slow and is not available on all versions of GCC. Actually, Win32 does have support for decent TLS, only GCC does not map __thread to it. The good news is, it's perfectly possible to declare TLS variables with simple C code! This patch adds wrappers to qemu-thread that will use __thread on POSIX systems, and the .tls segment on Windows. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- It does kinda uglify the declarations, but not too much IMO. Do we want to go this way? It will certainly limit the number of bug reports we get. coroutine-win32.c | 7 +++++-- qemu-thread-posix.h | 4 ++++ qemu-thread-win32.c | 17 +++++++++++++++++ qemu-thread-win32.h | 40 +++++++++++++++++++++++++++++++++++++++- qemu-thread.h | 2 ++ 5 files changed, 67 insertions(+), 3 deletions(-)