Message ID | 561E1E63.3020404@arm.com |
---|---|
State | New |
Headers | show |
On Wed, 14 Oct 2015, Szabolcs Nagy wrote: > _dl_tlsdesc_resolve_hold calls into a C function that clobbers r0, > but it assumes the original argument is still in r0 after the call. > This can cause crash in case of concurrent TLS access when TLSDESC > is in use (-mtls-dialect=gnu2). I presume this issue was user-visible in a release, in which case a bug should be filed in Bugzilla for it. Should I take it that it's hard to write a testcase for this bug that fails reliably without the patch?
On 14/10/15 13:17, Joseph Myers wrote: > On Wed, 14 Oct 2015, Szabolcs Nagy wrote: > >> _dl_tlsdesc_resolve_hold calls into a C function that clobbers r0, >> but it assumes the original argument is still in r0 after the call. >> This can cause crash in case of concurrent TLS access when TLSDESC >> is in use (-mtls-dialect=gnu2). > > I presume this issue was user-visible in a release, in which case a bug > should be filed in Bugzilla for it. Should I take it that it's hard to > write a testcase for this bug that fails reliably without the patch? > ok it's BZ 19129 i have a test that triggers this or the other tlsdesc race i was about to fix, but it is a bit expensive. (666 tls objects accessed from 66 threads, 2 versions to get tlsdesc entries 8byte-unaligned, running 10x triggers the failure, a smaller test triggers less frequently.) i'm not sure what's the glibc policy for expensive tests. and i think -mtls-dialect=gnu2 is not available in all supported gcc versions. (the test makes sense on all archs where lazy tls initialization may be racy).
On Wed, Oct 14, 2015 at 2:51 PM, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote: > On 14/10/15 13:17, Joseph Myers wrote: >> >> On Wed, 14 Oct 2015, Szabolcs Nagy wrote: >> >>> _dl_tlsdesc_resolve_hold calls into a C function that clobbers r0, >>> but it assumes the original argument is still in r0 after the call. >>> This can cause crash in case of concurrent TLS access when TLSDESC >>> is in use (-mtls-dialect=gnu2). >> >> >> I presume this issue was user-visible in a release, in which case a bug >> should be filed in Bugzilla for it. Should I take it that it's hard to >> write a testcase for this bug that fails reliably without the patch? >> > > ok it's BZ 19129 > > i have a test that triggers this or the other tlsdesc race > i was about to fix, but it is a bit expensive. > > (666 tls objects accessed from 66 threads, 2 versions to > get tlsdesc entries 8byte-unaligned, running 10x triggers > the failure, a smaller test triggers less frequently.) > > i'm not sure what's the glibc policy for expensive tests. > > and i think -mtls-dialect=gnu2 is not available in all > supported gcc versions. Support for this appeared in FSF GCC 4.7 Ramana > > (the test makes sense on all archs where lazy tls > initialization may be racy). >
On Wed, 14 Oct 2015, Szabolcs Nagy wrote: > On 14/10/15 13:17, Joseph Myers wrote: > > On Wed, 14 Oct 2015, Szabolcs Nagy wrote: > > > > > _dl_tlsdesc_resolve_hold calls into a C function that clobbers r0, > > > but it assumes the original argument is still in r0 after the call. > > > This can cause crash in case of concurrent TLS access when TLSDESC > > > is in use (-mtls-dialect=gnu2). > > > > I presume this issue was user-visible in a release, in which case a bug > > should be filed in Bugzilla for it. Should I take it that it's hard to > > write a testcase for this bug that fails reliably without the patch? > > > > ok it's BZ 19129 Thanks. The patch is OK with the usual [BZ #19129] notation in the ChangeLog, addition to the list of fixed bugs in NEWS and closing of the bug as FIXED with the right milestone. > i have a test that triggers this or the other tlsdesc race > i was about to fix, but it is a bit expensive. > > (666 tls objects accessed from 66 threads, 2 versions to > get tlsdesc entries 8byte-unaligned, running 10x triggers > the failure, a smaller test triggers less frequently.) > > i'm not sure what's the glibc policy for expensive tests. Well, at least it should be attached to the bug in Bugzilla so it's available to other people. What are the actual memory / CPU time requirements of this test?
On 14/10/15 16:09, Joseph Myers wrote: > On Wed, 14 Oct 2015, Szabolcs Nagy wrote: >> On 14/10/15 13:17, Joseph Myers wrote: >>> On Wed, 14 Oct 2015, Szabolcs Nagy wrote: >> i have a test that triggers this or the other tlsdesc race >> i was about to fix, but it is a bit expensive. >> >> (666 tls objects accessed from 66 threads, 2 versions to >> get tlsdesc entries 8byte-unaligned, running 10x triggers >> the failure, a smaller test triggers less frequently.) >> >> i'm not sure what's the glibc policy for expensive tests. > > Well, at least it should be attached to the bug in Bugzilla so it's > available to other people. What are the actual memory / CPU time > requirements of this test? > i can run the test 10 times under a second, the memory usage is small (but the address space usage of the thread stacks may cause problems with memory overcommit disabled). native compilation of the 666 sections takes some time (i put each tls object into different section so each gets its own tlsdesc relocations). (the code is partly generated i see if i can cut it down a bit more for inclusion into glibc).
diff --git a/sysdeps/arm/dl-tlsdesc.S b/sysdeps/arm/dl-tlsdesc.S index e42ca68..33a2695 100644 --- a/sysdeps/arm/dl-tlsdesc.S +++ b/sysdeps/arm/dl-tlsdesc.S @@ -196,21 +196,30 @@ _dl_tlsdesc_lazy_resolver: eabi_fnstart .align 2 _dl_tlsdesc_resolve_hold: - eabi_save ({r2,r3,ip,lr}) - push {r2, r3, ip, lr} - cfi_adjust_cfa_offset (16) - cfi_rel_offset (r2, 0) - cfi_rel_offset (r3, 4) - cfi_rel_offset (ip, 8) - cfi_rel_offset (lr, 12) + /* r0 is saved so its original value can be used after the call and + r1 is saved only to keep the stack aligned. (r0 points to the tls + descriptor, it is passed to _dl_tlsdesc_resolve_hold_fixup which + is a void function that may clobber r0, later r0 is used to load + the new resolver.) */ + eabi_save ({r0,r1,r2,r3,ip,lr}) + push {r0, r1, r2, r3, ip, lr} + cfi_adjust_cfa_offset (24) + cfi_rel_offset (r0, 0) + cfi_rel_offset (r1, 4) + cfi_rel_offset (r2, 8) + cfi_rel_offset (r3, 12) + cfi_rel_offset (ip, 16) + cfi_rel_offset (lr, 20) adr r1, _dl_tlsdesc_resolve_hold bl _dl_tlsdesc_resolve_hold_fixup - pop {r2, r3, ip, lr} - cfi_adjust_cfa_offset (-16) + pop {r0, r1, r2, r3, ip, lr} + cfi_adjust_cfa_offset (-24) cfi_restore (lr) cfi_restore (ip) cfi_restore (r3) cfi_restore (r2) + cfi_restore (r1) + cfi_restore (r0) sfi_breg r0, \ ldr r1, [\B, #4] BX (r1)