Message ID | 1400863633.12948.152.camel@otta |
---|---|
State | New |
Headers | show |
On Fri, 2014-05-23 at 11:47 -0500, Peter Bergner wrote: > The following patch gets bootstrap working again, but there seems to > be a large number of testsuite failures I will look into before > submitting the patch to the LLVM mailing list. FYI, it looks like a huge number of the failures are from the following in 32-bit mode: /home/bergner/gcc/build/gcc-fsf-mainline-asan-debug/powerpc64-linux/32/libsanitizer/asan/.libs/libasan.so: undefined reference to `__sync_fetch_and_add_8'^M collect2: error: ld returned 1 exit status^M Kosta, we don't have a __sync_fetch_and_add_8 in 32-bit, since we don't have the necessary hardware instructions to do a 64-bit fetch and add. How is this supposed to work? There are some libatomic routines we could call (eg, __atomic_fetch_add_8). Is that an option? ...or maybe we shouldn't even be attempting call a 64-bit fetch/add for things larger than a register size? Peter
On Fri, May 23, 2014 at 01:54:47PM -0500, Peter Bergner wrote: > On Fri, 2014-05-23 at 11:47 -0500, Peter Bergner wrote: > > The following patch gets bootstrap working again, but there seems to > > be a large number of testsuite failures I will look into before > > submitting the patch to the LLVM mailing list. > > FYI, it looks like a huge number of the failures are from the following > in 32-bit mode: > > /home/bergner/gcc/build/gcc-fsf-mainline-asan-debug/powerpc64-linux/32/libsanitizer/asan/.libs/libasan.so: undefined reference to `__sync_fetch_and_add_8'^M > collect2: error: ld returned 1 exit status^M > > Kosta, we don't have a __sync_fetch_and_add_8 in 32-bit, since > we don't have the necessary hardware instructions to do a 64-bit > fetch and add. How is this supposed to work? There are some > libatomic routines we could call (eg, __atomic_fetch_add_8). But libatomic in the end for that cases uses locking and non-atomic access anyway. On i?86 64-bit atomics are available starting with i586, on sparc starting with v9, but I believe on most architectures if you have atomics at all you don't have atomics larger than word size. For GCC, if you want in the preprocessor to test if a CPU has 64-bit atomics, you'd check #ifdef __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 Can you use locking instead otherwise? Jakub
Hi Peter, Last time I tried, asan did not work on ppc32 for a large number of different reasons. In upstream build system ppc32 is simply disabled, so imho it should be also disabled in the GCC build. If there is enough interest in ppc32, please work with up on fixing upstream and enabling the build bots -- then we can talk about supporting ppc32 in GCC (it will actually come for free then). (I can't say whether ppc64 works today, but at least it worked for me at some point). --kcc On Fri, May 23, 2014 at 10:54 PM, Peter Bergner <bergner@vnet.ibm.com> wrote: > On Fri, 2014-05-23 at 11:47 -0500, Peter Bergner wrote: >> The following patch gets bootstrap working again, but there seems to >> be a large number of testsuite failures I will look into before >> submitting the patch to the LLVM mailing list. > > FYI, it looks like a huge number of the failures are from the following > in 32-bit mode: > > /home/bergner/gcc/build/gcc-fsf-mainline-asan-debug/powerpc64-linux/32/libsanitizer/asan/.libs/libasan.so: undefined reference to `__sync_fetch_and_add_8'^M > collect2: error: ld returned 1 exit status^M > > Kosta, we don't have a __sync_fetch_and_add_8 in 32-bit, since > we don't have the necessary hardware instructions to do a 64-bit > fetch and add. How is this supposed to work? There are some > libatomic routines we could call (eg, __atomic_fetch_add_8). > Is that an option? ...or maybe we shouldn't even be attempting > call a 64-bit fetch/add for things larger than a register size? > > > Peter > >
On Mon, May 26, 2014 at 08:57:11AM +0400, Konstantin Serebryany wrote: > Last time I tried, asan did not work on ppc32 for a large number of > different reasons. ??? Comparing my 4.9.0 ppc/ppc64 testresults, for 32-bit I see: FAIL: c-c++-common/asan/null-deref-1.c -O0 output pattern test, is ASAN:SIGSEGV FAIL: c-c++-common/asan/null-deref-1.c -O1 output pattern test, is ASAN:SIGSEGV FAIL: c-c++-common/asan/null-deref-1.c -O2 output pattern test, is ASAN:SIGSEGV FAIL: c-c++-common/asan/null-deref-1.c -O3 -fomit-frame-pointer output pattern test, is ASAN:SIGSEGV FAIL: c-c++-common/asan/null-deref-1.c -O3 -g output pattern test, is ASAN:SIGSEGV FAIL: c-c++-common/asan/null-deref-1.c -Os output pattern test, is ASAN:SIGSEGV FAIL: c-c++-common/asan/null-deref-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none output pattern test, is ASAN:SIGSEGV FAIL: c-c++-common/asan/null-deref-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects output pattern test, is ASAN:SIGSEGV FAIL: c-c++-common/asan/pr59063-2.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (test for excess errors) UNRESOLVED: c-c++-common/asan/pr59063-2.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects compilation failed to produce executable and for 64-bit: FAIL: c-c++-common/asan/null-deref-1.c -O2 output pattern test, is ASAN:SIGSEGV FAIL: c-c++-common/asan/null-deref-1.c -O3 -fomit-frame-pointer output pattern test, is ASAN:SIGSEGV FAIL: c-c++-common/asan/null-deref-1.c -O3 -g output pattern test, is ASAN:SIGSEGV FAIL: c-c++-common/asan/null-deref-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none output pattern test, is ASAN:SIGSEGV FAIL: c-c++-common/asan/null-deref-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects output pattern test, is ASAN:SIGSEGV FAIL: c-c++-common/asan/pr59063-2.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (test for excess errors) UNRESOLVED: c-c++-common/asan/pr59063-2.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects compilation failed to produce executable FAIL: c-c++-common/asan/swapcontext-test-1.c -O0 execution test FAIL: c-c++-common/asan/swapcontext-test-1.c -O1 execution test FAIL: c-c++-common/asan/swapcontext-test-1.c -O2 execution test FAIL: c-c++-common/asan/swapcontext-test-1.c -O3 -fomit-frame-pointer execution test FAIL: c-c++-common/asan/swapcontext-test-1.c -O3 -fomit-frame-pointer -funroll-loops execution test FAIL: c-c++-common/asan/swapcontext-test-1.c -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions execution test FAIL: c-c++-common/asan/swapcontext-test-1.c -O3 -g execution test FAIL: c-c++-common/asan/swapcontext-test-1.c -Os execution test FAIL: c-c++-common/asan/swapcontext-test-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: c-c++-common/asan/swapcontext-test-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test Doesn't look like the ppc32 port would be in any worse shape than the 64-bit one. Peter has brought a real problem, in particular the allocator now newly relying on 2 x word size atomics which is definitely non-portable, I don't see why the answer to that should be disable your port or build a buildbot. Jakub
On Mon, May 26, 2014 at 9:57 AM, Jakub Jelinek <jakub@redhat.com> wrote: > On Mon, May 26, 2014 at 08:57:11AM +0400, Konstantin Serebryany wrote: >> Last time I tried, asan did not work on ppc32 for a large number of >> different reasons. > > ??? > Comparing my 4.9.0 ppc/ppc64 testresults, for 32-bit I see: > > FAIL: c-c++-common/asan/null-deref-1.c -O0 output pattern test, is ASAN:SIGSEGV > FAIL: c-c++-common/asan/null-deref-1.c -O1 output pattern test, is ASAN:SIGSEGV > FAIL: c-c++-common/asan/null-deref-1.c -O2 output pattern test, is ASAN:SIGSEGV > FAIL: c-c++-common/asan/null-deref-1.c -O3 -fomit-frame-pointer output pattern test, is ASAN:SIGSEGV > FAIL: c-c++-common/asan/null-deref-1.c -O3 -g output pattern test, is ASAN:SIGSEGV > FAIL: c-c++-common/asan/null-deref-1.c -Os output pattern test, is ASAN:SIGSEGV > FAIL: c-c++-common/asan/null-deref-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none output pattern test, is ASAN:SIGSEGV > FAIL: c-c++-common/asan/null-deref-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects output pattern test, is ASAN:SIGSEGV > FAIL: c-c++-common/asan/pr59063-2.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (test for excess errors) > UNRESOLVED: c-c++-common/asan/pr59063-2.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects compilation failed to produce executable > > and for 64-bit: > > FAIL: c-c++-common/asan/null-deref-1.c -O2 output pattern test, is ASAN:SIGSEGV > FAIL: c-c++-common/asan/null-deref-1.c -O3 -fomit-frame-pointer output pattern test, is ASAN:SIGSEGV > FAIL: c-c++-common/asan/null-deref-1.c -O3 -g output pattern test, is ASAN:SIGSEGV > FAIL: c-c++-common/asan/null-deref-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none output pattern test, is ASAN:SIGSEGV > FAIL: c-c++-common/asan/null-deref-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects output pattern test, is ASAN:SIGSEGV > FAIL: c-c++-common/asan/pr59063-2.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (test for excess errors) > UNRESOLVED: c-c++-common/asan/pr59063-2.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects compilation failed to produce executable > FAIL: c-c++-common/asan/swapcontext-test-1.c -O0 execution test > FAIL: c-c++-common/asan/swapcontext-test-1.c -O1 execution test > FAIL: c-c++-common/asan/swapcontext-test-1.c -O2 execution test > FAIL: c-c++-common/asan/swapcontext-test-1.c -O3 -fomit-frame-pointer execution test > FAIL: c-c++-common/asan/swapcontext-test-1.c -O3 -fomit-frame-pointer -funroll-loops execution test > FAIL: c-c++-common/asan/swapcontext-test-1.c -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions execution test > FAIL: c-c++-common/asan/swapcontext-test-1.c -O3 -g execution test > FAIL: c-c++-common/asan/swapcontext-test-1.c -Os execution test > FAIL: c-c++-common/asan/swapcontext-test-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test > FAIL: c-c++-common/asan/swapcontext-test-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test > > Doesn't look like the ppc32 port would be in any worse shape than the 64-bit > one. > Peter has brought a real problem, in particular the allocator now newly relying on > 2 x word size atomics which is definitely non-portable, I don't see why the answer > to that should be disable your port or build a buildbot. Because this is my default reply to any such case. :) > > Jakub
> On Mon, May 26, 2014 at 9:57 AM, Jakub Jelinek <jakub@redhat.com> wrote: > > Doesn't look like the ppc32 port would be in any worse shape than the 64-bit > > one. > > Peter has brought a real problem, in particular the allocator now newly relying on > > 2 x word size atomics which is definitely non-portable, I don't see why the answer > > to that should be disable your port or build a buildbot. Right, the ppc32 results definitely show it's on par with the ppc64 results. On Mon, 2014-05-26 at 10:36 +0400, Konstantin Serebryany wrote: > Because this is my default reply to any such case. :) I hope that is a humorous reply and not a serious one. In one of my other posts, I asked should 32-bit ports even attempt to use the 2 * word_size atomics. What is the code doing such that it wants to use a 2 * word_size atomic? Is it as simple as commenting that code out for 32-bit builds of the library or do we really have to support that? Peter
On Tue, May 27, 2014 at 6:25 AM, Peter Bergner <bergner@vnet.ibm.com> wrote: >> On Mon, May 26, 2014 at 9:57 AM, Jakub Jelinek <jakub@redhat.com> wrote: >> > Doesn't look like the ppc32 port would be in any worse shape than the 64-bit >> > one. >> > Peter has brought a real problem, in particular the allocator now newly relying on >> > 2 x word size atomics which is definitely non-portable, I don't see why the answer >> > to that should be disable your port or build a buildbot. > > Right, the ppc32 results definitely show it's on par with the ppc64 results. > > > On Mon, 2014-05-26 at 10:36 +0400, Konstantin Serebryany wrote: >> Because this is my default reply to any such case. :) > > I hope that is a humorous reply and not a serious one. Not really humorous. Our position is and always was: "If you are adding support for a new architecture/platform, we encourage you to set up a public build bot, otherwise we can not guarantee that we will keep your code in working conditions." https://code.google.com/p/address-sanitizer/wiki/HowToContribute > In one of my other posts, I asked should 32-bit ports even attempt > to use the 2 * word_size atomics. What is the code doing such that > it wants to use a 2 * word_size atomic? Is it as simple as commenting > that code out for 32-bit builds of the library or do we really have > to support that? Frankly, I don't remember where asan can use 2 * word_size atomic. This might be some mistake in new code indeed. Do you see what function calls these atomics? --kcc > > Peter > >
On May 26, 2014, at 10:13 PM, Konstantin Serebryany <konstantin.s.serebryany@gmail.com> wrote: >> On Mon, 2014-05-26 at 10:36 +0400, Konstantin Serebryany wrote: >>> Because this is my default reply to any such case. :) >> >> I hope that is a humorous reply and not a serious one. > > Not really humorous. Our position is and always was We don’t expect a guarantee that you keep code working. Only that when _you_ break things that you try and help out as you can, and if you cannot, merely to ask others for help. Doesn’t seem to me to be an unreasonable position and seems to have worked fairly well for the past 27 years. So, the right way to treat a regression that you hear about from the gcc side, is exactly the same way you handle a green to red transition on a build bot. So, let me ask, when you break a build bot, is your first response to want to disable the port the regression is found with? If not, then why treat the regression found on the gcc side any different?
On Mon, May 26, 2014 at 09:25:37PM -0500, Peter Bergner wrote: > In one of my other posts, I asked should 32-bit ports even attempt > to use the 2 * word_size atomics. What is the code doing such that > it wants to use a 2 * word_size atomic? Is it as simple as commenting > that code out for 32-bit builds of the library or do we really have > to support that? BTW, just checked and I don't see any 2 * word_size atomics in i686 libasan.so.1 - no cmpxchg8b insns anywhere. Thus I think it would be really nice if you could point out where exactly is the 64-bit atomic really needed (or gcc options + preprocessed source so that it can be investigated in a cross-compiler). Jakub
On Tue, May 27, 2014 at 9:53 PM, Mike Stump <mikestump@comcast.net> wrote: > > On May 26, 2014, at 10:13 PM, Konstantin Serebryany <konstantin.s.serebryany@gmail.com> wrote: >>> On Mon, 2014-05-26 at 10:36 +0400, Konstantin Serebryany wrote: >>>> Because this is my default reply to any such case. :) >>> >>> I hope that is a humorous reply and not a serious one. >> >> Not really humorous. Our position is and always was > > We don’t expect a guarantee that you keep code working. Only that when _you_ break things that you try and help out as you can, and if you cannot, merely to ask others for help. Doesn’t seem to me to be an unreasonable position and seems to have worked fairly well for the past 27 years. > > So, the right way to treat a regression that you hear about from the gcc side, is exactly the same way you handle a green to red transition on a build bot. > > So, let me ask, when you break a build bot, is your first response to want to disable the port the regression is found with? If not, then why treat the regression found on the gcc side any different? If a bot breaks, we know about it within tens of minutes. The context is still fresh, fixing such failure is typically a matter of minutes and costs us nothing. Of course, the bot should be accompanied with a person who supports it and can help us if we don't understand the failure. I am extremely happy to see the FreeBSD build bot, which became green recently. http://lab.llvm.org:8011/builders/sanitizer_x86_64-freeBSD9.2 If a problem is discovered 5 months after it has been introduced, the cost for fixing it is much higher as it involves more testing, more code reviews, etc. Every time a platform owner sends us a patch here, we need to remind that the patch has to be sent upstream, and every second time we need to explain why. Our code is unusual in many ways and we have to edit half of the patches we receive. We have our preferred way of sending patches which some of the contributors tend to ignore (I have strong opinions about the rules for sending patches in GCC, but I keep these opinions to myself and try to follow the rules; some of the GCC community members neglect to follow *our* rules). Moreover, if we know that something breaks on a given platform, we may change our design decision quickly. If we learn about breakage months later when design is finalized, changing the design is much more work. I remember Jakub asked to redesign a part of code to make it more portable. The idea was reasonable, but I would not even attempt it w/o having bots for all supported platforms. Finally, there is a psychological aspect -- I get sad when I learn that I broke someone's code, even when that someone is very polite (not all of someones are polite though). I want to know about a breakage first -- feels better. --kcc
On May 27, 2014, at 11:16 AM, Konstantin Serebryany <konstantin.s.serebryany@gmail.com> wrote: > On Tue, May 27, 2014 at 9:53 PM, Mike Stump <mikestump@comcast.net> wrote: >> >> On May 26, 2014, at 10:13 PM, Konstantin Serebryany <konstantin.s.serebryany@gmail.com> wrote: >>>> On Mon, 2014-05-26 at 10:36 +0400, Konstantin Serebryany wrote: >>>>> Because this is my default reply to any such case. :) >>>> >>>> I hope that is a humorous reply and not a serious one. >>> >>> Not really humorous. Our position is and always was >> >> We don’t expect a guarantee that you keep code working. Only that when _you_ break things that you try and help out as you can, and if you cannot, merely to ask others for help. Doesn’t seem to me to be an unreasonable position and seems to have worked fairly well for the past 27 years. >> >> So, the right way to treat a regression that you hear about from the gcc side, is exactly the same way you handle a green to red transition on a build bot. >> >> So, let me ask, when you break a build bot, is your first response to want to disable the port the regression is found with? If not, then why treat the regression found on the gcc side any different? > > If a bot breaks, we know about it within tens of minutes. I appreciate that. I also appreciate that if all bugs in a checkin can be pointed out in 2 minutes, that is very nice indeed. All I can say is, we welcome your contribution to make that happen… but, despite that, it is generally impolite to put in bugs and say let’s remove the port that exposes the bug. If you can do no better, then at least ignore it, this is more polite. There will always be bugs, and you will always put some in. Life goes on. > If we learn about breakage months later when design is finalized, Software is never finalized, neither are designs. > changing the design is much more work. Yes, software is hard. :-) Just wait til you try and solve a problem that autoconf was design to solve, then you will hard truly turned to the dark side. > Finally, there is a psychological aspect -- I get sad when I learn > that I broke someone's code Don’t be sad. Treat them as you would a slow running build bot. Try and make them green. :-)
On Tue, 2014-05-27 at 20:07 +0200, Jakub Jelinek wrote: > On Mon, May 26, 2014 at 09:25:37PM -0500, Peter Bergner wrote: > > In one of my other posts, I asked should 32-bit ports even attempt > > to use the 2 * word_size atomics. What is the code doing such that > > it wants to use a 2 * word_size atomic? Is it as simple as commenting > > that code out for 32-bit builds of the library or do we really have > > to support that? > > BTW, just checked and I don't see any 2 * word_size atomics in i686 > libasan.so.1 - no cmpxchg8b insns anywhere. Thus I think it would > be really nice if you could point out where exactly is the 64-bit > atomic really needed (or gcc options + preprocessed source so that > it can be investigated in a cross-compiler). It's being called form basically two files: [bergner@makalu-lp1 gcc-fsf-mainline-asan-debug]$ find . -name '*.o' | xargs nm -AC | grep sync_fetch_and_add_8 ./powerpc64-linux/32/libsanitizer/sanitizer_common/.libs/sanitizer_allocator.o: U __sync_fetch_and_add_8 ./powerpc64-linux/32/libsanitizer/sanitizer_common/sanitizer_allocator.o: U __sync_fetch_and_add_8 ./powerpc64-linux/32/libsanitizer/asan/.libs/asan_allocator2.o: U __sync_fetch_and_add_8 ./powerpc64-linux/32/libsanitizer/asan/asan_allocator2.o: U __sync_fetch_and_add_8 It seems to be an expansion of the atomic_load template in both object files: 00000000 <__sanitizer::atomic_uint64_t::Type __sanitizer::atomic_load<__sanitizer::atomic_uint64_t>(__sanitize r::atomic_uint64_t const volatile*, __sanitizer::memory_order)>: 0: 94 21 ff d0 stwu r1,-48(r1) 4: 7c 08 02 a6 mflr r0 8: 90 01 00 34 stw r0,52(r1) c: 93 c1 00 28 stw r30,40(r1) 10: 42 9f 00 05 bcl 20,4*cr7+so,14 <__sanitizer::atomic_uint64_t::Type __sanitizer::atomic_load<__ sanitizer::atomic_uint64_t>(__sanitizer::atomic_uint64_t const volatile*, __sanitizer::memory_order)+0x14> 14: 7f c8 02 a6 mflr r30 18: 3f de 00 00 addis r30,r30,0 1a: R_PPC_REL16_HA .got2+0x8006 1c: 3b de 00 00 addi r30,r30,0 1e: R_PPC_REL16_LO .got2+0x800a 20: 90 61 00 18 stw r3,24(r1) 24: 90 81 00 1c stw r4,28(r1) 28: 81 21 00 18 lwz r9,24(r1) 2c: 38 a0 00 00 li r5,0 30: 38 c0 00 00 li r6,0 34: 7d 23 4b 78 mr r3,r9 38: 48 00 00 01 bl 38 <__sanitizer::atomic_uint64_t::Type __sanitizer::atomic_load<__sanitizer::a tomic_uint64_t>(__sanitizer::atomic_uint64_t const volatile*, __sanitizer::memory_order)+0x38> 38: R_PPC_PLTREL24 __sync_fetch_and_add_8+0x8000 3c: 90 61 00 20 stw r3,32(r1) 40: 90 81 00 24 stw r4,36(r1) 44: c8 01 00 20 lfd f0,32(r1) 48: d8 01 00 08 stfd f0,8(r1) 4c: 81 21 00 08 lwz r9,8(r1) 50: 81 41 00 0c lwz r10,12(r1) 54: 7d 23 4b 78 mr r3,r9 58: 7d 44 53 78 mr r4,r10 5c: 80 01 00 34 lwz r0,52(r1) 60: 7c 08 03 a6 mtlr r0 64: 83 c1 00 28 lwz r30,40(r1) 68: 38 21 00 30 addi r1,r1,48 6c: 4e 80 00 20 blr This template comes from the ./sanitizer_common/sanitizer_atomic_clang_other.h header file: template<typename T> INLINE typename T::Type atomic_load( const volatile T *a, memory_order mo) { DCHECK(mo & (memory_order_relaxed | memory_order_consume | memory_order_acquire | memory_order_seq_cst)); DCHECK(!((uptr)a % sizeof(*a))); typename T::Type v; if (sizeof(*a) < 8 || sizeof(void*) == 8) { // Assume that aligned loads are atomic. if (mo == memory_order_relaxed) { v = a->val_dont_use; } else if (mo == memory_order_consume) { // Assume that processor respects data dependencies // (and that compiler won't break them). __asm__ __volatile__("" ::: "memory"); v = a->val_dont_use; __asm__ __volatile__("" ::: "memory"); } else if (mo == memory_order_acquire) { __asm__ __volatile__("" ::: "memory"); v = a->val_dont_use; __sync_synchronize(); } else { // seq_cst // E.g. on POWER we need a hw fence even before the store. __sync_synchronize(); v = a->val_dont_use; __sync_synchronize(); } } else { // 64-bit load on 32-bit platform. // Gross, but simple and reliable. // Assume that it is not in read-only memory. v = __sync_fetch_and_add( const_cast<typename T::Type volatile *>(&a->val_dont_use), 0); } return v; } It seems x86 has it's own version in sanitizer_atomic_clang_x86.h which maybe explains why i686 doesn't see this call? It does: ... } else { // 64-bit load on 32-bit platform. __asm__ __volatile__( "movq %1, %%mm0;" // Use mmx reg for 64-bit atomic moves "movq %%mm0, %0;" // (ptr could be read-only) "emms;" // Empty mmx state/Reset FP regs : "=m" (v) : "m" (a->val_dont_use) : // mark the FP stack and mmx registers as clobbered "st", "st(1)", "st(2)", "st(3)", "st(4)", "st(5)", "st(6)", "st(7)", #ifdef __MMX__ "mm0", "mm1", "mm2", "mm3", "mm4", "mm5", "mm6", "mm7", #endif // #ifdef __MMX__ "memory"); Peter
On Tue, May 27, 2014 at 04:02:08PM -0500, Peter Bergner wrote: > It's being called form basically two files: > > [bergner@makalu-lp1 gcc-fsf-mainline-asan-debug]$ find . -name '*.o' | xargs nm -AC | grep sync_fetch_and_add_8 > ./powerpc64-linux/32/libsanitizer/sanitizer_common/.libs/sanitizer_allocator.o: U __sync_fetch_and_add_8 > ./powerpc64-linux/32/libsanitizer/sanitizer_common/sanitizer_allocator.o: U __sync_fetch_and_add_8 > ./powerpc64-linux/32/libsanitizer/asan/.libs/asan_allocator2.o: U __sync_fetch_and_add_8 > ./powerpc64-linux/32/libsanitizer/asan/asan_allocator2.o: U __sync_fetch_and_add_8 Does ppc32 have any atomic 64-bit loads/stores (in the sense that the aligned 64 bits are written as one memory transaction, not each 32-bit word separately)? In any case, atomic_uint64_t there seems to be used only for some statistic counter and not really atomic anyway, as additions aren't performed using atomic instructions, but just atomic load, followed by normal arithmetics, followed by atomic store. Can't 32-bit counters be used instead on 32-bit arches? I see there is another spot with atomic_uint64_t in sanitizer_lfstack.h, but for some reason it isn't used now at all (there it would want to use 64-bit compare and exchange). Also note that the use of MMX (ugh) means that it is expensive (emms) and will not work on pre-MMX chips. Jakub
On Tue, May 27, 2014 at 11:50:33PM +0200, Jakub Jelinek wrote: > On Tue, May 27, 2014 at 04:02:08PM -0500, Peter Bergner wrote: > > It's being called form basically two files: > > > > [bergner@makalu-lp1 gcc-fsf-mainline-asan-debug]$ find . -name '*.o' | xargs nm -AC | grep sync_fetch_and_add_8 > > ./powerpc64-linux/32/libsanitizer/sanitizer_common/.libs/sanitizer_allocator.o: U __sync_fetch_and_add_8 > > ./powerpc64-linux/32/libsanitizer/sanitizer_common/sanitizer_allocator.o: U __sync_fetch_and_add_8 > > ./powerpc64-linux/32/libsanitizer/asan/.libs/asan_allocator2.o: U __sync_fetch_and_add_8 > > ./powerpc64-linux/32/libsanitizer/asan/asan_allocator2.o: U __sync_fetch_and_add_8 > > Does ppc32 have any atomic 64-bit loads/stores (in the sense that the aligned > 64 bits are written as one memory transaction, not each 32-bit word > separately)? > In any case, atomic_uint64_t there seems to be used only for some statistic > counter and not really atomic anyway, as additions aren't performed using > atomic instructions, but just atomic load, followed by normal arithmetics, > followed by atomic store. > Can't 32-bit counters be used instead on 32-bit arches? > > I see there is another spot with atomic_uint64_t in sanitizer_lfstack.h, > but for some reason it isn't used now at all (there it would want to use > 64-bit compare and exchange). On ARM, while it supposedly links, because __sync_compare_and_exchange_8 is defined in libgcc.a, it will only work with post-2011 kernels and is going to be very slow (because you do a separate compare and exchange to load and separate compare and exchange to store, i.e. twice as much effort to actually not achieve atomicity, sometimes even 2 syscalls). Jakub
On Tue, 2014-05-27 at 23:50 +0200, Jakub Jelinek wrote: > On Tue, May 27, 2014 at 04:02:08PM -0500, Peter Bergner wrote: > > It's being called form basically two files: > > > > [bergner@makalu-lp1 gcc-fsf-mainline-asan-debug]$ find . -name '*.o' | xargs nm -AC | grep sync_fetch_and_add_8 > > ./powerpc64-linux/32/libsanitizer/sanitizer_common/.libs/sanitizer_allocator.o: U __sync_fetch_and_add_8 > > ./powerpc64-linux/32/libsanitizer/sanitizer_common/sanitizer_allocator.o: U __sync_fetch_and_add_8 > > ./powerpc64-linux/32/libsanitizer/asan/.libs/asan_allocator2.o: U __sync_fetch_and_add_8 > > ./powerpc64-linux/32/libsanitizer/asan/asan_allocator2.o: U __sync_fetch_and_add_8 > > Does ppc32 have any atomic 64-bit loads/stores (in the sense that the aligned > 64 bits are written as one memory transaction, not each 32-bit word > separately)? The only option I can think of would be using the floating point load/store instructions lfd/stfd. Of course if we're going to operate on them, then we'd need to copy them back to memory and then into the integer registers again....before copying them back to the FP registers (thru memory again) so we can store them with the stfd. > In any case, atomic_uint64_t there seems to be used only for some statistic > counter and not really atomic anyway, as additions aren't performed using > atomic instructions, but just atomic load, followed by normal arithmetics, > followed by atomic store. > Can't 32-bit counters be used instead on 32-bit arches? Using 32-bit counters would be easiest if we can, but Kostya will have to answer that. Peter
On Tue, May 27, 2014 at 05:04:52PM -0500, Peter Bergner wrote: > On Tue, 2014-05-27 at 23:50 +0200, Jakub Jelinek wrote: > > On Tue, May 27, 2014 at 04:02:08PM -0500, Peter Bergner wrote: > > > It's being called form basically two files: > > > > > > [bergner@makalu-lp1 gcc-fsf-mainline-asan-debug]$ find . -name '*.o' | xargs nm -AC | grep sync_fetch_and_add_8 > > > ./powerpc64-linux/32/libsanitizer/sanitizer_common/.libs/sanitizer_allocator.o: U __sync_fetch_and_add_8 > > > ./powerpc64-linux/32/libsanitizer/sanitizer_common/sanitizer_allocator.o: U __sync_fetch_and_add_8 > > > ./powerpc64-linux/32/libsanitizer/asan/.libs/asan_allocator2.o: U __sync_fetch_and_add_8 > > > ./powerpc64-linux/32/libsanitizer/asan/asan_allocator2.o: U __sync_fetch_and_add_8 > > > > Does ppc32 have any atomic 64-bit loads/stores (in the sense that the aligned > > 64 bits are written as one memory transaction, not each 32-bit word > > separately)? > > The only option I can think of would be using the floating point > load/store instructions lfd/stfd. Of course if we're going to > operate on them, then we'd need to copy them back to memory and > then into the integer registers again....before copying them > back to the FP registers (thru memory again) so we can store > them with the stfd. I think that is pretty much comparable to the use of MMX in i?86 sanitizer. That will not help on soft-fp ppc, though not sure if anybody with soft-fp builds libsanitizer. > > In any case, atomic_uint64_t there seems to be used only for some statistic > > counter and not really atomic anyway, as additions aren't performed using > > atomic instructions, but just atomic load, followed by normal arithmetics, > > followed by atomic store. > > Can't 32-bit counters be used instead on 32-bit arches? > > Using 32-bit counters would be easiest if we can, but Kostya will have > to answer that. I agree with that. atomic_uintptr_t would be the type to use I think. Jakub
On Tue, 2014-05-27 at 17:04 -0500, Peter Bergner wrote: > On Tue, 2014-05-27 at 23:50 +0200, Jakub Jelinek wrote: > > Does ppc32 have any atomic 64-bit loads/stores (in the sense that the aligned > > 64 bits are written as one memory transaction, not each 32-bit word > > separately)? > > The only option I can think of would be using the floating point > load/store instructions lfd/stfd. Of course if we're going to > operate on them, then we'd need to copy them back to memory and > then into the integer registers again....before copying them > back to the FP registers (thru memory again) so we can store > them with the stfd. Looking at the atomic_store template expansion for atomic_uint64_t which calls __sync_val_compare_and_swap, it does use lfd/stfd. Peter
On Tue, May 27, 2014 at 11:00 PM, Jakub Jelinek <jakub@redhat.com> wrote: > On Tue, May 27, 2014 at 11:50:33PM +0200, Jakub Jelinek wrote: >> On Tue, May 27, 2014 at 04:02:08PM -0500, Peter Bergner wrote: >> > It's being called form basically two files: >> > >> > [bergner@makalu-lp1 gcc-fsf-mainline-asan-debug]$ find . -name '*.o' | xargs nm -AC | grep sync_fetch_and_add_8 >> > ./powerpc64-linux/32/libsanitizer/sanitizer_common/.libs/sanitizer_allocator.o: U __sync_fetch_and_add_8 >> > ./powerpc64-linux/32/libsanitizer/sanitizer_common/sanitizer_allocator.o: U __sync_fetch_and_add_8 >> > ./powerpc64-linux/32/libsanitizer/asan/.libs/asan_allocator2.o: U __sync_fetch_and_add_8 >> > ./powerpc64-linux/32/libsanitizer/asan/asan_allocator2.o: U __sync_fetch_and_add_8 >> >> Does ppc32 have any atomic 64-bit loads/stores (in the sense that the aligned >> 64 bits are written as one memory transaction, not each 32-bit word >> separately)? >> In any case, atomic_uint64_t there seems to be used only for some statistic >> counter and not really atomic anyway, as additions aren't performed using >> atomic instructions, but just atomic load, followed by normal arithmetics, >> followed by atomic store. >> Can't 32-bit counters be used instead on 32-bit arches? >> >> I see there is another spot with atomic_uint64_t in sanitizer_lfstack.h, >> but for some reason it isn't used now at all (there it would want to use >> 64-bit compare and exchange). > > On ARM, while it supposedly links, because __sync_compare_and_exchange_8 > is defined in libgcc.a, it will only work with post-2011 kernels and is > going to be very slow (because you do a separate compare and exchange FTR, this call down to the library function should only be for legacy architectures. On ARM we have a 64 bit atomic compare and exchange which can be done with the ldrexd / strexd instructions at the right architecture level (v6k and above IIRC). Ramana
Dmitry, You've introduced atomic_uint64_t stats_[AllocatorStatCount]; in http://llvm.org/viewvc/llvm-project?view=revision&revision=173332 Do you mind to change it to atomic_uintptr_t? There is of course a chance of overflow on 32-bit arch (the number of mallocs in a process may easily go over 2^32 in a long run), but this is just stats, I think we can tolerate it. --kcc On Wed, May 28, 2014 at 2:41 AM, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote: > On Tue, May 27, 2014 at 11:00 PM, Jakub Jelinek <jakub@redhat.com> wrote: >> On Tue, May 27, 2014 at 11:50:33PM +0200, Jakub Jelinek wrote: >>> On Tue, May 27, 2014 at 04:02:08PM -0500, Peter Bergner wrote: >>> > It's being called form basically two files: >>> > >>> > [bergner@makalu-lp1 gcc-fsf-mainline-asan-debug]$ find . -name '*.o' | xargs nm -AC | grep sync_fetch_and_add_8 >>> > ./powerpc64-linux/32/libsanitizer/sanitizer_common/.libs/sanitizer_allocator.o: U __sync_fetch_and_add_8 >>> > ./powerpc64-linux/32/libsanitizer/sanitizer_common/sanitizer_allocator.o: U __sync_fetch_and_add_8 >>> > ./powerpc64-linux/32/libsanitizer/asan/.libs/asan_allocator2.o: U __sync_fetch_and_add_8 >>> > ./powerpc64-linux/32/libsanitizer/asan/asan_allocator2.o: U __sync_fetch_and_add_8 >>> >>> Does ppc32 have any atomic 64-bit loads/stores (in the sense that the aligned >>> 64 bits are written as one memory transaction, not each 32-bit word >>> separately)? >>> In any case, atomic_uint64_t there seems to be used only for some statistic >>> counter and not really atomic anyway, as additions aren't performed using >>> atomic instructions, but just atomic load, followed by normal arithmetics, >>> followed by atomic store. >>> Can't 32-bit counters be used instead on 32-bit arches? >>> >>> I see there is another spot with atomic_uint64_t in sanitizer_lfstack.h, >>> but for some reason it isn't used now at all (there it would want to use >>> 64-bit compare and exchange). >> >> On ARM, while it supposedly links, because __sync_compare_and_exchange_8 >> is defined in libgcc.a, it will only work with post-2011 kernels and is >> going to be very slow (because you do a separate compare and exchange > > FTR, this call down to the library function should only be for legacy > architectures. > > On ARM we have a 64 bit atomic compare and exchange which can be done > with the ldrexd / strexd instructions at the right architecture level > (v6k and above IIRC). > > Ramana
On Tue, May 27, 2014 at 05:04:52PM -0500, Peter Bergner wrote: > On Tue, 2014-05-27 at 23:50 +0200, Jakub Jelinek wrote: > > On Tue, May 27, 2014 at 04:02:08PM -0500, Peter Bergner wrote: > > > It's being called form basically two files: > > > > > > [bergner@makalu-lp1 gcc-fsf-mainline-asan-debug]$ find . -name '*.o' | xargs nm -AC | grep sync_fetch_and_add_8 > > > ./powerpc64-linux/32/libsanitizer/sanitizer_common/.libs/sanitizer_allocator.o: U __sync_fetch_and_add_8 > > > ./powerpc64-linux/32/libsanitizer/sanitizer_common/sanitizer_allocator.o: U __sync_fetch_and_add_8 > > > ./powerpc64-linux/32/libsanitizer/asan/.libs/asan_allocator2.o: U __sync_fetch_and_add_8 > > > ./powerpc64-linux/32/libsanitizer/asan/asan_allocator2.o: U __sync_fetch_and_add_8 > > > > Does ppc32 have any atomic 64-bit loads/stores (in the sense that the aligned > > 64 bits are written as one memory transaction, not each 32-bit word > > separately)? > > The only option I can think of would be using the floating point > load/store instructions lfd/stfd. Of course if we're going to > operate on them, then we'd need to copy them back to memory and > then into the integer registers again....before copying them > back to the FP registers (thru memory again) so we can store > them with the stfd. OT, if lfd/stfd is atomic (perhaps on a subset of CPUs), why doesn't config/rs6000/sync.md implement 32-bit atomic_{load,store}di using that insn? Though, e.g. http://lists.apple.com/archives/perfoptimization-dev/2008/Nov/msg00026.html suggests that it might not be atomic on some CPUs. Jakub
On Wed, May 28, 2014 at 8:36 AM, Konstantin Serebryany <konstantin.s.serebryany@gmail.com> wrote: > Dmitry, > You've introduced atomic_uint64_t stats_[AllocatorStatCount]; in > http://llvm.org/viewvc/llvm-project?view=revision&revision=173332 > Do you mind to change it to atomic_uintptr_t? > There is of course a chance of overflow on 32-bit arch (the number of > mallocs in a process may easily go over 2^32 in a long run), > but this is just stats, I think we can tolerate it. I removed 64-bit atomics from stats in llvm r209744. > On Wed, May 28, 2014 at 2:41 AM, Ramana Radhakrishnan > <ramana.gcc@googlemail.com> wrote: >> On Tue, May 27, 2014 at 11:00 PM, Jakub Jelinek <jakub@redhat.com> wrote: >>> On Tue, May 27, 2014 at 11:50:33PM +0200, Jakub Jelinek wrote: >>>> On Tue, May 27, 2014 at 04:02:08PM -0500, Peter Bergner wrote: >>>> > It's being called form basically two files: >>>> > >>>> > [bergner@makalu-lp1 gcc-fsf-mainline-asan-debug]$ find . -name '*.o' | xargs nm -AC | grep sync_fetch_and_add_8 >>>> > ./powerpc64-linux/32/libsanitizer/sanitizer_common/.libs/sanitizer_allocator.o: U __sync_fetch_and_add_8 >>>> > ./powerpc64-linux/32/libsanitizer/sanitizer_common/sanitizer_allocator.o: U __sync_fetch_and_add_8 >>>> > ./powerpc64-linux/32/libsanitizer/asan/.libs/asan_allocator2.o: U __sync_fetch_and_add_8 >>>> > ./powerpc64-linux/32/libsanitizer/asan/asan_allocator2.o: U __sync_fetch_and_add_8 >>>> >>>> Does ppc32 have any atomic 64-bit loads/stores (in the sense that the aligned >>>> 64 bits are written as one memory transaction, not each 32-bit word >>>> separately)? >>>> In any case, atomic_uint64_t there seems to be used only for some statistic >>>> counter and not really atomic anyway, as additions aren't performed using >>>> atomic instructions, but just atomic load, followed by normal arithmetics, >>>> followed by atomic store. >>>> Can't 32-bit counters be used instead on 32-bit arches? >>>> >>>> I see there is another spot with atomic_uint64_t in sanitizer_lfstack.h, >>>> but for some reason it isn't used now at all (there it would want to use >>>> 64-bit compare and exchange). >>> >>> On ARM, while it supposedly links, because __sync_compare_and_exchange_8 >>> is defined in libgcc.a, it will only work with post-2011 kernels and is >>> going to be very slow (because you do a separate compare and exchange >> >> FTR, this call down to the library function should only be for legacy >> architectures. >> >> On ARM we have a 64 bit atomic compare and exchange which can be done >> with the ldrexd / strexd instructions at the right architecture level >> (v6k and above IIRC). >> >> Ramana
Index: libsanitizer/asan/asan_linux.cc =================================================================== --- libsanitizer/asan/asan_linux.cc (revision 210861) +++ libsanitizer/asan/asan_linux.cc (working copy) @@ -186,6 +186,13 @@ void GetPcSpBp(void *context, uptr *pc, *bp = ucontext->uc_mcontext.gregs[REG_EBP]; *sp = ucontext->uc_mcontext.gregs[REG_ESP]; # endif +#elif defined(__powerpc__) || defined(__powerpc64__) + ucontext_t *ucontext = (ucontext_t*)context; + *pc = ucontext->uc_mcontext.regs->nip; + *sp = ucontext->uc_mcontext.regs->gpr[PT_R1]; + // The powerpc{,64}-linux ABIs do not specify r31 as the frame + // pointer, but GCC always uses r31 when we need a frame pointer. + *bp = ucontext->uc_mcontext.regs->gpr[PT_R31]; #elif defined(__sparc__) ucontext_t *ucontext = (ucontext_t*)context; uptr *stk_ptr; Index: libsanitizer/asan/asan_mapping.h =================================================================== --- libsanitizer/asan/asan_mapping.h (revision 210861) +++ libsanitizer/asan/asan_mapping.h (working copy) @@ -85,6 +85,7 @@ static const u64 kDefaultShadowOffset64 static const u64 kDefaultShort64bitShadowOffset = 0x7FFF8000; // < 2G. static const u64 kAArch64_ShadowOffset64 = 1ULL << 36; static const u64 kMIPS32_ShadowOffset32 = 0x0aaa8000; +static const u64 kPPC64_ShadowOffset64 = 1ULL << 41; static const u64 kFreeBSD_ShadowOffset32 = 1ULL << 30; // 0x40000000 static const u64 kFreeBSD_ShadowOffset64 = 1ULL << 46; // 0x400000000000 @@ -107,6 +108,8 @@ static const u64 kFreeBSD_ShadowOffset64 # else # if defined(__aarch64__) # define SHADOW_OFFSET kAArch64_ShadowOffset64 +# elif defined(__powerpc64__) +# define SHADOW_OFFSET kPPC64_ShadowOffset64 # elif SANITIZER_FREEBSD # define SHADOW_OFFSET kFreeBSD_ShadowOffset64 # elif SANITIZER_MAC Index: libsanitizer/sanitizer_common/sanitizer_common.h =================================================================== --- libsanitizer/sanitizer_common/sanitizer_common.h (revision 210861) +++ libsanitizer/sanitizer_common/sanitizer_common.h (working copy) @@ -26,7 +26,11 @@ struct StackTrace; const uptr kWordSize = SANITIZER_WORDSIZE / 8; const uptr kWordSizeInBits = 8 * kWordSize; -const uptr kCacheLineSize = 64; +#if defined(__powerpc__) || defined(__powerpc64__) + const uptr kCacheLineSize = 128; +#else + const uptr kCacheLineSize = 64; +#endif const uptr kMaxPathLength = 512; Index: libsanitizer/sanitizer_common/sanitizer_stacktrace.cc =================================================================== --- libsanitizer/sanitizer_common/sanitizer_stacktrace.cc (revision 210861) +++ libsanitizer/sanitizer_common/sanitizer_stacktrace.cc (working copy) @@ -16,11 +16,13 @@ namespace __sanitizer { uptr StackTrace::GetPreviousInstructionPc(uptr pc) { -#ifdef __arm__ +#if defined(__arm__) // Cancel Thumb bit. pc = pc & (~1); -#endif -#if defined(__sparc__) +#elif defined(__powerpc__) || defined(__powerpc64__) + // PCs are always 4 byte aligned. + return pc - 4; +#elif defined(__sparc__) return pc - 8; #else return pc - 1;