Message ID | Pine.LNX.4.64.1008181402450.9287@hs20-bc2-1.build.redhat.com |
---|---|
State | Accepted |
Delegated to: | David Miller |
Headers | show |
From: Mikulas Patocka <mpatocka@redhat.com> Date: Wed, 18 Aug 2010 14:03:37 -0400 (EDT) > Simple microoptimizations for sparc64 atomic functions: > Save one instruction by using a delay slot. > Use %g1 instead of %g7, because %g1 is written earlier. > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Does it really make things faster? What timings have you done on these code sequences? -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 18 Aug 2010, David Miller wrote: > From: Mikulas Patocka <mpatocka@redhat.com> > Date: Wed, 18 Aug 2010 14:03:37 -0400 (EDT) > > > Simple microoptimizations for sparc64 atomic functions: > > Save one instruction by using a delay slot. > > Use %g1 instead of %g7, because %g1 is written earlier. > > > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > > Does it really make things faster? What timings have you > done on these code sequences? I don't think such microoptimizations can be measured. It may save an I-cacheline --- but who knows if exactly this cacheline makes some effect or not? The point is that making microoptimizations is *much* easier than measuring them, so I usually do them and never ask if they help. If you do 100 microoptimizations, it may have some effect in code size reduction, but you never pinpoint it to a single optimization. Mikulas -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Mikulas Patocka <mpatocka@redhat.com> Date: Wed, 18 Aug 2010 18:08:49 -0400 (EDT) > I don't think such microoptimizations can be measured. It may save an > I-cacheline --- but who knows if exactly this cacheline makes some effect > or not? These routines, when contention backoff is disabled, have intentionally been coded to be perfectly 8 instructions, which is exactly 32 bytes, which is exactly 1 I-cache line. You'll find that much of the by-hand sparc64 assembler routines have been written to be a multiple of 8 instructions. Because if you don't start a function on an I-cache line you get a partial fetch when it's called, therefore making it impossible to fill the pipeline even if the instructions could be executed in parallel. So actually you're changes are likely to hurt performance from a cache line and pipelining viewpoint. Furthermore, talking about saving one cycle (which I don't even think you'll get) when the CAS instruction itself is going to stall the chip for ~50 cycles is not all that worthwhile either. The UltraSPARC-I,II,III et al. programming manuals are pretty clear about code generation guidelines, I've been reading them for 10+ years, and that is what I've used to guide the writing of the assembler code. I've also run the code through simulators (when possible) and done cycle analysis (both hot and cold cache cases) on real hardware for these routines. So I basically expect the same kind of considerations from you if you want to "optimize" this code :-) I value your contribution but seriously I think the code is fine and optimal as-is. -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 18 Aug 2010, David Miller wrote: > From: Mikulas Patocka <mpatocka@redhat.com> > Date: Wed, 18 Aug 2010 18:08:49 -0400 (EDT) > > > I don't think such microoptimizations can be measured. It may save an > > I-cacheline --- but who knows if exactly this cacheline makes some effect > > or not? > > These routines, when contention backoff is disabled, have > intentionally been coded to be perfectly 8 instructions, which is > exactly 32 bytes, which is exactly 1 I-cache line. You'll find that > much of the by-hand sparc64 assembler routines have been written to be > a multiple of 8 instructions. They are not: 0000000000000000 <atomic_add>: 0000000000000028 <atomic_sub>: 0000000000000050 <atomic_add_ret>: 000000000000007c <atomic_sub_ret>: 00000000000000a8 <atomic64_add>: 00000000000000d0 <atomic64_sub>: 00000000000000f8 <atomic64_add_ret>: 0000000000000124 <atomic64_sub_ret>: (on UP compile without backoff). That dummy backoff code produces jump forward and backward. > Because if you don't start a function on an I-cache line you get a > partial fetch when it's called, therefore making it impossible to fill > the pipeline even if the instructions could be executed in parallel. So add .align there. > So actually you're changes are likely to hurt performance from a cache > line and pipelining viewpoint. > > Furthermore, talking about saving one cycle (which I don't even think > you'll get) when the CAS instruction itself is going to stall the chip > for ~50 cycles is not all that worthwhile either. If it's like x86 --- i.e. flush the whole pipe and execute microcode, that it doesn't make much sense to optimize ticks in the pipeline. Optimizing for cache pollution could make some sense. Mikulas > The UltraSPARC-I,II,III et al. programming manuals are pretty clear > about code generation guidelines, I've been reading them for 10+ > years, and that is what I've used to guide the writing of the > assembler code. I've also run the code through simulators (when > possible) and done cycle analysis (both hot and cold cache cases) on > real hardware for these routines. > > So I basically expect the same kind of considerations from you if you > want to "optimize" this code :-) > > I value your contribution but seriously I think the code is fine and > optimal as-is. > -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Mikulas Patocka <mpatocka@redhat.com> Date: Wed, 18 Aug 2010 19:44:05 -0400 (EDT) > > > On Wed, 18 Aug 2010, David Miller wrote: > >> These routines, when contention backoff is disabled, have >> intentionally been coded to be perfectly 8 instructions, which is >> exactly 32 bytes, which is exactly 1 I-cache line. You'll find that >> much of the by-hand sparc64 assembler routines have been written to be >> a multiple of 8 instructions. > > They are not: > 0000000000000000 <atomic_add>: > 0000000000000028 <atomic_sub>: > 0000000000000050 <atomic_add_ret>: > 000000000000007c <atomic_sub_ret>: > 00000000000000a8 <atomic64_add>: > 00000000000000d0 <atomic64_sub>: > 00000000000000f8 <atomic64_add_ret>: > 0000000000000124 <atomic64_sub_ret>: > > (on UP compile without backoff). That dummy backoff code produces jump > forward and backward. That's a bug, it should be just an empty macro expansion. >> So actually you're changes are likely to hurt performance from a cache >> line and pipelining viewpoint. >> >> Furthermore, talking about saving one cycle (which I don't even think >> you'll get) when the CAS instruction itself is going to stall the chip >> for ~50 cycles is not all that worthwhile either. > > If it's like x86 --- i.e. flush the whole pipe and execute microcode, that > it doesn't make much sense to optimize ticks in the pipeline. Yes, CAS bascially puts a bunch of micro-ops into the pipeline to do the load/compare/conditional-store atomically. -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Mikulas Patocka <mpatocka@redhat.com> Date: Wed, 18 Aug 2010 14:03:37 -0400 (EDT) > Simple microoptimizations for sparc64 atomic functions: > Save one instruction by using a delay slot. > Use %g1 instead of %g7, because %g1 is written earlier. > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Actually after some more consideration and studying of this code I've decided to apply your patch, thanks a lot Mikulas. I'll then add the BACKOFF macro fixup for the UP case. -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
BTW. two years ago, I reported a bug that userspace programs on sparc64 would randomly crash in a few hours with preempt enabled ( http://marc.info/?l=linux-sparc&m=121222607414906&w=4 ) Now I tried the same workload with 2.6.35 and it is stable with preempt. Are you aware that the bug was fixed? Or if not, it may be still lurking there and some unrelated change hid the race condition. Should I bisect the kernel to find out which patch actually fixed (or hid) the bug? Mikulas -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Mikulas Patocka <mpatocka@redhat.com> Date: Thu, 19 Aug 2010 09:15:16 -0400 (EDT) > BTW. two years ago, I reported a bug that userspace programs on sparc64 > would randomly crash in a few hours with preempt enabled > ( http://marc.info/?l=linux-sparc&m=121222607414906&w=4 ) > > Now I tried the same workload with 2.6.35 and it is stable with preempt. > > Are you aware that the bug was fixed? Or if not, it may be still lurking > there and some unrelated change hid the race condition. Should I bisect > the kernel to find out which patch actually fixed (or hid) the bug? I can't think of anything we did explicitly to fix that. I'm sure you can find more useful things to do than to try and bisect that thing down, really :-) -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 19 Aug 2010, David Miller wrote: > From: Mikulas Patocka <mpatocka@redhat.com> > Date: Thu, 19 Aug 2010 09:15:16 -0400 (EDT) > > > BTW. two years ago, I reported a bug that userspace programs on sparc64 > > would randomly crash in a few hours with preempt enabled > > ( http://marc.info/?l=linux-sparc&m=121222607414906&w=4 ) > > > > Now I tried the same workload with 2.6.35 and it is stable with preempt. > > > > Are you aware that the bug was fixed? Or if not, it may be still lurking > > there and some unrelated change hid the race condition. Should I bisect > > the kernel to find out which patch actually fixed (or hid) the bug? > > I can't think of anything we did explicitly to fix that. > > I'm sure you can find more useful things to do than to try and > bisect that thing down, really :-) From my experience with development --- no race should be silently ignored. Because if you ignore a race, there is a possibility that the race is still lurking there, only hitting under different conditions. I did the bisect anyway, the race was fixed between 2.6.28 and 2.6.29-rc1. Here "bad" really means "doesn't have a bug" and "good" means "does have the bug", because bisect tool was written to search for bugs that were created, not fixed. "X" means "doesn't work, I guessed". bad: 47676 590cf28580c999c8ba70dc39b40bab09d69e2630 bad: 61357 0191b625ca5a46206d2fb862bb08f36f2fcb3b31 bad: 61391 54a696bd07c14d3b1192d03ce7269bc59b45209a bad: 61578 bb26c6c29b7cc9f39e491b074b09f3c284738d36 good: 65849 a65056205cdf7efb96fb2558e4f1ec6bae2582ed good: 61638 cb10ea549fdc0ab2dd8988adab5bf40b4fa642f3 good: 113707 745ca2475a6ac596e3d8d37c2759c0fbe2586227 - X good: 66223 d8a5e2e9f4e70ade136c67ce8242f0db4c2cddc7 bad: 86684 94d6a5f7341ebaff53d4e41cc81fab37f0d9fbed good: 112956 e50a906e0200084f04f8f3b7c3a14b0442d1347f bad: 102852 6ded6ab9be4f6164aef1c527407c1b94f0929799 bad: 109140 7596b27dbd8de7bcfa7a80b2756114b49bd5c018 bad: 111150 f3a5c547012a09f38f7c27b17a8e3150b69cd259 I very much don't understand the bisect tool, I thought that it will do a binary search between the entries in the ChangeLog file, but it skips very randomly (the second columnt is the line in the ChangeLog file). How does it really work? Do you know how to interpret it and which patch really fixed the bug? Mikulas -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Mikulas Patocka <mpatocka@redhat.com> Date: Fri, 10 Sep 2010 14:31:27 -0400 (EDT) > I very much don't understand the bisect tool, I thought that it will do a > binary search between the entries in the ChangeLog file, but it skips very > randomly (the second columnt is the line in the ChangeLog file). > > How does it really work? Do you know how to interpret it and which patch > really fixed the bug? When there are multiple lines of development, it has to hop in and out of the merged branches. Sometimes that aspect can make it go "out of order". The best you can do is do the full bisect, trust what it comes up with, and start your analysis with the commit it comes up with after the whole bisect is complete. What commit did it print out when you completed the bisect run? Also, there is another problematic case, sometimes bisect gives you a merge commit. And this means that in the tree in which the development of the branch was done, the bisect condition doesn't trigger, but once it gets integrated into what happened upstream meanwhile, it does trigger. -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 10 Sep 2010, David Miller wrote: > From: Mikulas Patocka <mpatocka@redhat.com> > Date: Fri, 10 Sep 2010 14:31:27 -0400 (EDT) > > > I very much don't understand the bisect tool, I thought that it will do a > > binary search between the entries in the ChangeLog file, but it skips very > > randomly (the second columnt is the line in the ChangeLog file). > > > > How does it really work? Do you know how to interpret it and which patch > > really fixed the bug? > > When there are multiple lines of development, it has to hop in and out > of the merged branches. > > Sometimes that aspect can make it go "out of order". Bisecting over a graph, that is totally confusing. > The best you can do is do the full bisect, trust what it comes up > with, and start your analysis with the commit it comes up with after > the whole bisect is complete. > > What commit did it print out when you completed the bisect run? > > Also, there is another problematic case, sometimes bisect gives you > a merge commit. And this means that in the tree in which the > development of the branch was done, the bisect condition doesn't > trigger, but once it gets integrated into what happened upstream > meanwhile, it does trigger. good: 112956 e50a906e0200084f04f8f3b7c3a14b0442d1347f bad: 102852 6ded6ab9be4f6164aef1c527407c1b94f0929799 bad: 109140 7596b27dbd8de7bcfa7a80b2756114b49bd5c018 bad: 111150 f3a5c547012a09f38f7c27b17a8e3150b69cd259 I don't know what came up. The last "good" one and the last "bad" one are more than 1000 lines apart. After these tries it refused to bisect further and pretended that the bug is found. If you can't interpret it --- there happened sparc+sparc64 -> sparc merge between this and it looks like a probable cause. Could you give me your git tree with linear commits that I could try? Mikulas -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Mikulas Patocka <mpatocka@redhat.com> Date: Fri, 10 Sep 2010 14:46:39 -0400 (EDT) > If you can't interpret it --- there happened sparc+sparc64 -> sparc merge > between this and it looks like a probable cause. Could you give me your > git tree with linear commits that I could try? I rebase my tree every time the merge window completes, so I can't give that to you, sorry. But you don't need my tree from back then, you can git bisect between the commits on my line of development branch when Linus pulled my tree in. And even my original tree was linear, I would pull Linus's tree in now and again to deal with merge conflicts and similar. Anyways, all the information is there, you just need to give GIT the properl commit ranges. BTW, if you're so disappointed about how it walks the graph during git bisect, feel free to suggest alternatives to the GIT developers. I think they considered long and hard about this, and therefore the current behavior isn't arbitrary :-) -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 10 Sep 2010, David Miller wrote: > From: Mikulas Patocka <mpatocka@redhat.com> > Date: Fri, 10 Sep 2010 14:46:39 -0400 (EDT) > > > If you can't interpret it --- there happened sparc+sparc64 -> sparc merge > > between this and it looks like a probable cause. Could you give me your > > git tree with linear commits that I could try? > > I rebase my tree every time the merge window completes, so > I can't give that to you, sorry. > > But you don't need my tree from back then, you can git bisect between > the commits on my line of development branch when Linus pulled my tree > in. > > And even my original tree was linear, I would pull Linus's tree in now > and again to deal with merge conflicts and similar. I assume that there was some commit ID in your tree when you started to do sparc+sparc64 unification and there was some commit ID when you finished it. So give me these two and I hope git will go linearly between them. > Anyways, all the information is there, you just need to give GIT the > properl commit ranges. > > BTW, if you're so disappointed about how it walks the graph during git > bisect, feel free to suggest alternatives to the GIT developers. I > think they considered long and hard about this, and therefore the > current behavior isn't arbitrary :-) If you understand git better than me, please explain to me how to interpret bisect result --- i.e. I sent a list of commit IDs that are result of bisecting, so tell me which fixed the bug and how did you find it. Mikulas -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Mikulas Patocka <mpatocka@redhat.com> Date: Fri, 10 Sep 2010 15:33:08 -0400 (EDT) > I assume that there was some commit ID in your tree when you started to do > sparc+sparc64 unification and there was some commit ID when you finished > it. So give me these two and I hope git will go linearly between them. Maybe not this specific portion, but some parts of the unification work was not done linearly, it was partly done in a GIT tree maintained by Sam Ravnborg and partly done in mine. Non-linear development is just how things are much of the time. And regardless, you could figure out the commits you're looking for by looking in GITK at the merge commit when Linus brought the my tree in during the merge window, walk up the line that represents my branch going back in time to the point where the line connects back up to the mainline. Start with something like "gitk v2.6.28..v2.6.29-rc1 -- arch/sparc arch/spar64" it's pretty clear what the sequence of commits are in that branch when you look at that window. The boundaries (inclusive) you come up with will be something like: 4696b64d234b84b5b70ffd49a76833aa5c49cb61 86821d18f9d8fd42fa3180ad3703d491aecc52d9 Btw, "git bisect visualize" uses gitk to show you where the good and bad points are in the graph during bisection. -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Index: linux-2.6.35-preempt/arch/sparc/lib/atomic_64.S =================================================================== --- linux-2.6.35-preempt.orig/arch/sparc/lib/atomic_64.S 2010-08-18 15:10:47.000000000 +0200 +++ linux-2.6.35-preempt/arch/sparc/lib/atomic_64.S 2010-08-18 19:40:25.000000000 +0200 @@ -52,10 +52,9 @@ atomic_add_ret: /* %o0 = increment, %o1 cas [%o1], %g1, %g7 cmp %g1, %g7 bne,pn %icc, 2f - add %g7, %o0, %g7 - sra %g7, 0, %o0 + add %g1, %o0, %g1 retl - nop + sra %g1, 0, %o0 2: BACKOFF_SPIN(%o2, %o3, 1b) .size atomic_add_ret, .-atomic_add_ret @@ -68,10 +67,9 @@ atomic_sub_ret: /* %o0 = decrement, %o1 cas [%o1], %g1, %g7 cmp %g1, %g7 bne,pn %icc, 2f - sub %g7, %o0, %g7 - sra %g7, 0, %o0 + sub %g1, %o0, %g1 retl - nop + sra %g1, 0, %o0 2: BACKOFF_SPIN(%o2, %o3, 1b) .size atomic_sub_ret, .-atomic_sub_ret @@ -114,10 +112,9 @@ atomic64_add_ret: /* %o0 = increment, %o casx [%o1], %g1, %g7 cmp %g1, %g7 bne,pn %xcc, 2f - add %g7, %o0, %g7 - mov %g7, %o0 - retl nop + retl + add %g1, %o0, %o0 2: BACKOFF_SPIN(%o2, %o3, 1b) .size atomic64_add_ret, .-atomic64_add_ret @@ -130,9 +127,8 @@ atomic64_sub_ret: /* %o0 = decrement, %o casx [%o1], %g1, %g7 cmp %g1, %g7 bne,pn %xcc, 2f - sub %g7, %o0, %g7 - mov %g7, %o0 - retl nop + retl + sub %g1, %o0, %o0 2: BACKOFF_SPIN(%o2, %o3, 1b) .size atomic64_sub_ret, .-atomic64_sub_ret
Simple microoptimizations for sparc64 atomic functions: Save one instruction by using a delay slot. Use %g1 instead of %g7, because %g1 is written earlier. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> --- arch/sparc/lib/atomic_64.S | 20 ++++++++------------ 1 file changed, 8 insertions(+), 12 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html