diff mbox

atomic: add atomic_inc_not_zero_hint()

Message ID 1288984844.2665.52.camel@edumazet-laptop
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Nov. 5, 2010, 7:20 p.m. UTC
Le vendredi 05 novembre 2010 à 11:28 -0700, Andrew Morton a écrit :
> But we haven't established that there _is_ duplicated code which needs
> that treatment.
> 
> Scanning arch/x86/include/asm/atomic.h, perhaps ATOMIC_INIT() is a
> candidate.  But I'm not sure that it _should_ be hoisted up - if every
> architecture happens to do it the same way then that's just a fluke.
> 
> 

Not sure I understand you. I was trying to avoid recursive includes, but
that should be protected anyway. I see a lot of code that could be
factorized in this new header (atomic_inc_not_zero() for example)

Thanks

[PATCH v3] atomic: add atomic_inc_not_zero_hint()

Followup of perf tools session in Netfilter WorkShop 2010

In network stack we make high usage of atomic_inc_not_zero() in contexts
we know the probable value of atomic before increment (2 for udp sockets
for example)

Using a special version of atomic_inc_not_zero() giving this hint can
help processor to use less bus transactions.

On x86 (MESI protocol) for example, this avoids entering Shared state,
because "lock cmpxchg" issues an RFO (Read For Ownership)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: David Miller <davem@davemloft.net>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
---
V3: adds the include <asm/atomic.h>
    if hint is null, use atomic_inc_not_zero() (Paul suggestion)
V2: add #ifndef atomic_inc_not_zero_hint
    kerneldoc changes
    test that hint is not null
    Meant to be included at end of arch/*/asm/atomic.h files



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Andrew Morton Nov. 5, 2010, 7:39 p.m. UTC | #1
On Fri, 05 Nov 2010 20:20:44 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Le vendredi 05 novembre 2010 __ 11:28 -0700, Andrew Morton a __crit :
> > But we haven't established that there _is_ duplicated code which needs
> > that treatment.
> > 
> > Scanning arch/x86/include/asm/atomic.h, perhaps ATOMIC_INIT() is a
> > candidate.  But I'm not sure that it _should_ be hoisted up - if every
> > architecture happens to do it the same way then that's just a fluke.
> > 
> > 
> 
> Not sure I understand you. I was trying to avoid recursive includes, but
> that should be protected anyway. I see a lot of code that could be
> factorized in this new header (atomic_inc_not_zero() for example)

Ah.  I wasn't able to see much duplicated code at all, so I wasn't sure
that we needed to bother about this issue.

yup, atomic_inc_not_zero() looks like a candidate.

> [PATCH v3] atomic: add atomic_inc_not_zero_hint()

Let's go with this for now ;)

I'll assume that you intend to make use of this function soon, and it
looks safe enough to sneak it into 2.6.37-rc2, IMO.  If Linus shouts at
me then we could merge it into 2.6.38-rc1 via net-next, but I think
straight-to-mainline is best.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Nov. 5, 2010, 7:46 p.m. UTC | #2
Le vendredi 05 novembre 2010 à 12:39 -0700, Andrew Morton a écrit :

> Ah.  I wasn't able to see much duplicated code at all, so I wasn't sure
> that we needed to bother about this issue.
> 
> yup, atomic_inc_not_zero() looks like a candidate.

yes, and atomic_add_unless()...

> 
> > [PATCH v3] atomic: add atomic_inc_not_zero_hint()
> 
> Let's go with this for now ;)
> 
> I'll assume that you intend to make use of this function soon, and it
> looks safe enough to sneak it into 2.6.37-rc2, IMO.  If Linus shouts at
> me then we could merge it into 2.6.38-rc1 via net-next, but I think
> straight-to-mainline is best.
> 

Well, I dont expect using it before 2.6.38, no hurry Andrew, but it
probably can be merged before, since it has no user yet. It'll help our
job for sure.

Thanks


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney Nov. 5, 2010, 7:51 p.m. UTC | #3
On Fri, Nov 05, 2010 at 08:20:44PM +0100, Eric Dumazet wrote:
> Le vendredi 05 novembre 2010 à 11:28 -0700, Andrew Morton a écrit :
> > But we haven't established that there _is_ duplicated code which needs
> > that treatment.
> > 
> > Scanning arch/x86/include/asm/atomic.h, perhaps ATOMIC_INIT() is a
> > candidate.  But I'm not sure that it _should_ be hoisted up - if every
> > architecture happens to do it the same way then that's just a fluke.
> > 
> > 
> 
> Not sure I understand you. I was trying to avoid recursive includes, but
> that should be protected anyway. I see a lot of code that could be
> factorized in this new header (atomic_inc_not_zero() for example)
> 
> Thanks
> 
> [PATCH v3] atomic: add atomic_inc_not_zero_hint()
> 
> Followup of perf tools session in Netfilter WorkShop 2010
> 
> In network stack we make high usage of atomic_inc_not_zero() in contexts
> we know the probable value of atomic before increment (2 for udp sockets
> for example)
> 
> Using a special version of atomic_inc_not_zero() giving this hint can
> help processor to use less bus transactions.
> 
> On x86 (MESI protocol) for example, this avoids entering Shared state,
> because "lock cmpxchg" issues an RFO (Read For Ownership)
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: Christoph Lameter <cl@linux-foundation.org>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Andi Kleen <andi@firstfloor.org>
> Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
> Cc: David Miller <davem@davemloft.net>
> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

Looks quite good to me!

Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

> Cc: Nick Piggin <npiggin@kernel.dk>
> ---
> V3: adds the include <asm/atomic.h>
>     if hint is null, use atomic_inc_not_zero() (Paul suggestion)
> V2: add #ifndef atomic_inc_not_zero_hint
>     kerneldoc changes
>     test that hint is not null
>     Meant to be included at end of arch/*/asm/atomic.h files
> 
> diff --git a/include/linux/atomic.h b/include/linux/atomic.h
> new file mode 100644
> index 0000000..5a7df87
> --- /dev/null
> +++ b/include/linux/atomic.h
> @@ -0,0 +1,37 @@
> +#ifndef _LINUX_ATOMIC_H
> +#define _LINUX_ATOMIC_H
> +#include <asm/atomic.h>
> +
> +/**
> + * atomic_inc_not_zero_hint - increment if not null
> + * @v: pointer of type atomic_t
> + * @hint: probable value of the atomic before the increment
> + *
> + * This version of atomic_inc_not_zero() gives a hint of probable
> + * value of the atomic. This helps processor to not read the memory
> + * before doing the atomic read/modify/write cycle, lowering
> + * number of bus transactions on some arches.
> + *
> + * Returns: 0 if increment was not done, 1 otherwise.
> + */
> +#ifndef atomic_inc_not_zero_hint
> +static inline int atomic_inc_not_zero_hint(atomic_t *v, int hint)
> +{
> +	int val, c = hint;
> +
> +	/* sanity test, should be removed by compiler if hint is a constant */
> +	if (!hint)
> +		return atomic_inc_not_zero(v);
> +
> + 	do {
> +		val = atomic_cmpxchg(v, c, c + 1);
> +		if (val == c)
> +			return 1;
> +		c = val;
> +	} while (c);
> +
> +	return 0;
> +}
> +#endif
> +
> +#endif /* _LINUX_ATOMIC_H */
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Lameter (Ampere) Nov. 12, 2010, 7:14 p.m. UTC | #4
prefetchw() would be too much overhead?



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney Nov. 13, 2010, 10:26 p.m. UTC | #5
On Fri, Nov 12, 2010 at 01:14:12PM -0600, Christoph Lameter wrote:
> 
> prefetchw() would be too much overhead?

No idea.  Where do you believe that prefetchw() should be added?

							Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Lameter (Ampere) Nov. 15, 2010, 1:57 p.m. UTC | #6
On Sat, 13 Nov 2010, Paul E. McKenney wrote:

> On Fri, Nov 12, 2010 at 01:14:12PM -0600, Christoph Lameter wrote:
> >
> > prefetchw() would be too much overhead?
>
> No idea.  Where do you believe that prefetchw() should be added?

It is another way to get an exclusive cache line
for situations like this. No need to give a hint.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andi Kleen Nov. 15, 2010, 2:07 p.m. UTC | #7
On Mon, Nov 15, 2010 at 07:57:10AM -0600, Christoph Lameter wrote:
> On Sat, 13 Nov 2010, Paul E. McKenney wrote:
> 
> > On Fri, Nov 12, 2010 at 01:14:12PM -0600, Christoph Lameter wrote:
> > >
> > > prefetchw() would be too much overhead?
> >
> > No idea.  Where do you believe that prefetchw() should be added?
> 
> It is another way to get an exclusive cache line
> for situations like this. No need to give a hint.

prefetchw doesn't work on Intel (or rather is equivalent to prefetch), 
for Intel you always need to explicitely write to get an exclusive
line.

-Andi
Christoph Lameter (Ampere) Nov. 15, 2010, 2:16 p.m. UTC | #8
On Mon, 15 Nov 2010, Andi Kleen wrote:

> > It is another way to get an exclusive cache line
> > for situations like this. No need to give a hint.
>
> prefetchw doesn't work on Intel (or rather is equivalent to prefetch),
> for Intel you always need to explicitely write to get an exclusive
> line.

Argh. You mean x86. Itanium could do it and is also by Intel. Could you
please change that for x86 as well? Otherwise we will get more of these
weird code twisters.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Nov. 15, 2010, 2:17 p.m. UTC | #9
Le lundi 15 novembre 2010 à 07:57 -0600, Christoph Lameter a écrit :
> On Sat, 13 Nov 2010, Paul E. McKenney wrote:
> 
> > On Fri, Nov 12, 2010 at 01:14:12PM -0600, Christoph Lameter wrote:
> > >
> > > prefetchw() would be too much overhead?
> >
> > No idea.  Where do you believe that prefetchw() should be added?
> 
> It is another way to get an exclusive cache line
> for situations like this. No need to give a hint.
> 

Exclusive access ? As soon as another cpu takes it again, you lose.

Its not really the same thing... Maybe you miss the 'hint' intention at
all. We know the probable value of the counter, we dont want to read it.

In fact, prefetchw() is useful when you can assert it many cycles before
the memory read you are going to perform [before the write]. On
contended cache lines, its a waste, because by the time your cpu is
going to read memory, then perform the atomic compare_and_exchange(), an
other cpu might have dirtied the location again. This is what we noticed
during Netfilter Workshop 2010 : A high performance cost at both
atomic_read() and atomic_cmpxchg(). We tried prefetchw() and it was a
performance drop. It was with only 16 cpus contending on neighbour
refcnt, and 5 millions frames per second (5 millions atomic increments,
5 millions atomic decrements)

prefetchw() should be used on very specific spots, when a cpu is going
to write into a private area (not potentially accessed by other cpus).
We use it for example in __alloc_skb(), a bit before memset().

By the way, atomic_inc_not_zero_hint() is less code than 
[prefetchw(), atomic_inc_not_zero()]. Using one instruction [cmpxchg]
with the memory pointer is better than three.  [prefetchw(), read(),
cmpxchg()], particularly if you have high contention on cache line.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Lameter (Ampere) Nov. 15, 2010, 2:25 p.m. UTC | #10
On Mon, 15 Nov 2010, Eric Dumazet wrote:

> Exclusive access ? As soon as another cpu takes it again, you lose.

Sure but you want to avoid the fetch in shared mode here.

> Its not really the same thing... Maybe you miss the 'hint' intention at
> all. We know the probable value of the counter, we dont want to read it.

Ok may be in thise case you can predict the value but in general it is
difficult to always provide an expected value. It would be easier to be
able to tell the processor that the cacheline should not be fetched as
shared but immediately in exclusive state.

> atomic_read() and atomic_cmpxchg(). We tried prefetchw() and it was a
> performance drop. It was with only 16 cpus contending on neighbour

Does prefetchw work? Andi claims that prefetchw is not working on
x86 and I doubt that you ran tests on Itanium.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andi Kleen Nov. 15, 2010, 2:39 p.m. UTC | #11
> > atomic_read() and atomic_cmpxchg(). We tried prefetchw() and it was a
> > performance drop. It was with only 16 cpus contending on neighbour
> 
> Does prefetchw work? Andi claims that prefetchw is not working on
> x86 and I doubt that you ran tests on Itanium.

AMD supports it due to their MOESI protocol, but it's not supported
in MESIF as used by Intel QPI.  The kernel maps it on Intel to 
ordinary prefetch.

-Andi
Eric Dumazet Nov. 15, 2010, 2:47 p.m. UTC | #12
Le lundi 15 novembre 2010 à 08:25 -0600, Christoph Lameter a écrit :
> On Mon, 15 Nov 2010, Eric Dumazet wrote:
> 
> > Exclusive access ? As soon as another cpu takes it again, you lose.
> 
> Sure but you want to avoid the fetch in shared mode here.
> 

Yes, this is what cmpxchg() does for sure.

> > Its not really the same thing... Maybe you miss the 'hint' intention at
> > all. We know the probable value of the counter, we dont want to read it.
> 
> Ok may be in thise case you can predict the value but in general it is
> difficult to always provide an expected value. It would be easier to be
> able to tell the processor that the cacheline should not be fetched as
> shared but immediately in exclusive state.
> 

Maybe its not clear, but atomic_inc_not_zero_hint() is going to be used
only in contexts we know the expected value, and not as a generic
replacement for atomic_inc_not_zero(). Even if cache line is already hot
in this cpu cache, it should be faster or same speed.

Then, in high contention contexts, using atomic_inc_not_zero_hint() with
whatever initial hint might also be a win over atomic_inc_not_zero(),
but we try to remove such contexts ;)

And two atomic_cmpxchg() are probably slower in non contended contexts,
in particular is cache line is already hot in this cpu cache.

> > atomic_read() and atomic_cmpxchg(). We tried prefetchw() and it was a
> > performance drop. It was with only 16 cpus contending on neighbour
> 
> Does prefetchw work? Andi claims that prefetchw is not working on
> x86 and I doubt that you ran tests on Itanium.

In fact, in benchmarks, prefetch() or prefetchw() are a pain on x86, or
at least "perf tools" show artifact on them (high number of cycles
consumed on these instructions)

Andi had a patch to disable prefetch() in list iterators, and its a win.

I dont have Itanium platform to run tests. Is cmpxchg() that bad on
ia64 ? I also have old AMD cpus, so I cannot say if recent ones handle
prefetchw() better...



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/atomic.h b/include/linux/atomic.h
new file mode 100644
index 0000000..5a7df87
--- /dev/null
+++ b/include/linux/atomic.h
@@ -0,0 +1,37 @@ 
+#ifndef _LINUX_ATOMIC_H
+#define _LINUX_ATOMIC_H
+#include <asm/atomic.h>
+
+/**
+ * atomic_inc_not_zero_hint - increment if not null
+ * @v: pointer of type atomic_t
+ * @hint: probable value of the atomic before the increment
+ *
+ * This version of atomic_inc_not_zero() gives a hint of probable
+ * value of the atomic. This helps processor to not read the memory
+ * before doing the atomic read/modify/write cycle, lowering
+ * number of bus transactions on some arches.
+ *
+ * Returns: 0 if increment was not done, 1 otherwise.
+ */
+#ifndef atomic_inc_not_zero_hint
+static inline int atomic_inc_not_zero_hint(atomic_t *v, int hint)
+{
+	int val, c = hint;
+
+	/* sanity test, should be removed by compiler if hint is a constant */
+	if (!hint)
+		return atomic_inc_not_zero(v);
+
+ 	do {
+		val = atomic_cmpxchg(v, c, c + 1);
+		if (val == c)
+			return 1;
+		c = val;
+	} while (c);
+
+	return 0;
+}
+#endif
+
+#endif /* _LINUX_ATOMIC_H */