diff mbox

[RFC,net] tcp: metrics: Avoid duplicate entries with the same destination-IP

Message ID 1389626736-3143-1-git-send-email-christoph.paasch@uclouvain.be
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Christoph Paasch Jan. 13, 2014, 3:25 p.m. UTC
Note: This patch is based on "net" and so the source-IP is not yet part
of the tcp-metrics.

Because the tcp-metrics is an RCU-list, it may be that two
soft-interrupts are inside __tcp_get_metrics() for the same
destination-IP at the same time. If this destination-IP is not yet part of
the tcp-metrics, both soft-interrupts will end up in tcpm_new and create
a new entry for this IP.
So, we will have two tcp-metrics with the same destination-IP in the list.

This patch now takes the tcp_metrics_lock before calling
__tcp_get_metrics().

I post this as an RFC, because this patch will make the TCP-stack take
the tcp_metrics_lock even if the metric is already in the cache.

I tested it with apache-benchmark on a short file with
ab -c 100 -n 100000 server/1KB
IMO this should stress the code most, but there was no significant
performance regression.

Another solution might be to leave tcp_get_metrics() as it is, and in
tcpm_new do another call to __tcp_get_metrics() while holding the
spin-lock. We would then check __tcp_get_metrics twice for new entries
but we won't hold the spin-lock needlessly anymore.

So, it's a tradeoff between taking the tcp_metrics_lock more often vs.
calling __tcp_get_metrics twice for new entries.

Suggestions are very welcome.

Fixes: 51c5d0c4b169b (tcp: Maintain dynamic metrics in local cache.)
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
---
 net/ipv4/tcp_metrics.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

Comments

Eric Dumazet Jan. 13, 2014, 4:47 p.m. UTC | #1
On Mon, 2014-01-13 at 16:25 +0100, Christoph Paasch wrote:

> Another solution might be to leave tcp_get_metrics() as it is, and in
> tcpm_new do another call to __tcp_get_metrics() while holding the
> spin-lock. We would then check __tcp_get_metrics twice for new entries
> but we won't hold the spin-lock needlessly anymore.

This is the only solution if you want to fix this.
Cost of lookup are the cache line misses.
Avoiding the spinlock is a must.

The second 'lookup' is basically free, as the first one have populated
cpu caches.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Jan. 15, 2014, 8:18 p.m. UTC | #2
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 13 Jan 2014 08:47:49 -0800

> On Mon, 2014-01-13 at 16:25 +0100, Christoph Paasch wrote:
> 
>> Another solution might be to leave tcp_get_metrics() as it is, and in
>> tcpm_new do another call to __tcp_get_metrics() while holding the
>> spin-lock. We would then check __tcp_get_metrics twice for new entries
>> but we won't hold the spin-lock needlessly anymore.
> 
> This is the only solution if you want to fix this.
> Cost of lookup are the cache line misses.
> Avoiding the spinlock is a must.
> 
> The second 'lookup' is basically free, as the first one have populated
> cpu caches.

Indeed, taking the lock in tcp_get_metrics() is to be avoided at all
costs.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Paasch Jan. 15, 2014, 11:21 p.m. UTC | #3
On 15/01/14 - 12:18:56, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Mon, 13 Jan 2014 08:47:49 -0800
> 
> > On Mon, 2014-01-13 at 16:25 +0100, Christoph Paasch wrote:
> > 
> >> Another solution might be to leave tcp_get_metrics() as it is, and in
> >> tcpm_new do another call to __tcp_get_metrics() while holding the
> >> spin-lock. We would then check __tcp_get_metrics twice for new entries
> >> but we won't hold the spin-lock needlessly anymore.
> > 
> > This is the only solution if you want to fix this.
> > Cost of lookup are the cache line misses.
> > Avoiding the spinlock is a must.
> > 
> > The second 'lookup' is basically free, as the first one have populated
> > cpu caches.
> 
> Indeed, taking the lock in tcp_get_metrics() is to be avoided at all
> costs.

Yes, I will send the updated patch tomorrow. Didn't yet had the time to send
it.


Christoph

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c
index 06493736fbc8..7748a5d9f37a 100644
--- a/net/ipv4/tcp_metrics.c
+++ b/net/ipv4/tcp_metrics.c
@@ -138,7 +138,6 @@  static struct tcp_metrics_block *tcpm_new(struct dst_entry *dst,
 	struct tcp_metrics_block *tm;
 	struct net *net;
 
-	spin_lock_bh(&tcp_metrics_lock);
 	net = dev_net(dst->dev);
 	if (unlikely(reclaim)) {
 		struct tcp_metrics_block *oldest;
@@ -153,7 +152,7 @@  static struct tcp_metrics_block *tcpm_new(struct dst_entry *dst,
 	} else {
 		tm = kmalloc(sizeof(*tm), GFP_ATOMIC);
 		if (!tm)
-			goto out_unlock;
+			return NULL;
 	}
 	tm->tcpm_addr = *addr;
 
@@ -164,8 +163,6 @@  static struct tcp_metrics_block *tcpm_new(struct dst_entry *dst,
 		rcu_assign_pointer(net->ipv4.tcp_metrics_hash[hash].chain, tm);
 	}
 
-out_unlock:
-	spin_unlock_bh(&tcp_metrics_lock);
 	return tm;
 }
 
@@ -303,6 +300,8 @@  static struct tcp_metrics_block *tcp_get_metrics(struct sock *sk,
 	net = dev_net(dst->dev);
 	hash = hash_32(hash, net->ipv4.tcp_metrics_hash_log);
 
+	spin_lock_bh(&tcp_metrics_lock);
+
 	tm = __tcp_get_metrics(&addr, net, hash);
 	reclaim = false;
 	if (tm == TCP_METRICS_RECLAIM_PTR) {
@@ -314,6 +313,8 @@  static struct tcp_metrics_block *tcp_get_metrics(struct sock *sk,
 	else
 		tcpm_check_stamp(tm, dst);
 
+	spin_unlock_bh(&tcp_metrics_lock);
+
 	return tm;
 }