diff mbox

ARP table question

Message ID 491B5452.6020709@candelatech.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Ben Greear Nov. 12, 2008, 10:10 p.m. UTC
Ben Greear wrote:
> I have 500 mac-vlans on a system talking to 500 other
> mac-vlans.  My problem is that the arp-table gets extremely
> huge because every time an arp-request comes in on all mac-vlans,
> a stale arp entry is added for each mac-vlan.  I have filtering
> turned on, but that doesn't help because the neigh_event_ns call
> below will cause a stale neighbor entry to be created regardless
> of whether a replay will be sent or not.
> 
> Maybe the neigh_event code should be below the checks for dont_send,
> and only create check neigh_event_ns if we are !dont_send?

The attached patch makes it work much better for me.  The patch
will cause the code to NOT create a stale neighbor entry if we
are not going to respond to the ARP request.  The old code
*would* create a stale entry even if we are not going to respond.

This is against 2.6.25.15.

Signed-off-by:  Ben Greear<greearb@candelatech.com>

Thanks,
Ben

Comments

David Miller Nov. 17, 2008, 3:16 a.m. UTC | #1
From: Ben Greear <greearb@candelatech.com>
Date: Wed, 12 Nov 2008 14:10:26 -0800

> Ben Greear wrote:
> > I have 500 mac-vlans on a system talking to 500 other
> > mac-vlans.  My problem is that the arp-table gets extremely
> > huge because every time an arp-request comes in on all mac-vlans,
> > a stale arp entry is added for each mac-vlan.  I have filtering
> > turned on, but that doesn't help because the neigh_event_ns call
> > below will cause a stale neighbor entry to be created regardless
> > of whether a replay will be sent or not.
> > Maybe the neigh_event code should be below the checks for dont_send,
> > and only create check neigh_event_ns if we are !dont_send?
> 
> The attached patch makes it work much better for me.  The patch
> will cause the code to NOT create a stale neighbor entry if we
> are not going to respond to the ARP request.  The old code
> *would* create a stale entry even if we are not going to respond.
> 
> This is against 2.6.25.15.
> 
> Signed-off-by:  Ben Greear<greearb@candelatech.com>

This change makes a lot of sense to me, I'll add it to net-next-2.6
so it can cook in there for a while just in case there are some
unwanted side-effects.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Greear Nov. 17, 2008, 6:17 p.m. UTC | #2
David Miller wrote:

> This change makes a lot of sense to me, I'll add it to net-next-2.6
> so it can cook in there for a while just in case there are some
> unwanted side-effects.

Thanks Dave.

I think I found another problem as well:  If I start 1 TCP and 1 UDP connection
between each of the 500 interfaces on mac-vlans, the ARP tables will not converge.

It seems to be because mac-vlan has to copy broadcast packets to every
mac-vlan on a physical device, there are just too many packets:

500 vlans arping once per second means 500 pkts per second on the
other NIC.
Other NIC must copy these 500 times,
so, 250000 packets per second in each direction are
processed by the stack (they are not all on the wire, at least).

A few get through and those UDP/TCP connections start consuming
bandwidth, which clogs up the 1G link enough that other responses
are lost most of the time.

I'm going to try to work on some sort of random backoff for ARP that can
be enabled in this situation next.

Thanks,
Ben
diff mbox

Patch

diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 8c16b42..dd454b7 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -872,18 +872,18 @@  static int arp_process(struct sk_buff *skb)
 		addr_type = rt->rt_type;
 
 		if (addr_type == RTN_LOCAL) {
-			n = neigh_event_ns(&arp_tbl, sha, &sip, dev);
-			if (n) {
-				int dont_send = 0;
-
-				if (!dont_send)
-					dont_send |= arp_ignore(in_dev,sip,tip);
-				if (!dont_send && IN_DEV_ARPFILTER(in_dev))
-					dont_send |= arp_filter(sip,tip,dev);
-				if (!dont_send)
-					arp_send(ARPOP_REPLY,ETH_P_ARP,sip,dev,tip,sha,dev->dev_addr,sha);
+			int dont_send = 0;
 
-				neigh_release(n);
+			if (!dont_send)
+				dont_send |= arp_ignore(in_dev,sip,tip);
+			if (!dont_send && IN_DEV_ARPFILTER(in_dev))
+				dont_send |= arp_filter(sip,tip,dev);
+			if (!dont_send) {
+				n = neigh_event_ns(&arp_tbl, sha, &sip, dev);
+				if (n) {
+					arp_send(ARPOP_REPLY,ETH_P_ARP,sip,dev,tip,sha,dev->dev_addr,sha);
+					neigh_release(n);
+				}
 			}
 			goto out;
 		} else if (IN_DEV_FORWARD(in_dev)) {