From patchwork Mon Mar 8 13:20:00 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 47114 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id F3632B7CF0 for ; Tue, 9 Mar 2010 00:20:40 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754155Ab0CHNUJ (ORCPT ); Mon, 8 Mar 2010 08:20:09 -0500 Received: from mail-bw0-f209.google.com ([209.85.218.209]:59828 "EHLO mail-bw0-f209.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751497Ab0CHNUG (ORCPT ); Mon, 8 Mar 2010 08:20:06 -0500 Received: by bwz1 with SMTP id 1so2766741bwz.21 for ; Mon, 08 Mar 2010 05:20:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:subject:from:to:cc :content-type:date:message-id:mime-version:x-mailer :content-transfer-encoding; bh=JvrCxfALfpmcpQlPztSC8PvdC5JyV78OCehYauraR44=; b=eNZGpCzOQjRA79B9ezfpyJ2hrRSFEDtFBqh55MVaM3wA94PJ5NEQ+fQTYSDnuXCauy I3iPKgN/tuaBXLwwYQgpglBR3J3W8o9uNz8oECH5RswLmAQmWjo7B/eT2gnktMRj8ewW g5wkgDGzCf8eVx8u+Xe+mxucs9Ift4mx/6xpM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=subject:from:to:cc:content-type:date:message-id:mime-version :x-mailer:content-transfer-encoding; b=BgFtA19BCesmwWW8XoNcX24WBJSsXaYQftWTNeGPzAaeE9f1QImfqOqgchdbofVzXM a9TrDo7NDDCF7f26ph7r0uZ7PUEPyNoUUldhTmENV3mUZezLafGw+Ncjq2hggvdKNrCP 3kKzk1nBE9UoqNX27IVsT+gFmCP2K+iVAVL1Y= Received: by 10.204.36.70 with SMTP id s6mr3049487bkd.22.1268054403631; Mon, 08 Mar 2010 05:20:03 -0800 (PST) Received: from [127.0.0.1] (gw1.cosmosbay.com [212.99.114.194]) by mx.google.com with ESMTPS id 15sm2147401bwz.8.2010.03.08.05.20.02 (version=SSLv3 cipher=RC4-MD5); Mon, 08 Mar 2010 05:20:03 -0800 (PST) Subject: [PATCH] net: fix route cache rebuilds From: Eric Dumazet To: David Miller Cc: netdev , =?UTF-8?Q?Pawe=C5=82?= Staszewski , Neil Horman Date: Mon, 08 Mar 2010 14:20:00 +0100 Message-ID: <1268054400.6148.99.camel@edumazet-laptop> Mime-Version: 1.0 X-Mailer: Evolution 2.28.1 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org We added an automatic route cache rebuilding in commit 1080d709fb9d8cd43 but had to correct few bugs. One of the assumption of original patch, was that entries where kept sorted in a given way. This assumption is known to be wrong (commit 1ddbcb005c395518 gave an explanation of this and corrected a leak) and expensive to respect. Paweł Staszewski reported to me one of his machine got its routing cache disabled after few messages like : [ 2677.850065] Route hash chain too long! [ 2677.850080] Adjust your secret_interval! [82839.662993] Route hash chain too long! [82839.662996] Adjust your secret_interval! [155843.731650] Route hash chain too long! [155843.731664] Adjust your secret_interval! [155843.811881] Route hash chain too long! [155843.811891] Adjust your secret_interval! [155843.858209] vlan0811: 5 rebuilds is over limit, route caching disabled [155843.858212] Route hash chain too long! [155843.858213] Adjust your secret_interval! This is because rt_intern_hash() might be fooled when computing a chain length, because multiple entries with same keys can differ because of TOS (or mark/oif) bits. In the rare case the fast algorithm see a too long chain, and before taking expensive path, we call a helper function in order to not count duplicates of same routes, that only differ with tos/mark/oif bits. This helper works with data already in cpu cache and is not be very expensive, despite its O(N^2) implementation. Paweł Staszewski sucessfully tested this patch on his loaded router. Reported-and-tested-by: Paweł Staszewski Signed-off-by: Eric Dumazet Acked-by: Neil Horman --- net/ipv4/route.c | 50 ++++++++++++++++++++++++++++++++++----------- 1 file changed, 38 insertions(+), 12 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/net/ipv4/route.c b/net/ipv4/route.c index b2ba558..d9b4024 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -146,7 +146,6 @@ static struct dst_entry *ipv4_negative_advice(struct dst_entry *dst); static void ipv4_link_failure(struct sk_buff *skb); static void ip_rt_update_pmtu(struct dst_entry *dst, u32 mtu); static int rt_garbage_collect(struct dst_ops *ops); -static void rt_emergency_hash_rebuild(struct net *net); static struct dst_ops ipv4_dst_ops = { @@ -780,11 +779,30 @@ static void rt_do_flush(int process_context) #define FRACT_BITS 3 #define ONE (1UL << FRACT_BITS) +/* + * Given a hash chain and an item in this hash chain, + * find if a previous entry has the same hash_inputs + * (but differs on tos, mark or oif) + * Returns 0 if an alias is found. + * Returns ONE if rth has no alias before itself. + */ +static int has_noalias(const struct rtable *head, const struct rtable *rth) +{ + const struct rtable *aux = head; + + while (aux != rth) { + if (compare_hash_inputs(&aux->fl, &rth->fl)) + return 0; + aux = aux->u.dst.rt_next; + } + return ONE; +} + static void rt_check_expire(void) { static unsigned int rover; unsigned int i = rover, goal; - struct rtable *rth, *aux, **rthp; + struct rtable *rth, **rthp; unsigned long samples = 0; unsigned long sum = 0, sum2 = 0; unsigned long delta; @@ -835,15 +853,7 @@ nofree: * attributes don't unfairly skew * the length computation */ - for (aux = rt_hash_table[i].chain;;) { - if (aux == rth) { - length += ONE; - break; - } - if (compare_hash_inputs(&aux->fl, &rth->fl)) - break; - aux = aux->u.dst.rt_next; - } + length += has_noalias(rt_hash_table[i].chain, rth); continue; } } else if (!rt_may_expire(rth, tmo, ip_rt_gc_timeout)) @@ -1073,6 +1083,21 @@ work_done: out: return 0; } +/* + * Returns number of entries in a hash chain that have different hash_inputs + */ +static int slow_chain_length(const struct rtable *head) +{ + int length = 0; + const struct rtable *rth = head; + + while (rth) { + length += has_noalias(head, rth); + rth = rth->u.dst.rt_next; + } + return length >> FRACT_BITS; +} + static int rt_intern_hash(unsigned hash, struct rtable *rt, struct rtable **rp, struct sk_buff *skb) { @@ -1185,7 +1210,8 @@ restart: rt_free(cand); } } else { - if (chain_length > rt_chain_length_max) { + if (chain_length > rt_chain_length_max && + slow_chain_length(rt_hash_table[hash].chain) > rt_chain_length_max) { struct net *net = dev_net(rt->u.dst.dev); int num = ++net->ipv4.current_rt_cache_rebuild_count; if (!rt_caching(dev_net(rt->u.dst.dev))) {