[net-next] net: change fib behavior based on interface link status

This patch adds the ability to have the Linux kernel track whether or
not a particular route should be used based on the link-status of the
interface associated with the next-hop.

Before this patch any link-failure on an interface that was serving as a
gateway for some systems could result in those systems being isolated
from the rest of the network as the stack would continue to attempt to
send frames out of an interface that is actually linked-down.  When the
kernel is responsible for all forwarding, it should also be responsible
for taking action when the traffic can no longer be forwarded -- there
is no real need to outsource link-monitoring to userspace anymore.

This feature is only enabled with the new sysctl set (default is off):
net.core.kill_routes_on_linkdown = 1

When this is set, the following behavior can be observed (interface p8p1
is link-down):

# ip route show 
default via 10.0.5.2 dev p9p1 
10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15 
70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1 
80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 dead 
90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 dead 
90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2 
# ip route get 90.0.0.1 
90.0.0.1 via 70.0.0.2 dev p7p1  src 70.0.0.1 
    cache 
# ip route get 80.0.0.1 
local 80.0.0.1 dev lo  src 80.0.0.1 
    cache <local> 
# ip route get 80.0.0.2
80.0.0.2 via 10.0.5.2 dev p9p1  src 10.0.5.15 
    cache 

While the route does remain in the table (so it can be modified if
needed rather than being wiped away as it would be if IFF_UP was
cleared), the proper next-hop is chosen automatically when the link is
down.  Now interface p8p1 is linked-up:

# ip route show 
default via 10.0.5.2 dev p9p1 
10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15 
70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1 
80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 
90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 
90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2 
192.168.56.0/24 dev p2p1  proto kernel  scope link  src 192.168.56.2 
# ip route get 90.0.0.1 
90.0.0.1 via 80.0.0.2 dev p8p1  src 80.0.0.1 
    cache 
# ip route get 80.0.0.1 
local 80.0.0.1 dev lo  src 80.0.0.1 
    cache <local> 
# ip route get 80.0.0.2
80.0.0.2 dev p8p1  src 80.0.0.1 
    cache 

and the output changes to what one would expect.

Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Suggested-by: Dinesh Dutt <ddutt@cumulusnetworks.com>

---
Though there were some that preferred not to have a configuration option
and to make this behavior the default when it was discussed in Ottawa
earlier this year since "it was time to do this."  I wanted to propose
the config option to preserve the current behavior for those that desire
it.  I'll happily remove it if Dave and Linus approve.

An IPv6 implementation is also needed (DECnet too!), but I wanted to
start with the IPv4 implementation to get people comfortable with the
idea before moving forward.  If this is accepted the IPv6 implementation
can be posted shortly.  

FWIW, we have been running this patch with the sysctl setting above and
our customers have been happily using a backported version for IPv4 and
IPv6 for >6 months.

 include/linux/netdevice.h      |  1 +
 include/net/fib_rules.h        |  1 +
 include/net/ip_fib.h           |  1 +
 include/uapi/linux/rtnetlink.h |  1 +
 include/uapi/linux/sysctl.h    |  1 +
 kernel/sysctl_binary.c         |  1 +
 net/core/dev.c                 |  2 ++
 net/core/sysctl_net_core.c     |  7 +++++++
 net/ipv4/fib_frontend.c        | 12 +++++++++--
 net/ipv4/fib_rules.c           |  7 ++++++-
 net/ipv4/fib_semantics.c       | 46 ++++++++++++++++++++++++++++++++++++------
 net/ipv4/fib_trie.c            | 19 +++++++++++++----
 12 files changed, 86 insertions(+), 13 deletions(-)

Message ID	1433300839-18511-1-git-send-email-gospo@cumulusnetworks.com
State	Changes Requested, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 7655D14029E for <patchwork-incoming@ozlabs.org>; Wed, 3 Jun 2015 13:08:11 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=cumulusnetworks.com header.i=@cumulusnetworks.com header.b=AcZF1cwk; dkim-atps=neutral Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752046AbbFCDIF (ORCPT <rfc822;patchwork-incoming@ozlabs.org>); Tue, 2 Jun 2015 23:08:05 -0400 Received: from mail-qc0-f172.google.com ([209.85.216.172]:35655 "EHLO mail-qc0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751599AbbFCDIC (ORCPT <rfc822;netdev@vger.kernel.org>); Tue, 2 Jun 2015 23:08:02 -0400 Received: by qczw4 with SMTP id w4so43627161qcz.2 for <netdev@vger.kernel.org>; Tue, 02 Jun 2015 20:08:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cumulusnetworks.com; s=google; h=from:to:cc:subject:date:message-id; bh=sgqVw8Gl2sa9byredx0H99MECbi5gfBhZELHBiJGNYQ=; b=AcZF1cwkEWSA7GadNaFdPI3SD1fthL52jAeESDgf0uVxcLlRZ9XW7drHluU+tl7FHQ VtMo1U5VVlxL1APdZm3d0eVt+uyCDaye3rsJLX42I6Oq059a17FiZC7O8DNouGbs7bY5 uVIlb2GNJaa3A52xEGq0Q99EJ4XwzFfQxbdnQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=sgqVw8Gl2sa9byredx0H99MECbi5gfBhZELHBiJGNYQ=; b=CcNYNWU7Us67bBYSMN1uELvhdmSoFTWmRAe0vSgLVAj6KDj9f/YvSxc1F9bPUF5Krp wAnFDLxydWLeC30ob9+aibZUVax5TO6/nLEBM0GL1F2FtW7smKmTlFtqaA8vYhXA948A yzhF5oJmGc1fNXnk+uwL9YIqcms0tUGEgN7N1whJa+HlS3C5tkd3DAOuKUZ+iSyMBk3F ZvN9OA/mhmCpqkUvzzM5nBvW2AXijQSfg4PDkndgrysBWeh9l487QvY4WzfH6hBnz/C0 3kfTXkMTM+WClnOy3xWwKBe8Zx6jvKulQX4dxcqFwh0d4XkaLsRnubTbMsKNAPGSFbu6 E08Q== X-Gm-Message-State: ALoCoQn3Wl3FhIq1XUjRMIO9Pq5Hw9VDZZ27mNnq5V3QGJofRtlp4/JwFUamAO1OTK1oyQMpdaEa X-Received: by 10.140.133.9 with SMTP id 9mr34296448qhf.5.1433300880908; Tue, 02 Jun 2015 20:08:00 -0700 (PDT) Received: from fedora-devel.home.greyhouse.net (cpe-24-211-243-155.nc.res.rr.com. [24.211.243.155]) by mx.google.com with ESMTPSA id 143sm8303887qhg.16.2015.06.02.20.08.00 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 02 Jun 2015 20:08:00 -0700 (PDT) From: Andy Gospodarek <gospo@cumulusnetworks.com> To: netdev@vger.kernel.org, davem@davemloft.net Cc: ddutt@cumulusnetworks.com, Andy Gospodarek <gospo@cumulusnetworks.com> Subject: [PATCH net-next] net: change fib behavior based on interface link status Date: Tue, 2 Jun 2015 23:07:19 -0400 Message-Id: <1433300839-18511-1-git-send-email-gospo@cumulusnetworks.com> X-Mailer: git-send-email 1.9.3 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: <netdev.vger.kernel.org> X-Mailing-List: netdev@vger.kernel.org

[net-next] net: change fib behavior based on interface link status

Commit Message

Comments

Patch