From patchwork Sun Oct 11 18:29:34 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roopa Prabhu X-Patchwork-Id: 528789 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 2992F1402A1 for ; Mon, 12 Oct 2015 05:30:01 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=cumulusnetworks.com header.i=@cumulusnetworks.com header.b=O133VmvU; dkim-atps=neutral Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752503AbbJKS3r (ORCPT ); Sun, 11 Oct 2015 14:29:47 -0400 Received: from mail-pa0-f43.google.com ([209.85.220.43]:32936 "EHLO mail-pa0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751589AbbJKS3k (ORCPT ); Sun, 11 Oct 2015 14:29:40 -0400 Received: by pacex6 with SMTP id ex6so5225520pac.0 for ; Sun, 11 Oct 2015 11:29:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cumulusnetworks.com; s=google; h=from:to:cc:subject:date:message-id; bh=6n2Gvp1rf5flLEi+GUhqGLaQk/EK8zcUG0DdQBFlpF4=; b=O133VmvU3aQWwqQzKj4WAh1YU+0kf0uAcsU1SN5eLB+ZHqjMosKuVpQImeCUPyeJ74 X83C225g2PpbRC60pJNXlzwxjB7sDplGcg9FEq+qwxhNlEIhEfiHLxJHPkV2flD1LpGk ou3yxNElQwtOrTge/swSn8MAlTTSkLqiutXQU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=6n2Gvp1rf5flLEi+GUhqGLaQk/EK8zcUG0DdQBFlpF4=; b=lmVSae98faoI4GKuC4J74yx3m9o3tx0HMaVbbrcT/4T+22wjazynGvIV3B+HivzgJB JGbmgEtgPd44N6HgT1cndJzwB0VbdXZ+2wuwM+M24AW5E1MwuFOYTr4FhAuluBsd+H4Q Qp0ZuA8ssLDM7jj7SvuE8LAcXqO0LHVroWjJEUpBl6gmxLo8X5/4LsoBasuifyzy9ieN NMkvtYhXzu8ZsLE+0LvSQCdzUEcHwAkk3iWUf+8HPQjN4rp7buRXIyTaRBviboBumIqz 6sDY5lVyBXa4xXEuN4TCFQ4ZRAR9EppeUHnxviCGYkZzh2eeu/3m8qIZ32ok5sYaZSCO IXzw== X-Gm-Message-State: ALoCoQnVWP0Y8cQGCGNCYRYpCzlWIsPNdOnLpiWLmsYdtyAtwaElCmCZxtMUMi0CYQNia9L4bbyg X-Received: by 10.66.186.141 with SMTP id fk13mr29465861pac.7.1444588180133; Sun, 11 Oct 2015 11:29:40 -0700 (PDT) Received: from hydra-01.cumulusnetworks.com ([216.129.126.126]) by smtp.googlemail.com with ESMTPSA id l16sm13879054pbq.22.2015.10.11.11.29.39 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 11 Oct 2015 11:29:39 -0700 (PDT) From: Roopa Prabhu X-Google-Original-From: Roopa Prabhu To: davem@davemloft.net Cc: netdev@vger.kernel.org, ebiederm@xmission.com, rshearma@brocade.com Subject: [PATCH net-next v3 2/2] mpls: flow-based multipath selection Date: Sun, 11 Oct 2015 11:29:34 -0700 Message-Id: <1444588174-44663-3-git-send-email-roopa@cumulusnetworks.com> X-Mailer: git-send-email 1.9.1 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Robert Shearman Change the selection of a multipath route to use a flow-based hash. This more suitable for traffic sensitive to reordering within a flow (e.g. TCP, L2VPN) and whilst still allowing a good distribution of traffic given enough flows. Selection of the path for a multipath route is done using a hash of: 1. Label stack up to MAX_MP_SELECT_LABELS labels or up to and including entropy label, whichever is first. 2. 3-tuple of (L3 src, L3 dst, proto) from IPv4/IPv6 header in MPLS payload, if present. Naturally, a 5-tuple hash using L4 information in addition would be possible and be better in some scenarios, but there is a tradeoff between looking deeper into the packet to achieve good distribution, and packet forwarding performance, and I have erred on the side of the latter as the default. Signed-off-by: Robert Shearman --- net/mpls/af_mpls.c | 88 ++++++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 83 insertions(+), 5 deletions(-) diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c index 4d819df..15dd2eb 100644 --- a/net/mpls/af_mpls.c +++ b/net/mpls/af_mpls.c @@ -22,6 +22,11 @@ #include #include "internal.h" +/* Maximum number of labels to look ahead at when selecting a path of + * a multipath route + */ +#define MAX_MP_SELECT_LABELS 4 + static int zero = 0; static int label_limit = (1 << 20) - 1; @@ -77,10 +82,78 @@ bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned int mtu) } EXPORT_SYMBOL_GPL(mpls_pkt_too_big); -static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt) +static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt, + struct sk_buff *skb, bool bos) { - /* assume single nexthop for now */ - return &rt->rt_nh[0]; + struct mpls_entry_decoded dec; + struct mpls_shim_hdr *hdr; + bool eli_seen = false; + int label_index; + int nh_index = 0; + u32 hash = 0; + + /* No need to look further into packet if there's only + * one path + */ + if (rt->rt_nhn == 1) + goto out; + + for (label_index = 0; label_index < MAX_MP_SELECT_LABELS && !bos; + label_index++) { + if (!pskb_may_pull(skb, sizeof(*hdr) * label_index)) + break; + + /* Read and decode the current label */ + hdr = mpls_hdr(skb) + label_index; + dec = mpls_entry_decode(hdr); + + /* RFC6790 - reserved labels MUST NOT be used as keys + * for the load-balancing function + */ + if (dec.label == MPLS_LABEL_ENTROPY) { + eli_seen = true; + } else if (dec.label >= MPLS_LABEL_FIRST_UNRESERVED) { + hash = jhash_1word(dec.label, hash); + + /* The entropy label follows the entropy label + * indicator, so this means that the entropy + * label was just added to the hash - no need to + * go any deeper either in the label stack or in the + * payload + */ + if (eli_seen) + break; + } + + bos = dec.bos; + if (bos && pskb_may_pull(skb, sizeof(*hdr) * label_index + + sizeof(struct iphdr))) { + const struct iphdr *v4hdr; + + v4hdr = (const struct iphdr *)(mpls_hdr(skb) + + label_index); + if (v4hdr->version == 4) { + hash = jhash_3words(ntohl(v4hdr->saddr), + ntohl(v4hdr->daddr), + v4hdr->protocol, hash); + } else if (v4hdr->version == 6 && + pskb_may_pull(skb, sizeof(*hdr) * label_index + + sizeof(struct ipv6hdr))) { + const struct ipv6hdr *v6hdr; + + v6hdr = (const struct ipv6hdr *)(mpls_hdr(skb) + + label_index); + + hash = __ipv6_addr_jhash(&v6hdr->saddr, hash); + hash = __ipv6_addr_jhash(&v6hdr->daddr, hash); + hash = jhash_1word(v6hdr->nexthdr, hash); + } + } + } + + nh_index = hash % rt->rt_nhn; +out: + return &rt->rt_nh[nh_index]; } static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb, @@ -145,7 +218,6 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev, unsigned int new_header_size; unsigned int mtu; int err; - int nhidx; /* Careful this entire function runs inside of an rcu critical section */ @@ -176,7 +248,7 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev, if (!rt) goto drop; - nh = mpls_select_multipath(rt); + nh = mpls_select_multipath(rt, skb, dec.bos); if (!nh) goto drop; @@ -545,6 +617,12 @@ static int mpls_nh_build_multi(struct mpls_route_config *cfg, if (!rtnh_ok(rtnh, remaining)) goto errout; + /* neither weighted multipath nor any flags + * are supported + */ + if (rtnh->rtnh_hops || rtnh->rtnh_flags) + goto errout; + attrlen = rtnh_attrlen(rtnh); if (attrlen > 0) { struct nlattr *attrs = rtnh_attrs(rtnh);