From patchwork Thu Jun 20 16:35:03 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Veaceslav Falico X-Patchwork-Id: 253031 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 5E5092C023D for ; Fri, 21 Jun 2013 02:35:32 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965625Ab3FTQf3 (ORCPT ); Thu, 20 Jun 2013 12:35:29 -0400 Received: from mx1.redhat.com ([209.132.183.28]:28153 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965608Ab3FTQf2 (ORCPT ); Thu, 20 Jun 2013 12:35:28 -0400 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r5KGYnUA024581 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 20 Jun 2013 12:34:49 -0400 Received: from darkmag.usersys.redhat.com (dhcp-27-204.brq.redhat.com [10.34.27.204]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id r5KGYaZC015726; Thu, 20 Jun 2013 12:34:47 -0400 From: Veaceslav Falico To: netdev@vger.kernel.org Cc: vfalico@redhat.com, fubar@us.ibm.com, andy@greyhouse.net, davem@davemloft.net, linux@8192.net, nicolas.2p.debian@free.fr, rick.jones2@hp.com, nikolay@redhat.com Subject: [PATCH v2 net-next 4/6] bonding: don't trust arp requests unless active slave really works Date: Thu, 20 Jun 2013 18:35:03 +0200 Message-Id: <1371746105-2482-5-git-send-email-vfalico@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Currently, if we receive any arp packet on a backup slave in active-backup mode and arp_validate enabled, we suppose that it's an arp request, swap source/target ip and try to validate it. This optimization gives us virtually no downtime in the most common situation (active and backup slaves are in the same broadcast domain and the active slave failed). However, if we can't reach the arp_ip_target(s), we end up in an endless loop of reselecting slaves, because we receive our arp requests, sent by the active slave, and think that backup slaves are up, thus selecting them as active and, again, sending arp requests, which fool our backup slaves. Fix this by not validating the swapped arp packets if the current active slave didn't receive any arp reply after it was selected as active. This way we will only accept arp requests if we know that the current active slave can actually reach arp_ip_target. Signed-off-by: Veaceslav Falico --- drivers/net/bonding/bond_main.c | 7 ++++++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 7fe9802..f51c379 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -2677,10 +2677,15 @@ static int bond_arp_rcv(const struct sk_buff *skb, struct bonding *bond, * configuration, the ARP probe will (hopefully) travel from * the active, through one switch, the router, then the other * switch before reaching the backup. + * + * We 'trust' the arp requests if there is an active slave and + * it received valid arp reply(s) after it became active. This + * is done to avoid endless looping when we can't reach the + * arp_ip_target and fool ourselves with our own arp requests. */ if (bond_is_active_slave(slave)) bond_validate_arp(bond, slave, sip, tip); - else + else if (bond->curr_active_slave && time_after(slave_last_rx(bond, bond->curr_active_slave), bond->curr_active_slave->jiffies)) bond_validate_arp(bond, slave, tip, sip); out_unlock: