[RFC] bonding: fix workqueue re-arming races

Hi,

this is another attempt to solve the bonding workqueue re-arming
races.

The issue has been thoroughly discussed here:
http://article.gmane.org/gmane.linux.network/146949 "[PATCH]
bonding: cancel_delayed_work() -> cancel_delayed_work_sync()"
However, the only outcome was a proposal for an ugly hack with
busy-waiting on the rtnl.

The problem:
Bonding uses delayed work that automatically re-arms itself,
e.g.: bond_mii_monitor().

A dev->close() quickly followed by dev->open() on the bonding
master has a race condition. bond_close() sets kill_timers=1 and
calls cancel_delayed_work(), hoping that bond_mii_monitor() will
not re-arm again anymore.  There are two problems with this:

- bond->kill_timers is not re-checked after re-acquiring the
  bond->lock (this would be easy to fix)

- bond_open() resets bond->kill_timers to 0. If this happens
  before bond_mii_monitor() notices the flag and exits, it will
  re-arm itself. bond_open() then tries to schedule the delayed
  work, which causes a BUG.

The issue would be solved by calling cancel_delayed_work_sync(),
but this can not be done from bond_close() since it is called
under rtnl and the delayed work locks rtnl itself.

My proposal is to move all the "commit" work that requires rtnl
to a separate work and schedule it on the bonding wq. Thus, the
re-arming work does not need rtnl and can be cancelled using
cancel_delayed_work_sync().

Comments?

[note, this does not deal with bond_loadbalance_arp_mon(), where
rtnl is now taken as well in net-next; I'll do this if you think
the idea is good ]

Signed-off-by: Jiri Bohac <jbohac@suse.cz>

Message ID	20100831170752.GA9743@midget.suse.cz
State	RFC, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 97505B7149 for <patchwork-incoming@ozlabs.org>; Wed, 1 Sep 2010 03:06:29 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754197Ab0HaRGX (ORCPT <rfc822;patchwork-incoming@ozlabs.org>); Tue, 31 Aug 2010 13:06:23 -0400 Received: from cantor2.suse.de ([195.135.220.15]:43674 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753770Ab0HaRGX (ORCPT <rfc822;netdev@vger.kernel.org>); Tue, 31 Aug 2010 13:06:23 -0400 Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.221.2]) by mx2.suse.de (Postfix) with ESMTP id 627CE87104; Tue, 31 Aug 2010 19:06:21 +0200 (CEST) Date: Tue, 31 Aug 2010 19:07:52 +0200 From: Jiri Bohac <jbohac@suse.cz> To: Jay Vosburgh <fubar@us.ibm.com> Cc: bonding-devel@lists.sourceforge.net, markine@google.com, jarkao2@gmail.com, chavey@google.com, netdev@vger.kernel.org Subject: [RFC] bonding: fix workqueue re-arming races Message-ID: <20100831170752.GA9743@midget.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-06-14) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: <netdev.vger.kernel.org> X-Mailing-List: netdev@vger.kernel.org

[RFC] bonding: fix workqueue re-arming races

Commit Message

Comments

Patch