From patchwork Tue Oct 15 21:54:11 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 283792 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 5D5CD2C032D for ; Wed, 16 Oct 2013 08:54:32 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759821Ab3JOVy1 (ORCPT ); Tue, 15 Oct 2013 17:54:27 -0400 Received: from mail-pa0-f49.google.com ([209.85.220.49]:49393 "EHLO mail-pa0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759787Ab3JOVyX (ORCPT ); Tue, 15 Oct 2013 17:54:23 -0400 Received: by mail-pa0-f49.google.com with SMTP id lj1so26863pab.36 for ; Tue, 15 Oct 2013 14:54:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=mUUYVFH0XS6fErcdhp54kyuE4vBDPpeCif10BBwBH5U=; b=jr32kQHElEWS1L02/F/j9NEwM3Ss6/wAtDm8Fz/UbhgB2aeMwxW4Y2v+ZSKZI1+FdI m0uC+Qywp5mfRSfYNRWS5rIocu4/GoQ+HWjTI4ZHmOsgA3mDO8U9CYLLNgLgUHFEetCG zxFJql3CSoCLfrDRSasqF+Sl68DtcReEkTY0r2D9p9lDSEHTqI6izbSNwSlGwpcyWsIj SdoktKwZOxANHl5OzoCHQXASbFXbxVi/q7Tw3ax+TxLh0xuqgGEW0o5sS8XSTMau3QRh LVUneJhytSJCyqlUVpCVgYOSPnHC5F++H0f87fgv2JWeE22so8orT4jP92c4fK6k/XQ1 GtWw== X-Gm-Message-State: ALoCoQlLLupA7XfEQ5XCgZHfiv7oBYiyA+JwjAS02R5FEZ52KFjT6wD+Gg8wsILPtM7GtqOMUTAN X-Received: by 10.66.196.110 with SMTP id il14mr93489pac.130.1381874063087; Tue, 15 Oct 2013 14:54:23 -0700 (PDT) Received: from pg-vmw-gw1.plumgrid.com ([67.21.3.149]) by mx.google.com with ESMTPSA id a6sm40120728pbr.17.1969.12.31.16.00.00 (version=TLSv1.1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 15 Oct 2013 14:54:22 -0700 (PDT) From: Alexei Starovoitov To: "David S. Miller" Cc: Jesse Gross , Pravin B Shelar , Jiri Pirko , Cong Wang , dev@openvswitch.org, netdev@vger.kernel.org Subject: [PATCH v4 net-next] openvswitch: fix vport-netdev unregister Date: Tue, 15 Oct 2013 14:54:11 -0700 Message-Id: <1381874051-4175-1-git-send-email-ast@plumgrid.com> X-Mailer: git-send-email 1.7.9.5 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org The combination of two commits: commit 8e4e1713e4 ("openvswitch: Simplify datapath locking.") commit 2537b4dd0a ("openvswitch:: link upper device for port devices") introduced a bug where upper_dev wasn't unlinked upon netdev_unregister notification The following steps: modprobe openvswitch ovs-dpctl add-dp test ip tuntap add dev tap1 mode tap ovs-dpctl add-if test tap1 ip tuntap del dev tap1 mode tap are causing multiple warnings: [ 62.747557] gre: GRE over IPv4 demultiplexor driver [ 62.749579] openvswitch: Open vSwitch switching datapath [ 62.755087] device test entered promiscuous mode [ 62.765911] device tap1 entered promiscuous mode [ 62.766033] IPv6: ADDRCONF(NETDEV_UP): tap1: link is not ready [ 62.769017] ------------[ cut here ]------------ [ 62.769022] WARNING: CPU: 1 PID: 3267 at net/core/dev.c:5501 rollback_registered_many+0x20f/0x240() [ 62.769023] Modules linked in: openvswitch gre vxlan ip_tunnel libcrc32c ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc vhost_net macvtap macvlan vhost kvm_intel kvm dm_crypt iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi hid_generic mxm_wmi eeepc_wmi asus_wmi sparse_keymap dm_multipath psmouse serio_raw usbhid hid parport_pc ppdev firewire_ohci lpc_ich firewire_core e1000e crc_itu_t binfmt_misc igb dca ptp pps_core mac_hid wmi lp parport i2o_config i2o_block video [ 62.769051] CPU: 1 PID: 3267 Comm: ip Not tainted 3.12.0-rc3+ #60 [ 62.769052] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012 [ 62.769053] 0000000000000009 ffff8807f25cbd28 ffffffff8175e575 0000000000000006 [ 62.769055] 0000000000000000 ffff8807f25cbd68 ffffffff8105314c ffff8807f25cbd58 [ 62.769057] ffff8807f2634000 ffff8807f25cbdc8 ffff8807f25cbd88 ffff8807f25cbdc8 [ 62.769059] Call Trace: [ 62.769062] [] dump_stack+0x55/0x76 [ 62.769065] [] warn_slowpath_common+0x8c/0xc0 [ 62.769067] [] warn_slowpath_null+0x1a/0x20 [ 62.769069] [] rollback_registered_many+0x20f/0x240 [ 62.769071] [] rollback_registered+0x31/0x40 [ 62.769073] [] unregister_netdevice_queue+0x58/0x90 [ 62.769075] [] __tun_detach+0x140/0x340 [ 62.769077] [] tun_chr_close+0x36/0x60 [ 62.769080] [] __fput+0xff/0x260 [ 62.769082] [] ____fput+0xe/0x10 [ 62.769084] [] task_work_run+0xb5/0xe0 [ 62.769087] [] do_notify_resume+0x59/0x80 [ 62.769089] [] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 62.769091] [] int_signal+0x12/0x17 [ 62.769093] ---[ end trace 838756c62e156ffb ]--- [ 62.769481] ------------[ cut here ]------------ [ 62.769485] WARNING: CPU: 1 PID: 92 at fs/sysfs/inode.c:325 sysfs_hash_and_remove+0xa9/0xb0() [ 62.769486] sysfs: can not remove 'master', no directory [ 62.769486] Modules linked in: openvswitch gre vxlan ip_tunnel libcrc32c ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc vhost_net macvtap macvlan vhost kvm_intel kvm dm_crypt iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi hid_generic mxm_wmi eeepc_wmi asus_wmi sparse_keymap dm_multipath psmouse serio_raw usbhid hid parport_pc ppdev firewire_ohci lpc_ich firewire_core e1000e crc_itu_t binfmt_misc igb dca ptp pps_core mac_hid wmi lp parport i2o_config i2o_block video [ 62.769514] CPU: 1 PID: 92 Comm: kworker/1:2 Tainted: G W 3.12.0-rc3+ #60 [ 62.769515] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012 [ 62.769518] Workqueue: events ovs_dp_notify_wq [openvswitch] [ 62.769519] 0000000000000009 ffff880807ad3ac8 ffffffff8175e575 0000000000000006 [ 62.769521] ffff880807ad3b18 ffff880807ad3b08 ffffffff8105314c ffff880807ad3b28 [ 62.769523] 0000000000000000 ffffffff81a87a1f ffff8807f2634000 ffff880037038500 [ 62.769525] Call Trace: [ 62.769528] [] dump_stack+0x55/0x76 [ 62.769529] [] warn_slowpath_common+0x8c/0xc0 [ 62.769531] [] warn_slowpath_fmt+0x46/0x50 [ 62.769533] [] sysfs_hash_and_remove+0xa9/0xb0 [ 62.769535] [] sysfs_remove_link+0x26/0x30 [ 62.769538] [] __netdev_adjacent_dev_remove+0xf7/0x150 [ 62.769540] [] __netdev_adjacent_dev_unlink_lists+0x27/0x50 [ 62.769542] [] __netdev_adjacent_dev_unlink_neighbour+0x3a/0x50 [ 62.769544] [] netdev_upper_dev_unlink+0x3d/0x140 [ 62.769548] [] netdev_destroy+0x4b/0x80 [openvswitch] [ 62.769550] [] ovs_vport_del+0x46/0x60 [openvswitch] [ 62.769552] [] ovs_dp_detach_port+0x44/0x60 [openvswitch] [ 62.769555] [] ovs_dp_notify_wq+0xb4/0x150 [openvswitch] [ 62.769557] [] process_one_work+0x1d8/0x6a0 [ 62.769559] [] ? process_one_work+0x178/0x6a0 [ 62.769562] [] worker_thread+0x11b/0x370 [ 62.769564] [] ? rescuer_thread+0x350/0x350 [ 62.769566] [] kthread+0xea/0xf0 [ 62.769568] [] ? flush_kthread_worker+0x150/0x150 [ 62.769570] [] ret_from_fork+0x7c/0xb0 [ 62.769572] [] ? flush_kthread_worker+0x150/0x150 [ 62.769573] ---[ end trace 838756c62e156ffc ]--- [ 62.769574] ------------[ cut here ]------------ [ 62.769576] WARNING: CPU: 1 PID: 92 at fs/sysfs/inode.c:325 sysfs_hash_and_remove+0xa9/0xb0() [ 62.769577] sysfs: can not remove 'upper_test', no directory [ 62.769577] Modules linked in: openvswitch gre vxlan ip_tunnel libcrc32c ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc vhost_net macvtap macvlan vhost kvm_intel kvm dm_crypt iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi hid_generic mxm_wmi eeepc_wmi asus_wmi sparse_keymap dm_multipath psmouse serio_raw usbhid hid parport_pc ppdev firewire_ohci lpc_ich firewire_core e1000e crc_itu_t binfmt_misc igb dca ptp pps_core mac_hid wmi lp parport i2o_config i2o_block video [ 62.769603] CPU: 1 PID: 92 Comm: kworker/1:2 Tainted: G W 3.12.0-rc3+ #60 [ 62.769604] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012 [ 62.769606] Workqueue: events ovs_dp_notify_wq [openvswitch] [ 62.769607] 0000000000000009 ffff880807ad3ac8 ffffffff8175e575 0000000000000006 [ 62.769609] ffff880807ad3b18 ffff880807ad3b08 ffffffff8105314c ffff880807ad3b58 [ 62.769611] 0000000000000000 ffff880807ad3bd9 ffff8807f2634000 ffff880037038500 [ 62.769613] Call Trace: [ 62.769615] [] dump_stack+0x55/0x76 [ 62.769617] [] warn_slowpath_common+0x8c/0xc0 [ 62.769619] [] warn_slowpath_fmt+0x46/0x50 [ 62.769621] [] sysfs_hash_and_remove+0xa9/0xb0 [ 62.769622] [] sysfs_remove_link+0x26/0x30 [ 62.769624] [] __netdev_adjacent_dev_remove+0x122/0x150 [ 62.769627] [] __netdev_adjacent_dev_unlink_lists+0x27/0x50 [ 62.769629] [] __netdev_adjacent_dev_unlink_neighbour+0x3a/0x50 [ 62.769631] [] netdev_upper_dev_unlink+0x3d/0x140 [ 62.769633] [] netdev_destroy+0x4b/0x80 [openvswitch] [ 62.769636] [] ovs_vport_del+0x46/0x60 [openvswitch] [ 62.769638] [] ovs_dp_detach_port+0x44/0x60 [openvswitch] [ 62.769640] [] ovs_dp_notify_wq+0xb4/0x150 [openvswitch] [ 62.769642] [] process_one_work+0x1d8/0x6a0 [ 62.769644] [] ? process_one_work+0x178/0x6a0 [ 62.769646] [] worker_thread+0x11b/0x370 [ 62.769648] [] ? rescuer_thread+0x350/0x350 [ 62.769650] [] kthread+0xea/0xf0 [ 62.769652] [] ? flush_kthread_worker+0x150/0x150 [ 62.769654] [] ret_from_fork+0x7c/0xb0 [ 62.769656] [] ? flush_kthread_worker+0x150/0x150 [ 62.769657] ---[ end trace 838756c62e156ffd ]--- [ 62.769724] device tap1 left promiscuous mode This patch also affects moving devices between net namespaces. OVS used to ignore netns move notifications which caused problems. Like: ovs-dpctl add-if test tap1 ip link set tap1 netns 3512 and then removing tap1 inside the namespace will cause hang on missing dev_put. With this patch OVS will detach dev upon receiving netns move event. Signed-off-by: Alexei Starovoitov --- net/openvswitch/dp_notify.c | 7 +++++-- net/openvswitch/vport-netdev.c | 16 +++++++++++++--- net/openvswitch/vport-netdev.h | 1 + 3 files changed, 19 insertions(+), 5 deletions(-) diff --git a/net/openvswitch/dp_notify.c b/net/openvswitch/dp_notify.c index c323567..5c2dab2 100644 --- a/net/openvswitch/dp_notify.c +++ b/net/openvswitch/dp_notify.c @@ -65,8 +65,7 @@ void ovs_dp_notify_wq(struct work_struct *work) continue; netdev_vport = netdev_vport_priv(vport); - if (netdev_vport->dev->reg_state == NETREG_UNREGISTERED || - netdev_vport->dev->reg_state == NETREG_UNREGISTERING) + if (!(netdev_vport->dev->priv_flags & IFF_OVS_DATAPATH)) dp_detach_port_notify(vport); } } @@ -88,6 +87,10 @@ static int dp_device_event(struct notifier_block *unused, unsigned long event, return NOTIFY_DONE; if (event == NETDEV_UNREGISTER) { + /* upper_dev_unlink and decrement promisc immediately */ + ovs_netdev_detach_dev(vport); + + /* schedule vport destroy, dev_put and genl notification */ ovs_net = net_generic(dev_net(dev), ovs_net_id); queue_work(system_wq, &ovs_net->dp_notify_work); } diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c index 09d93c1..d21f77d 100644 --- a/net/openvswitch/vport-netdev.c +++ b/net/openvswitch/vport-netdev.c @@ -150,15 +150,25 @@ static void free_port_rcu(struct rcu_head *rcu) ovs_vport_free(vport_from_priv(netdev_vport)); } -static void netdev_destroy(struct vport *vport) +void ovs_netdev_detach_dev(struct vport *vport) { struct netdev_vport *netdev_vport = netdev_vport_priv(vport); - rtnl_lock(); + ASSERT_RTNL(); netdev_vport->dev->priv_flags &= ~IFF_OVS_DATAPATH; netdev_rx_handler_unregister(netdev_vport->dev); - netdev_upper_dev_unlink(netdev_vport->dev, get_dpdev(vport->dp)); + netdev_upper_dev_unlink(netdev_vport->dev, + netdev_master_upper_dev_get(netdev_vport->dev)); dev_set_promiscuity(netdev_vport->dev, -1); +} + +static void netdev_destroy(struct vport *vport) +{ + struct netdev_vport *netdev_vport = netdev_vport_priv(vport); + + rtnl_lock(); + if (netdev_vport->dev->priv_flags & IFF_OVS_DATAPATH) + ovs_netdev_detach_dev(vport); rtnl_unlock(); call_rcu(&netdev_vport->rcu, free_port_rcu); diff --git a/net/openvswitch/vport-netdev.h b/net/openvswitch/vport-netdev.h index dd298b5..8df01c11 100644 --- a/net/openvswitch/vport-netdev.h +++ b/net/openvswitch/vport-netdev.h @@ -39,5 +39,6 @@ netdev_vport_priv(const struct vport *vport) } const char *ovs_netdev_get_name(const struct vport *); +void ovs_netdev_detach_dev(struct vport *); #endif /* vport_netdev.h */