macvlan: add tap device backend

From: Arnd Bergmann <arnd@arndb.de>

This is a first prototype of a new interface into the network
stack, to eventually replace tun/tap and the bridge driver
in certain virtual machine setups.

Background
----------
The 'Edge Virtual Bridging' working group is discussing ways to overcome
the limitation of virtual bridges in hypervisors.  One important part
of this is the Virtual Ethernet Port Aggregator (VEPA), as described in
http://www.ieee802.org/1/files/public/docs2009/new-evb-congdon-vepa-modular-0709-v01.pdf

In short, the idea of VEPA is that virtual machines do not communicate
with each other through direct bridging in the hypervisor but only via
an external managed switch that is already well integrated into the data
center, including network filtering, accounting and monitoring. While
we can do most of that efficiently in the Linux bridge code, doing it
externally simplifies the overall setup.

Related work
------------
Patches to implement VEPA in the Linux bridge driver have been posted by
Anna Fischer in June, see http://patchwork.ozlabs.org/patch/28702/. Those
patches are good and hopefully get merged in 2.6.32, but I think we can
take some shortcuts with an alternative approach:

The macvlan driver already has the property of forwarding all traffic
between guests and an external interface but not between the guests, just
as VEPA needs it. Also, VEPA does explicitly not want or need advanced
filtering in the way that netfilter-bridge provides, so we can use macvlan
to replace the bridge code in this setup, reducing the code path through
the kernel.  This works fine with containers and network namespaces,
but not easily with kvm/qemu because we only have a network device.

Or Gerlitz posted a "raw" packet socket backend for qemu to deal with this,
at http://marc.info/?l=qemu-devel&m=124653801212767 and at least three
other people have done a similar functionality independently.

This driver
-----------
While the other approaches should work as well, doing it using a tap
interface should give additional benefits:

* We can keep using the optimizations for jumbo frames that we have put
into the tun/tap driver.

* No need for root permissions that packet sockets need, just use 'ip
link add link type macvtap' to create a new device and give it the right
permissions using udev (using one tap per macvlan netdev).

* support for multiqueue network adapters by opening the tap device
multiple times, using one file descriptor per guest CPU/network
queue/interrupt (if the adapter supports multiple queues on a single
MAC address).

* support for zero-copy receive/transmit using async I/O on the tap device
(if the adapter supports per MAC rx queues).

* The same framework in macvlan can be used to add a third backend
into a future kernel based virtio-net implementation.

This version of the driver does not support any of those features,
but they all appear possible to add ;).
The driver is currently called 'macvtap', but I'd be more than happy
to change that if anyone could suggest a better name. The code is
still in an early stage and I wish I had found more time to polish
it, but at this time, I'd first like to know if people agree with the
basic concept at all.

Cc: Patrick McHardy <kaber@trash.net>
Cc: Stephen Hemminger <shemminger@linux-foundation.org>
Cc: David S. Miller" <davem@davemloft.net>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Or Gerlitz <ogerlitz@voltaire.com>
Cc: "Fischer, Anna" <anna.fischer@hp.com>
Cc: netdev@vger.kernel.org
Cc: bridge@lists.linux-foundation.org
Cc: linux-kernel@vger.kernel.org
Cc: Edge Virtual Bridging <evb@yahoogroups.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

---

The evb mailing list eats Cc headers, please make sure to keep everybody
in your Cc list when replying there.
---
 drivers/net/Kconfig   |   12 ++
 drivers/net/Makefile  |    1 +
 drivers/net/macvlan.c |   39 +++-----
 drivers/net/macvlan.h |   37 +++++++
 drivers/net/macvtap.c |  276 +++++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 341 insertions(+), 24 deletions(-)
 create mode 100644 drivers/net/macvlan.h
 create mode 100644 drivers/net/macvtap.c

Message ID	1249595428-21594-1-git-send-email-arnd@arndb.de
State	RFC, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> X-Original-To: patchwork-incoming@bilbo.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from ozlabs.org (ozlabs.org [203.10.76.45]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mx.ozlabs.org", Issuer "CA Cert Signing Authority" (verified OK)) by bilbo.ozlabs.org (Postfix) with ESMTPS id 834F6B6F2B for <patchwork-incoming@bilbo.ozlabs.org>; Fri, 7 Aug 2009 07:53:24 +1000 (EST) Received: by ozlabs.org (Postfix) id 77CC7DDDA2; Fri, 7 Aug 2009 07:53:24 +1000 (EST) Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by ozlabs.org (Postfix) with ESMTP id EDDCADDD01 for <patchwork-incoming@ozlabs.org>; Fri, 7 Aug 2009 07:53:23 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754060AbZHFVxF (ORCPT <rfc822;patchwork-incoming@ozlabs.org>); Thu, 6 Aug 2009 17:53:05 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752785AbZHFVxF (ORCPT <rfc822;netdev-outgoing>); Thu, 6 Aug 2009 17:53:05 -0400 Received: from moutng.kundenserver.de ([212.227.126.188]:65408 "EHLO moutng.kundenserver.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752125AbZHFVxB (ORCPT <rfc822;netdev@vger.kernel.org>); Thu, 6 Aug 2009 17:53:01 -0400 Received: from localhost.localdomain (port-92-200-46-149.dynamic.qsc.de [92.200.46.149]) by mrelayeu.kundenserver.de (node=mreu0) with ESMTP (Nemesis) id 0MKuxg-1MZAqz0W3q-0008Jn; Thu, 06 Aug 2009 23:50:33 +0200 From: Arnd Bergmann <arnd@arndb.de> To: netdev@vger.kernel.org Cc: Arnd Bergmann <arnd@arndb.de>, Patrick McHardy <kaber@trash.net>, Stephen Hemminger <shemminger@linux-foundation.org>, "David S. Miller\"" <davem@davemloft.net>, "Michael S. Tsirkin" <mst@redhat.com>, Herbert Xu <herbert@gondor.apana.org.au>, Or Gerlitz <ogerlitz@voltaire.com>, "Fischer, Anna" <anna.fischer@hp.com>, bridge@lists.linux-foundation.org, linux-kernel@vger.kernel.org, Edge Virtual Bridging <evb@yahoogroups.com> Subject: [PATCH] macvlan: add tap device backend Date: Thu, 6 Aug 2009 21:50:28 +0000 Message-Id: <1249595428-21594-1-git-send-email-arnd@arndb.de> X-Mailer: git-send-email 1.6.0.4 X-Provags-ID: V01U2FsdGVkX19K8NHapeg1zwwAzSXlLutG8ARQadbVP2vdtNh 6nMu1C9Zt8SCMuiI2hYmeA7Q1BaDB26EUEfhqXManR0Y1hREGV poNajwzlH95NLDqMA0q1g== Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: <netdev.vger.kernel.org> X-Mailing-List: netdev@vger.kernel.org

macvlan: add tap device backend

Commit Message

Comments

Patch