From patchwork Tue Jan 22 07:12:15 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luigi Rizzo X-Patchwork-Id: 214353 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id B850A2C0080 for ; Tue, 22 Jan 2013 18:12:56 +1100 (EST) Received: from localhost ([::1]:56583 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TxY2Q-0003O1-Iy for incoming@patchwork.ozlabs.org; Tue, 22 Jan 2013 02:12:54 -0500 Received: from eggs.gnu.org ([208.118.235.92]:47952) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TxY2H-0003No-Pr for qemu-devel@nongnu.org; Tue, 22 Jan 2013 02:12:47 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TxY2F-0002jS-Hf for qemu-devel@nongnu.org; Tue, 22 Jan 2013 02:12:45 -0500 Received: from onelab2.iet.unipi.it ([131.114.59.238]:54025) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TxY2E-0002jM-Vp for qemu-devel@nongnu.org; Tue, 22 Jan 2013 02:12:43 -0500 Received: by onelab2.iet.unipi.it (Postfix, from userid 275) id AED9B7300B; Tue, 22 Jan 2013 08:12:15 +0100 (CET) Date: Tue, 22 Jan 2013 08:12:15 +0100 From: Luigi Rizzo To: qemu-devel@nongnu.org Message-ID: <20130122071215.GA37733@onelab2.iet.unipi.it> Mime-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.4.2.3i X-detected-operating-system: by eggs.gnu.org: Mac OS X 10.x X-Received-From: 131.114.59.238 Subject: [Qemu-devel] [PATCH v2] netmap backend (revised) X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org reposting a version without changes that implement bounded queues in net/queue.c Hi, the attached patch implements a qemu backend for the "netmap" API thus allowing machines to attach to the VALE software switch as well as netmap-supported cards (links below). http://info.iet.unipi.it/~luigi/netmap/ http://info.iet.unipi.it/~luigi/vale/ This is a cleaned up version of code written last summer. guest-guest speed using an e1000 frontend (with some modifications related to interrupt moderation, will repost an updated version later): up to 700 Kpps using sockets, and up to 5 Mpps using netmap within the guests. I have not tried with virtio. cheers luigi Signed-off-by: Luigi Rizzo --- configure | 31 +++++ net/Makefile.objs | 1 + net/clients.h | 4 + net/net.c | 3 + net/qemu-netmap.c | 353 +++++++++++++++++++++++++++++++++++++++++++++++++++++ qapi-schema.json | 8 +- ----- End forwarded message ----- diff --git a/configure b/configure index c6172ef..cfdf8a6 100755 --- a/configure +++ b/configure @@ -146,6 +146,7 @@ curl="" curses="" docs="" fdt="" +netmap="" nptl="" pixman="" sdl="" @@ -739,6 +740,10 @@ for opt do ;; --enable-vde) vde="yes" ;; + --disable-netmap) netmap="no" + ;; + --enable-netmap) netmap="yes" + ;; --disable-xen) xen="no" ;; --enable-xen) xen="yes" @@ -1112,6 +1117,8 @@ echo " --disable-uuid disable uuid support" echo " --enable-uuid enable uuid support" echo " --disable-vde disable support for vde network" echo " --enable-vde enable support for vde network" +echo " --disable-netmap disable support for netmap network" +echo " --enable-netmap enable support for netmap network" echo " --disable-linux-aio disable Linux AIO support" echo " --enable-linux-aio enable Linux AIO support" echo " --disable-cap-ng disable libcap-ng support" @@ -1914,6 +1921,26 @@ EOF fi ########################################## +# netmap headers probe +if test "$netmap" != "no" ; then + cat > $TMPC << EOF +#include +#include +#include +#include +int main(void) { return 0; } +EOF + if compile_prog "" "" ; then + netmap=yes + else + if test "$netmap" = "yes" ; then + feature_not_found "netmap" + fi + netmap=no + fi +fi + +########################################## # libcap-ng library probe if test "$cap_ng" != "no" ; then cap_libs="-lcap-ng" @@ -3314,6 +3341,7 @@ echo "NPTL support $nptl" echo "GUEST_BASE $guest_base" echo "PIE $pie" echo "vde support $vde" +echo "netmap support $netmap" echo "Linux AIO support $linux_aio" echo "ATTR/XATTR support $attr" echo "Install blobs $blobs" @@ -3438,6 +3466,9 @@ fi if test "$vde" = "yes" ; then echo "CONFIG_VDE=y" >> $config_host_mak fi +if test "$netmap" = "yes" ; then + echo "CONFIG_NETMAP=y" >> $config_host_mak +fi if test "$cap_ng" = "yes" ; then echo "CONFIG_LIBCAP=y" >> $config_host_mak fi diff --git a/net/Makefile.objs b/net/Makefile.objs index a08cd14..068253f 100644 --- a/net/Makefile.objs +++ b/net/Makefile.objs @@ -10,3 +10,4 @@ common-obj-$(CONFIG_AIX) += tap-aix.o common-obj-$(CONFIG_HAIKU) += tap-haiku.o common-obj-$(CONFIG_SLIRP) += slirp.o common-obj-$(CONFIG_VDE) += vde.o +common-obj-$(CONFIG_NETMAP) += qemu-netmap.o diff --git a/net/clients.h b/net/clients.h index 7793294..952d076 100644 --- a/net/clients.h +++ b/net/clients.h @@ -52,4 +52,8 @@ int net_init_vde(const NetClientOptions *opts, const char *name, NetClientState *peer); #endif +#ifdef CONFIG_NETMAP +int net_init_netmap(const NetClientOptions *opts, const char *name, + NetClientState *peer); +#endif #endif /* QEMU_NET_CLIENTS_H */ diff --git a/net/net.c b/net/net.c index cdd9b04..816c987 100644 --- a/net/net.c +++ b/net/net.c @@ -618,6 +618,9 @@ static int (* const net_client_init_fun[NET_CLIENT_OPTIONS_KIND_MAX])( [NET_CLIENT_OPTIONS_KIND_BRIDGE] = net_init_bridge, #endif [NET_CLIENT_OPTIONS_KIND_HUBPORT] = net_init_hubport, +#ifdef CONFIG_NETMAP + [NET_CLIENT_OPTIONS_KIND_NETMAP] = net_init_netmap, +#endif }; diff --git a/net/qemu-netmap.c b/net/qemu-netmap.c new file mode 100644 index 0000000..79d7c09 --- /dev/null +++ b/net/qemu-netmap.c @@ -0,0 +1,353 @@ +/* + * netmap access for qemu + * + * Copyright (c) 2012-2013 Luigi Rizzo + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include "config-host.h" + +/* note paths are different for -head and 1.3 */ +#include "net/net.h" +#include "clients.h" +#include "sysemu/sysemu.h" +#include "qemu-common.h" +#include "qemu/error-report.h" + +#include +#include +#include +#include +#include + +#define ND(fd, ... ) // debugging +#define D(format, ...) \ + do { \ + struct timeval __xxts; \ + gettimeofday(&__xxts, NULL); \ + printf("%03d.%06d %s [%d] " format "\n", \ + (int)__xxts.tv_sec % 1000, (int)__xxts.tv_usec, \ + __FUNCTION__, __LINE__, ##__VA_ARGS__); \ + } while (0) + +/* rate limited, lps indicates how many per second */ +#define RD(lps, format, ...) \ + do { \ + static int t0, __cnt; \ + struct timeval __xxts; \ + gettimeofday(&__xxts, NULL); \ + if (t0 != __xxts.tv_sec) { \ + t0 = __xxts.tv_sec; \ + __cnt = 0; \ + } \ + if (__cnt++ < lps) \ + D(format, ##__VA_ARGS__); \ + } while (0) + + + +/* + * private netmap device info + */ +struct netmap_state { + int fd; + int memsize; + void *mem; + struct netmap_if *nifp; + struct netmap_ring *rx; + struct netmap_ring *tx; + char fdname[128]; /* normally /dev/netmap */ + char ifname[128]; /* maybe the nmreq here ? */ +}; + +struct nm_state { + NetClientState nc; + struct netmap_state me; + unsigned int read_poll; + unsigned int write_poll; +}; + +// a fast copy routine only for multiples of 64 bytes, non overlapped. +static inline void +pkt_copy(const void *_src, void *_dst, int l) +{ + const uint64_t *src = _src; + uint64_t *dst = _dst; +#define likely(x) __builtin_expect(!!(x), 1) +#define unlikely(x) __builtin_expect(!!(x), 0) + if (unlikely(l >= 1024)) { + bcopy(src, dst, l); + return; + } + for (; l > 0; l -= 64) { + *dst++ = *src++; + *dst++ = *src++; + *dst++ = *src++; + *dst++ = *src++; + *dst++ = *src++; + *dst++ = *src++; + *dst++ = *src++; + *dst++ = *src++; + } +} + + +/* + * open a netmap device. We assume there is only one queue + * (which is the case for the VALE bridge). + */ +static int netmap_open(struct netmap_state *me) +{ + int fd, l, err; + struct nmreq req; + + me->fd = fd = open(me->fdname, O_RDWR); + if (fd < 0) { + error_report("Unable to open netmap device '%s'", me->fdname); + return -1; + } + bzero(&req, sizeof(req)); + pstrcpy(req.nr_name, sizeof(req.nr_name), me->ifname); + req.nr_ringid = 0; + req.nr_version = NETMAP_API; + err = ioctl(fd, NIOCGINFO, &req); + if (err) { + error_report("cannot get info on %s", me->ifname); + goto error; + } + l = me->memsize = req.nr_memsize; + err = ioctl(fd, NIOCREGIF, &req); + if (err) { + error_report("Unable to register %s", me->ifname); + goto error; + } + + me->mem = mmap(0, l, PROT_WRITE | PROT_READ, MAP_SHARED, fd, 0); + if (me->mem == MAP_FAILED) { + error_report("Unable to mmap"); + me->mem = NULL; + goto error; + } + + me->nifp = NETMAP_IF(me->mem, req.nr_offset); + me->tx = NETMAP_TXRING(me->nifp, 0); + me->rx = NETMAP_RXRING(me->nifp, 0); + return 0; + +error: + close(me->fd); + return -1; +} + +// XXX do we need the can-send routine ? +static int netmap_can_send(void *opaque) +{ + struct nm_state *s = opaque; + + return qemu_can_send_packet(&s->nc); +} + +static void netmap_send(void *opaque); +static void netmap_writable(void *opaque); + +/* + * set the handlers for the device + */ +static void netmap_update_fd_handler(struct nm_state *s) +{ +#if 1 + qemu_set_fd_handler2(s->me.fd, + s->read_poll ? netmap_can_send : NULL, + s->read_poll ? netmap_send : NULL, + s->write_poll ? netmap_writable : NULL, + s); +#else + qemu_set_fd_handler(s->me.fd, + s->read_poll ? netmap_send : NULL, + s->write_poll ? netmap_writable : NULL, + s); +#endif +} + +// update the read handler +static void netmap_read_poll(struct nm_state *s, int enable) +{ + if (s->read_poll != enable) { /* do nothing if not changed */ + s->read_poll = enable; + netmap_update_fd_handler(s); + } +} + +// update the write handler +static void netmap_write_poll(struct nm_state *s, int enable) +{ + if (s->write_poll != enable) { + s->write_poll = enable; + netmap_update_fd_handler(s); + } +} + +/* + * the fd_write() callback, invoked if the fd is marked as + * writable after a poll. Reset the handler and flush any + * buffered packets. + */ +static void netmap_writable(void *opaque) +{ + struct nm_state *s = opaque; + + netmap_write_poll(s, 0); + qemu_flush_queued_packets(&s->nc); +} + +/* + * new data guest --> backend + */ +static ssize_t netmap_receive_raw(NetClientState *nc, const uint8_t *buf, size_t size) +{ + struct nm_state *s = DO_UPCAST(struct nm_state, nc, nc); + struct netmap_ring *ring = s->me.tx; + + if (ring) { + /* request an early notification to avoid running dry */ + if (ring->avail < ring->num_slots / 2 && s->write_poll == 0) { + netmap_write_poll(s, 1); + } + if (ring->avail == 0) { // cannot write + return 0; + } + uint32_t i = ring->cur; + uint32_t idx = ring->slot[i].buf_idx; + uint8_t *dst = (u_char *)NETMAP_BUF(ring, idx); + + ring->slot[i].len = size; + pkt_copy(buf, dst, size); + ring->cur = NETMAP_RING_NEXT(ring, i); + ring->avail--; + } + return size; +} + +// complete a previous send (backend --> guest), enable the fd_read callback +static void netmap_send_completed(NetClientState *nc, ssize_t len) +{ + struct nm_state *s = DO_UPCAST(struct nm_state, nc, nc); + + netmap_read_poll(s, 1); +} + +/* + * netmap_send: backend -> guest + * there is traffic available from the network, try to send it up. + */ +static void netmap_send(void *opaque) +{ + struct nm_state *s = opaque; + int sent = 0; + struct netmap_ring *ring = s->me.rx; + + /* only check ring->avail, let the packet be queued + * with qemu_send_packet_async() if needed + * XXX until we fix the propagation on the bridge we need to stop early + */ + while (ring->avail > 0 && qemu_can_send_packet(&s->nc) ) { + uint32_t i = ring->cur; + uint32_t idx = ring->slot[i].buf_idx; + uint8_t *src = (u_char *)NETMAP_BUF(ring, idx); + int size = ring->slot[i].len; + + ring->cur = NETMAP_RING_NEXT(ring, i); + ring->avail--; + sent++; + size = qemu_send_packet_async(&s->nc, src, size, netmap_send_completed); + if (size == 0) { + /* the guest does not receive anymore. Packet is queued, stop + * reading from the backend until netmap_send_completed() + */ + netmap_read_poll(s, 0); + return; + } + } + netmap_read_poll(s, 1); // probably useless. +} + + +// flush and close +static void netmap_cleanup(NetClientState *nc) +{ + struct nm_state *s = DO_UPCAST(struct nm_state, nc, nc); + + qemu_purge_queued_packets(nc); + + netmap_read_poll(s, 0); + netmap_write_poll(s, 0); + close(s->me.fd); + + s->me.fd = -1; +} + +static void netmap_poll(NetClientState *nc, bool enable) +{ + struct nm_state *s = DO_UPCAST(struct nm_state, nc, nc); + + netmap_read_poll(s, enable); + netmap_write_poll(s, enable); +} + + +/* fd support */ + +static NetClientInfo net_netmap_info = { + .type = NET_CLIENT_OPTIONS_KIND_NETMAP, + .size = sizeof(struct nm_state), + .receive = netmap_receive_raw, +// .receive_raw = netmap_receive_raw, +// .receive_iov = netmap_receive_iov, + .poll = netmap_poll, + .cleanup = netmap_cleanup, +}; + +/* the external calls */ + +/* + * ... -net netmap,ifname="..." + */ +int net_init_netmap(const NetClientOptions *opts, const char *name, NetClientState *peer) +{ + const NetdevNetmapOptions *netmap_opts = opts->netmap; + NetClientState *nc; + struct netmap_state me; + struct nm_state *s; + + pstrcpy(me.fdname, sizeof(me.fdname), name ? name : "/dev/netmap"); + /* set default name for the port if not supplied */ + pstrcpy(me.ifname, sizeof(me.ifname), + netmap_opts->has_ifname ? netmap_opts->ifname : "vale0"); + if (netmap_open(&me)) + return -1; + + /* create the object -- XXX use name or ifname ? */ + nc = qemu_new_net_client(&net_netmap_info, peer, "netmap", name); + s = DO_UPCAST(struct nm_state, nc, nc); + s->me = me; + netmap_read_poll(s, 1); // initially only poll for reads. + + return 0; +} diff --git a/qapi-schema.json b/qapi-schema.json index 6d7252b..f24b745 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -2572,6 +2572,11 @@ 'data': { 'hubid': 'int32' } } +{ 'type': 'NetdevNetmapOptions', + 'data': { + '*ifname': 'str' } } + + ## # @NetClientOptions # @@ -2589,7 +2594,8 @@ 'vde': 'NetdevVdeOptions', 'dump': 'NetdevDumpOptions', 'bridge': 'NetdevBridgeOptions', - 'hubport': 'NetdevHubPortOptions' } } + 'hubport': 'NetdevHubPortOptions', + 'netmap': 'NetdevNetmapOptions' } } ## # @NetLegacy