Patchwork Initial commit for QDES - QEMU Distributed Ethernet Switch

login
register
mail settings
Submitter Mike Lovell
Date June 25, 2012, 5:42 a.m.
Message ID <1340602924-3231-2-git-send-email-mike@dev-zero.net>
Download mbox | patch
Permalink /patch/167005/
State New
Headers show

Comments

Mike Lovell - June 25, 2012, 5:42 a.m.
This commit adds a new network backend to QEMU. It combines the basic behavior
of the unicast udp and multicast socket backends with some intelligence about
the source and destination of the packets. It also adds a header to the
packets to allow for creating multiple logical networks using the same
underlying network infrastructure. (Kind of like GRE Keys). This provides a
network backend that acts as a type of Ethernet switch.

During initialization, QDES will create two sockets. One that is used for
receiving unicast udp traffic and for sending both unicast and multicast udp
traffic. The second socket is for receiving multicast udp traffic. A
GHashTable is also created that will serve as an address table for where
packets should be delivered. A timer is also configured to regularly clean the
contents of the address table.

When a packet is received by either of the two sockets, the header is parsed
and checked to make sure the packet is a member of the correct logical
network. Then the ethernet frame that is received is inspected for the source
MAC address. The address table is then updated with the source MAC address,
the address of where the packets was received from, and the current time.

When QEMU delivers a packet to be sent, the destination MAC address is looked
for in the address table. If it is found, the packet is sent to the remote
address stored in the table along with the approriate header. If the address
is not found, the packet and header is sent to the configured multicast
address so all members of the network will receive the packet.

Unfortunately, only IPv4 is currently supported. IPv6 is on the short list of
improvements to be made.

Signed-off-by: Mike Lovell <mike@dev-zero.net>
---
 hmp-commands.hx   |    2 +-
 net.c             |   31 +++-
 net.h             |    1 +
 net/Makefile.objs |    1 +
 net/qdes.c        |  453 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 net/qdes.h        |   38 +++++
 net/socket.c      |    3 +-
 net/socket.h      |    3 +
 qemu-options.hx   |    5 +
 9 files changed, 533 insertions(+), 4 deletions(-)
 create mode 100644 net/qdes.c
 create mode 100644 net/qdes.h
Mike Lovell - June 25, 2012, 7:01 a.m.
I'm not sure why but it looks like my intro email for this got eaten by 
something. Here it is again and sorry if it shows up twice. This is my 
first time posting to the list and submitting a patch and I guess 
something doesn't like the way I did it.

-----
Hi all,
Here is something I've been tinkering with the past few weeks and now 
have it
in a state where the basic idea makes sense, it works, and could use some
feedback from the community.

This is what I've been calling QDES or QEMU Distributed Ethernet Switch. I
first had the idea when I was playing with the udp and mcast socket network
backends while exploring how to build a VM infrastructure. I liked the 
idea of
using the sockets backends cause it doesn't require escalated permissions to
configure and run as well as the ability to talk over IP networks.

But the built in socket backends either allowed for only 2 guests talking
directly or for multiple guests where all traffic is sent to all. So one can
either have two guests talking or have bandwidth wasted with multiple 
guests.
There wasn't something that could talk to multiple guests but also utilize
unicast traffic.

So I made a backend that can do this. It takes the basics of how the udp and
mcast socket backends work and combines them with some switching based 
on the
ethernet packets. The result is multiple guests can talk to each other but
not waste bandwidth by delivering unicast traffic to all guests. The backend
also adds some header data to each packet. This header includes a network
identifier so multiple logical networks can be created using the same
multicast configuration but still have separation in the guests.

There are a couple advantages that I see to this. It allows for multiple 
guests
in multiple locations to talk to each other while keeping unicast traffic to
just between two hosts. It doesn't require root permissions to run. It can
operate over non-ethernet networks (like IPoIB). It doesn't require changing
network configuration on the host. It allows for a ton of logical 
networks to
be created (currently 65536 per multicast address and port combination).

There are a few disadvantages as well. It does add some more processing 
to the
QEMU process but not much (I saw it go as fast as the socket backends). 
It is
encapsulating an Ethernet frame inside a UDP packet so there is the 
overhead of
the IP and UDP headers as well as the transport medium headers (most likely
Ethernet again). Because there is additional header data and MTU of the 
guest
could be limited depending on the ability to send larger multicast 
packet from
the host. (I haven't really looked closely at this last one). There 
isn't the
ability for something besides QEMU processes to communicate using this, 
though
I hope to build a utility to work with a tap device.

Overall, I think this is something that's pretty cool. I don't know how much
people give any thought to the socket backends for real world use and so I
don't know if this would be of much use to anyone. I am looking for some
feedback into what the community thinks and for comments about the code. Its
only my second time doing more than 20 lines of C so I'm sure I did some 
stupid
things. I have only tested on 64 bit x86 Linux systems so far.

Hopefully you all have good things to say. :)

mike

Patch

diff --git a/hmp-commands.hx b/hmp-commands.hx
index f5d9d91..042bb85 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1007,7 +1007,7 @@  ETEXI
     {
         .name       = "host_net_add",
         .args_type  = "device:s,opts:s?",
-        .params     = "tap|user|socket|vde|dump [options]",
+        .params     = "tap|user|socket|qdes|vde|dump [options]",
         .help       = "add host VLAN client",
         .mhandler.cmd = net_host_device_add,
     },
diff --git a/net.c b/net.c
index 4aa416c..dbb4a48 100644
--- a/net.c
+++ b/net.c
@@ -30,6 +30,7 @@ 
 #include "net/dump.h"
 #include "net/slirp.h"
 #include "net/vde.h"
+#include "net/qdes.h"
 #include "net/util.h"
 #include "monitor.h"
 #include "qemu-common.h"
@@ -1016,6 +1017,30 @@  static const struct {
             { /* end of list */ }
         },
     },
+    [NET_CLIENT_TYPE_QDES] = {
+        .type = "qdes",
+        .init = net_init_qdes,
+        .desc = {
+            NET_COMMON_PARAMS_DESC,
+            {
+                .name = "timer",
+                .type = QEMU_OPT_NUMBER,
+                .help = "Seconds between cleaning mac address table"
+            }, {
+                .name = "mcast",
+                .type = QEMU_OPT_STRING,
+                .help = "UDP multicast address and port number",
+            }, {
+                .name = "localaddr",
+                .type = QEMU_OPT_STRING,
+                .help = "source address for multicast and udp packets",
+            }, {
+                .name = "network",
+                .type = QEMU_OPT_NUMBER,
+                .help = "qdes network number",
+            },
+        },
+    },
 #ifdef CONFIG_VDE
     [NET_CLIENT_TYPE_VDE] = {
         .type = "vde",
@@ -1104,7 +1129,8 @@  int net_client_init(QemuOpts *opts, int is_netdev, Error **errp)
 #ifdef CONFIG_VDE
             strcmp(type, "vde") != 0 &&
 #endif
-            strcmp(type, "socket") != 0) {
+            strcmp(type, "socket") != 0 &&
+            strcmp(type, "qdes") != 0) {
             error_set(errp, QERR_INVALID_PARAMETER_VALUE, "type",
                       "a netdev backend type");
             return -1;
@@ -1170,7 +1196,7 @@  int net_client_init(QemuOpts *opts, int is_netdev, Error **errp)
 static int net_host_check_device(const char *device)
 {
     int i;
-    const char *valid_param_list[] = { "tap", "socket", "dump"
+    const char *valid_param_list[] = { "tap", "socket", "dump", "qdes"
 #ifdef CONFIG_NET_BRIDGE
                                        , "bridge"
 #endif
@@ -1405,6 +1431,7 @@  void net_check_clients(void)
             case NET_CLIENT_TYPE_USER:
             case NET_CLIENT_TYPE_TAP:
             case NET_CLIENT_TYPE_SOCKET:
+            case NET_CLIENT_TYPE_QDES:
             case NET_CLIENT_TYPE_VDE:
                 has_host_dev = 1;
                 break;
diff --git a/net.h b/net.h
index bdc2a06..bf932b0 100644
--- a/net.h
+++ b/net.h
@@ -38,6 +38,7 @@  typedef enum {
     NET_CLIENT_TYPE_VDE,
     NET_CLIENT_TYPE_DUMP,
     NET_CLIENT_TYPE_BRIDGE,
+    NET_CLIENT_TYPE_QDES,
 
     NET_CLIENT_TYPE_MAX
 } net_client_type;
diff --git a/net/Makefile.objs b/net/Makefile.objs
index 72f50bc..a959499 100644
--- a/net/Makefile.objs
+++ b/net/Makefile.objs
@@ -1,5 +1,6 @@ 
 common-obj-y = queue.o checksum.o util.o
 common-obj-y += socket.o
+common-obj-y += qdes.o
 common-obj-y += dump.o
 common-obj-$(CONFIG_POSIX) += tap.o
 common-obj-$(CONFIG_LINUX) += tap-linux.o
diff --git a/net/qdes.c b/net/qdes.c
new file mode 100644
index 0000000..b796018
--- /dev/null
+++ b/net/qdes.c
@@ -0,0 +1,453 @@ 
+/*
+ * QEMU System Emulator
+ *
+ * Copyright (c) 2003-2008 Fabrice Bellard
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+/*
+ * QEMU Distributed Ethernet Switch  - qdes
+ * Copyright 2012 Mike Lovell
+ */
+
+#include <linux/if_ether.h>
+
+#include "config-host.h"
+#include "net.h"
+#include "qemu-common.h"
+#include "qemu-error.h"
+#include "qemu-option.h"
+#include "qemu-timer.h"
+#include "qemu_socket.h"
+
+#include "net/qdes.h"
+#include "net/socket.h"
+
+#define DEFAULT_TIMER_INTERVAL 5
+#define BUF_SIZE 4096
+
+/*
+ * struct for storing a mac address entry in the address table.
+ *
+ * mac      - 64 bit int with the ethernet address. 64 bits since ethernet
+ *            addresses are 48 bits and there isn't a uint48_t
+ * lastSeen - time that a packet from this mac address that a packet was seen.
+ *            used for cleaning old entries from the table
+ * addr     - the remote address that the packet was received from
+ */
+typedef struct QDESMacEntry {
+    uint64_t mac;
+    time_t lastSeen;
+    struct sockaddr_in addr;
+} QDESMacEntry;
+
+/*
+ * struct for all state associated with QDES
+ *
+ * nc       - VLANClientState generic used by all network backends
+ * network  - int specifying which logical network this QDES instance is a
+ *            member of
+ * gcTimer  - QEMUTimer object used for cleaning the address table and any
+ *            other periodic work
+ * macTable - GHashTable storing the mappings of ethernet MAC address to
+ *            QDESMacEntry structs
+ * mcastAddr    - sockaddr storing the multicast address being used by QDES
+ * mcastFD  - file descriptor for the socket receiving multicast traffic
+ * localAddr    - sockaddr storing the local unicast addresses being used
+ * localFD  - file descriptor for the socket receiving unicast traffic
+ * buf      - buffer array for working with packets
+ */
+typedef struct QDESState {
+    VLANClientState nc;
+    unsigned int network;
+    int timerInterval;
+    QEMUTimer *gcTimer;
+    GHashTable *macTable;
+    struct sockaddr_in mcastAddr;
+    int mcastFD;
+    struct sockaddr_in localAddr;
+    int localFD;
+    uint8_t buf[BUF_SIZE];
+} QDESState;
+
+/*
+ * struct for working with the QDES packet headers
+ *
+ * setting #pragma pack(1) here to remove cpu architecture dependent padding
+ * in the struct.
+ *
+ * version  - version number of the QDES header. Always 1 for now
+ * net      - network number of the QDES packet
+ * size     - size of the ethernet frame in the QDES packet
+ */
+#pragma pack(push)
+#pragma pack(1)
+
+typedef struct QDESHeaderV1 {
+    unsigned char version;
+    uint16_t net;
+    uint16_t size;
+} QDESHeaderV1;
+
+#pragma pack(pop)
+
+/*
+ * Determine which address to send a packet to based on mac address.
+ */
+static struct sockaddr_in * qdes_dst_for_mac(QDESState *s, uint64_t mac)
+{
+    QDESMacEntry *entry;
+    entry = g_hash_table_lookup(s->macTable, &mac);
+    if (entry == NULL) {
+        return &s->mcastAddr;
+    } else {
+        return &entry->addr;
+    }
+}
+
+/*
+ * Update the mac table with information from recently received packet.
+ *
+ * If an entry already exists for the address, update that entry. If an entry
+ * doesn't already exist, allocate memory for a new entry, set its information
+ * and add it to the address table.
+ */
+static void qdes_update_table(QDESState *s, const uint8_t *buf,
+                             struct sockaddr_in *remote)
+{
+    uint64_t src;
+    /* determine the source address through some vooodoo. */
+    /* this probably only works on x86 and similar endianness right now :( */
+    src = *((uint64_t *) (buf + 6));
+    src = src & be64toh(0xFFFFFFFFFFFF0000);
+    src = be64toh(src) >> 16;
+    /* if the address is an ethernet mutlicast address, ignore it. */
+    if ((src | 0xFEFFFFFFFFFF) == 0xFFFFFFFFFFFF) {
+        return;
+    }
+
+    QDESMacEntry *entry;
+    entry = g_hash_table_lookup(s->macTable, &src);
+    /* entry found. update existing data */
+    if (entry != NULL) {
+        entry->addr.sin_family = remote->sin_family;
+        entry->addr.sin_port = remote->sin_port;
+        entry->addr.sin_addr.s_addr = remote->sin_addr.s_addr;
+        entry->lastSeen = qemu_get_clock_ms(rt_clock);
+    /* entry not found. allocate new and set data */
+    } else {
+        entry = g_new0(QDESMacEntry, 1);
+        entry->mac = src;
+        entry->addr.sin_family = remote->sin_family;
+        entry->addr.sin_port = remote->sin_port;
+        entry->addr.sin_addr.s_addr = remote->sin_addr.s_addr;
+        entry->lastSeen = qemu_get_clock_ms(rt_clock);
+        g_hash_table_insert(s->macTable, &entry->mac, entry);
+    }
+}
+
+/*
+ * Process a packet being recieved from the socket fd.
+ */
+static void qdes_receive(QDESState *s, int fd)
+{
+    struct sockaddr_in addr;
+    int recv_size;
+    unsigned int sockaddr_size;
+    sockaddr_size = sizeof(addr);
+
+    /* use recvfrom instead of recv to know who sent the packet */
+    recv_size = recvfrom(fd, s->buf, sizeof(s->buf), 0,
+                         (struct sockaddr *) &addr, &sockaddr_size);
+    if (recv_size < 0) {
+        printf("error on recvfrom. %s\n", strerror(errno));
+        return;
+    }
+
+    /* packets sent to the multicast address end up back here as well */
+    /* if this instance sent the packet, ignore it*/
+    if ((addr.sin_addr.s_addr == s->localAddr.sin_addr.s_addr) &&
+        (addr.sin_port == s->localAddr.sin_port)) {
+        return;
+    }
+
+    /* verify header data */
+    struct QDESHeaderV1 *header;
+    header = (QDESHeaderV1 *) s->buf;
+    if (header->version != 1) {
+        return;
+    }
+    if (header->net != s->network) {
+        return;
+    }
+    if (ntohs(header->size) != (recv_size - sizeof(QDESHeaderV1))) {
+        printf("received wrong number of bytes\n");
+        return;
+    }
+
+    qdes_update_table(s, s->buf + sizeof(QDESHeaderV1), &addr);
+
+    /* deliver the packet without the header to qemu */
+    qemu_send_packet(&s->nc, s->buf + sizeof(QDESHeaderV1),
+                     recv_size - sizeof(QDESHeaderV1));
+}
+
+/*
+ * Function called by QEMU when the multicast socket has data to be read.
+ */
+static void qdes_receive_mcast(void *opaque)
+{
+    QDESState *s = opaque;
+    qdes_receive(s, s->mcastFD);
+    return;
+}
+
+/*
+ * Function called by QEMU when the unicast socket has data to be read.
+ */
+static void qdes_receive_udp(void *opaque)
+{
+    QDESState *s = opaque;
+    qdes_receive(s, s->localFD);
+    return;
+}
+
+/*
+ * Function called by QEMU when the a packet is sent from the guest.
+ */
+static ssize_t qdes_from_qemu(VLANClientState *nc, const uint8_t *buf,
+                             size_t size)
+{
+    QDESState *s = DO_UPCAST(QDESState, nc, nc);
+    QDESHeaderV1 *header;
+    uint64_t dst;
+    struct sockaddr_in *dstAddr;
+
+    /* error if the packet being sent is bigger than the buffer minus the
+       size of the QDES header */
+    if (size > (sizeof(s->buf) - sizeof(QDESHeaderV1))) {
+        return -1;
+    }
+
+    /* figure out the destination ethernet address through voodoo */
+    /* this probably only works on x86 and similar endianness right now :( */
+    dst = *((uint64_t *) (buf));
+    dst = dst & be64toh(0xFFFFFFFFFFFF0000);
+    dst = be64toh(dst) >> 16;
+
+    dstAddr = qdes_dst_for_mac(s, dst);
+
+    /* treat the buffer like a header and set the necessary fields */
+    header = (QDESHeaderV1 *) s->buf;
+    header->version = 1;
+    header->net = s->network;
+    header->size = htons(size);
+
+    /* copy the packet to the buffer and send it */
+    memcpy(s->buf + sizeof(QDESHeaderV1), buf, size);
+    int res;
+    res = sendto(s->localFD, (const void *)&s->buf, size + sizeof(QDESHeaderV1),
+                 0, (struct sockaddr *) dstAddr, sizeof(struct sockaddr_in));
+    /* don't report header as bytes sent */
+    return res - sizeof(QDESHeaderV1);
+}
+
+/*
+ * QDES Shutdown function. Used for normal shutdown and error cleanup.
+ *
+ * If sockets are open, close them. Then shutdown the gcTimer and free the
+ * address table.
+ */
+static void qdes_shutdown(VLANClientState *nc)
+{
+    QDESState *s = DO_UPCAST(QDESState, nc, nc);
+    if (s->localFD > 0) {
+        qemu_set_fd_handler(s->localFD, NULL, NULL, NULL);
+        close(s->localFD);
+    }
+    if (s->mcastFD > 0) {
+        qemu_set_fd_handler(s->mcastFD, NULL, NULL, NULL);
+        close(s->mcastFD);
+    }
+    qemu_del_timer(s->gcTimer);
+    qemu_free_timer(s->gcTimer);
+    g_hash_table_destroy(s->macTable);
+}
+
+/*
+ * Function used by g_hash_table_foreach_remove to clean the address table
+ *
+ * Returns true is the entry hasn't been seen in more than 30 seconds which
+ * triggers removal of the entry from the table. Since the GHashTable was
+ * made with destroy functions, the g_hash_table_foreach_remove will free the
+ * memory used.
+ */
+static gboolean qdes_entry_age_check(uint64_t *mac, QDESMacEntry *entry)
+{
+    uint64_t t = qemu_get_clock_ms(rt_clock) - 30000;
+    if (entry->lastSeen < t) {
+        return true;
+    }
+    return false;
+}
+
+/*
+ * Update the info_str
+ */
+static void qdes_update_infostr(QDESState *s)
+{
+    snprintf(s->nc.info_str, sizeof(s->nc.info_str),
+             "mcast=%s:%d,localaddr=%s:%d,network=%d,timer=%d "
+             "mcastFD=%d,localFD=%d,tableSize=%d",
+             inet_ntoa(s->mcastAddr.sin_addr), s->mcastAddr.sin_port,
+             inet_ntoa(s->localAddr.sin_addr), s->localAddr.sin_port,
+             ntohs(s->network), s->timerInterval, s->mcastFD, s->localFD,
+             g_hash_table_size(s->macTable));
+}
+
+/*
+ * Function called by the QDES gcTimer.
+ *
+ * Clean the macTable. Set the next timer event.
+ */
+static void qdes_gc_timer(void *opaque)
+{
+    QDESState *s = opaque;
+    g_hash_table_foreach_remove(s->macTable, (GHRFunc) qdes_entry_age_check,
+                                NULL);
+    qdes_update_infostr(s);
+    qemu_mod_timer(s->gcTimer,
+                   (s->timerInterval * SCALE_US) + qemu_get_clock_ms(rt_clock));
+}
+
+/*
+ * struct to tell QEMU which functions to call and how much memory is needed.
+ */
+static NetClientInfo net_qdes_info = {
+    .type = NET_CLIENT_TYPE_QDES,
+    .size = sizeof(QDESState),
+    .receive = qdes_from_qemu,
+    .cleanup = qdes_shutdown,
+};
+
+/*
+ * Function to initialize a QDES instance.
+ */
+int net_init_qdes(QemuOpts *opts, const char *name, VLANState *vlan)
+{
+    VLANClientState *nc;
+    QDESState *s;
+
+    nc = qemu_new_net_client(&net_qdes_info, vlan, NULL, "qdes", name);
+
+    s = DO_UPCAST(QDESState, nc, nc);
+    s->mcastFD = -1;
+    s->localFD = -1;
+
+    /* create the gc timer. uses rt_clock since this doesn't depend on guest */
+    /* state and should run when the guest is paused. */
+    int timerInterval;
+    timerInterval = qemu_opt_get_number(opts, "timer", DEFAULT_TIMER_INTERVAL);
+    s->gcTimer = qemu_new_timer_ms(rt_clock, qdes_gc_timer, s);
+    s->timerInterval = timerInterval;
+
+    /* create the macTable. uses the 'full' fuction to specify how the items */
+    /* in the table should be free'd when removed. */
+    s->macTable = g_hash_table_new_full(g_int_hash, g_int_equal, NULL, g_free);
+
+    /* get and set the network number to use. */
+    uint16_t network;
+    network = qemu_opt_get_number(opts, "network", 0);
+    if (network == 0) {
+        printf("using default network %d. not recommended\n", 0);
+    }
+    s->network = htons(network);
+
+    /* validate required options */
+    int fd, val, ret;
+    const char *mcastStr, *localStr;
+    mcastStr = qemu_opt_get(opts, "mcast");
+    if (mcastStr == NULL) {
+        error_report("mcast= is required option for qdes");
+        goto err;
+    }
+
+    localStr = qemu_opt_get(opts, "localaddr");
+    if (localStr == NULL) {
+        error_report("localaddr= is required option for qdes");
+        goto err;
+    }
+
+    if (parse_host_port(&s->mcastAddr, mcastStr) < 0) {
+        error_report("Error parsing mcast option");
+        goto err;
+    }
+
+    if (parse_host_port(&s->localAddr, localStr) < 0) {
+        error_report("Error parsing localaddr option");
+        goto err;
+    }
+
+    /* create multicast receiving socket */
+    fd = net_socket_mcast_create(&s->mcastAddr, NULL);
+    if (fd < 0) {
+        error_report("Error creating mcast socket");
+        goto err;
+    }
+    s->mcastFD = fd;
+    /* set the handler for when packets are received */
+    qemu_set_fd_handler(s->mcastFD, qdes_receive_mcast, NULL, s);
+
+    /* create unicast receiving socket */
+    fd = qemu_socket(PF_INET, SOCK_DGRAM, 0);
+    if (fd < 0) {
+        error_report("Error creating udp socket");
+        goto err;
+    }
+    s->localFD = fd;
+    val = 1;
+    ret = setsockopt(s->localFD, SOL_SOCKET, SO_REUSEADDR,
+                     (const char *)&val, sizeof(val));
+    if (ret < 0) {
+        error_report("Error setting SO_REUSEADDR");
+        goto err;
+    }
+    ret = bind(s->localFD, (struct sockaddr *)&(s->localAddr),
+               sizeof(s->localAddr));
+    if (ret < 0) {
+        error_report("Error binding localAddr");
+        goto err;
+    }
+    /* set the handler for when packets are received */
+    qemu_set_fd_handler(s->localFD, qdes_receive_udp, NULL, s);
+
+    /* start the gctimer */
+    qemu_mod_timer(s->gcTimer,
+                   (s->timerInterval * SCALE_US) + qemu_get_clock_ms(rt_clock));
+
+    /* set the info string */
+    qdes_update_infostr(s);
+    return 0;
+
+    /* on error just call shutdown and return -1 */
+    err:
+        qdes_shutdown(nc);
+        return -1;
+}
diff --git a/net/qdes.h b/net/qdes.h
new file mode 100644
index 0000000..a105ba7
--- /dev/null
+++ b/net/qdes.h
@@ -0,0 +1,38 @@ 
+/*
+ * QEMU System Emulator
+ *
+ * Copyright (c) 2003-2008 Fabrice Bellard
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+/*
+ * QEMU Distributed Ethernet Switch  - qdes
+ * Copyright 2012 Mike Lovell
+ */
+
+#ifndef QEMU_NET_QDES_H
+#define QEMU_NET_QDES_H
+
+#include "net.h"
+#include "qemu-common.h"
+
+int net_init_qdes(QemuOpts *opts, const char *name, VLANState *vlan);
+
+#endif /* QEMU_NET_QDES_H */
diff --git a/net/socket.c b/net/socket.c
index fcd0a3c..ba46275 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -150,7 +150,8 @@  static void net_socket_send_dgram(void *opaque)
     qemu_send_packet(&s->nc, s->buf, size);
 }
 
-static int net_socket_mcast_create(struct sockaddr_in *mcastaddr, struct in_addr *localaddr)
+int net_socket_mcast_create(struct sockaddr_in *mcastaddr,
+                            struct in_addr *localaddr)
 {
     struct ip_mreq imr;
     int fd;
diff --git a/net/socket.h b/net/socket.h
index e1fe959..af8b354 100644
--- a/net/socket.h
+++ b/net/socket.h
@@ -26,7 +26,10 @@ 
 
 #include "net.h"
 #include "qemu-common.h"
+#include "qemu_socket.h"
 
+int net_socket_mcast_create(struct sockaddr_in *mcastaddr,
+                            struct in_addr *localaddr);
 int net_init_socket(QemuOpts *opts, const char *name, VLANState *vlan);
 
 #endif /* QEMU_NET_SOCKET_H */
diff --git a/qemu-options.hx b/qemu-options.hx
index 8b66264..5f10f5f 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1284,6 +1284,11 @@  DEF("net", HAS_ARG, QEMU_OPTION_net,
     "                Use group 'groupname' and mode 'octalmode' to change default\n"
     "                ownership and permissions for communication port.\n"
 #endif
+    "-net qdes,mcast=maddr:port,localaddr=addr:port[,vlan=n][,network=k][,timer=t]\n"
+    "                Join vlan 'n' to a QDES switch on multicast address maddr:port\n"
+    "                and local address of addr:port\n"
+    "                Use network 'k' as the QDES network number\n"
+    "                Use timer 't' as number of seconds for the table gc timer\n"
     "-net dump[,vlan=n][,file=f][,len=n]\n"
     "                dump traffic on vlan 'n' to file 'f' (max n bytes per packet)\n"
     "-net none       use it alone to have zero network devices. If no -net option\n"