Message ID | 1445794812-23826-1-git-send-email-victork@redhat.com |
---|---|
State | New |
Headers | show |
On Sun, Oct 25, 2015 at 07:42:00PM +0200, Victor Kaplansky wrote: > QEMU is missing a good test for vhost-user feature, > so I've created a sample vhost-user application, which > called Vubr (mst coined the name, but better > suggestions will be appreciated). Short for Vhost-User Bridge. Not very pretty, hopefully someone can come up with a better name. Maybe just vhost-user-bridge, avoiding acronims.
On Sun, Oct 25, 2015 at 07:42:00PM +0200, Victor Kaplansky wrote: > QEMU is missing a good test for vhost-user feature, The existing test is good actually. It does not, however, allow actual traffic, so at best it tests the management protocol. > so I've created a sample vhost-user application, which > called Vubr (mst coined the name, but better > suggestions will be appreciated). Vubr may later serve > the QEMU community as vhost-user QEMU internal test. > Essentially Vubr is a very basic vhost-user backend for QEMU, > It runs as a separate user-level process. For packet > processing Vubr uses an additional QEMU instance with a backend > configured by "-net socket" as a shared VLAN. This way another > QEMU virtual machine can effectively make a bus by means of > UDP communication. > > For a more simple setup, the another QEMU instance running > the SLiRP backed can be the same QEMU instance running vhost-user > client. > > The Vubr implementation is very preliminary. It is missing many > features. I has been studying vhost-user protocol internals, > so I've wrote Vubr bit by bit as I progressed through the > protocol. Most probably internal architecture will change > significantly. > > To run Vubr application: > > Build vubr with: > > $ cd qemu/tests/vubr; make > > Ensure the machine has hugepages enabled in kernel with command line > like: default_hugepagesz=2M hugepagesz=2M hugepages=2048 > > Run it with: > > $ ./vubr > > The above will run vhost-user server listening for connections > on UNIX domain socket /tmp/vubr.sock, and will try to connect > by UDP to VLAN bridge to localhost:5555, while listening on > localhost:4444 > > Run qemu with a virtio-net backed by vhost-user: > > $ qemu \ > -enable-kvm -m 512 -smp 2 \ > -object memory-backend-file,id=mem,size=512M,mem-path=/dev/hugepages,share=on \ > -numa node,memdev=mem -mem-prealloc \ > -chardev socket,id=char0,path=/tmp/vubr.sock \ > -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \ > -device virtio-net-pci,netdev=mynet1 \ > -net none \ > -net socket,vlan=0,udp=localhost:4444,localaddr=localhost:5555 \ > -net user,vlan=0 \ > disk.img > > Vubr tested very lightly: it's able to bringup a linux on client VM > with virtio-net driver, and execute transmits and receives to the > internet. I tested with "wget redhat.com", "dig redhat.com". > > PS. I've consulted DPDK's code for vhost-user during Vubr > implementation. > Signed-off-by: Victor Kaplansky <victork@redhat.com> Thanks for working on this. This needs a bit more work, especially from coding style perspective. Please read CODING_STYLE and follow the rules. I pointed out some violations but by no means all of them. checkpatch.pl might catch some violations, too. > --- > tests/vubr/dispatcher.h | 26 ++ > tests/vubr/vhost.h | 77 +++++ > tests/vubr/vhost_user.h | 70 +++++ > tests/vubr/virtio_net.h | 38 +++ > tests/vubr/virtio_ring.h | 103 +++++++ > tests/vubr/virtqueue.h | 17 ++ > tests/vubr/vubr_config.h | 7 + > tests/vubr/vubr_device.h | 41 +++ > tests/vubr/dispatcher.c | 77 +++++ > tests/vubr/main.c | 18 ++ > tests/vubr/vhost_user.c | 83 +++++ > tests/vubr/vubr_device.c | 773 +++++++++++++++++++++++++++++++++++++++++++++++ > tests/vubr/Makefile | 15 + > 13 files changed, 1345 insertions(+) > create mode 100644 tests/vubr/dispatcher.h > create mode 100644 tests/vubr/vhost.h > create mode 100644 tests/vubr/vhost_user.h > create mode 100644 tests/vubr/virtio_net.h > create mode 100644 tests/vubr/virtio_ring.h > create mode 100644 tests/vubr/virtqueue.h > create mode 100644 tests/vubr/vubr_config.h > create mode 100644 tests/vubr/vubr_device.h > create mode 100644 tests/vubr/dispatcher.c > create mode 100644 tests/vubr/main.c > create mode 100644 tests/vubr/vhost_user.c > create mode 100644 tests/vubr/vubr_device.c > create mode 100644 tests/vubr/Makefile > > diff --git a/tests/vubr/dispatcher.h b/tests/vubr/dispatcher.h > new file mode 100644 > index 0000000..cd02f07 > --- /dev/null > +++ b/tests/vubr/dispatcher.h > @@ -0,0 +1,26 @@ > +#ifndef __DISPATCHER__ > +#define __DISPATCHER__ > + > +#include <stddef.h> > +#include <stdint.h> > +#include <sys/select.h> > + > +typedef void (*callback_func)(int sock, void *ctx); > + > +struct event { > + void *ctx; > + callback_func callback; > +}; > + > +struct dispatcher { > + int max_sock; > + fd_set fdset; > + struct event events[FD_SETSIZE]; > +}; > + > +int dispatcher_init(struct dispatcher *d); > +int dispatcher_add(struct dispatcher *d, int sock, void *ctx, callback_func cb); > +int dispatcher_remove(struct dispatcher *d, int sock); > +int dispatcher_wait(struct dispatcher *d, uint32_t timeout); > + > +#endif /* __DISPATCHER__ */ > diff --git a/tests/vubr/vhost.h b/tests/vubr/vhost.h > new file mode 100644 > index 0000000..3960cc2 > --- /dev/null > +++ b/tests/vubr/vhost.h > @@ -0,0 +1,77 @@ > +#ifndef __VHOST_H__ > +#define __VHOST_H__ > + > +#include <inttypes.h> > + > +/* Most imported form qemu/linux-headers/linux/vhost.h > + * > + * Userspace interface for virtio structures. */ > + > +struct vhost_vring_state { > + unsigned int index; > + unsigned int num; > +}; > + > +struct vhost_vring_file { > + unsigned int index; > + int fd; /* Pass -1 to unbind from file. */ > +}; > + > +struct vhost_vring_addr { > + unsigned int index; > + /* Option flags. */ > + unsigned int flags; > + /* Flag values: */ > + /* Whether log address is valid. If set enables logging. */ > +#define VHOST_VRING_F_LOG 0 > + > + /* Start of array of descriptors (virtually contiguous) */ > + uint64_t desc_user_addr; > + /* Used structure address. Must be 32 bit aligned */ > + uint64_t used_user_addr; > + /* Available structure address. Must be 16 bit aligned */ > + uint64_t avail_user_addr; > + /* Logging support. */ > + /* Log writes to used structure, at offset calculated from specified > + * address. Address must be 32 bit aligned. */ > + uint64_t log_guest_addr; > +}; > + > +#define VHOST_MEMORY_MAX_NREGIONS (8) > + > +struct vhost_memory_region { > + uint64_t guest_phys_addr; > + uint64_t memory_size; /* bytes */ > + uint64_t userspace_addr; > + uint64_t mmap_offset; > +}; > + > +struct vhost_memory { > + uint32_t nregions; > + uint32_t padding; > + struct vhost_memory_region regions[VHOST_MEMORY_MAX_NREGIONS]; > +}; > + > +/* Feature bits */ > +/* Log all write descriptors. Can be changed while device is active. */ > +#define VHOST_F_LOG_ALL 26 > +/* vhost-net should add virtio_net_hdr for RX, and strip for TX packets. */ > +#define VHOST_NET_F_VIRTIO_NET_HDR 27 > + > +struct virtio_net_hdr { > +#define VIRTIO_NET_HDR_F_NEEDS_CSUM 1 > + uint8_t flags; > +#define VIRTIO_NET_HDR_GSO_NONE 0 > +#define VIRTIO_NET_HDR_GSO_TCPV4 1 > +#define VIRTIO_NET_HDR_GSO_UDP 3 > +#define VIRTIO_NET_HDR_GSO_TCPV6 4 > +#define VIRTIO_NET_HDR_GSO_ECN 0x80 > + uint8_t gso_type; > + uint16_t hdr_len; > + uint16_t gso_size; > + uint16_t csum_start; > + uint16_t csum_offset; > + uint16_t num_buffers; > +}; Pls use this one from standard-headers. > + > +#endif /* __VHOST_H__ */ > diff --git a/tests/vubr/vhost_user.h b/tests/vubr/vhost_user.h > new file mode 100644 > index 0000000..44d82d0 > --- /dev/null > +++ b/tests/vubr/vhost_user.h > @@ -0,0 +1,70 @@ > +#ifndef __VHOST_USER_H__ > +#define __VHOST_USER_H__ > + > +/* Based on qemu/hw/virtio/vhost-user.c */ > + > +#include <stdint.h> > +#include <stddef.h> > +#include "vhost.h" > + > +#define VHOST_USER_F_PROTOCOL_FEATURES 30 > +#define VHOST_USER_PROTOCOL_FEATURE_MASK 0x1ULL > + > +#define VHOST_USER_PROTOCOL_F_MQ 0 > + > +#define VHOST_USER_REQUEST_LIST \ > + INFO(NONE, 0) \ > + INFO(GET_FEATURES, 1) \ > + INFO(SET_FEATURES, 2) \ > + INFO(SET_OWNER, 3) \ > + INFO(RESET_DEVICE, 4) \ > + INFO(SET_MEM_TABLE, 5) \ > + INFO(SET_LOG_BASE, 6) \ > + INFO(SET_LOG_FD, 7) \ > + INFO(SET_VRING_NUM, 8) \ > + INFO(SET_VRING_ADDR, 9) \ > + INFO(SET_VRING_BASE, 10) \ > + INFO(GET_VRING_BASE, 11) \ > + INFO(SET_VRING_KICK, 12) \ > + INFO(SET_VRING_CALL, 13) \ > + INFO(SET_VRING_ERR, 14) \ > + INFO(GET_PROTOCOL_FEATURES, 15) \ > + INFO(SET_PROTOCOL_FEATURES, 16) \ > + INFO(GET_QUEUE_NUM, 17) \ > + INFO(SET_VRING_ENABLE, 18) > + > +enum vhost_user_request { > +#define INFO(a, b) VHOST_USER_ ## a = b, I don't think these macro tricks are really justified, and they break tag searches for most editors. Let's keep it simple. > + VHOST_USER_REQUEST_LIST > +#undef INFO > + VHOST_USER_MAX > +}; > + > +struct vhost_user_message { Pls fix names to adhere to QEMU coding style, here and elsewhere. > + enum vhost_user_request request; > + > +#define VHOST_USER_VERSION_MASK (0x3) > +#define VHOST_USER_REPLY_MASK (0x1<<2) > + uint32_t flags; > + uint32_t size; /* the following payload size */ > + union { > +#define VHOST_USER_VRING_IDX_MASK (0xff) > +#define VHOST_USER_VRING_NOFD_MASK (0x1<<8) > + uint64_t u64; > + struct vhost_vring_state state; > + struct vhost_vring_addr addr; > + struct vhost_memory memory; > + } payload; > + int fds[VHOST_MEMORY_MAX_NREGIONS]; > + int fd_num; > +} __attribute__((packed)); > + > +#define VHOST_USER_HDR_SIZE offsetof(struct vhost_user_message, payload.u64) > + > +/* The version of the protocol we support */ > +#define VHOST_USER_VERSION (0x1) > + > +void vhost_user_message_read(int conn_fd, struct vhost_user_message *vmsg); > +void vhost_user_message_write(int conn_fd, struct vhost_user_message *vmsg); > + > +#endif /* __VHOST_USER_H__ */ > diff --git a/tests/vubr/virtio_net.h b/tests/vubr/virtio_net.h > new file mode 100644 > index 0000000..f6f87b1 > --- /dev/null > +++ b/tests/vubr/virtio_net.h > @@ -0,0 +1,38 @@ > +#ifndef __VIRTIO_NET_H__ > +#define __VIRTIO_NET_H__ > + > +/* Form qemu/include/standard-headers/linux/virtio_net.h */ From? Pls just include that header. > + > +/* The feature bitmap for virtio net */ > +#define VIRTIO_NET_F_CSUM 0 /* Host handles pkts w/ partial csum */ > +#define VIRTIO_NET_F_GUEST_CSUM 1 /* Guest handles pkts w/ partial csum */ > +#define VIRTIO_NET_F_CTRL_GUEST_OFFLOADS 2 /* Dynamic offload configuration. */ > +#define VIRTIO_NET_F_MAC 5 /* Host has given MAC address. */ > +#define VIRTIO_NET_F_GUEST_TSO4 7 /* Guest can handle TSOv4 in. */ > +#define VIRTIO_NET_F_GUEST_TSO6 8 /* Guest can handle TSOv6 in. */ > +#define VIRTIO_NET_F_GUEST_ECN 9 /* Guest can handle TSO[6] w/ ECN in. */ > +#define VIRTIO_NET_F_GUEST_UFO 10 /* Guest can handle UFO in. */ > +#define VIRTIO_NET_F_HOST_TSO4 11 /* Host can handle TSOv4 in. */ > +#define VIRTIO_NET_F_HOST_TSO6 12 /* Host can handle TSOv6 in. */ > +#define VIRTIO_NET_F_HOST_ECN 13 /* Host can handle TSO[6] w/ ECN in. */ > +#define VIRTIO_NET_F_HOST_UFO 14 /* Host can handle UFO in. */ > +#define VIRTIO_NET_F_MRG_RXBUF 15 /* Host can merge receive buffers. */ > +#define VIRTIO_NET_F_STATUS 16 /* virtio_net_config.status available */ > +#define VIRTIO_NET_F_CTRL_VQ 17 /* Control channel available */ > +#define VIRTIO_NET_F_CTRL_RX 18 /* Control channel RX mode support */ > +#define VIRTIO_NET_F_CTRL_VLAN 19 /* Control channel VLAN filtering */ > +#define VIRTIO_NET_F_CTRL_RX_EXTRA 20 /* Extra RX mode control support */ > +#define VIRTIO_NET_F_GUEST_ANNOUNCE 21 /* Guest can announce device on the > + * network */ > +#define VIRTIO_NET_F_MQ 22 /* Device supports Receive Flow > + * Steering */ > +#define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */ > + > +#ifndef VIRTIO_NET_NO_LEGACY > +#define VIRTIO_NET_F_GSO 6 /* Host handles pkts w/ any GSO type */ > +#endif /* VIRTIO_NET_NO_LEGACY */ > + > +#define VIRTIO_NET_S_LINK_UP 1 /* Link is up */ > +#define VIRTIO_NET_S_ANNOUNCE 2 /* Announcement is needed */ > + > +#endif /* __VIRTIO_NET_H__ */ > diff --git a/tests/vubr/virtio_ring.h b/tests/vubr/virtio_ring.h > new file mode 100644 > index 0000000..e2d0adb > --- /dev/null > +++ b/tests/vubr/virtio_ring.h Pls pick the header we have under standard-headers > @@ -0,0 +1,103 @@ > +#ifndef VIRTQUEUE_H > +#define VIRTQUEUE_H > +/* > + * > + * Virtual I/O Device (VIRTIO) Version 1.0 > + * Committee Specification 03 > + * 02 August 2015 > + * Copyright (c) OASIS Open 2015. All Rights Reserved. > + * Source: http://docs.oasis-open.org/virtio/virtio/v1.0/cs03/listings/ > + * > + */ > +#include <stdint.h> > + > +typedef uint64_t le64; > +typedef uint32_t le32; > +typedef uint16_t le16; > + > +/* This marks a buffer as continuing via the next field. */ > +#define VIRTQ_DESC_F_NEXT 1 > +/* This marks a buffer as write-only (otherwise read-only). */ > +#define VIRTQ_DESC_F_WRITE 2 > +/* This means the buffer contains a list of buffer descriptors. */ > +#define VIRTQ_DESC_F_INDIRECT 4 > + > +/* The device uses this in used->flags to advise the driver: don't kick me > + * when you add a buffer. It's unreliable, so it's simply an > + * optimization. */ > +#define VIRTQ_USED_F_NO_NOTIFY 1 > +/* The driver uses this in avail->flags to advise the device: don't > + * interrupt me when you consume a buffer. It's unreliable, so it's > + * simply an optimization. */ > +#define VIRTQ_AVAIL_F_NO_INTERRUPT 1 > + > +/* Support for indirect descriptors */ > +#define VIRTIO_F_INDIRECT_DESC 28 > + > +/* Support for avail_event and used_event fields */ > +#define VIRTIO_F_EVENT_IDX 29 > + > +/* Arbitrary descriptor layouts. */ > +#define VIRTIO_F_ANY_LAYOUT 27 > + > +/* Virtqueue descriptors: 16 bytes. > + * These can chain together via "next". */ > +struct virtq_desc { > + /* Address (guest-physical). */ > + le64 addr; > + /* Length. */ > + le32 len; > + /* The flags as indicated above. */ > + le16 flags; > + /* We chain unused descriptors via this, too */ > + le16 next; > +}; > + > +struct virtq_avail { > + le16 flags; > + le16 idx; > + le16 ring[]; > + /* Only if VIRTIO_F_EVENT_IDX: le16 used_event; */ > +}; > + > +/* le32 is used here for ids for padding reasons. */ > +struct virtq_used_elem { > + /* Index of start of used descriptor chain. */ > + le32 id; > + /* Total length of the descriptor chain which was written to. */ > + le32 len; > +}; > + > +struct virtq_used { > + le16 flags; > + le16 idx; > + struct virtq_used_elem ring[]; > + /* Only if VIRTIO_F_EVENT_IDX: le16 avail_event; */ > +}; > + > +struct virtq { > + unsigned int num; > + > + struct virtq_desc *desc; > + struct virtq_avail *avail; > + struct virtq_used *used; > +}; > + > +static inline int virtq_need_event(uint16_t event_idx, uint16_t new_idx, uint16_t old_idx) > +{ > + return (uint16_t)(new_idx - event_idx - 1) < (uint16_t)(new_idx - old_idx); > +} > + > +/* Get location of event indices (only with VIRTIO_F_EVENT_IDX) */ > +static inline le16 *virtq_used_event(struct virtq *vq) > +{ > + /* For backwards compat, used event index is at *end* of avail ring. */ > + return &vq->avail->ring[vq->num]; > +} > + > +static inline le16 *virtq_avail_event(struct virtq *vq) > +{ > + /* For backwards compat, avail event index is at *end* of used ring. */ > + return (le16 *)&vq->used->ring[vq->num]; > +} > +#endif /* VIRTQUEUE_H */ > diff --git a/tests/vubr/virtqueue.h b/tests/vubr/virtqueue.h > new file mode 100644 > index 0000000..b018cca > --- /dev/null > +++ b/tests/vubr/virtqueue.h > @@ -0,0 +1,17 @@ > +#ifndef __VIRTQUEUE__ > +#define __VIRTQUEUE__ > + > +#include "virtio_ring.h" > + > +struct virtqueue { > + int call_fd; > + int kick_fd; > + uint32_t size; > + uint16_t last_avail_index; > + uint16_t last_used_index; > + struct virtq_desc* desc; > + struct virtq_avail* avail; > + struct virtq_used* used; > +}; > + > +#endif /* __VIRTQUEUE__ */ > diff --git a/tests/vubr/vubr_config.h b/tests/vubr/vubr_config.h > new file mode 100644 > index 0000000..19681d0 > --- /dev/null > +++ b/tests/vubr/vubr_config.h > @@ -0,0 +1,7 @@ > +#ifndef __VHU_CONFIG__ > +#define __VHU_CONFIG__ > + > +#define VHOST_USER_SHOW_MGMT_TRAFFIC > +#define VHOST_USER_SHOW_NET_TRAFFIC I don't see why this needs a header for itself. Also, these ifdefs all over the code look inelegant. I'd just do #ifdef VHOST_USER_SHOW_MGMT_TRAFFIC #define DEBUG_VHOST_USER_BRIDGE_MGMT(...) printf(#arg...) #else #define DEBUG_VHOST_USER_BRIDGE_MGMT(...) do {} while (0) #endif and then code has no ifdefs. > + > +#endif /* __VHU_CONFIG__ */ > diff --git a/tests/vubr/vubr_device.h b/tests/vubr/vubr_device.h > new file mode 100644 > index 0000000..04a0ecb > --- /dev/null > +++ b/tests/vubr/vubr_device.h > @@ -0,0 +1,41 @@ > +#ifndef __VHU_DEVICE__ > +#define __VHU_DEVICE__ > + > +#include <arpa/inet.h> > +#include <sys/socket.h> > + > +#include "vhost.h" > +#include "virtqueue.h" > +#include "dispatcher.h" > + > +#define MAX_NR_VIRTQUEUE (8) > + > +struct vubr_device_region { > + /* Guest Phhysical address. */ > + uint64_t gpa; You need to add a header to use these. > + /* Memory region size. */ > + uint64_t size; > + /* QEMU virtual address (userspace). */ > + uint64_t qva; > + /* Starting offset in our mmaped space. */ > + uint64_t mmap_offset; > + /* Start addrtess of mmaped space. */ > + uint64_t mmap_addr; > +}; > + > +struct vubr_device { > + int sock; > + struct dispatcher dispatcher; > + uint32_t nregions; > + struct vubr_device_region regions[VHOST_MEMORY_MAX_NREGIONS]; > + struct virtqueue virtqueue[MAX_NR_VIRTQUEUE]; > + int backend_udp_sock; > + struct sockaddr_in backend_udp_dest; > +}; > + > +struct vubr_device *vubr_device_new(char *path); > +void vubr_device_run(struct vubr_device * dev); > +void vubr_device_backend_udp_setup(struct vubr_device *dev, char *local_host, > + uint16_t local_port, char *dest_host, uint16_t dest_port); > + > +#endif /* __VHU_DEVICE__ */ > diff --git a/tests/vubr/dispatcher.c b/tests/vubr/dispatcher.c > new file mode 100644 > index 0000000..62d386a > --- /dev/null > +++ b/tests/vubr/dispatcher.c Pls add copyright info. GPLv2+ is preferred. > @@ -0,0 +1,77 @@ > +#include <stdio.h> > +#include <sys/select.h> > + > +#include "vubr_config.h" > +#include "dispatcher.h" > + > +int > +dispatcher_init(struct dispatcher *d) > +{ > + FD_ZERO(&d->fdset); > + d->max_sock = -1; > + return 0; > +} > + > +int > +dispatcher_add(struct dispatcher *d, int sock, void *ctx, callback_func cb) > +{ > + if (sock >= FD_SETSIZE) { > + fprintf(stderr, "Error: Failed to add new event. sock %d should be less than %d\n", > + sock, FD_SETSIZE); > + return -1; > + } > + > + d->events[sock].ctx = ctx; > + d->events[sock].callback = cb; > + > + FD_SET(sock, &d->fdset); > + if (sock > d->max_sock) > + d->max_sock = sock; > + printf("DEBUG: Added sock %d for watching. max_sock: %d\n", sock, d->max_sock); > + return 0; > +} > + > +int > +dispatcher_remove(struct dispatcher *d, int sock) > +{ > + if (sock >= FD_SETSIZE) { > + fprintf(stderr, "Error: Failed to remove event. sock %d should be less than %d\n", > + sock, FD_SETSIZE); > + return -1; > + } > + > + FD_CLR(sock, &d->fdset); > + return 0; > +} > + > +/* timeout in us */ > +int > +dispatcher_wait(struct dispatcher *d, uint32_t timeout) > +{ > + struct timeval tv; > + tv.tv_sec = timeout / 1000000; > + tv.tv_usec = timeout % 1000000; > + > + fd_set fdset = d->fdset; > + > + /* wait until some of sockets become readable. */ > + int rc = select(d->max_sock + 1, &fdset, 0, 0, &tv); > + > + if (rc == -1) > + perror("select"); > + > + /* Timeout */ > + if (rc == 0) > + return 0; > + > + /* Now call callback for every ready socket. */ > + > + int sock; > + for (sock = 0; sock < d->max_sock + 1; sock++) > + if (FD_ISSET(sock, &fdset)) { > + struct event *e = &d->events[sock]; > + e->callback(sock, e->ctx); > + } > + > + return 0; > +} > diff --git a/tests/vubr/main.c b/tests/vubr/main.c > new file mode 100644 > index 0000000..a7a3e9d > --- /dev/null > +++ b/tests/vubr/main.c > @@ -0,0 +1,18 @@ > +#include "vubr_config.h" > +#include "vubr_device.h" > + > +int main(int argc, char* argv[]) > +{ > + struct vubr_device *dev; > + > + if((dev = vubr_device_new("/tmp/vubr.sock"))) { > + vubr_device_backend_udp_setup(dev, > + "127.0.0.1", 4444, > + "127.0.0.1", 5555); > + > + vubr_device_run(dev); > + return 0; > + } > + else > + return 1; > +} > diff --git a/tests/vubr/vhost_user.c b/tests/vubr/vhost_user.c > new file mode 100644 > index 0000000..91ab09e > --- /dev/null > +++ b/tests/vubr/vhost_user.c > @@ -0,0 +1,83 @@ > +#include <sys/types.h> > +#include <sys/socket.h> > +#include <string.h> > +#include <stdio.h> > +#include <stdlib.h> > +#include <assert.h> > +#include <unistd.h> > +#include <errno.h> > + > +#include "vubr_config.h" > +#include "vhost_user.h" > + > +void > +vhost_user_message_read(int conn_fd, struct vhost_user_message *vmsg) > +{ > + int rc; > + struct msghdr msg = {}; > + struct iovec iov; > + size_t fd_size = VHOST_MEMORY_MAX_NREGIONS * sizeof(int); > + char control[CMSG_SPACE(fd_size)]; empty line won't hurt here, after all declarations. > + memset(control, 0, sizeof(control)); > + > + iov.iov_base = (char *)vmsg; > + iov.iov_len = VHOST_USER_HDR_SIZE; > + > + msg.msg_iov = &iov; > + msg.msg_iovlen = 1; > + msg.msg_control = control; > + msg.msg_controllen = sizeof(control); > + > + rc = recvmsg(conn_fd, &msg, 0); > + > + if (rc <= 0) { > + perror("recvmsg"); > + exit(1); > + } > + > + vmsg->fd_num = 0; > + struct cmsghdr *cmsg; > + for (cmsg = CMSG_FIRSTHDR(&msg); > + cmsg != NULL; > + cmsg = CMSG_NXTHDR(&msg, cmsg)) > + { > + if ((cmsg->cmsg_level == SOL_SOCKET) && Don't use tabs for indents, and you don't need so many () around simple math. > + (cmsg->cmsg_type == SCM_RIGHTS)) > + { > + fd_size = cmsg->cmsg_len - CMSG_LEN(0); > + vmsg->fd_num = fd_size / sizeof(int); > + memcpy(vmsg->fds, CMSG_DATA(cmsg), fd_size); > + break; > + } > + } > + > + if (vmsg->size > sizeof(vmsg->payload)) { > + fprintf(stderr, "Error: too big message request: %d, size: vmsg->size: %u, while sizeof(vmsg->payload) = %lu\n", > + vmsg->request, vmsg->size, sizeof(vmsg->payload)); > + exit(1); > + } > + > + if (vmsg->size) { > + rc = read(conn_fd, &vmsg->payload, vmsg->size); > + if (rc <= 0) { > + perror("recvmsg"); > + exit(1); > + } > + > + assert(rc == vmsg->size); > + } > +} > + > +void > +vhost_user_message_write(int conn_fd, struct vhost_user_message *vmsg) > +{ > + int rc; > + do { > + rc = write(conn_fd, vmsg, VHOST_USER_HDR_SIZE + vmsg->size); > + } while (rc < 0 && errno == EINTR); > + > + if (rc < 0) { > + perror("write"); > + exit(1); > + } > +} > diff --git a/tests/vubr/vubr_device.c b/tests/vubr/vubr_device.c > new file mode 100644 > index 0000000..a296aef > --- /dev/null > +++ b/tests/vubr/vubr_device.c > @@ -0,0 +1,773 @@ > +#include <assert.h> > +#include <stdio.h> > +#include <stdlib.h> > +#include <inttypes.h> > +#include <string.h> > +#include <unistd.h> > +#include <errno.h> > +#include <sys/types.h> > +#include <sys/socket.h> > +#include <sys/un.h> > +#include <sys/unistd.h> > +#include <sys/mman.h> > +#include <sys/eventfd.h> > + > +#include "vubr_config.h" > +#include "vhost_user.h" > +#include "virtio_net.h" > +#include "vubr_device.h" > +#include "virtqueue.h" > + > +#ifdef VHOST_USER_SHOW_MGMT_TRAFFIC > +static char *vhost_user_request_str[] = { > +#define INFO(name,num) \ > + [num] = #name, > +VHOST_USER_REQUEST_LIST > +#undef INFO > +}; > +#endif /* VHOST_USER_SHOW_MGMT_TRAFFIC */ > + > +void > +die(char *s) Pls do prefix functions with "vubr" and pls don't prefix them with _. > +{ > + perror(s); > + exit(1); > +} > + > +static void > +print_buffer(uint8_t* buf, size_t len) > +{ > + int i; > + printf("raw buffer:\n"); > + for(i = 0; i < len; i++) { > + if ((i % 16) == 0) > + printf("\n"); > + if ((i % 4) == 0) > + printf(" "); > + printf("%02x ", buf[i]); > + } > + printf("\n............................................................\n"); > +} > + > +/* Translate guest physical address to our virtual address. */ > +static uint64_t __attribute__((unused)) > +gpa_to_va(struct vubr_device *dev, uint64_t guest_addr) > +{ > + int i; emoty line won't hurt after declaration. > + /* Find matching memory region. */ > + > + for (i = 0; i < dev->nregions; i++) { > + struct vubr_device_region *r = &dev->regions[i]; > + > + if ((guest_addr >= r->gpa) && (guest_addr < (r->gpa + r->size))) > + return (guest_addr - r->gpa + r->mmap_addr + r->mmap_offset); > + } > + > + assert(!"address not found in regions"); > + return 0; > +} > + > +/* Translate qemu virtual address to our virtual address. */ > +static uint64_t > +qva_to_va(struct vubr_device *dev, uint64_t qemu_addr) > +{ > + int i; > + /* Find matching memory region. */ > + > + for (i = 0; i < dev->nregions; i++) { > + struct vubr_device_region *r = &dev->regions[i]; > + > + if ((qemu_addr >= r->qva) && (qemu_addr < (r->qva + r->size))) > + return (qemu_addr - r->qva + r->mmap_addr + r->mmap_offset); > + } > + > + assert(!"address not found in regions"); > + return 0; > +} > + > +static void vubr_device_backend_udp_sendbuf(struct vubr_device *dev, uint8_t *buf, size_t len); Line way too long. And pls don't forward-declare static functions. Just sort them by use. > + > +static void > +_consume_raw_packet(struct vubr_device *dev, uint8_t *buf, uint32_t len) > +{ > + int hdrlen = sizeof(struct virtio_net_hdr); > + > +#ifdef VHOST_USER_SHOW_NET_TRAFFIC > + print_buffer(buf, len); > +#endif > + vubr_device_backend_udp_sendbuf(dev, buf + hdrlen, len - hdrlen); > +} > + > +/* Kick the guest if necessary. */ > +static void > +virtqueue_kick(struct virtqueue *vq) > +{ > + if (!(vq->avail->flags & VIRTQ_AVAIL_F_NO_INTERRUPT)) { > + printf("Kicking the guest...\n"); > + eventfd_write(vq->call_fd, 1); > + } > +} > + > +static void > +_post_buffer(struct vubr_device *dev, struct virtqueue *vq, uint8_t *buf, int32_t len) > +{ > + struct virtq_desc* desc = vq->desc; > + struct virtq_avail* avail = vq->avail; > + struct virtq_used* used = vq->used; > + > + unsigned int size = vq->size; > + > + uint16_t a_index = vq->last_avail_index % size; > + uint16_t u_index = vq->last_used_index % size; > + uint16_t d_index = avail->ring[a_index]; > + > + int i = d_index; > + > +#ifdef VHOST_USER_SHOW_NET_TRAFFIC > + printf("Posting the packet to guest on vq:\n"); > + printf(" size = %d\n", vq->size); > + printf(" last_avail_index = %d\n", vq->last_avail_index); > + printf(" last_used_index = %d\n", vq->last_used_index); > + printf(" a_index = %d\n", a_index); > + printf(" u_index = %d\n", u_index); > + printf(" d_index = %d\n", d_index); > + printf(" desc[%d].addr = 0x%016"PRIx64"\n", i, desc[i].addr); > + printf(" desc[%d].len = %d\n", i, desc[i].len); > + printf(" desc[%d].flags = %d\n", i, desc[i].flags); > + printf(" avail->idx = %d\n", avail->idx); > + printf(" used->idx = %d\n", used->idx); > +#endif > + > + if (!(desc[i].flags & VIRTQ_DESC_F_WRITE)) { > + // FIXME: we should find writable descriptor > + fprintf(stderr, "descriptor is not writable. exiting.\n"); > + exit(1); > + } > + > + void *chunk_start = (void *)gpa_to_va(dev, desc[i].addr); > + uint32_t chunk_len = desc[i].len; > + > + if (len <= chunk_len) { > + memcpy(chunk_start, buf, len); > + } else { > + fprintf(stderr, "received too long packet from the backend. dropping...\n"); > + return; > + } > + > + /* Add descriptor to the used ring. */ > + used->ring[u_index].id = d_index; > + used->ring[u_index].len = len; > + > + vq->last_avail_index++; > + vq->last_used_index++; > + > + used->idx = vq->last_used_index; > + > + /* Kick the guest if necessary. */ > + virtqueue_kick(vq); > +} > + > +static int > +_process_desc(struct vubr_device *dev, struct virtqueue *vq) > +{ > + struct virtq_desc* desc = vq->desc; > + struct virtq_avail* avail = vq->avail; > + struct virtq_used* used = vq->used; > + > + unsigned int size = vq->size; > + > + uint16_t a_index = vq->last_avail_index % size; > + uint16_t u_index = vq->last_used_index % size; > + uint16_t d_index = avail->ring[a_index]; > + > + uint32_t i, len = 0; > + size_t buf_size = 4096; > + uint8_t buf[4096]; > + > +#ifdef VHOST_USER_SHOW_NET_TRAFFIC > + printf("chunks: "); > +#endif > + > + i = d_index; > + do { > + void *chunk_start = (void *)gpa_to_va(dev, desc[i].addr); > + uint32_t chunk_len = desc[i].len; > + > + if (len + chunk_len < buf_size) { > + memcpy(buf + len, chunk_start, chunk_len); > +#ifdef VHOST_USER_SHOW_NET_TRAFFIC > + printf("%d ", chunk_len); > +#endif > + } else { > + fprintf(stderr, "too long packet. dropping...\n"); > + break; > + } > + > + len += chunk_len; > + > + if (!(desc[i].flags & VIRTQ_DESC_F_NEXT)) > + break; > + > + i = desc[i].next; > + } while(1); > + > + if (!len) > + return -1; > + > + /* Add descriptor to the used ring. */ > + used->ring[u_index].id = d_index; > + used->ring[u_index].len = len; > + > +#ifdef VHOST_USER_SHOW_NET_TRAFFIC > + printf("\n"); > +#endif > + > + _consume_raw_packet(dev, buf, len); > + > + return 0; > +} > + > +static void > +_process_avail(struct vubr_device *dev, struct virtqueue *vq) > +{ > + struct virtq_avail *avail = vq->avail; > + struct virtq_used* used = vq->used; > + > + while(vq->last_avail_index != avail->idx) { > + _process_desc(dev, vq); There are no memory barriers anywere, this is almost sure to be racy. > + vq->last_avail_index++; > + vq->last_used_index++; > + } > + > + used->idx = vq->last_used_index; > +} > + > +static int vubr_device_backend_udp_recvbuf(struct vubr_device *dev, uint8_t *buf, size_t buflen); > + > +static void > +_backend_recv_cb(int sock, void *ctx) > +{ > + printf("\n\n *** IN UDP RECEIVE CALLBACK ***\n\n"); > + struct vubr_device *dev = (struct vubr_device *) ctx; > + struct virtqueue *rx_vq = &dev->virtqueue[0]; > +#define BUFLEN 4096 > + uint8_t buf[BUFLEN]; > + int len; > + struct virtio_net_hdr *hdr = (struct virtio_net_hdr *)buf; > + int hdrlen = sizeof(struct virtio_net_hdr); > + > + *hdr = (struct virtio_net_hdr) {}; > + hdr->num_buffers = 1; > + > + len = vubr_device_backend_udp_recvbuf(dev, buf + hdrlen, BUFLEN - hdrlen); > + _post_buffer(dev, rx_vq, buf, len + hdrlen); > +#undef BUFLEN Pls don't play such preprocessor tricks. > +} > + > +static void > +_kick_cb(int sock, void *ctx) > +{ > + struct vubr_device *dev = (struct vubr_device *) ctx; > + eventfd_t kick_data; > + ssize_t rc; > + > + rc = eventfd_read(sock, &kick_data); > + > + if (rc == -1) { > + perror("read kick"); > + exit(1); > + } else { > +#ifdef VHOST_USER_SHOW_MGMT_TRAFFIC > + printf("Got kick_data: %016"PRIx64"\n", kick_data); > +#endif > + _process_avail(dev, &dev->virtqueue[1]); > + } > +} > + > +static int > +_execute_NONE(struct vubr_device *dev, struct vhost_user_message *vmsg) > +{ > + printf("function %s() not implemented yet.\n", __FUNCTION__); > + return 0; > +} > + > +static int > +_execute_GET_FEATURES(struct vubr_device *dev, struct vhost_user_message *vmsg) > +{ > + vmsg->payload.u64 = > + ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | > + (1ULL << VIRTIO_NET_F_CTRL_VQ) | > + (1ULL << VIRTIO_NET_F_CTRL_RX) | > + (1ULL << VHOST_F_LOG_ALL)); > + vmsg->size = sizeof(vmsg->payload.u64); > + > +#ifdef VHOST_USER_SHOW_MGMT_TRAFFIC > + printf("returing u64: 0x%016"PRIx64"\n", vmsg->payload.u64); > +#endif > + > + /* reply */ > + return 1; > +} > + > +static int > +_execute_SET_FEATURES(struct vubr_device *dev, struct vhost_user_message *vmsg) > +{ > +#ifdef VHOST_USER_SHOW_MGMT_TRAFFIC > + printf("u64: 0x%016"PRIx64"\n", vmsg->payload.u64); > +#endif > + return 0; > +} > + > +static int > +_execute_SET_OWNER(struct vubr_device *dev, struct vhost_user_message *vmsg) > +{ > + printf("function %s() not implemented yet.\n", __FUNCTION__); You don't have to do anything if you don't want to. > + return 0; > +} > + > +static int > +_execute_RESET_DEVICE(struct vubr_device *dev, struct vhost_user_message *vmsg) > +{ > + printf("function %s() not implemented yet.\n", __FUNCTION__); Basically just stop processing rings until KICK. > + return 0; > +} > + > +static int > +_execute_SET_MEM_TABLE(struct vubr_device *dev, struct vhost_user_message *vmsg) > +{ > + printf("Nregions: %d\n", vmsg->payload.memory.nregions); > + > + struct vhost_memory *memory = &vmsg->payload.memory; > + dev->nregions = memory->nregions; > + int i; > + for (i = 0; i < dev->nregions; i++) { > + struct vhost_memory_region *msg_region = &memory->regions[i]; > + struct vubr_device_region *dev_region = &dev->regions[i]; > + > +#ifdef VHOST_USER_SHOW_MGMT_TRAFFIC > + printf("Region %d\n", i); > + printf(" guest_phys_addr: 0x%016"PRIx64"\n", msg_region->guest_phys_addr); > + printf(" memory_size: 0x%016"PRIx64"\n", msg_region->memory_size); > + printf(" userspace_addr 0x%016"PRIx64"\n", msg_region->userspace_addr); > + printf(" mmap_offset 0x%016"PRIx64"\n", msg_region->mmap_offset); > +#endif > + > + dev_region->gpa = msg_region->guest_phys_addr; > + dev_region->size = msg_region->memory_size; > + dev_region->qva = msg_region->userspace_addr; > + dev_region->mmap_offset = msg_region->mmap_offset; > + > + void *mmap_addr; > + > + /* We don't use offset argument of mmap() since the > + * mapped address has to be page aligned, and we use huge > + * pages. */ > + mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset, > + PROT_READ | PROT_WRITE, MAP_SHARED, > + vmsg->fds[i], 0); > + > + if (mmap_addr == MAP_FAILED) { > + perror("mmap"); > + exit (1); > + } > + > + dev_region->mmap_addr = (uint64_t) mmap_addr; > + > +#ifdef VHOST_USER_SHOW_MGMT_TRAFFIC > + printf(" mmap_addr: 0x%016"PRIx64"\n", dev_region->mmap_addr); > +#endif > + } > + > + return 0; > +} > + > +static int > +_execute_SET_LOG_BASE(struct vubr_device *dev, struct vhost_user_message *vmsg) > +{ > + printf("function %s() not implemented yet.\n", __FUNCTION__); > + return 0; > +} > + > +static int > +_execute_SET_LOG_FD(struct vubr_device *dev, struct vhost_user_message *vmsg) > +{ > + printf("function %s() not implemented yet.\n", __FUNCTION__); > + return 0; > +} > + > +static int > +_execute_SET_VRING_NUM(struct vubr_device *dev, struct vhost_user_message *vmsg) > +{ > + unsigned int index = vmsg->payload.state.index; > + unsigned int num = vmsg->payload.state.num; > + > +#ifdef VHOST_USER_SHOW_MGMT_TRAFFIC > + printf("state.index: %d\n", index); > + printf("state.num: %d\n", num); > +#endif > + dev->virtqueue[index].size = num; > + return 0; > +} > + > +static int > +_execute_SET_VRING_ADDR(struct vubr_device *dev, struct vhost_user_message *vmsg) > +{ > + struct vhost_vring_addr *vra = &vmsg->payload.addr; > +#ifdef VHOST_USER_SHOW_MGMT_TRAFFIC > + printf("vhost_vring_addr:\n"); > + printf(" index: %d\n", vra->index); > + printf(" flags: %d\n", vra->flags); > + printf(" desc_user_addr: 0x%016"PRIx64"\n", vra->desc_user_addr); > + printf(" used_user_addr: 0x%016"PRIx64"\n", vra->used_user_addr); > + printf(" avail_user_addr: 0x%016"PRIx64"\n", vra->avail_user_addr); > + printf(" log_guest_addr: 0x%016"PRIx64"\n", vra->log_guest_addr); > +#endif > + > + unsigned int index = vra->index; > + struct virtqueue *vq = &dev->virtqueue[index]; > + > + vq->desc = (struct virtq_desc *)qva_to_va(dev, vra->desc_user_addr); > + vq->used = (struct virtq_used *)qva_to_va(dev, vra->used_user_addr); > + vq->avail = (struct virtq_avail *)qva_to_va(dev, vra->avail_user_addr); > + > + printf("Setting virtq addresses:\n"); > + printf(" virtq_desc at %p\n", vq->desc); > + printf(" virtq_used at %p\n", vq->used); > + printf(" virtq_avail at %p\n", vq->avail); > + > + vq->last_used_index = vq->used->idx; > + return 0; > +} > + > +static int > +_execute_SET_VRING_BASE(struct vubr_device *dev, struct vhost_user_message *vmsg) > +{ > + unsigned int index = vmsg->payload.state.index; > + unsigned int num = vmsg->payload.state.num; > + > +#ifdef VHOST_USER_SHOW_MGMT_TRAFFIC > + printf("state.index: %d\n", index); > + printf("state.num: %d\n", num); > +#endif > + dev->virtqueue[index].last_avail_index = num; > + > + return 0; > +} > + > +static int > +_execute_GET_VRING_BASE(struct vubr_device *dev, struct vhost_user_message *vmsg) > +{ > + printf("function %s() not implemented yet.\n", __FUNCTION__); > + return 0; > +} > + > +static int > +_execute_SET_VRING_KICK(struct vubr_device *dev, struct vhost_user_message *vmsg) > +{ > +#ifdef VHOST_USER_SHOW_MGMT_TRAFFIC > + printf("u64: 0x%016"PRIx64"\n", vmsg->payload.u64); > +#endif > + > + uint64_t u64_arg = vmsg->payload.u64; > + int index = u64_arg & VHOST_USER_VRING_IDX_MASK; > + > + assert((u64_arg & VHOST_USER_VRING_NOFD_MASK) == 0); > + assert(vmsg->fd_num == 1); > + > + dev->virtqueue[index].kick_fd = vmsg->fds[0]; > + printf("Got kick_fd: %d for vq: %d\n", vmsg->fds[0], index); > + > + if ((index % 2 == 1)) { > + /* TX queue. */ > + dispatcher_add(&dev->dispatcher, dev->virtqueue[index].kick_fd, dev, _kick_cb); > + > +#ifdef VHOST_USER_SHOW_MGMT_TRAFFIC > + printf("Waiting for kicks on fd: %d for vq: %d\n", > + dev->virtqueue[index].kick_fd, index); > +#endif > + } > + return 0; > +} > + > +static int > +_execute_SET_VRING_CALL(struct vubr_device *dev, struct vhost_user_message *vmsg) > +{ > +#ifdef VHOST_USER_SHOW_MGMT_TRAFFIC > + printf("u64: 0x%016"PRIx64"\n", vmsg->payload.u64); > +#endif > + > + uint64_t u64_arg = vmsg->payload.u64; > + int index = u64_arg & VHOST_USER_VRING_IDX_MASK; > + > + assert((u64_arg & VHOST_USER_VRING_NOFD_MASK) == 0); > + assert(vmsg->fd_num == 1); > + > + dev->virtqueue[index].call_fd = vmsg->fds[0]; > + printf("Got call_fd: %d for vq: %d\n", vmsg->fds[0], index); > + > + return 0; > +} > + > +static int > +_execute_SET_VRING_ERR(struct vubr_device *dev, struct vhost_user_message *vmsg) > +{ > +#ifdef VHOST_USER_SHOW_MGMT_TRAFFIC > + printf("u64: 0x%016"PRIx64"\n", vmsg->payload.u64); > +#endif > + return 0; > +} > + > +static int > +_execute_GET_PROTOCOL_FEATURES(struct vubr_device *dev, struct vhost_user_message *vmsg) > +{ > + /* FIXME: unimplented */ > +#ifdef VHOST_USER_SHOW_MGMT_TRAFFIC > + printf("u64: 0x%016"PRIx64"\n", vmsg->payload.u64); > +#endif > + return 0; > +} > + > +static int > +_execute_SET_PROTOCOL_FEATURES(struct vubr_device *dev, struct vhost_user_message *vmsg) > +{ > + /* FIXME: unimplented */ > +#ifdef VHOST_USER_SHOW_MGMT_TRAFFIC > + printf("u64: 0x%016"PRIx64"\n", vmsg->payload.u64); > +#endif > + return 0; > +} > + > +static int > +_execute_GET_QUEUE_NUM(struct vubr_device *dev, struct vhost_user_message *vmsg) > +{ > + printf("function %s() not implemented yet.\n", __FUNCTION__); > + return 0; > +} > + > +static int > +_execute_SET_VRING_ENABLE(struct vubr_device *dev, struct vhost_user_message *vmsg) > +{ > + printf("function %s() not implemented yet.\n", __FUNCTION__); > + return 0; > +} > + > +static int > +vubr_device_execute_request(struct vubr_device *dev, struct vhost_user_message *vmsg) > +{ > +#ifdef VHOST_USER_SHOW_MGMT_TRAFFIC > + /* Print out generic part of the request. */ > + printf("======================= Vhost user message from QEMU =======================\n"); > + printf("Request: %s (%d)\n", vhost_user_request_str[vmsg->request], vmsg->request); > + printf("Flags: 0x%x\n", vmsg->flags); > + printf("Size: %d\n", vmsg->size); > + > + if (vmsg->fd_num) { > + int i; > + printf("Fds:"); > + for (i = 0; i < vmsg->fd_num; i++) > + printf(" %d", vmsg->fds[i]); > + printf("\n"); > + } > +#endif /* VHOST_USER_SHOW_MGMT_TRAFFIC */ > + > + switch (vmsg->request) { > + case VHOST_USER_NONE: > + return _execute_NONE(dev, vmsg); > + case VHOST_USER_GET_FEATURES: > + return _execute_GET_FEATURES(dev, vmsg); > + case VHOST_USER_SET_FEATURES: > + return _execute_SET_FEATURES(dev, vmsg); > + case VHOST_USER_SET_OWNER: > + return _execute_SET_OWNER(dev, vmsg); > + case VHOST_USER_RESET_DEVICE: > + return _execute_RESET_DEVICE(dev, vmsg); > + case VHOST_USER_SET_MEM_TABLE: > + return _execute_SET_MEM_TABLE(dev, vmsg); > + case VHOST_USER_SET_LOG_BASE: > + return _execute_SET_LOG_BASE(dev, vmsg); > + case VHOST_USER_SET_LOG_FD: > + return _execute_SET_LOG_FD(dev, vmsg); > + case VHOST_USER_SET_VRING_NUM: > + return _execute_SET_VRING_NUM(dev, vmsg); > + case VHOST_USER_SET_VRING_ADDR: > + return _execute_SET_VRING_ADDR(dev, vmsg); > + case VHOST_USER_SET_VRING_BASE: > + return _execute_SET_VRING_BASE(dev, vmsg); > + case VHOST_USER_GET_VRING_BASE: > + return _execute_GET_VRING_BASE(dev, vmsg); > + case VHOST_USER_SET_VRING_KICK: > + return _execute_SET_VRING_KICK(dev, vmsg); > + case VHOST_USER_SET_VRING_CALL: > + return _execute_SET_VRING_CALL(dev, vmsg); > + case VHOST_USER_SET_VRING_ERR: > + return _execute_SET_VRING_ERR(dev, vmsg); > + case VHOST_USER_GET_PROTOCOL_FEATURES: > + return _execute_GET_PROTOCOL_FEATURES(dev, vmsg); > + case VHOST_USER_SET_PROTOCOL_FEATURES: > + return _execute_SET_PROTOCOL_FEATURES(dev, vmsg); > + case VHOST_USER_GET_QUEUE_NUM: > + return _execute_GET_QUEUE_NUM(dev, vmsg); > + case VHOST_USER_SET_VRING_ENABLE: > + return _execute_SET_VRING_ENABLE(dev, vmsg); > + case VHOST_USER_MAX: > + assert(vmsg->request != VHOST_USER_MAX); > + } > + return 0; > +} > + > +static void > +vubr_device_receive_cb(int sock, void *ctx) > +{ > + struct vubr_device *dev = (struct vubr_device *) ctx; > + struct vhost_user_message vmsg; > + > + vhost_user_message_read(sock, &vmsg); > + > + int reply_requested = vubr_device_execute_request(dev, &vmsg); > + > + if (reply_requested) { > + /* Set the version in the flags when sending the reply */ > + vmsg.flags &= ~VHOST_USER_VERSION_MASK; > + vmsg.flags |= VHOST_USER_VERSION; > + vmsg.flags |= VHOST_USER_REPLY_MASK; > + vhost_user_message_write(sock, &vmsg); > + } > +} > + > +static void > +vubr_device_accept_cb(int sock, void *ctx) > +{ > + struct vubr_device *dev = (struct vubr_device *)ctx; > + int conn_fd; > + struct sockaddr_un un; > + socklen_t len = sizeof(un); > + > + if ((conn_fd = accept(sock, (struct sockaddr *) &un, &len)) == -1) { > + perror("accept"); > + exit(1); > + } > + > + printf("DEBUG: Got connection from remote peer on sock %d\n", conn_fd); above within ifdef as well? > + dispatcher_add(&dev->dispatcher, conn_fd, ctx, vubr_device_receive_cb); > +} > + > +struct vubr_device * > +vubr_device_new(char *path) > +{ > + struct vubr_device *dev = > + (struct vubr_device *) calloc(1, sizeof(struct vubr_device)); > + > + dev->nregions = 0; > + > + int i; > + for (i = 0; i < MAX_NR_VIRTQUEUE; i++) > + dev->virtqueue[i] = (struct virtqueue) { > + .call_fd = -1, .kick_fd = -1, > + .size = 0, > + .last_avail_index = 0, .last_used_index = 0, > + .desc = 0, .avail = 0, .used = 0, > + }; > + > + /* Get a UNIX socket. */ > + if ((dev->sock = socket(AF_UNIX, SOCK_STREAM, 0)) == -1) { > + perror("socket"); > + exit(1); > + } > + > + struct sockaddr_un un; > + un.sun_family = AF_UNIX; > + strcpy(un.sun_path, path); > + > + size_t len = sizeof(un.sun_family) + strlen(path); > + > + unlink(path); > + > + if (bind(dev->sock, (struct sockaddr *) &un, len) == -1) { > + perror("bind"); > + exit(1); > + } > + > + if (listen(dev->sock, 1) == -1) { > + perror("listen"); > + exit(1); > + } > + > + dispatcher_init(&dev->dispatcher); > + dispatcher_add(&dev->dispatcher, dev->sock, (void*) dev, vubr_device_accept_cb); > + > + printf("Waiting for connections on UNIX socket %s ...\n", path); > + return dev; > +} > + > +void > +vubr_device_backend_udp_setup(struct vubr_device *dev, > + char *local_host, > + uint16_t local_port, > + char *dest_host, > + uint16_t dest_port) > +{ > + > + struct sockaddr_in si_local; > + int sock; > + > + if ((sock = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP)) == -1) > + die("socket"); > + > + memset((char *) &si_local, 0, sizeof(struct sockaddr_in)); > + si_local.sin_family = AF_INET; > + si_local.sin_port = htons(local_port); > + if(inet_aton(local_host, &si_local.sin_addr) == 0) { > + fprintf(stderr, "inet_aton() failed.\n"); > + exit(1); > + } > + > + if( bind(sock, (struct sockaddr*)&si_local, sizeof(si_local) ) == -1) > + die("bind"); > + > + /* setup destination for sends */ > + struct sockaddr_in *si_remote = &dev->backend_udp_dest; > + memset((char *) si_remote, 0, sizeof(struct sockaddr_in)); > + si_remote->sin_family = AF_INET; > + si_remote->sin_port = htons(dest_port); > + if(inet_aton(dest_host, &si_remote->sin_addr) == 0) { > + fprintf(stderr, "inet_aton() failed.\n"); > + exit(1); > + } > + > + dev->backend_udp_sock = sock; > + dispatcher_add(&dev->dispatcher, sock, dev, _backend_recv_cb); > + printf("Waiting for data from udp backend on %s:%d...\n", local_host, local_port); > +} > + > +static void > +vubr_device_backend_udp_sendbuf(struct vubr_device *dev, uint8_t *buf, size_t len) > +{ > + int slen = sizeof(struct sockaddr_in); > + > + if (sendto(dev->backend_udp_sock, buf, len, 0, (struct sockaddr *) &dev->backend_udp_dest, slen) == -1) > + die("sendto()"); > +} > + > +static int > +vubr_device_backend_udp_recvbuf(struct vubr_device *dev, uint8_t *buf, size_t buflen) > +{ > + int slen = sizeof(struct sockaddr_in); > + int rc; > + > + if ((rc = recvfrom(dev->backend_udp_sock, buf, buflen, 0, > + (struct sockaddr *) &dev->backend_udp_dest, > + (socklen_t *)&slen)) == -1) > + die("recvfrom()"); > + > + return rc; > +} > + > +void > +vubr_device_run(struct vubr_device * dev) > +{ > + while (1) { > + /* timeout 200ms */ > + dispatcher_wait(&dev->dispatcher, 200000); > + /* Here one can try polling strategy. */ > + } > +} > diff --git a/tests/vubr/Makefile b/tests/vubr/Makefile > new file mode 100644 > index 0000000..c3400fb > --- /dev/null > +++ b/tests/vubr/Makefile > @@ -0,0 +1,15 @@ > +SRCS=dispatcher.c vhost_user.c vubr_device.c main.c > +INCLUDES+=vhost.h virtio_ring.h virtio_net.h > +INCLUDES+=vubr_config.h vhost_user.h virtqueue.h > +INCLUDES+=dispatcher.h vubr_device.h > + > +EXE=vubr > +CFLAGS += -m64 -Wall -Werror -g > + > +all: $(EXE) > + > +$(EXE): $(SRCS) $(INCLUDES) > + $(CC) $(CFLAGS) $(SRCS) -o $@ > + > +clean: > + rm -f $(EXE) > -- Probably just add this as part of tests/Makefile > --Victor
diff --git a/tests/vubr/dispatcher.h b/tests/vubr/dispatcher.h new file mode 100644 index 0000000..cd02f07 --- /dev/null +++ b/tests/vubr/dispatcher.h @@ -0,0 +1,26 @@ +#ifndef __DISPATCHER__ +#define __DISPATCHER__ + +#include <stddef.h> +#include <stdint.h> +#include <sys/select.h> + +typedef void (*callback_func)(int sock, void *ctx); + +struct event { + void *ctx; + callback_func callback; +}; + +struct dispatcher { + int max_sock; + fd_set fdset; + struct event events[FD_SETSIZE]; +}; + +int dispatcher_init(struct dispatcher *d); +int dispatcher_add(struct dispatcher *d, int sock, void *ctx, callback_func cb); +int dispatcher_remove(struct dispatcher *d, int sock); +int dispatcher_wait(struct dispatcher *d, uint32_t timeout); + +#endif /* __DISPATCHER__ */ diff --git a/tests/vubr/vhost.h b/tests/vubr/vhost.h new file mode 100644 index 0000000..3960cc2 --- /dev/null +++ b/tests/vubr/vhost.h @@ -0,0 +1,77 @@ +#ifndef __VHOST_H__ +#define __VHOST_H__ + +#include <inttypes.h> + +/* Most imported form qemu/linux-headers/linux/vhost.h + * + * Userspace interface for virtio structures. */ + +struct vhost_vring_state { + unsigned int index; + unsigned int num; +}; + +struct vhost_vring_file { + unsigned int index; + int fd; /* Pass -1 to unbind from file. */ +}; + +struct vhost_vring_addr { + unsigned int index; + /* Option flags. */ + unsigned int flags; + /* Flag values: */ + /* Whether log address is valid. If set enables logging. */ +#define VHOST_VRING_F_LOG 0 + + /* Start of array of descriptors (virtually contiguous) */ + uint64_t desc_user_addr; + /* Used structure address. Must be 32 bit aligned */ + uint64_t used_user_addr; + /* Available structure address. Must be 16 bit aligned */ + uint64_t avail_user_addr; + /* Logging support. */ + /* Log writes to used structure, at offset calculated from specified + * address. Address must be 32 bit aligned. */ + uint64_t log_guest_addr; +}; + +#define VHOST_MEMORY_MAX_NREGIONS (8) + +struct vhost_memory_region { + uint64_t guest_phys_addr; + uint64_t memory_size; /* bytes */ + uint64_t userspace_addr; + uint64_t mmap_offset; +}; + +struct vhost_memory { + uint32_t nregions; + uint32_t padding; + struct vhost_memory_region regions[VHOST_MEMORY_MAX_NREGIONS]; +}; + +/* Feature bits */ +/* Log all write descriptors. Can be changed while device is active. */ +#define VHOST_F_LOG_ALL 26 +/* vhost-net should add virtio_net_hdr for RX, and strip for TX packets. */ +#define VHOST_NET_F_VIRTIO_NET_HDR 27 + +struct virtio_net_hdr { +#define VIRTIO_NET_HDR_F_NEEDS_CSUM 1 + uint8_t flags; +#define VIRTIO_NET_HDR_GSO_NONE 0 +#define VIRTIO_NET_HDR_GSO_TCPV4 1 +#define VIRTIO_NET_HDR_GSO_UDP 3 +#define VIRTIO_NET_HDR_GSO_TCPV6 4 +#define VIRTIO_NET_HDR_GSO_ECN 0x80 + uint8_t gso_type; + uint16_t hdr_len; + uint16_t gso_size; + uint16_t csum_start; + uint16_t csum_offset; + uint16_t num_buffers; +}; + +#endif /* __VHOST_H__ */ diff --git a/tests/vubr/vhost_user.h b/tests/vubr/vhost_user.h new file mode 100644 index 0000000..44d82d0 --- /dev/null +++ b/tests/vubr/vhost_user.h @@ -0,0 +1,70 @@ +#ifndef __VHOST_USER_H__ +#define __VHOST_USER_H__ + +/* Based on qemu/hw/virtio/vhost-user.c */ + +#include <stdint.h> +#include <stddef.h> +#include "vhost.h" + +#define VHOST_USER_F_PROTOCOL_FEATURES 30 +#define VHOST_USER_PROTOCOL_FEATURE_MASK 0x1ULL + +#define VHOST_USER_PROTOCOL_F_MQ 0 + +#define VHOST_USER_REQUEST_LIST \ + INFO(NONE, 0) \ + INFO(GET_FEATURES, 1) \ + INFO(SET_FEATURES, 2) \ + INFO(SET_OWNER, 3) \ + INFO(RESET_DEVICE, 4) \ + INFO(SET_MEM_TABLE, 5) \ + INFO(SET_LOG_BASE, 6) \ + INFO(SET_LOG_FD, 7) \ + INFO(SET_VRING_NUM, 8) \ + INFO(SET_VRING_ADDR, 9) \ + INFO(SET_VRING_BASE, 10) \ + INFO(GET_VRING_BASE, 11) \ + INFO(SET_VRING_KICK, 12) \ + INFO(SET_VRING_CALL, 13) \ + INFO(SET_VRING_ERR, 14) \ + INFO(GET_PROTOCOL_FEATURES, 15) \ + INFO(SET_PROTOCOL_FEATURES, 16) \ + INFO(GET_QUEUE_NUM, 17) \ + INFO(SET_VRING_ENABLE, 18) + +enum vhost_user_request { +#define INFO(a, b) VHOST_USER_ ## a = b, + VHOST_USER_REQUEST_LIST +#undef INFO + VHOST_USER_MAX +}; + +struct vhost_user_message { + enum vhost_user_request request; + +#define VHOST_USER_VERSION_MASK (0x3) +#define VHOST_USER_REPLY_MASK (0x1<<2) + uint32_t flags; + uint32_t size; /* the following payload size */ + union { +#define VHOST_USER_VRING_IDX_MASK (0xff) +#define VHOST_USER_VRING_NOFD_MASK (0x1<<8) + uint64_t u64; + struct vhost_vring_state state; + struct vhost_vring_addr addr; + struct vhost_memory memory; + } payload; + int fds[VHOST_MEMORY_MAX_NREGIONS]; + int fd_num; +} __attribute__((packed)); + +#define VHOST_USER_HDR_SIZE offsetof(struct vhost_user_message, payload.u64) + +/* The version of the protocol we support */ +#define VHOST_USER_VERSION (0x1) + +void vhost_user_message_read(int conn_fd, struct vhost_user_message *vmsg); +void vhost_user_message_write(int conn_fd, struct vhost_user_message *vmsg); + +#endif /* __VHOST_USER_H__ */ diff --git a/tests/vubr/virtio_net.h b/tests/vubr/virtio_net.h new file mode 100644 index 0000000..f6f87b1 --- /dev/null +++ b/tests/vubr/virtio_net.h @@ -0,0 +1,38 @@ +#ifndef __VIRTIO_NET_H__ +#define __VIRTIO_NET_H__ + +/* Form qemu/include/standard-headers/linux/virtio_net.h */ + +/* The feature bitmap for virtio net */ +#define VIRTIO_NET_F_CSUM 0 /* Host handles pkts w/ partial csum */ +#define VIRTIO_NET_F_GUEST_CSUM 1 /* Guest handles pkts w/ partial csum */ +#define VIRTIO_NET_F_CTRL_GUEST_OFFLOADS 2 /* Dynamic offload configuration. */ +#define VIRTIO_NET_F_MAC 5 /* Host has given MAC address. */ +#define VIRTIO_NET_F_GUEST_TSO4 7 /* Guest can handle TSOv4 in. */ +#define VIRTIO_NET_F_GUEST_TSO6 8 /* Guest can handle TSOv6 in. */ +#define VIRTIO_NET_F_GUEST_ECN 9 /* Guest can handle TSO[6] w/ ECN in. */ +#define VIRTIO_NET_F_GUEST_UFO 10 /* Guest can handle UFO in. */ +#define VIRTIO_NET_F_HOST_TSO4 11 /* Host can handle TSOv4 in. */ +#define VIRTIO_NET_F_HOST_TSO6 12 /* Host can handle TSOv6 in. */ +#define VIRTIO_NET_F_HOST_ECN 13 /* Host can handle TSO[6] w/ ECN in. */ +#define VIRTIO_NET_F_HOST_UFO 14 /* Host can handle UFO in. */ +#define VIRTIO_NET_F_MRG_RXBUF 15 /* Host can merge receive buffers. */ +#define VIRTIO_NET_F_STATUS 16 /* virtio_net_config.status available */ +#define VIRTIO_NET_F_CTRL_VQ 17 /* Control channel available */ +#define VIRTIO_NET_F_CTRL_RX 18 /* Control channel RX mode support */ +#define VIRTIO_NET_F_CTRL_VLAN 19 /* Control channel VLAN filtering */ +#define VIRTIO_NET_F_CTRL_RX_EXTRA 20 /* Extra RX mode control support */ +#define VIRTIO_NET_F_GUEST_ANNOUNCE 21 /* Guest can announce device on the + * network */ +#define VIRTIO_NET_F_MQ 22 /* Device supports Receive Flow + * Steering */ +#define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */ + +#ifndef VIRTIO_NET_NO_LEGACY +#define VIRTIO_NET_F_GSO 6 /* Host handles pkts w/ any GSO type */ +#endif /* VIRTIO_NET_NO_LEGACY */ + +#define VIRTIO_NET_S_LINK_UP 1 /* Link is up */ +#define VIRTIO_NET_S_ANNOUNCE 2 /* Announcement is needed */ + +#endif /* __VIRTIO_NET_H__ */ diff --git a/tests/vubr/virtio_ring.h b/tests/vubr/virtio_ring.h new file mode 100644 index 0000000..e2d0adb --- /dev/null +++ b/tests/vubr/virtio_ring.h @@ -0,0 +1,103 @@ +#ifndef VIRTQUEUE_H +#define VIRTQUEUE_H +/* + * + * Virtual I/O Device (VIRTIO) Version 1.0 + * Committee Specification 03 + * 02 August 2015 + * Copyright (c) OASIS Open 2015. All Rights Reserved. + * Source: http://docs.oasis-open.org/virtio/virtio/v1.0/cs03/listings/ + * + */ +#include <stdint.h> + +typedef uint64_t le64; +typedef uint32_t le32; +typedef uint16_t le16; + +/* This marks a buffer as continuing via the next field. */ +#define VIRTQ_DESC_F_NEXT 1 +/* This marks a buffer as write-only (otherwise read-only). */ +#define VIRTQ_DESC_F_WRITE 2 +/* This means the buffer contains a list of buffer descriptors. */ +#define VIRTQ_DESC_F_INDIRECT 4 + +/* The device uses this in used->flags to advise the driver: don't kick me + * when you add a buffer. It's unreliable, so it's simply an + * optimization. */ +#define VIRTQ_USED_F_NO_NOTIFY 1 +/* The driver uses this in avail->flags to advise the device: don't + * interrupt me when you consume a buffer. It's unreliable, so it's + * simply an optimization. */ +#define VIRTQ_AVAIL_F_NO_INTERRUPT 1 + +/* Support for indirect descriptors */ +#define VIRTIO_F_INDIRECT_DESC 28 + +/* Support for avail_event and used_event fields */ +#define VIRTIO_F_EVENT_IDX 29 + +/* Arbitrary descriptor layouts. */ +#define VIRTIO_F_ANY_LAYOUT 27 + +/* Virtqueue descriptors: 16 bytes. + * These can chain together via "next". */ +struct virtq_desc { + /* Address (guest-physical). */ + le64 addr; + /* Length. */ + le32 len; + /* The flags as indicated above. */ + le16 flags; + /* We chain unused descriptors via this, too */ + le16 next; +}; + +struct virtq_avail { + le16 flags; + le16 idx; + le16 ring[]; + /* Only if VIRTIO_F_EVENT_IDX: le16 used_event; */ +}; + +/* le32 is used here for ids for padding reasons. */ +struct virtq_used_elem { + /* Index of start of used descriptor chain. */ + le32 id; + /* Total length of the descriptor chain which was written to. */ + le32 len; +}; + +struct virtq_used { + le16 flags; + le16 idx; + struct virtq_used_elem ring[]; + /* Only if VIRTIO_F_EVENT_IDX: le16 avail_event; */ +}; + +struct virtq { + unsigned int num; + + struct virtq_desc *desc; + struct virtq_avail *avail; + struct virtq_used *used; +}; + +static inline int virtq_need_event(uint16_t event_idx, uint16_t new_idx, uint16_t old_idx) +{ + return (uint16_t)(new_idx - event_idx - 1) < (uint16_t)(new_idx - old_idx); +} + +/* Get location of event indices (only with VIRTIO_F_EVENT_IDX) */ +static inline le16 *virtq_used_event(struct virtq *vq) +{ + /* For backwards compat, used event index is at *end* of avail ring. */ + return &vq->avail->ring[vq->num]; +} + +static inline le16 *virtq_avail_event(struct virtq *vq) +{ + /* For backwards compat, avail event index is at *end* of used ring. */ + return (le16 *)&vq->used->ring[vq->num]; +} +#endif /* VIRTQUEUE_H */ diff --git a/tests/vubr/virtqueue.h b/tests/vubr/virtqueue.h new file mode 100644 index 0000000..b018cca --- /dev/null +++ b/tests/vubr/virtqueue.h @@ -0,0 +1,17 @@ +#ifndef __VIRTQUEUE__ +#define __VIRTQUEUE__ + +#include "virtio_ring.h" + +struct virtqueue { + int call_fd; + int kick_fd; + uint32_t size; + uint16_t last_avail_index; + uint16_t last_used_index; + struct virtq_desc* desc; + struct virtq_avail* avail; + struct virtq_used* used; +}; + +#endif /* __VIRTQUEUE__ */ diff --git a/tests/vubr/vubr_config.h b/tests/vubr/vubr_config.h new file mode 100644 index 0000000..19681d0 --- /dev/null +++ b/tests/vubr/vubr_config.h @@ -0,0 +1,7 @@ +#ifndef __VHU_CONFIG__ +#define __VHU_CONFIG__ + +#define VHOST_USER_SHOW_MGMT_TRAFFIC +#define VHOST_USER_SHOW_NET_TRAFFIC + +#endif /* __VHU_CONFIG__ */ diff --git a/tests/vubr/vubr_device.h b/tests/vubr/vubr_device.h new file mode 100644 index 0000000..04a0ecb --- /dev/null +++ b/tests/vubr/vubr_device.h @@ -0,0 +1,41 @@ +#ifndef __VHU_DEVICE__ +#define __VHU_DEVICE__ + +#include <arpa/inet.h> +#include <sys/socket.h> + +#include "vhost.h" +#include "virtqueue.h" +#include "dispatcher.h" + +#define MAX_NR_VIRTQUEUE (8) + +struct vubr_device_region { + /* Guest Phhysical address. */ + uint64_t gpa; + /* Memory region size. */ + uint64_t size; + /* QEMU virtual address (userspace). */ + uint64_t qva; + /* Starting offset in our mmaped space. */ + uint64_t mmap_offset; + /* Start addrtess of mmaped space. */ + uint64_t mmap_addr; +}; + +struct vubr_device { + int sock; + struct dispatcher dispatcher; + uint32_t nregions; + struct vubr_device_region regions[VHOST_MEMORY_MAX_NREGIONS]; + struct virtqueue virtqueue[MAX_NR_VIRTQUEUE]; + int backend_udp_sock; + struct sockaddr_in backend_udp_dest; +}; + +struct vubr_device *vubr_device_new(char *path); +void vubr_device_run(struct vubr_device * dev); +void vubr_device_backend_udp_setup(struct vubr_device *dev, char *local_host, + uint16_t local_port, char *dest_host, uint16_t dest_port); + +#endif /* __VHU_DEVICE__ */ diff --git a/tests/vubr/dispatcher.c b/tests/vubr/dispatcher.c new file mode 100644 index 0000000..62d386a --- /dev/null +++ b/tests/vubr/dispatcher.c @@ -0,0 +1,77 @@ +#include <stdio.h> +#include <sys/select.h> + +#include "vubr_config.h" +#include "dispatcher.h" + +int +dispatcher_init(struct dispatcher *d) +{ + FD_ZERO(&d->fdset); + d->max_sock = -1; + return 0; +} + +int +dispatcher_add(struct dispatcher *d, int sock, void *ctx, callback_func cb) +{ + if (sock >= FD_SETSIZE) { + fprintf(stderr, "Error: Failed to add new event. sock %d should be less than %d\n", + sock, FD_SETSIZE); + return -1; + } + + d->events[sock].ctx = ctx; + d->events[sock].callback = cb; + + FD_SET(sock, &d->fdset); + if (sock > d->max_sock) + d->max_sock = sock; + printf("DEBUG: Added sock %d for watching. max_sock: %d\n", sock, d->max_sock); + return 0; +} + +int +dispatcher_remove(struct dispatcher *d, int sock) +{ + if (sock >= FD_SETSIZE) { + fprintf(stderr, "Error: Failed to remove event. sock %d should be less than %d\n", + sock, FD_SETSIZE); + return -1; + } + + FD_CLR(sock, &d->fdset); + return 0; +} + +/* timeout in us */ +int +dispatcher_wait(struct dispatcher *d, uint32_t timeout) +{ + struct timeval tv; + tv.tv_sec = timeout / 1000000; + tv.tv_usec = timeout % 1000000; + + fd_set fdset = d->fdset; + + /* wait until some of sockets become readable. */ + int rc = select(d->max_sock + 1, &fdset, 0, 0, &tv); + + if (rc == -1) + perror("select"); + + /* Timeout */ + if (rc == 0) + return 0; + + /* Now call callback for every ready socket. */ + + int sock; + for (sock = 0; sock < d->max_sock + 1; sock++) + if (FD_ISSET(sock, &fdset)) { + struct event *e = &d->events[sock]; + e->callback(sock, e->ctx); + } + + return 0; +} diff --git a/tests/vubr/main.c b/tests/vubr/main.c new file mode 100644 index 0000000..a7a3e9d --- /dev/null +++ b/tests/vubr/main.c @@ -0,0 +1,18 @@ +#include "vubr_config.h" +#include "vubr_device.h" + +int main(int argc, char* argv[]) +{ + struct vubr_device *dev; + + if((dev = vubr_device_new("/tmp/vubr.sock"))) { + vubr_device_backend_udp_setup(dev, + "127.0.0.1", 4444, + "127.0.0.1", 5555); + + vubr_device_run(dev); + return 0; + } + else + return 1; +} diff --git a/tests/vubr/vhost_user.c b/tests/vubr/vhost_user.c new file mode 100644 index 0000000..91ab09e --- /dev/null +++ b/tests/vubr/vhost_user.c @@ -0,0 +1,83 @@ +#include <sys/types.h> +#include <sys/socket.h> +#include <string.h> +#include <stdio.h> +#include <stdlib.h> +#include <assert.h> +#include <unistd.h> +#include <errno.h> + +#include "vubr_config.h" +#include "vhost_user.h" + +void +vhost_user_message_read(int conn_fd, struct vhost_user_message *vmsg) +{ + int rc; + struct msghdr msg = {}; + struct iovec iov; + size_t fd_size = VHOST_MEMORY_MAX_NREGIONS * sizeof(int); + char control[CMSG_SPACE(fd_size)]; + memset(control, 0, sizeof(control)); + + iov.iov_base = (char *)vmsg; + iov.iov_len = VHOST_USER_HDR_SIZE; + + msg.msg_iov = &iov; + msg.msg_iovlen = 1; + msg.msg_control = control; + msg.msg_controllen = sizeof(control); + + rc = recvmsg(conn_fd, &msg, 0); + + if (rc <= 0) { + perror("recvmsg"); + exit(1); + } + + vmsg->fd_num = 0; + struct cmsghdr *cmsg; + for (cmsg = CMSG_FIRSTHDR(&msg); + cmsg != NULL; + cmsg = CMSG_NXTHDR(&msg, cmsg)) + { + if ((cmsg->cmsg_level == SOL_SOCKET) && + (cmsg->cmsg_type == SCM_RIGHTS)) + { + fd_size = cmsg->cmsg_len - CMSG_LEN(0); + vmsg->fd_num = fd_size / sizeof(int); + memcpy(vmsg->fds, CMSG_DATA(cmsg), fd_size); + break; + } + } + + if (vmsg->size > sizeof(vmsg->payload)) { + fprintf(stderr, "Error: too big message request: %d, size: vmsg->size: %u, while sizeof(vmsg->payload) = %lu\n", + vmsg->request, vmsg->size, sizeof(vmsg->payload)); + exit(1); + } + + if (vmsg->size) { + rc = read(conn_fd, &vmsg->payload, vmsg->size); + if (rc <= 0) { + perror("recvmsg"); + exit(1); + } + + assert(rc == vmsg->size); + } +} + +void +vhost_user_message_write(int conn_fd, struct vhost_user_message *vmsg) +{ + int rc; + do { + rc = write(conn_fd, vmsg, VHOST_USER_HDR_SIZE + vmsg->size); + } while (rc < 0 && errno == EINTR); + + if (rc < 0) { + perror("write"); + exit(1); + } +} diff --git a/tests/vubr/vubr_device.c b/tests/vubr/vubr_device.c new file mode 100644 index 0000000..a296aef --- /dev/null +++ b/tests/vubr/vubr_device.c @@ -0,0 +1,773 @@ +#include <assert.h> +#include <stdio.h> +#include <stdlib.h> +#include <inttypes.h> +#include <string.h> +#include <unistd.h> +#include <errno.h> +#include <sys/types.h> +#include <sys/socket.h> +#include <sys/un.h> +#include <sys/unistd.h> +#include <sys/mman.h> +#include <sys/eventfd.h> + +#include "vubr_config.h" +#include "vhost_user.h" +#include "virtio_net.h" +#include "vubr_device.h" +#include "virtqueue.h" + +#ifdef VHOST_USER_SHOW_MGMT_TRAFFIC +static char *vhost_user_request_str[] = { +#define INFO(name,num) \ + [num] = #name, +VHOST_USER_REQUEST_LIST +#undef INFO +}; +#endif /* VHOST_USER_SHOW_MGMT_TRAFFIC */ + +void +die(char *s) +{ + perror(s); + exit(1); +} + +static void +print_buffer(uint8_t* buf, size_t len) +{ + int i; + printf("raw buffer:\n"); + for(i = 0; i < len; i++) { + if ((i % 16) == 0) + printf("\n"); + if ((i % 4) == 0) + printf(" "); + printf("%02x ", buf[i]); + } + printf("\n............................................................\n"); +} + +/* Translate guest physical address to our virtual address. */ +static uint64_t __attribute__((unused)) +gpa_to_va(struct vubr_device *dev, uint64_t guest_addr) +{ + int i; + /* Find matching memory region. */ + + for (i = 0; i < dev->nregions; i++) { + struct vubr_device_region *r = &dev->regions[i]; + + if ((guest_addr >= r->gpa) && (guest_addr < (r->gpa + r->size))) + return (guest_addr - r->gpa + r->mmap_addr + r->mmap_offset); + } + + assert(!"address not found in regions"); + return 0; +} + +/* Translate qemu virtual address to our virtual address. */ +static uint64_t +qva_to_va(struct vubr_device *dev, uint64_t qemu_addr) +{ + int i; + /* Find matching memory region. */ + + for (i = 0; i < dev->nregions; i++) { + struct vubr_device_region *r = &dev->regions[i]; + + if ((qemu_addr >= r->qva) && (qemu_addr < (r->qva + r->size))) + return (qemu_addr - r->qva + r->mmap_addr + r->mmap_offset); + } + + assert(!"address not found in regions"); + return 0; +} + +static void vubr_device_backend_udp_sendbuf(struct vubr_device *dev, uint8_t *buf, size_t len); + +static void +_consume_raw_packet(struct vubr_device *dev, uint8_t *buf, uint32_t len) +{ + int hdrlen = sizeof(struct virtio_net_hdr); + +#ifdef VHOST_USER_SHOW_NET_TRAFFIC + print_buffer(buf, len); +#endif + vubr_device_backend_udp_sendbuf(dev, buf + hdrlen, len - hdrlen); +} + +/* Kick the guest if necessary. */ +static void +virtqueue_kick(struct virtqueue *vq) +{ + if (!(vq->avail->flags & VIRTQ_AVAIL_F_NO_INTERRUPT)) { + printf("Kicking the guest...\n"); + eventfd_write(vq->call_fd, 1); + } +} + +static void +_post_buffer(struct vubr_device *dev, struct virtqueue *vq, uint8_t *buf, int32_t len) +{ + struct virtq_desc* desc = vq->desc; + struct virtq_avail* avail = vq->avail; + struct virtq_used* used = vq->used; + + unsigned int size = vq->size; + + uint16_t a_index = vq->last_avail_index % size; + uint16_t u_index = vq->last_used_index % size; + uint16_t d_index = avail->ring[a_index]; + + int i = d_index; + +#ifdef VHOST_USER_SHOW_NET_TRAFFIC + printf("Posting the packet to guest on vq:\n"); + printf(" size = %d\n", vq->size); + printf(" last_avail_index = %d\n", vq->last_avail_index); + printf(" last_used_index = %d\n", vq->last_used_index); + printf(" a_index = %d\n", a_index); + printf(" u_index = %d\n", u_index); + printf(" d_index = %d\n", d_index); + printf(" desc[%d].addr = 0x%016"PRIx64"\n", i, desc[i].addr); + printf(" desc[%d].len = %d\n", i, desc[i].len); + printf(" desc[%d].flags = %d\n", i, desc[i].flags); + printf(" avail->idx = %d\n", avail->idx); + printf(" used->idx = %d\n", used->idx); +#endif + + if (!(desc[i].flags & VIRTQ_DESC_F_WRITE)) { + // FIXME: we should find writable descriptor + fprintf(stderr, "descriptor is not writable. exiting.\n"); + exit(1); + } + + void *chunk_start = (void *)gpa_to_va(dev, desc[i].addr); + uint32_t chunk_len = desc[i].len; + + if (len <= chunk_len) { + memcpy(chunk_start, buf, len); + } else { + fprintf(stderr, "received too long packet from the backend. dropping...\n"); + return; + } + + /* Add descriptor to the used ring. */ + used->ring[u_index].id = d_index; + used->ring[u_index].len = len; + + vq->last_avail_index++; + vq->last_used_index++; + + used->idx = vq->last_used_index; + + /* Kick the guest if necessary. */ + virtqueue_kick(vq); +} + +static int +_process_desc(struct vubr_device *dev, struct virtqueue *vq) +{ + struct virtq_desc* desc = vq->desc; + struct virtq_avail* avail = vq->avail; + struct virtq_used* used = vq->used; + + unsigned int size = vq->size; + + uint16_t a_index = vq->last_avail_index % size; + uint16_t u_index = vq->last_used_index % size; + uint16_t d_index = avail->ring[a_index]; + + uint32_t i, len = 0; + size_t buf_size = 4096; + uint8_t buf[4096]; + +#ifdef VHOST_USER_SHOW_NET_TRAFFIC + printf("chunks: "); +#endif + + i = d_index; + do { + void *chunk_start = (void *)gpa_to_va(dev, desc[i].addr); + uint32_t chunk_len = desc[i].len; + + if (len + chunk_len < buf_size) { + memcpy(buf + len, chunk_start, chunk_len); +#ifdef VHOST_USER_SHOW_NET_TRAFFIC + printf("%d ", chunk_len); +#endif + } else { + fprintf(stderr, "too long packet. dropping...\n"); + break; + } + + len += chunk_len; + + if (!(desc[i].flags & VIRTQ_DESC_F_NEXT)) + break; + + i = desc[i].next; + } while(1); + + if (!len) + return -1; + + /* Add descriptor to the used ring. */ + used->ring[u_index].id = d_index; + used->ring[u_index].len = len; + +#ifdef VHOST_USER_SHOW_NET_TRAFFIC + printf("\n"); +#endif + + _consume_raw_packet(dev, buf, len); + + return 0; +} + +static void +_process_avail(struct vubr_device *dev, struct virtqueue *vq) +{ + struct virtq_avail *avail = vq->avail; + struct virtq_used* used = vq->used; + + while(vq->last_avail_index != avail->idx) { + _process_desc(dev, vq); + vq->last_avail_index++; + vq->last_used_index++; + } + + used->idx = vq->last_used_index; +} + +static int vubr_device_backend_udp_recvbuf(struct vubr_device *dev, uint8_t *buf, size_t buflen); + +static void +_backend_recv_cb(int sock, void *ctx) +{ + printf("\n\n *** IN UDP RECEIVE CALLBACK ***\n\n"); + struct vubr_device *dev = (struct vubr_device *) ctx; + struct virtqueue *rx_vq = &dev->virtqueue[0]; +#define BUFLEN 4096 + uint8_t buf[BUFLEN]; + int len; + struct virtio_net_hdr *hdr = (struct virtio_net_hdr *)buf; + int hdrlen = sizeof(struct virtio_net_hdr); + + *hdr = (struct virtio_net_hdr) {}; + hdr->num_buffers = 1; + + len = vubr_device_backend_udp_recvbuf(dev, buf + hdrlen, BUFLEN - hdrlen); + _post_buffer(dev, rx_vq, buf, len + hdrlen); +#undef BUFLEN +} + +static void +_kick_cb(int sock, void *ctx) +{ + struct vubr_device *dev = (struct vubr_device *) ctx; + eventfd_t kick_data; + ssize_t rc; + + rc = eventfd_read(sock, &kick_data); + + if (rc == -1) { + perror("read kick"); + exit(1); + } else { +#ifdef VHOST_USER_SHOW_MGMT_TRAFFIC + printf("Got kick_data: %016"PRIx64"\n", kick_data); +#endif + _process_avail(dev, &dev->virtqueue[1]); + } +} + +static int +_execute_NONE(struct vubr_device *dev, struct vhost_user_message *vmsg) +{ + printf("function %s() not implemented yet.\n", __FUNCTION__); + return 0; +} + +static int +_execute_GET_FEATURES(struct vubr_device *dev, struct vhost_user_message *vmsg) +{ + vmsg->payload.u64 = + ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | + (1ULL << VIRTIO_NET_F_CTRL_VQ) | + (1ULL << VIRTIO_NET_F_CTRL_RX) | + (1ULL << VHOST_F_LOG_ALL)); + vmsg->size = sizeof(vmsg->payload.u64); + +#ifdef VHOST_USER_SHOW_MGMT_TRAFFIC + printf("returing u64: 0x%016"PRIx64"\n", vmsg->payload.u64); +#endif + + /* reply */ + return 1; +} + +static int +_execute_SET_FEATURES(struct vubr_device *dev, struct vhost_user_message *vmsg) +{ +#ifdef VHOST_USER_SHOW_MGMT_TRAFFIC + printf("u64: 0x%016"PRIx64"\n", vmsg->payload.u64); +#endif + return 0; +} + +static int +_execute_SET_OWNER(struct vubr_device *dev, struct vhost_user_message *vmsg) +{ + printf("function %s() not implemented yet.\n", __FUNCTION__); + return 0; +} + +static int +_execute_RESET_DEVICE(struct vubr_device *dev, struct vhost_user_message *vmsg) +{ + printf("function %s() not implemented yet.\n", __FUNCTION__); + return 0; +} + +static int +_execute_SET_MEM_TABLE(struct vubr_device *dev, struct vhost_user_message *vmsg) +{ + printf("Nregions: %d\n", vmsg->payload.memory.nregions); + + struct vhost_memory *memory = &vmsg->payload.memory; + dev->nregions = memory->nregions; + int i; + for (i = 0; i < dev->nregions; i++) { + struct vhost_memory_region *msg_region = &memory->regions[i]; + struct vubr_device_region *dev_region = &dev->regions[i]; + +#ifdef VHOST_USER_SHOW_MGMT_TRAFFIC + printf("Region %d\n", i); + printf(" guest_phys_addr: 0x%016"PRIx64"\n", msg_region->guest_phys_addr); + printf(" memory_size: 0x%016"PRIx64"\n", msg_region->memory_size); + printf(" userspace_addr 0x%016"PRIx64"\n", msg_region->userspace_addr); + printf(" mmap_offset 0x%016"PRIx64"\n", msg_region->mmap_offset); +#endif + + dev_region->gpa = msg_region->guest_phys_addr; + dev_region->size = msg_region->memory_size; + dev_region->qva = msg_region->userspace_addr; + dev_region->mmap_offset = msg_region->mmap_offset; + + void *mmap_addr; + + /* We don't use offset argument of mmap() since the + * mapped address has to be page aligned, and we use huge + * pages. */ + mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset, + PROT_READ | PROT_WRITE, MAP_SHARED, + vmsg->fds[i], 0); + + if (mmap_addr == MAP_FAILED) { + perror("mmap"); + exit (1); + } + + dev_region->mmap_addr = (uint64_t) mmap_addr; + +#ifdef VHOST_USER_SHOW_MGMT_TRAFFIC + printf(" mmap_addr: 0x%016"PRIx64"\n", dev_region->mmap_addr); +#endif + } + + return 0; +} + +static int +_execute_SET_LOG_BASE(struct vubr_device *dev, struct vhost_user_message *vmsg) +{ + printf("function %s() not implemented yet.\n", __FUNCTION__); + return 0; +} + +static int +_execute_SET_LOG_FD(struct vubr_device *dev, struct vhost_user_message *vmsg) +{ + printf("function %s() not implemented yet.\n", __FUNCTION__); + return 0; +} + +static int +_execute_SET_VRING_NUM(struct vubr_device *dev, struct vhost_user_message *vmsg) +{ + unsigned int index = vmsg->payload.state.index; + unsigned int num = vmsg->payload.state.num; + +#ifdef VHOST_USER_SHOW_MGMT_TRAFFIC + printf("state.index: %d\n", index); + printf("state.num: %d\n", num); +#endif + dev->virtqueue[index].size = num; + return 0; +} + +static int +_execute_SET_VRING_ADDR(struct vubr_device *dev, struct vhost_user_message *vmsg) +{ + struct vhost_vring_addr *vra = &vmsg->payload.addr; +#ifdef VHOST_USER_SHOW_MGMT_TRAFFIC + printf("vhost_vring_addr:\n"); + printf(" index: %d\n", vra->index); + printf(" flags: %d\n", vra->flags); + printf(" desc_user_addr: 0x%016"PRIx64"\n", vra->desc_user_addr); + printf(" used_user_addr: 0x%016"PRIx64"\n", vra->used_user_addr); + printf(" avail_user_addr: 0x%016"PRIx64"\n", vra->avail_user_addr); + printf(" log_guest_addr: 0x%016"PRIx64"\n", vra->log_guest_addr); +#endif + + unsigned int index = vra->index; + struct virtqueue *vq = &dev->virtqueue[index]; + + vq->desc = (struct virtq_desc *)qva_to_va(dev, vra->desc_user_addr); + vq->used = (struct virtq_used *)qva_to_va(dev, vra->used_user_addr); + vq->avail = (struct virtq_avail *)qva_to_va(dev, vra->avail_user_addr); + + printf("Setting virtq addresses:\n"); + printf(" virtq_desc at %p\n", vq->desc); + printf(" virtq_used at %p\n", vq->used); + printf(" virtq_avail at %p\n", vq->avail); + + vq->last_used_index = vq->used->idx; + return 0; +} + +static int +_execute_SET_VRING_BASE(struct vubr_device *dev, struct vhost_user_message *vmsg) +{ + unsigned int index = vmsg->payload.state.index; + unsigned int num = vmsg->payload.state.num; + +#ifdef VHOST_USER_SHOW_MGMT_TRAFFIC + printf("state.index: %d\n", index); + printf("state.num: %d\n", num); +#endif + dev->virtqueue[index].last_avail_index = num; + + return 0; +} + +static int +_execute_GET_VRING_BASE(struct vubr_device *dev, struct vhost_user_message *vmsg) +{ + printf("function %s() not implemented yet.\n", __FUNCTION__); + return 0; +} + +static int +_execute_SET_VRING_KICK(struct vubr_device *dev, struct vhost_user_message *vmsg) +{ +#ifdef VHOST_USER_SHOW_MGMT_TRAFFIC + printf("u64: 0x%016"PRIx64"\n", vmsg->payload.u64); +#endif + + uint64_t u64_arg = vmsg->payload.u64; + int index = u64_arg & VHOST_USER_VRING_IDX_MASK; + + assert((u64_arg & VHOST_USER_VRING_NOFD_MASK) == 0); + assert(vmsg->fd_num == 1); + + dev->virtqueue[index].kick_fd = vmsg->fds[0]; + printf("Got kick_fd: %d for vq: %d\n", vmsg->fds[0], index); + + if ((index % 2 == 1)) { + /* TX queue. */ + dispatcher_add(&dev->dispatcher, dev->virtqueue[index].kick_fd, dev, _kick_cb); + +#ifdef VHOST_USER_SHOW_MGMT_TRAFFIC + printf("Waiting for kicks on fd: %d for vq: %d\n", + dev->virtqueue[index].kick_fd, index); +#endif + } + return 0; +} + +static int +_execute_SET_VRING_CALL(struct vubr_device *dev, struct vhost_user_message *vmsg) +{ +#ifdef VHOST_USER_SHOW_MGMT_TRAFFIC + printf("u64: 0x%016"PRIx64"\n", vmsg->payload.u64); +#endif + + uint64_t u64_arg = vmsg->payload.u64; + int index = u64_arg & VHOST_USER_VRING_IDX_MASK; + + assert((u64_arg & VHOST_USER_VRING_NOFD_MASK) == 0); + assert(vmsg->fd_num == 1); + + dev->virtqueue[index].call_fd = vmsg->fds[0]; + printf("Got call_fd: %d for vq: %d\n", vmsg->fds[0], index); + + return 0; +} + +static int +_execute_SET_VRING_ERR(struct vubr_device *dev, struct vhost_user_message *vmsg) +{ +#ifdef VHOST_USER_SHOW_MGMT_TRAFFIC + printf("u64: 0x%016"PRIx64"\n", vmsg->payload.u64); +#endif + return 0; +} + +static int +_execute_GET_PROTOCOL_FEATURES(struct vubr_device *dev, struct vhost_user_message *vmsg) +{ + /* FIXME: unimplented */ +#ifdef VHOST_USER_SHOW_MGMT_TRAFFIC + printf("u64: 0x%016"PRIx64"\n", vmsg->payload.u64); +#endif + return 0; +} + +static int +_execute_SET_PROTOCOL_FEATURES(struct vubr_device *dev, struct vhost_user_message *vmsg) +{ + /* FIXME: unimplented */ +#ifdef VHOST_USER_SHOW_MGMT_TRAFFIC + printf("u64: 0x%016"PRIx64"\n", vmsg->payload.u64); +#endif + return 0; +} + +static int +_execute_GET_QUEUE_NUM(struct vubr_device *dev, struct vhost_user_message *vmsg) +{ + printf("function %s() not implemented yet.\n", __FUNCTION__); + return 0; +} + +static int +_execute_SET_VRING_ENABLE(struct vubr_device *dev, struct vhost_user_message *vmsg) +{ + printf("function %s() not implemented yet.\n", __FUNCTION__); + return 0; +} + +static int +vubr_device_execute_request(struct vubr_device *dev, struct vhost_user_message *vmsg) +{ +#ifdef VHOST_USER_SHOW_MGMT_TRAFFIC + /* Print out generic part of the request. */ + printf("======================= Vhost user message from QEMU =======================\n"); + printf("Request: %s (%d)\n", vhost_user_request_str[vmsg->request], vmsg->request); + printf("Flags: 0x%x\n", vmsg->flags); + printf("Size: %d\n", vmsg->size); + + if (vmsg->fd_num) { + int i; + printf("Fds:"); + for (i = 0; i < vmsg->fd_num; i++) + printf(" %d", vmsg->fds[i]); + printf("\n"); + } +#endif /* VHOST_USER_SHOW_MGMT_TRAFFIC */ + + switch (vmsg->request) { + case VHOST_USER_NONE: + return _execute_NONE(dev, vmsg); + case VHOST_USER_GET_FEATURES: + return _execute_GET_FEATURES(dev, vmsg); + case VHOST_USER_SET_FEATURES: + return _execute_SET_FEATURES(dev, vmsg); + case VHOST_USER_SET_OWNER: + return _execute_SET_OWNER(dev, vmsg); + case VHOST_USER_RESET_DEVICE: + return _execute_RESET_DEVICE(dev, vmsg); + case VHOST_USER_SET_MEM_TABLE: + return _execute_SET_MEM_TABLE(dev, vmsg); + case VHOST_USER_SET_LOG_BASE: + return _execute_SET_LOG_BASE(dev, vmsg); + case VHOST_USER_SET_LOG_FD: + return _execute_SET_LOG_FD(dev, vmsg); + case VHOST_USER_SET_VRING_NUM: + return _execute_SET_VRING_NUM(dev, vmsg); + case VHOST_USER_SET_VRING_ADDR: + return _execute_SET_VRING_ADDR(dev, vmsg); + case VHOST_USER_SET_VRING_BASE: + return _execute_SET_VRING_BASE(dev, vmsg); + case VHOST_USER_GET_VRING_BASE: + return _execute_GET_VRING_BASE(dev, vmsg); + case VHOST_USER_SET_VRING_KICK: + return _execute_SET_VRING_KICK(dev, vmsg); + case VHOST_USER_SET_VRING_CALL: + return _execute_SET_VRING_CALL(dev, vmsg); + case VHOST_USER_SET_VRING_ERR: + return _execute_SET_VRING_ERR(dev, vmsg); + case VHOST_USER_GET_PROTOCOL_FEATURES: + return _execute_GET_PROTOCOL_FEATURES(dev, vmsg); + case VHOST_USER_SET_PROTOCOL_FEATURES: + return _execute_SET_PROTOCOL_FEATURES(dev, vmsg); + case VHOST_USER_GET_QUEUE_NUM: + return _execute_GET_QUEUE_NUM(dev, vmsg); + case VHOST_USER_SET_VRING_ENABLE: + return _execute_SET_VRING_ENABLE(dev, vmsg); + case VHOST_USER_MAX: + assert(vmsg->request != VHOST_USER_MAX); + } + return 0; +} + +static void +vubr_device_receive_cb(int sock, void *ctx) +{ + struct vubr_device *dev = (struct vubr_device *) ctx; + struct vhost_user_message vmsg; + + vhost_user_message_read(sock, &vmsg); + + int reply_requested = vubr_device_execute_request(dev, &vmsg); + + if (reply_requested) { + /* Set the version in the flags when sending the reply */ + vmsg.flags &= ~VHOST_USER_VERSION_MASK; + vmsg.flags |= VHOST_USER_VERSION; + vmsg.flags |= VHOST_USER_REPLY_MASK; + vhost_user_message_write(sock, &vmsg); + } +} + +static void +vubr_device_accept_cb(int sock, void *ctx) +{ + struct vubr_device *dev = (struct vubr_device *)ctx; + int conn_fd; + struct sockaddr_un un; + socklen_t len = sizeof(un); + + if ((conn_fd = accept(sock, (struct sockaddr *) &un, &len)) == -1) { + perror("accept"); + exit(1); + } + + printf("DEBUG: Got connection from remote peer on sock %d\n", conn_fd); + dispatcher_add(&dev->dispatcher, conn_fd, ctx, vubr_device_receive_cb); +} + +struct vubr_device * +vubr_device_new(char *path) +{ + struct vubr_device *dev = + (struct vubr_device *) calloc(1, sizeof(struct vubr_device)); + + dev->nregions = 0; + + int i; + for (i = 0; i < MAX_NR_VIRTQUEUE; i++) + dev->virtqueue[i] = (struct virtqueue) { + .call_fd = -1, .kick_fd = -1, + .size = 0, + .last_avail_index = 0, .last_used_index = 0, + .desc = 0, .avail = 0, .used = 0, + }; + + /* Get a UNIX socket. */ + if ((dev->sock = socket(AF_UNIX, SOCK_STREAM, 0)) == -1) { + perror("socket"); + exit(1); + } + + struct sockaddr_un un; + un.sun_family = AF_UNIX; + strcpy(un.sun_path, path); + + size_t len = sizeof(un.sun_family) + strlen(path); + + unlink(path); + + if (bind(dev->sock, (struct sockaddr *) &un, len) == -1) { + perror("bind"); + exit(1); + } + + if (listen(dev->sock, 1) == -1) { + perror("listen"); + exit(1); + } + + dispatcher_init(&dev->dispatcher); + dispatcher_add(&dev->dispatcher, dev->sock, (void*) dev, vubr_device_accept_cb); + + printf("Waiting for connections on UNIX socket %s ...\n", path); + return dev; +} + +void +vubr_device_backend_udp_setup(struct vubr_device *dev, + char *local_host, + uint16_t local_port, + char *dest_host, + uint16_t dest_port) +{ + + struct sockaddr_in si_local; + int sock; + + if ((sock = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP)) == -1) + die("socket"); + + memset((char *) &si_local, 0, sizeof(struct sockaddr_in)); + si_local.sin_family = AF_INET; + si_local.sin_port = htons(local_port); + if(inet_aton(local_host, &si_local.sin_addr) == 0) { + fprintf(stderr, "inet_aton() failed.\n"); + exit(1); + } + + if( bind(sock, (struct sockaddr*)&si_local, sizeof(si_local) ) == -1) + die("bind"); + + /* setup destination for sends */ + struct sockaddr_in *si_remote = &dev->backend_udp_dest; + memset((char *) si_remote, 0, sizeof(struct sockaddr_in)); + si_remote->sin_family = AF_INET; + si_remote->sin_port = htons(dest_port); + if(inet_aton(dest_host, &si_remote->sin_addr) == 0) { + fprintf(stderr, "inet_aton() failed.\n"); + exit(1); + } + + dev->backend_udp_sock = sock; + dispatcher_add(&dev->dispatcher, sock, dev, _backend_recv_cb); + printf("Waiting for data from udp backend on %s:%d...\n", local_host, local_port); +} + +static void +vubr_device_backend_udp_sendbuf(struct vubr_device *dev, uint8_t *buf, size_t len) +{ + int slen = sizeof(struct sockaddr_in); + + if (sendto(dev->backend_udp_sock, buf, len, 0, (struct sockaddr *) &dev->backend_udp_dest, slen) == -1) + die("sendto()"); +} + +static int +vubr_device_backend_udp_recvbuf(struct vubr_device *dev, uint8_t *buf, size_t buflen) +{ + int slen = sizeof(struct sockaddr_in); + int rc; + + if ((rc = recvfrom(dev->backend_udp_sock, buf, buflen, 0, + (struct sockaddr *) &dev->backend_udp_dest, + (socklen_t *)&slen)) == -1) + die("recvfrom()"); + + return rc; +} + +void +vubr_device_run(struct vubr_device * dev) +{ + while (1) { + /* timeout 200ms */ + dispatcher_wait(&dev->dispatcher, 200000); + /* Here one can try polling strategy. */ + } +} diff --git a/tests/vubr/Makefile b/tests/vubr/Makefile new file mode 100644 index 0000000..c3400fb --- /dev/null +++ b/tests/vubr/Makefile @@ -0,0 +1,15 @@ +SRCS=dispatcher.c vhost_user.c vubr_device.c main.c +INCLUDES+=vhost.h virtio_ring.h virtio_net.h +INCLUDES+=vubr_config.h vhost_user.h virtqueue.h +INCLUDES+=dispatcher.h vubr_device.h + +EXE=vubr +CFLAGS += -m64 -Wall -Werror -g + +all: $(EXE) + +$(EXE): $(SRCS) $(INCLUDES) + $(CC) $(CFLAGS) $(SRCS) -o $@ + +clean: + rm -f $(EXE)
QEMU is missing a good test for vhost-user feature, so I've created a sample vhost-user application, which called Vubr (mst coined the name, but better suggestions will be appreciated). Vubr may later serve the QEMU community as vhost-user QEMU internal test. Essentially Vubr is a very basic vhost-user backend for QEMU, It runs as a separate user-level process. For packet processing Vubr uses an additional QEMU instance with a backend configured by "-net socket" as a shared VLAN. This way another QEMU virtual machine can effectively make a bus by means of UDP communication. For a more simple setup, the another QEMU instance running the SLiRP backed can be the same QEMU instance running vhost-user client. The Vubr implementation is very preliminary. It is missing many features. I has been studying vhost-user protocol internals, so I've wrote Vubr bit by bit as I progressed through the protocol. Most probably internal architecture will change significantly. To run Vubr application: Build vubr with: $ cd qemu/tests/vubr; make Ensure the machine has hugepages enabled in kernel with command line like: default_hugepagesz=2M hugepagesz=2M hugepages=2048 Run it with: $ ./vubr The above will run vhost-user server listening for connections on UNIX domain socket /tmp/vubr.sock, and will try to connect by UDP to VLAN bridge to localhost:5555, while listening on localhost:4444 Run qemu with a virtio-net backed by vhost-user: $ qemu \ -enable-kvm -m 512 -smp 2 \ -object memory-backend-file,id=mem,size=512M,mem-path=/dev/hugepages,share=on \ -numa node,memdev=mem -mem-prealloc \ -chardev socket,id=char0,path=/tmp/vubr.sock \ -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \ -device virtio-net-pci,netdev=mynet1 \ -net none \ -net socket,vlan=0,udp=localhost:4444,localaddr=localhost:5555 \ -net user,vlan=0 \ disk.img Vubr tested very lightly: it's able to bringup a linux on client VM with virtio-net driver, and execute transmits and receives to the internet. I tested with "wget redhat.com", "dig redhat.com". PS. I've consulted DPDK's code for vhost-user during Vubr implementation. Signed-off-by: Victor Kaplansky <victork@redhat.com> --- tests/vubr/dispatcher.h | 26 ++ tests/vubr/vhost.h | 77 +++++ tests/vubr/vhost_user.h | 70 +++++ tests/vubr/virtio_net.h | 38 +++ tests/vubr/virtio_ring.h | 103 +++++++ tests/vubr/virtqueue.h | 17 ++ tests/vubr/vubr_config.h | 7 + tests/vubr/vubr_device.h | 41 +++ tests/vubr/dispatcher.c | 77 +++++ tests/vubr/main.c | 18 ++ tests/vubr/vhost_user.c | 83 +++++ tests/vubr/vubr_device.c | 773 +++++++++++++++++++++++++++++++++++++++++++++++ tests/vubr/Makefile | 15 + 13 files changed, 1345 insertions(+) create mode 100644 tests/vubr/dispatcher.h create mode 100644 tests/vubr/vhost.h create mode 100644 tests/vubr/vhost_user.h create mode 100644 tests/vubr/virtio_net.h create mode 100644 tests/vubr/virtio_ring.h create mode 100644 tests/vubr/virtqueue.h create mode 100644 tests/vubr/vubr_config.h create mode 100644 tests/vubr/vubr_device.h create mode 100644 tests/vubr/dispatcher.c create mode 100644 tests/vubr/main.c create mode 100644 tests/vubr/vhost_user.c create mode 100644 tests/vubr/vubr_device.c create mode 100644 tests/vubr/Makefile