diff mbox

[RFCv4,2/2] net: Allow protocols to provide an unlocked_recvmsg socket method

Message ID 20090916170745.GD7699@ghostprotocols.net
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Arnaldo Carvalho de Melo Sept. 16, 2009, 5:07 p.m. UTC
So thar recvmmsg can use it. With this patch recvmmsg actually _requires_ that
socket->ops->unlocked_recvmsg exists, and that socket->sk->sk_prot->unlocked_recvmsg
is non NULL.

We may well switch back to the previous scheme where sys_recvmmsg checks if
the underlying protocol provides an unlocked version and uses it, falling
back to the locked version if there is none.

But first lets see if this works with recvmmsg alone and what kinds of gains we
get with the unlocked_recvmmsg implementation in UDP. Followup patches can
restore that behaviour if we want to use it with, say, DCCP and SCTP without an
specific unlocked version.

This should address the concerns raised by Rémi about the MSG_UNLOCKED problem.

Cc: Caitlin Bestler <caitlin.bestler@gmail.com>
Cc: Chris Van Hoof <vanhoof@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Nir Tzachar <nir.tzachar@gmail.com>
Cc: Nivedita Singhvi <niv@us.ibm.com>
Cc: Paul Moore <paul.moore@hp.com>
Cc: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Cc: Steven Whitehouse <steve@chygwyn.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 drivers/isdn/mISDN/socket.c    |    2 +
 drivers/net/pppoe.c            |    1 +
 drivers/net/pppol2tp.c         |    1 +
 include/linux/net.h            |    7 +++
 include/net/sock.h             |   13 +++++
 net/appletalk/ddp.c            |    1 +
 net/atm/pvc.c                  |    1 +
 net/atm/svc.c                  |    1 +
 net/ax25/af_ax25.c             |    1 +
 net/bluetooth/bnep/sock.c      |    1 +
 net/bluetooth/cmtp/sock.c      |    1 +
 net/bluetooth/hci_sock.c       |    1 +
 net/bluetooth/hidp/sock.c      |    1 +
 net/bluetooth/l2cap.c          |    1 +
 net/bluetooth/rfcomm/sock.c    |    1 +
 net/bluetooth/sco.c            |    1 +
 net/can/bcm.c                  |    1 +
 net/can/raw.c                  |    1 +
 net/core/sock.c                |   26 +++++++++
 net/dccp/ipv4.c                |    1 +
 net/dccp/ipv6.c                |    1 +
 net/decnet/af_decnet.c         |    1 +
 net/econet/af_econet.c         |    1 +
 net/ieee802154/af_ieee802154.c |    2 +
 net/ipv4/af_inet.c             |    3 +
 net/ipv4/udp.c                 |   52 +++++++++++++++---
 net/ipv6/af_inet6.c            |    2 +
 net/ipv6/raw.c                 |    1 +
 net/ipx/af_ipx.c               |    1 +
 net/irda/af_irda.c             |    4 ++
 net/iucv/af_iucv.c             |    1 +
 net/key/af_key.c               |    1 +
 net/llc/af_llc.c               |    1 +
 net/netlink/af_netlink.c       |    1 +
 net/netrom/af_netrom.c         |    1 +
 net/packet/af_packet.c         |    2 +
 net/phonet/socket.c            |    2 +
 net/rds/af_rds.c               |    1 +
 net/rose/af_rose.c             |    1 +
 net/rxrpc/af_rxrpc.c           |    1 +
 net/sctp/ipv6.c                |    1 +
 net/sctp/protocol.c            |    1 +
 net/socket.c                   |  112 +++++++++++++++++++++++++++++++++++----
 net/tipc/socket.c              |    3 +
 net/unix/af_unix.c             |    3 +
 net/x25/af_x25.c               |    1 +
 46 files changed, 244 insertions(+), 21 deletions(-)

Comments

Nir Tzachar Sept. 17, 2009, 2:09 p.m. UTC | #1
Hello.

Below are some test results with the patch (only part 1, as I did not
manage to apply part 2).
The test application is attached below, and works as follows:

I set out to measure the latency which can be saved by this patch, and
the application is designed accordingly. It is composed of three
parts: a producer, which time-stamps packets and sends them as fast as
possible, a mirror, which receives messages and bounces them to a
remote destination and finally, a consumer, which receives messages as
fast as possible and measures latency and throughout.

Both the produce and consumer are executed on the same host and the
mirror on a remote host. Both hosts are running linux 2.6.31 with v4
of the patch (but, as I said before, only part 1, with the unlocked_*
stuff). All processes are executed under SCHED_FIFO. Both hosts are
connected by a switched 1G Ethernet network. The mirror is executed on
a 8-core nahelem beast, and the producer and consumer on my desktop,
which is a quad. /proc/cpuinfo and lspcis and .configs can be supplied
if needed. Network cards are Intel Corporation 82566DM-2 Gigabit
Network and Broadcom Corporation NetXtreme II BCM5709 Gigabit
Ethernet.

The results (which follow below) clearly show the advantages of using
recvmmsg over recvmsg both latency wise and throughput wise. The
addition of a sendmmsg would also have a huge impact, IMO.
Receiving batches of 30 packets, each of 1024 bytes, results with no
latency improvements, but with a ~55% throughput improvement, from 72
megabytes per second to  111. Repeating the same test, but with
batches of 3000, displays the same behaviour. The more interesting
result (to me, at least :) is when using small packets. Sending
packets of size 100 and receiving in batches of 30  gives 470 micro
latency and 244669 packets per second. On the other hand, without
recvmmsg we get 750 micro latency and 210818 packets per second. A
huge improvement here.

I think that with a bit more tinkering we can even stretch these results a bit.

Cheers.

The results:

(a sample execution)
Usage:
	       	-n do not use recvmmsg
	       	-r producer/consumer/mirror [producer]
	       	-b recv_batch_size          [8]
	       	-l master_listen_port       [5001]
	       	-t send_to_host              [localhost]
	       	-p slave_listen_port        [5002]
	       	-s packet_size                [256]
	       	-f run in sched fifo
	       	-m use mlockall
10.0.0.1:
sudo ./recvmmsg -r consumer -b 3000 -f -s 1024
sudo ./recvmmsg -r producer -t 10.0.0.2 -f -s 1024 -b 1

10.0.0.2
sudo ./recvmmsg -t 10.0.0.1 -r mirror -b 3000 -f -s 1024

-f -s 1024 -b 30

packets num: 569203, mean: 942.69, max: 1551, stddev: 128.48
packets per second: 113839.96, bytes per second: 116572774
packets num: 569214, mean: 942.61, max: 1385, stddev: 126.55
packets per second: 113841.62, bytes per second: 116575027
packets num: 569210, mean: 943.76, max: 1443, stddev: 127.41
packets per second: 113840.36, bytes per second: 116574208
packets num: 569209, mean: 942.34, max: 1363, stddev: 126.72
packets per second: 113840.18, bytes per second: 116574003
packets num: 569202, mean: 943.43, max: 1495, stddev: 127.88
packets per second: 113839.85, bytes per second: 116572569

-f -s 1024 -b 30 -n

packets num: 373461, mean: 950.15, max: 1351, stddev: 122.45
packets per second: 74691.80, bytes per second: 76484812
packets num: 373494, mean: 954.28, max: 1538, stddev: 125.60
packets per second: 74697.81, bytes per second: 76491571
packets num: 373786, mean: 952.16, max: 1505, stddev: 124.79
packets per second: 74756.24, bytes per second: 76551372
packets num: 373564, mean: 953.37, max: 1500, stddev: 125.18
packets per second: 74712.34, bytes per second: 76505907

-f -s 100 -b 30

packets num: 1208114, mean: 474.45, max: 1849, stddev: 117.70
packets per second: 241616.37, bytes per second: 24162280
packets num: 1223365, mean: 475.24, max: 2273, stddev: 117.12
packets per second: 244669.28, bytes per second: 24467300
packets num: 1231103, mean: 470.03, max: 2509, stddev: 107.01
packets per second: 246219.42, bytes per second: 24622060
packets num: 1242466, mean: 467.69, max: 2753, stddev: 114.55
packets per second: 248488.53, bytes per second: 24849320


-f -s 100 -b 30 -n

packets num: 1044677, mean: 785.11, max: 3635, stddev: 417.51
packets per second: 208933.60, bytes per second: 20893540
packets num: 1054100, mean: 765.59, max: 3259, stddev: 399.20
packets per second: 210818.74, bytes per second: 21082000
packets num: 1051835, mean: 726.04, max: 3403, stddev: 369.04
packets per second: 210365.91, bytes per second: 21036700
packets num: 1048108, mean: 743.42, max: 3440, stddev: 390.79
packets per second: 209620.38, bytes per second: 20962160



-b 3000 -f -s 1024

packets num: 569200, mean: 948.99, max: 1507, stddev: 130.52
packets per second: 113838.77, bytes per second: 116572160
packets num: 569204, mean: 940.57, max: 1307, stddev: 125.34
packets per second: 113840.28, bytes per second: 116572979
packets num: 569193, mean: 957.70, max: 1545, stddev: 138.00
packets per second: 113836.62, bytes per second: 116570726
packets num: 569205, mean: 947.59, max: 1505, stddev: 130.55
packets per second: 113839.91, bytes per second: 116573184
packets num: 569205, mean: 943.81, max: 1395, stddev: 126.93
packets per second: 113840.36, bytes per second: 116573184


-b 3000 -f -s 1024 -n:

packets num: 373661, mean: 952.37, max: 1509, stddev: 131.57
packets per second: 74731.71, bytes per second: 76525772
packets num: 373678, mean: 951.38, max: 1525, stddev: 130.43
packets per second: 74734.52, bytes per second: 76529254
packets num: 373717, mean: 947.87, max: 1499, stddev: 127.31
packets per second: 74742.53, bytes per second: 76537241
packets num: 373727, mean: 944.58, max: 1491, stddev: 125.06
packets per second: 74744.29, bytes per second: 76539289


-f -s 100 -b 3000

packets num: 1380128, mean: 1345.93, max: 4422, stddev: 164.48
packets per second: 276023.28, bytes per second: 27602560
packets num: 1430723, mean: 1379.40, max: 2498, stddev: 45.08
packets per second: 286138.19, bytes per second: 28614460
packets num: 1450128, mean: 1353.45, max: 2589, stddev: 52.73
packets per second: 290024.56, bytes per second: 29002560
packets num: 1422040, mean: 1392.20, max: 2539, stddev: 50.48
packets per second: 284404.25, bytes per second: 28440800
packets num: 1391757, mean: 1422.72, max: 2604, stddev: 50.74
packets per second: 278349.79, bytes per second: 27835140


-f -s 100 -n -b 3000

packets num: 1088358, mean: 828.90, max: 20103, stddev: 660.55
packets per second: 217668.99, bytes per second: 21767160
packets num: 1225010, mean: 1018.98, max: 10186, stddev: 538.93
packets per second: 245000.28, bytes per second: 24500200
packets num: 1090276, mean: 899.01, max: 5032, stddev: 562.04
packets per second: 218001.96, bytes per second: 21805520


recvmmsg.c:

#include "linux/arch/x86/include/asm/unistd.h"

#include <stdlib.h>
#include <syscall.h>
#include <stdio.h>
#include <sys/socket.h>
#include <unistd.h>
#include <netinet/in.h>
#include <netinet/tcp.h>
#include <arpa/inet.h>
#include <netdb.h>
#include <poll.h>
#include <string.h>
#include <sys/time.h>
#include <time.h>
#include <math.h>
#include <sched.h>
#include <fcntl.h>
#include <sys/mman.h>


struct mmsghdr {
	struct msghdr	msg_hdr;
	unsigned	msg_len;
};


#ifndef NSEC_PER_MSEC
#define NSEC_PER_MSEC	1000000UL
#endif

/* Set a fd into nonblocking mode. */
int set_nonblocking(int fd)
{
	int val;

	if ((val = fcntl(fd, F_GETFL)) == -1)
		return -1;
	if (!(val & O_NONBLOCK)) {
		val |= O_NONBLOCK;
		return fcntl(fd, F_SETFL, val);
	}
	return 0;
}


static int recvmmsg(int fd, struct mmsghdr *mmsg,
			   unsigned vlen, unsigned flags,
			   struct timespec *timeout)
{
	return syscall(__NR_recvmmsg, fd, mmsg, vlen, flags, timeout);
}

static int reg_recvmsg(int fd, struct mmsghdr *mmsg,
			   unsigned vlen, unsigned flags,
			   struct timespec *timeout)
{
	int i;
	int ret = -1;

	for (i=0; i<vlen; i++){
		int tmp = recvmsg(fd, &mmsg[i].msg_hdr, flags);
		if (tmp < 0)
			break;
		mmsg[i].msg_len = tmp;
		ret++;
	}
	return ret;
}

static int reg_sendmsg(int fd, struct mmsghdr *mmsg,
			   unsigned vlen, unsigned flags,
			   struct timespec *timeout)
{
	int i;
	int ret = 0;

	for (i=0; i<vlen; i++){
		int tmp = sendmsg(fd, &mmsg[i].msg_hdr, flags);
		if (tmp <= 0){
			ret = tmp;
			break;
		}
		mmsg[i].msg_len = tmp;
		ret++;
	}
	return ret;
}

static unsigned long long micro_time()
{
	struct timeval tv;
	gettimeofday(&tv, NULL);
	return tv.tv_sec*NSEC_PER_MSEC+tv.tv_usec;
}

typedef int (*send_packets_f)(int fd, struct mmsghdr *mmsg, unsigned vlen,
			    unsigned flags, struct timespec *timeout);
typedef int (*recv_packets_f)(int fd, struct mmsghdr *mmsg, unsigned vlen,
			    unsigned flags, struct timespec *timeout);

//sockets must be bound/connected
static void producer(const int batch_size,
		     const int packet_size,
		     int send_sock,
		     send_packets_f send_f)
{
	char buf[batch_size][packet_size];
	struct iovec iovec[batch_size];
	struct mmsghdr datagrams[batch_size];
	int i;

	for (i = 0; i < batch_size; ++i) {
		memset(&datagrams[i].msg_hdr, 0, sizeof(datagrams[i].msg_hdr));
		iovec[i].iov_base = buf[i];
		iovec[i].iov_len  = sizeof(buf[i]);
		datagrams[i].msg_hdr.msg_iov	 = &iovec[i];
		datagrams[i].msg_hdr.msg_iovlen	 = 1;
	}

	while (1){
		//generate batch_size packets of packet_size, stamp them, and send
		int send_num = 0;
		for (i = 0; i < batch_size; ++i) {
			unsigned long long *stamp =
				(unsigned long long *) &buf[i][0];
			*stamp = micro_time();
		}
		
		send_num = send_f(send_sock, datagrams, batch_size, 0, 0);
		if (send_num < batch_size){
			printf("could not send entire batch: %d %m\n", send_num);
			continue;
		}
	}
}

//sockets must be bound/connected
static void consumer(const int batch_size,
		     const int packet_size,
		     int recv_sock,
		     recv_packets_f recv_f)
{
	char buf[batch_size][packet_size];
	struct iovec iovec[batch_size];
	struct mmsghdr datagrams[batch_size];
	int i;
	unsigned long long start_time = micro_time();
	unsigned long long max = 0;
	double mean = 0;
	double m2 = 0;
	int n = 0;

	if (set_nonblocking(recv_sock) != 0)
		printf("recv socket is in blocking mode\n");
	else
		printf("recv socket is in non-blocking mode\n");

	for (i = 0; i < batch_size; ++i) {
		memset(&datagrams[i].msg_hdr, 0, sizeof(datagrams[i].msg_hdr));
		iovec[i].iov_base = buf[i];
		iovec[i].iov_len  = sizeof(buf[i]);
		datagrams[i].msg_hdr.msg_iov	 = &iovec[i];
		datagrams[i].msg_hdr.msg_iovlen	 = 1;
	}

	struct pollfd pfds[1] = {
		[0] = {
			.fd = recv_sock,
			.events = POLLIN,
		},
	};

	while (1){
		unsigned long long now;

		if (poll(pfds, 1, -1) < 0) {
			perror("poll: ");
			exit(0);
		}

		int ret = recv_f(recv_sock, &datagrams[0], batch_size, 0, 0);
		if (ret < 0){
			perror("consumer recv: ");
			exit(0);
		}

		//go over all received packets, and count latency:
		now = micro_time();
		for (i = 0; i < ret; ++i) {
			double delta;
			unsigned long long *stamp =
				(unsigned long long *) &buf[i];
			unsigned long long sample =
				now - *stamp;
			n++;
			delta = sample - mean;
			mean += delta/n;
			m2 += delta*(sample-mean);

			if (max < sample)
				max = sample;
		}

		if (micro_time() - start_time >= 5000000){
			printf("packets num: %d, mean: %.2f, max: %llu, stddev: %.2f\n",
			       n, mean, max, sqrt(m2/(n-1)));
			printf("packets per second: %.2f, bytes per second: %lld\n",
			       n / ((micro_time() - start_time)/1000000.0),
			       n*packet_size / ((micro_time() - start_time)/1000000));

			start_time = micro_time();
			n = 0;
			mean = 0;
			m2 = 0;
			max = 0;
		}
	}
}

//sockets must be bound/connected
static void mirror(const int batch_size,
	   const int packet_size,
	   int send_sock,
	   int recv_sock,
	   send_packets_f send_f,
	   recv_packets_f recv_f)
{
	char buf[batch_size][packet_size];
	struct iovec iovec[batch_size];
	struct mmsghdr datagrams[batch_size];
	int i;

	if (set_nonblocking(recv_sock) != 0)
		printf("recv socket is in blocking mode\n");
	else
		printf("recv socket is in non-blocking mode\n");

	for (i = 0; i < batch_size; ++i) {
		memset(&datagrams[i].msg_hdr, 0, sizeof(datagrams[i].msg_hdr));
		iovec[i].iov_base = buf[i];
		iovec[i].iov_len  = sizeof(buf[i]);
		datagrams[i].msg_hdr.msg_iov	 = &iovec[i];
		datagrams[i].msg_hdr.msg_iovlen	 = 1;
		datagrams[i].msg_hdr.msg_name	 = NULL;
	}

	while (1){
		int send_num = 0;
		int recv_num = 0;

		struct pollfd pfds[1] = {
			[0] = {
				.fd = recv_sock,
				.events = POLLIN,
			},
		};

		if (poll(pfds, 1, -1) < 0) {
			perror("poll: ");
			exit(0);
		}


		//printf("slave recv...\n");
		recv_num = recv_f(recv_sock, &datagrams[recv_num],
				 batch_size-recv_num, 0, 0);
		if (recv_num < 0) {
			perror("mirror recv");
			exit(0);
		}
		//printf("recv %d packets\n", recv_num);

		while (send_num < recv_num){
			int ret = send_f(send_sock, &datagrams[send_num], recv_num-send_num, 0, 0);
			if (ret < 0){
				perror("mirror send");
				exit(0);
			}
			send_num += ret;
			//printf("sent %d packets\n", ret);
		}
	}
}

static void usage(char *app)
{
	printf("Usage: %s\n"
	       "	-n do not use recvmmsg\n"
	       "	-r producer/consumer/mirror [producer]\n"
	       "	-b recv_batch_size          [8]\n"
	       "	-l master_listen_port       [5001]\n"
	       "	-t send_to_host             [localhost]\n"
	       "	-p slave_listen_port        [5002]\n"
	       "	-s packet_size              [256]\n"
	       "	-f run in sched fifo\n"
	       "	-m use mlockall\n"
	       "	-h this help\n",
	       app);
}

int create_recv_sock(const char *port)
{
	struct addrinfo *host;
	struct addrinfo hints;
	int fd = -1;
	int err;

	memset(&hints, 0, sizeof(struct addrinfo));
	hints.ai_flags = AI_PASSIVE;    /* For wildcard IP address */
	hints.ai_family = AF_INET;
	hints.ai_socktype = SOCK_DGRAM; /* Datagram socket */
	hints.ai_protocol = 0;          /* Any protocol */
	hints.ai_canonname = NULL;
	hints.ai_addr = NULL;
	hints.ai_next = NULL;

	err = getaddrinfo(NULL, port, &hints, &host);
	if (err != 0) {
		fprintf(stderr, "error using getaddrinfo: %s\n",
			gai_strerror(err));
		goto out;
	}
	
	fd = socket(host->ai_family, host->ai_socktype, host->ai_protocol);
	if (fd < 0) {
		perror("recv_sock: ");
		goto out_freeaddrinfo;
	}
	
	if (bind(fd, host->ai_addr, host->ai_addrlen) < 0) {
		perror("recv_sock bind");
		close(fd);
	}

out_freeaddrinfo:
	freeaddrinfo(host);
out:
	return fd;
}

int create_send_sock(const char *host, const char *port)
{
	int fd = -1;
	struct addrinfo *send_host;
	struct addrinfo hints;
	int err;

	memset(&hints, 0, sizeof(struct addrinfo));
	hints.ai_flags = AI_PASSIVE;    /* For wildcard IP address */
	hints.ai_family = AF_INET;
	hints.ai_socktype = SOCK_DGRAM; /* Datagram socket */
	hints.ai_protocol = 0;          /* Any protocol */
	hints.ai_canonname = NULL;
	hints.ai_addr = NULL;
	hints.ai_next = NULL;

	err = getaddrinfo(host, port, &hints, &send_host);
	if (err != 0) {
		fprintf(stderr, "error using getaddrinfo: %s\n",
			gai_strerror(err));
		goto out;
	}
	
	fd = socket(send_host->ai_family, send_host->ai_socktype,
send_host->ai_protocol);
	if (fd < 0) {
		perror("send_sock");
		goto out_freeaddrinfo;
	}

	if (connect(fd, send_host->ai_addr, send_host->ai_addrlen) != 0) {
		perror("send_sock connect");
		close(fd);
	}

out_freeaddrinfo:
	freeaddrinfo(send_host);
out:
	return fd;
}

int main(int argc, char *argv[])
{
	const char *master_listen_port = "5001";
	const char *slave_listen_port = "5002";
	const char *listen_port;
	const char *send_port;
	const char *target_host = "localhost";
	const char *role = "producer";
	int batch_size = 8;
	int packet_size = 256;
	int use_mmsg = 1;
	int s_fifo = 0;
	int lock_mem = 0;
	char c;

	while ( (c=getopt(argc, argv, "mfhr:b:nl:t:p:s:")) != -1){
		switch(c){
		case 'r':
			role = optarg;
			break;
		case 'b':
			batch_size = atoi(optarg);
			break;
		case 'l':
			master_listen_port = optarg;
			break;
		case 't':
			target_host = optarg;
			break;
		case 'p':
			slave_listen_port = optarg;
			break;
		case 'n':
			use_mmsg = 0;
			break;
		case 'm':
			lock_mem = 1;
			break;
		case 'f':
			s_fifo = 1;
			break;
		case 's':
			packet_size = atoi(optarg);
			break;
		case 'h':
		default:
			usage(argv[0]);
			exit(0);
		}
	}
	//set scheduling to SCHED_FIFO
	struct sched_param params;
	params.sched_priority = 99;
	
	if (s_fifo && sched_setscheduler(getpid(), SCHED_FIFO, &params) != 0){
		perror("sched_setscheduler");
	}

	if (sched_getscheduler(getpid()) != SCHED_FIFO)
		printf("not running in SCHED_FIFO\n");
	else
		printf("running in SCHED_FIFO\n");

	if (lock_mem){
		if (mlockall(MCL_CURRENT|MCL_FUTURE) != 0)
			perror("mlockall failed");
		else
			printf("memory is locked\n");
	}

	if (strcmp(role, "producer") == 0 ||
	    strcmp(role, "consumer") == 0){
		listen_port = master_listen_port;
		send_port = slave_listen_port;
	} else {
		listen_port = slave_listen_port;
		send_port = master_listen_port;
	}

	if (strcmp(role, "producer") == 0){
		int send_sock = create_send_sock(target_host, send_port);
		if (send_sock < 0){
			perror("send sock");
			goto out;
		}
		if (use_mmsg){
			printf("starting producer with mmsg\n");
			producer(batch_size,
			       packet_size,
			       send_sock,
			       reg_sendmsg);
		} else {
			printf("starting producer without mmsg\n");
			producer(batch_size,
			       packet_size,
			       send_sock,
			       reg_sendmsg);
		}
	} else if (strcmp(role, "consumer") == 0){
		int recv_sock = create_recv_sock(listen_port);
		if (recv_sock < 0){
			perror("recv_sock ");
			goto out;
		}
		if (use_mmsg){
			printf("starting consumer with mmsg\n");
			consumer(batch_size,
			       packet_size,
			       recv_sock,
			       recvmmsg);
		} else {
			printf("starting consumer without mmsg\n");
			consumer(batch_size,
			       packet_size,
			       recv_sock,
			       reg_recvmsg);
		}
	} else if (strcmp(role, "mirror") == 0){
		int recv_sock = create_recv_sock(listen_port);
		int send_sock = create_send_sock(target_host, send_port);
		if (send_sock < 0){
			perror("send sock");
			goto out;
		}
		if (recv_sock < 0){
			perror("recv_sock ");
			goto out;
		}
		if (use_mmsg){
			printf("starting mirror with mmsg\n");
			mirror(batch_size,
			       packet_size,
			       send_sock,
			       recv_sock,
			       reg_sendmsg,
			       recvmmsg);

		} else {
			printf("starting mirror without mmsg\n");
			mirror(batch_size,
			       packet_size,
			       send_sock,
			       recv_sock,
			       reg_sendmsg,
			       reg_recvmsg);
		}

	} else {
		printf("please specify role as either master or slave\n");
	}

out:
	return 0;
}
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnaldo Carvalho de Melo Sept. 17, 2009, 9:21 p.m. UTC | #2
Em Thu, Sep 17, 2009 at 05:09:19PM +0300, Nir Tzachar escreveu:
> Hello.
> 
> Below are some test results with the patch (only part 1, as I did not
> manage to apply part 2).

I forgot to mention that the patches were made against DaveM's
net-next-2.6 tree at:

git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6

If you have a linux-2.6 git tree, just do:

cd linux-2.6
git remote add net-next git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
git branch -b net-next-recvmmsg net-next/master

And you should be able to apply the two patches cleanly.

> The test application is attached below, and works as follows:
> 
> I set out to measure the latency which can be saved by this patch, and
> the application is designed accordingly. It is composed of three
> parts: a producer, which time-stamps packets and sends them as fast as
> possible, a mirror, which receives messages and bounces them to a
> remote destination and finally, a consumer, which receives messages as
> fast as possible and measures latency and throughout.
> 
> Both the produce and consumer are executed on the same host and the
> mirror on a remote host. Both hosts are running linux 2.6.31 with v4
> of the patch (but, as I said before, only part 1, with the unlocked_*
> stuff). All processes are executed under SCHED_FIFO. Both hosts are

Here is the problem, the patch, as mentioned above, was made against
net-next-2.6.

I'll rework the 2nd patch so that you can test with both.

> connected by a switched 1G Ethernet network. The mirror is executed on
> a 8-core nahelem beast, and the producer and consumer on my desktop,
> which is a quad. /proc/cpuinfo and lspcis and .configs can be supplied
> if needed. Network cards are Intel Corporation 82566DM-2 Gigabit
> Network and Broadcom Corporation NetXtreme II BCM5709 Gigabit
> Ethernet.
> 
> The results (which follow below) clearly show the advantages of using
> recvmmsg over recvmsg both latency wise and throughput wise. The
> addition of a sendmmsg would also have a huge impact, IMO.

Yeah, there are even some smarts that can be done in the sendmmsg case,
like avoiding passing the same payload to multiple destinations, just
marking the mmsghdr size with zero that would thus mean "use the latest
non-zero sized payload".

> Receiving batches of 30 packets, each of 1024 bytes, results with no
> latency improvements, but with a ~55% throughput improvement, from 72
> megabytes per second to  111. Repeating the same test, but with
> batches of 3000, displays the same behaviour. The more interesting
> result (to me, at least :) is when using small packets. Sending
> packets of size 100 and receiving in batches of 30  gives 470 micro
> latency and 244669 packets per second. On the other hand, without
> recvmmsg we get 750 micro latency and 210818 packets per second. A
> huge improvement here.
> 
> I think that with a bit more tinkering we can even stretch these results a bit.

I guess so too, with luck I'll be able to test this over a 10 Gbit/s
link today, will use my and your test cases.

Thanks a lot!
 
- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/isdn/mISDN/socket.c b/drivers/isdn/mISDN/socket.c
index c36f521..6da3a71 100644
--- a/drivers/isdn/mISDN/socket.c
+++ b/drivers/isdn/mISDN/socket.c
@@ -590,6 +590,7 @@  static const struct proto_ops data_sock_ops = {
 	.getname	= data_sock_getname,
 	.sendmsg	= mISDN_sock_sendmsg,
 	.recvmsg	= mISDN_sock_recvmsg,
+	.unlocked_recvmsg = sock_no_unlocked_recvmsg,
 	.poll		= datagram_poll,
 	.listen		= sock_no_listen,
 	.shutdown	= sock_no_shutdown,
@@ -743,6 +744,7 @@  static const struct proto_ops base_sock_ops = {
 	.getname	= sock_no_getname,
 	.sendmsg	= sock_no_sendmsg,
 	.recvmsg	= sock_no_recvmsg,
+	.unlocked_recvmsg = sock_no_unlocked_recvmsg,
 	.poll		= sock_no_poll,
 	.listen		= sock_no_listen,
 	.shutdown	= sock_no_shutdown,
diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c
index 7cbf6f9..cbcd3d5 100644
--- a/drivers/net/pppoe.c
+++ b/drivers/net/pppoe.c
@@ -1121,6 +1121,7 @@  static const struct proto_ops pppoe_ops = {
 	.getsockopt	= sock_no_getsockopt,
 	.sendmsg	= pppoe_sendmsg,
 	.recvmsg	= pppoe_recvmsg,
+	.unlocked_recvmsg = sock_no_unlocked_recvmsg,
 	.mmap		= sock_no_mmap,
 	.ioctl		= pppox_ioctl,
 };
diff --git a/drivers/net/pppol2tp.c b/drivers/net/pppol2tp.c
index e0f9219..af6160c 100644
--- a/drivers/net/pppol2tp.c
+++ b/drivers/net/pppol2tp.c
@@ -2590,6 +2590,7 @@  static struct proto_ops pppol2tp_ops = {
 	.getsockopt	= pppol2tp_getsockopt,
 	.sendmsg	= pppol2tp_sendmsg,
 	.recvmsg	= pppol2tp_recvmsg,
+	.unlocked_recvmsg = sock_no_unlocked_recvmsg,
 	.mmap		= sock_no_mmap,
 	.ioctl		= pppox_ioctl,
 };
diff --git a/include/linux/net.h b/include/linux/net.h
index d67587a..8b852de 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -186,6 +186,10 @@  struct proto_ops {
 	int		(*recvmsg)   (struct kiocb *iocb, struct socket *sock,
 				      struct msghdr *m, size_t total_len,
 				      int flags);
+	int		(*unlocked_recvmsg)(struct kiocb *iocb,
+					    struct socket *sock,
+					    struct msghdr *m,
+					    size_t total_len, int flags);
 	int		(*mmap)	     (struct file *file, struct socket *sock,
 				      struct vm_area_struct * vma);
 	ssize_t		(*sendpage)  (struct socket *sock, struct page *page,
@@ -316,6 +320,8 @@  SOCKCALL_WRAP(name, sendmsg, (struct kiocb *iocb, struct socket *sock, struct ms
 	      (iocb, sock, m, len)) \
 SOCKCALL_WRAP(name, recvmsg, (struct kiocb *iocb, struct socket *sock, struct msghdr *m, size_t len, int flags), \
 	      (iocb, sock, m, len, flags)) \
+SOCKCALL_WRAP(name, unlocked_recvmsg, (struct kiocb *iocb, struct socket *sock, struct msghdr *m, size_t len, int flags), \
+	      (iocb, sock, m, len, flags)) \
 SOCKCALL_WRAP(name, mmap, (struct file *file, struct socket *sock, struct vm_area_struct *vma), \
 	      (file, sock, vma)) \
 	      \
@@ -337,6 +343,7 @@  static const struct proto_ops name##_ops = {			\
 	.getsockopt	= __lock_##name##_getsockopt,	\
 	.sendmsg	= __lock_##name##_sendmsg,	\
 	.recvmsg	= __lock_##name##_recvmsg,	\
+	.unlocked_recvmsg = __lock_##name##_unlocked_recvmsg,	\
 	.mmap		= __lock_##name##_mmap,		\
 };
 
diff --git a/include/net/sock.h b/include/net/sock.h
index 950409d..7c62428 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -644,6 +644,11 @@  struct proto {
 					   struct msghdr *msg,
 					size_t len, int noblock, int flags, 
 					int *addr_len);
+	int			(*unlocked_recvmsg)(struct kiocb *iocb,
+						    struct sock *sk,
+						    struct msghdr *msg,
+						    size_t len, int noblock,
+						    int flags, int *addr_len);
 	int			(*sendpage)(struct sock *sk, struct page *page,
 					int offset, size_t size, int flags);
 	int			(*bind)(struct sock *sk, 
@@ -998,6 +1003,11 @@  extern int                      sock_no_sendmsg(struct kiocb *, struct socket *,
 						struct msghdr *, size_t);
 extern int                      sock_no_recvmsg(struct kiocb *, struct socket *,
 						struct msghdr *, size_t, int);
+extern int			sock_no_unlocked_recvmsg(struct kiocb *iocb,
+							 struct socket *sock,
+							 struct msghdr *msg,
+							 size_t size,
+							 int flags);
 extern int			sock_no_mmap(struct file *file,
 					     struct socket *sock,
 					     struct vm_area_struct *vma);
@@ -1014,6 +1024,9 @@  extern int sock_common_getsockopt(struct socket *sock, int level, int optname,
 				  char __user *optval, int __user *optlen);
 extern int sock_common_recvmsg(struct kiocb *iocb, struct socket *sock,
 			       struct msghdr *msg, size_t size, int flags);
+extern int sock_common_unlocked_recvmsg(struct kiocb *iocb, struct socket *sock,
+					struct msghdr *msg, size_t size,
+					int flags);
 extern int sock_common_setsockopt(struct socket *sock, int level, int optname,
 				  char __user *optval, int optlen);
 extern int compat_sock_common_getsockopt(struct socket *sock, int level,
diff --git a/net/appletalk/ddp.c b/net/appletalk/ddp.c
index 4a6ff2b..bb2e1bb 100644
--- a/net/appletalk/ddp.c
+++ b/net/appletalk/ddp.c
@@ -1847,6 +1847,7 @@  static const struct proto_ops SOCKOPS_WRAPPED(atalk_dgram_ops) = {
 	.getsockopt	= sock_no_getsockopt,
 	.sendmsg	= atalk_sendmsg,
 	.recvmsg	= atalk_recvmsg,
+	.unlocked_recvmsg = sock_no_unlocked_recvmsg,
 	.mmap		= sock_no_mmap,
 	.sendpage	= sock_no_sendpage,
 };
diff --git a/net/atm/pvc.c b/net/atm/pvc.c
index e1d22d9..5c03749 100644
--- a/net/atm/pvc.c
+++ b/net/atm/pvc.c
@@ -122,6 +122,7 @@  static const struct proto_ops pvc_proto_ops = {
 	.getsockopt =	pvc_getsockopt,
 	.sendmsg =	vcc_sendmsg,
 	.recvmsg =	vcc_recvmsg,
+	.unlocked_recvmsg = sock_no_unlocked_recvmsg,
 	.mmap =		sock_no_mmap,
 	.sendpage =	sock_no_sendpage,
 };
diff --git a/net/atm/svc.c b/net/atm/svc.c
index 7b831b5..6c66ae9 100644
--- a/net/atm/svc.c
+++ b/net/atm/svc.c
@@ -644,6 +644,7 @@  static const struct proto_ops svc_proto_ops = {
 	.setsockopt =	svc_setsockopt,
 	.getsockopt =	svc_getsockopt,
 	.sendmsg =	vcc_sendmsg,
+	.unlocked_recvmsg = sock_no_unlocked_recvmsg,
 	.recvmsg =	vcc_recvmsg,
 	.mmap =		sock_no_mmap,
 	.sendpage =	sock_no_sendpage,
diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index da0f64f..43f4f57 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -1976,6 +1976,7 @@  static const struct proto_ops ax25_proto_ops = {
 	.getsockopt	= ax25_getsockopt,
 	.sendmsg	= ax25_sendmsg,
 	.recvmsg	= ax25_recvmsg,
+	.unlocked_recvmsg = sock_no_unlocked_recvmsg,
 	.mmap		= sock_no_mmap,
 	.sendpage	= sock_no_sendpage,
 };
diff --git a/net/bluetooth/bnep/sock.c b/net/bluetooth/bnep/sock.c
index e857628..0b26b3c 100644
--- a/net/bluetooth/bnep/sock.c
+++ b/net/bluetooth/bnep/sock.c
@@ -178,6 +178,7 @@  static const struct proto_ops bnep_sock_ops = {
 	.getname	= sock_no_getname,
 	.sendmsg	= sock_no_sendmsg,
 	.recvmsg	= sock_no_recvmsg,
+	.unlocked_recvmsg = sock_no_unlocked_recvmsg,
 	.poll		= sock_no_poll,
 	.listen		= sock_no_listen,
 	.shutdown	= sock_no_shutdown,
diff --git a/net/bluetooth/cmtp/sock.c b/net/bluetooth/cmtp/sock.c
index 16b0fad..72a4b5d 100644
--- a/net/bluetooth/cmtp/sock.c
+++ b/net/bluetooth/cmtp/sock.c
@@ -173,6 +173,7 @@  static const struct proto_ops cmtp_sock_ops = {
 	.getname	= sock_no_getname,
 	.sendmsg	= sock_no_sendmsg,
 	.recvmsg	= sock_no_recvmsg,
+	.unlocked_recvmsg = sock_no_unlocked_recvmsg,
 	.poll		= sock_no_poll,
 	.listen		= sock_no_listen,
 	.shutdown	= sock_no_shutdown,
diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
index 4f9621f..bd0aace 100644
--- a/net/bluetooth/hci_sock.c
+++ b/net/bluetooth/hci_sock.c
@@ -603,6 +603,7 @@  static const struct proto_ops hci_sock_ops = {
 	.getname	= hci_sock_getname,
 	.sendmsg	= hci_sock_sendmsg,
 	.recvmsg	= hci_sock_recvmsg,
+	.unlocked_recvmsg = sock_no_unlocked_recvmsg,
 	.ioctl		= hci_sock_ioctl,
 	.poll		= datagram_poll,
 	.listen		= sock_no_listen,
diff --git a/net/bluetooth/hidp/sock.c b/net/bluetooth/hidp/sock.c
index 37c9d7d..90b40e2 100644
--- a/net/bluetooth/hidp/sock.c
+++ b/net/bluetooth/hidp/sock.c
@@ -224,6 +224,7 @@  static const struct proto_ops hidp_sock_ops = {
 	.getname	= sock_no_getname,
 	.sendmsg	= sock_no_sendmsg,
 	.recvmsg	= sock_no_recvmsg,
+	.unlocked_recvmsg = sock_no_unlocked_recvmsg,
 	.poll		= sock_no_poll,
 	.listen		= sock_no_listen,
 	.shutdown	= sock_no_shutdown,
diff --git a/net/bluetooth/l2cap.c b/net/bluetooth/l2cap.c
index b030125..dc73bd4 100644
--- a/net/bluetooth/l2cap.c
+++ b/net/bluetooth/l2cap.c
@@ -3907,6 +3907,7 @@  static const struct proto_ops l2cap_sock_ops = {
 	.getname	= l2cap_sock_getname,
 	.sendmsg	= l2cap_sock_sendmsg,
 	.recvmsg	= l2cap_sock_recvmsg,
+	.unlocked_recvmsg = sock_no_unlocked_recvmsg,
 	.poll		= bt_sock_poll,
 	.ioctl		= bt_sock_ioctl,
 	.mmap		= sock_no_mmap,
diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c
index 0b85e81..00b1a41 100644
--- a/net/bluetooth/rfcomm/sock.c
+++ b/net/bluetooth/rfcomm/sock.c
@@ -1092,6 +1092,7 @@  static const struct proto_ops rfcomm_sock_ops = {
 	.getname	= rfcomm_sock_getname,
 	.sendmsg	= rfcomm_sock_sendmsg,
 	.recvmsg	= rfcomm_sock_recvmsg,
+	.unlocked_recvmsg = sock_no_unlocked_recvmsg,
 	.shutdown	= rfcomm_sock_shutdown,
 	.setsockopt	= rfcomm_sock_setsockopt,
 	.getsockopt	= rfcomm_sock_getsockopt,
diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c
index 13c27f1..fda79b8 100644
--- a/net/bluetooth/sco.c
+++ b/net/bluetooth/sco.c
@@ -984,6 +984,7 @@  static const struct proto_ops sco_sock_ops = {
 	.getname	= sco_sock_getname,
 	.sendmsg	= sco_sock_sendmsg,
 	.recvmsg	= bt_sock_recvmsg,
+	.unlocked_recvmsg = sock_no_unlocked_recvmsg,
 	.poll		= bt_sock_poll,
 	.ioctl		= bt_sock_ioctl,
 	.mmap		= sock_no_mmap,
diff --git a/net/can/bcm.c b/net/can/bcm.c
index 597da4f..e0aff9e 100644
--- a/net/can/bcm.c
+++ b/net/can/bcm.c
@@ -1562,6 +1562,7 @@  static struct proto_ops bcm_ops __read_mostly = {
 	.getsockopt    = sock_no_getsockopt,
 	.sendmsg       = bcm_sendmsg,
 	.recvmsg       = bcm_recvmsg,
+	.unlocked_recvmsg = sock_no_unlocked_recvmsg,
 	.mmap          = sock_no_mmap,
 	.sendpage      = sock_no_sendpage,
 };
diff --git a/net/can/raw.c b/net/can/raw.c
index db3152d..b8fa610 100644
--- a/net/can/raw.c
+++ b/net/can/raw.c
@@ -730,6 +730,7 @@  static struct proto_ops raw_ops __read_mostly = {
 	.getsockopt    = raw_getsockopt,
 	.sendmsg       = raw_sendmsg,
 	.recvmsg       = raw_recvmsg,
+	.unlocked_recvmsg = sock_no_unlocked_recvmsg,
 	.mmap          = sock_no_mmap,
 	.sendpage      = sock_no_sendpage,
 };
diff --git a/net/core/sock.c b/net/core/sock.c
index 30d5446..6ac86d4 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1653,6 +1653,13 @@  int sock_no_connect(struct socket *sock, struct sockaddr *saddr,
 }
 EXPORT_SYMBOL(sock_no_connect);
 
+int sock_no_unlocked_recvmsg(struct kiocb *iocb, struct socket *sock,
+			     struct msghdr *msg, size_t size, int flags)
+{
+	return -EOPNOTSUPP;
+}
+EXPORT_SYMBOL(sock_no_unlocked_recvmsg);
+
 int sock_no_socketpair(struct socket *sock1, struct socket *sock2)
 {
 	return -EOPNOTSUPP;
@@ -2014,6 +2021,25 @@  int sock_common_recvmsg(struct kiocb *iocb, struct socket *sock,
 }
 EXPORT_SYMBOL(sock_common_recvmsg);
 
+int sock_common_unlocked_recvmsg(struct kiocb *iocb, struct socket *sock,
+				 struct msghdr *msg, size_t size, int flags)
+{
+	struct sock *sk = sock->sk;
+	int addr_len = 0;
+	int err;
+
+	if (sk->sk_prot->unlocked_recvmsg == NULL)
+		return -EOPNOTSUPP;
+
+	err = sk->sk_prot->unlocked_recvmsg(iocb, sk, msg, size,
+					    flags & MSG_DONTWAIT,
+					    flags & ~MSG_DONTWAIT, &addr_len);
+	if (err >= 0)
+		msg->msg_namelen = addr_len;
+	return err;
+}
+EXPORT_SYMBOL(sock_common_unlocked_recvmsg);
+
 /*
  *	Set socket options on an inet socket.
  */
diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index d01c00d..e781f01 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -974,6 +974,7 @@  static const struct proto_ops inet_dccp_ops = {
 	.getsockopt	   = sock_common_getsockopt,
 	.sendmsg	   = inet_sendmsg,
 	.recvmsg	   = sock_common_recvmsg,
+	.unlocked_recvmsg  = sock_no_unlocked_recvmsg,
 	.mmap		   = sock_no_mmap,
 	.sendpage	   = sock_no_sendpage,
 #ifdef CONFIG_COMPAT
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index 64f011c..f530e37 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -1175,6 +1175,7 @@  static struct proto_ops inet6_dccp_ops = {
 	.getsockopt	   = sock_common_getsockopt,
 	.sendmsg	   = inet_sendmsg,
 	.recvmsg	   = sock_common_recvmsg,
+	.unlocked_recvmsg  = sock_no_unlocked_recvmsg,
 	.mmap		   = sock_no_mmap,
 	.sendpage	   = sock_no_sendpage,
 #ifdef CONFIG_COMPAT
diff --git a/net/decnet/af_decnet.c b/net/decnet/af_decnet.c
index 77d4028..aa1af0b 100644
--- a/net/decnet/af_decnet.c
+++ b/net/decnet/af_decnet.c
@@ -2348,6 +2348,7 @@  static const struct proto_ops dn_proto_ops = {
 	.getsockopt =	dn_getsockopt,
 	.sendmsg =	dn_sendmsg,
 	.recvmsg =	dn_recvmsg,
+	.unlocked_recvmsg = sock_no_unlocked_recvmsg,
 	.mmap =		sock_no_mmap,
 	.sendpage =	sock_no_sendpage,
 };
diff --git a/net/econet/af_econet.c b/net/econet/af_econet.c
index 0e0254f..7891aad 100644
--- a/net/econet/af_econet.c
+++ b/net/econet/af_econet.c
@@ -765,6 +765,7 @@  static const struct proto_ops econet_ops = {
 	.getsockopt =	sock_no_getsockopt,
 	.sendmsg =	econet_sendmsg,
 	.recvmsg =	econet_recvmsg,
+	.unlocked_recvmsg = sock_no_unlocked_recvmsg,
 	.mmap =		sock_no_mmap,
 	.sendpage =	sock_no_sendpage,
 };
diff --git a/net/ieee802154/af_ieee802154.c b/net/ieee802154/af_ieee802154.c
index cd949d5..98cf2be 100644
--- a/net/ieee802154/af_ieee802154.c
+++ b/net/ieee802154/af_ieee802154.c
@@ -195,6 +195,7 @@  static const struct proto_ops ieee802154_raw_ops = {
 	.getsockopt	   = sock_common_getsockopt,
 	.sendmsg	   = ieee802154_sock_sendmsg,
 	.recvmsg	   = sock_common_recvmsg,
+	.unlocked_recvmsg  = sock_no_unlocked_recvmsg,
 	.mmap		   = sock_no_mmap,
 	.sendpage	   = sock_no_sendpage,
 #ifdef CONFIG_COMPAT
@@ -220,6 +221,7 @@  static const struct proto_ops ieee802154_dgram_ops = {
 	.getsockopt	   = sock_common_getsockopt,
 	.sendmsg	   = ieee802154_sock_sendmsg,
 	.recvmsg	   = sock_common_recvmsg,
+	.unlocked_recvmsg  = sock_no_unlocked_recvmsg,
 	.mmap		   = sock_no_mmap,
 	.sendpage	   = sock_no_sendpage,
 #ifdef CONFIG_COMPAT
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 6c30a73..4981d8e 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -866,6 +866,7 @@  const struct proto_ops inet_stream_ops = {
 	.getsockopt	   = sock_common_getsockopt,
 	.sendmsg	   = tcp_sendmsg,
 	.recvmsg	   = sock_common_recvmsg,
+	.unlocked_recvmsg  = sock_no_unlocked_recvmsg,
 	.mmap		   = sock_no_mmap,
 	.sendpage	   = tcp_sendpage,
 	.splice_read	   = tcp_splice_read,
@@ -893,6 +894,7 @@  const struct proto_ops inet_dgram_ops = {
 	.getsockopt	   = sock_common_getsockopt,
 	.sendmsg	   = inet_sendmsg,
 	.recvmsg	   = sock_common_recvmsg,
+	.unlocked_recvmsg  = sock_common_unlocked_recvmsg,
 	.mmap		   = sock_no_mmap,
 	.sendpage	   = inet_sendpage,
 #ifdef CONFIG_COMPAT
@@ -923,6 +925,7 @@  static const struct proto_ops inet_sockraw_ops = {
 	.getsockopt	   = sock_common_getsockopt,
 	.sendmsg	   = inet_sendmsg,
 	.recvmsg	   = sock_common_recvmsg,
+	.unlocked_recvmsg  = sock_no_unlocked_recvmsg,
 	.mmap		   = sock_no_mmap,
 	.sendpage	   = inet_sendpage,
 #ifdef CONFIG_COMPAT
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index ebaaa7f..fcb34bd 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -882,13 +882,34 @@  int udp_ioctl(struct sock *sk, int cmd, unsigned long arg)
 }
 EXPORT_SYMBOL(udp_ioctl);
 
+static void skb_free_datagram_locked(struct sock *sk, struct sk_buff *skb)
+{
+	lock_sock(sk);
+	skb_free_datagram(sk, skb);
+	release_sock(sk);
+}
+
+static int skb_kill_datagram_locked(struct sock *sk, struct sk_buff *skb,
+                                   unsigned int flags)
+{
+	int ret;
+	lock_sock(sk);
+	ret = skb_kill_datagram(sk, skb, flags);
+	release_sock(sk);
+	return ret;
+}
+
 /*
  * 	This should be easy, if there is something there we
  * 	return it, otherwise we block.
  */
-
-int udp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
-		size_t len, int noblock, int flags, int *addr_len)
+static int __udp_recvmsg(struct kiocb *iocb, struct sock *sk,
+			 struct msghdr *msg, size_t len, int noblock,
+			 int flags, int *addr_len,
+			 void (*free_datagram)(struct sock *,
+					       struct sk_buff *),
+			 int  (*kill_datagram)(struct sock *,
+					       struct sk_buff *, unsigned int))
 {
 	struct inet_sock *inet = inet_sk(sk);
 	struct sockaddr_in *sin = (struct sockaddr_in *)msg->msg_name;
@@ -967,23 +988,35 @@  try_again:
 		err = ulen;
 
 out_free:
-	lock_sock(sk);
-	skb_free_datagram(sk, skb);
-	release_sock(sk);
+	free_datagram(sk, skb);
 out:
 	return err;
 
 csum_copy_err:
-	lock_sock(sk);
-	if (!skb_kill_datagram(sk, skb, flags))
+	if (!kill_datagram(sk, skb, flags))
 		UDP_INC_STATS_USER(sock_net(sk), UDP_MIB_INERRORS, is_udplite);
-	release_sock(sk);
 
 	if (noblock)
 		return -EAGAIN;
 	goto try_again;
 }
 
+int udp_recvmsg(struct kiocb *iocb, struct sock *sk,
+		struct msghdr *msg, size_t len, int noblock,
+		int flags, int *addr_len)
+{
+	return __udp_recvmsg(iocb, sk, msg, len, noblock, flags, addr_len,
+			     skb_free_datagram_locked,
+			     skb_kill_datagram_locked);
+}
+
+int udp_unlocked_recvmsg(struct kiocb *iocb, struct sock *sk,
+			 struct msghdr *msg, size_t len, int noblock,
+			 int flags, int *addr_len)
+{
+	return __udp_recvmsg(iocb, sk, msg, len, noblock, flags, addr_len,
+			     skb_free_datagram, skb_kill_datagram);
+}
 
 int udp_disconnect(struct sock *sk, int flags)
 {
@@ -1580,6 +1613,7 @@  struct proto udp_prot = {
 	.getsockopt	   = udp_getsockopt,
 	.sendmsg	   = udp_sendmsg,
 	.recvmsg	   = udp_recvmsg,
+	.unlocked_recvmsg  = udp_unlocked_recvmsg,
 	.sendpage	   = udp_sendpage,
 	.backlog_rcv	   = __udp_queue_rcv_skb,
 	.hash		   = udp_lib_hash,
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index a123a32..b72c518 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -518,6 +518,7 @@  const struct proto_ops inet6_stream_ops = {
 	.getsockopt	   = sock_common_getsockopt,	/* ok		*/
 	.sendmsg	   = tcp_sendmsg,		/* ok		*/
 	.recvmsg	   = sock_common_recvmsg,	/* ok		*/
+	.unlocked_recvmsg  = sock_no_unlocked_recvmsg,
 	.mmap		   = sock_no_mmap,
 	.sendpage	   = tcp_sendpage,
 	.splice_read	   = tcp_splice_read,
@@ -544,6 +545,7 @@  const struct proto_ops inet6_dgram_ops = {
 	.getsockopt	   = sock_common_getsockopt,	/* ok		*/
 	.sendmsg	   = inet_sendmsg,		/* ok		*/
 	.recvmsg	   = sock_common_recvmsg,	/* ok		*/
+	.unlocked_recvmsg  = sock_common_unlocked_recvmsg,
 	.mmap		   = sock_no_mmap,
 	.sendpage	   = sock_no_sendpage,
 #ifdef CONFIG_COMPAT
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 7d675b8..d17db28 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -1325,6 +1325,7 @@  static const struct proto_ops inet6_sockraw_ops = {
 	.getsockopt	   = sock_common_getsockopt,	/* ok		*/
 	.sendmsg	   = inet_sendmsg,		/* ok		*/
 	.recvmsg	   = sock_common_recvmsg,	/* ok		*/
+	.unlocked_recvmsg  = sock_no_unlocked_recvmsg,
 	.mmap		   = sock_no_mmap,
 	.sendpage	   = sock_no_sendpage,
 #ifdef CONFIG_COMPAT
diff --git a/net/ipx/af_ipx.c b/net/ipx/af_ipx.c
index f1118d9..45048a0 100644
--- a/net/ipx/af_ipx.c
+++ b/net/ipx/af_ipx.c
@@ -1953,6 +1953,7 @@  static const struct proto_ops SOCKOPS_WRAPPED(ipx_dgram_ops) = {
 	.getsockopt	= ipx_getsockopt,
 	.sendmsg	= ipx_sendmsg,
 	.recvmsg	= ipx_recvmsg,
+	.unlocked_recvmsg = sock_no_unlocked_recvmsg,
 	.mmap		= sock_no_mmap,
 	.sendpage	= sock_no_sendpage,
 };
diff --git a/net/irda/af_irda.c b/net/irda/af_irda.c
index 50b43c5..7e97581 100644
--- a/net/irda/af_irda.c
+++ b/net/irda/af_irda.c
@@ -2489,6 +2489,7 @@  static const struct proto_ops SOCKOPS_WRAPPED(irda_stream_ops) = {
 	.getsockopt =	irda_getsockopt,
 	.sendmsg =	irda_sendmsg,
 	.recvmsg =	irda_recvmsg_stream,
+	.unlocked_recvmsg = sock_no_unlocked_recvmsg,
 	.mmap =		sock_no_mmap,
 	.sendpage =	sock_no_sendpage,
 };
@@ -2513,6 +2514,7 @@  static const struct proto_ops SOCKOPS_WRAPPED(irda_seqpacket_ops) = {
 	.getsockopt =	irda_getsockopt,
 	.sendmsg =	irda_sendmsg,
 	.recvmsg =	irda_recvmsg_dgram,
+	.unlocked_recvmsg = sock_no_unlocked_recvmsg,
 	.mmap =		sock_no_mmap,
 	.sendpage =	sock_no_sendpage,
 };
@@ -2537,6 +2539,7 @@  static const struct proto_ops SOCKOPS_WRAPPED(irda_dgram_ops) = {
 	.getsockopt =	irda_getsockopt,
 	.sendmsg =	irda_sendmsg_dgram,
 	.recvmsg =	irda_recvmsg_dgram,
+	.unlocked_recvmsg = sock_no_unlocked_recvmsg,
 	.mmap =		sock_no_mmap,
 	.sendpage =	sock_no_sendpage,
 };
@@ -2562,6 +2565,7 @@  static const struct proto_ops SOCKOPS_WRAPPED(irda_ultra_ops) = {
 	.getsockopt =	irda_getsockopt,
 	.sendmsg =	irda_sendmsg_ultra,
 	.recvmsg =	irda_recvmsg_dgram,
+	.unlocked_recvmsg = sock_no_unlocked_recvmsg,
 	.mmap =		sock_no_mmap,
 	.sendpage =	sock_no_sendpage,
 };
diff --git a/net/iucv/af_iucv.c b/net/iucv/af_iucv.c
index 49c15b4..c208622 100644
--- a/net/iucv/af_iucv.c
+++ b/net/iucv/af_iucv.c
@@ -1693,6 +1693,7 @@  static struct proto_ops iucv_sock_ops = {
 	.getname	= iucv_sock_getname,
 	.sendmsg	= iucv_sock_sendmsg,
 	.recvmsg	= iucv_sock_recvmsg,
+	.unlocked_recvmsg = sock_no_unlocked_recvmsg,
 	.poll		= iucv_sock_poll,
 	.ioctl		= sock_no_ioctl,
 	.mmap		= sock_no_mmap,
diff --git a/net/key/af_key.c b/net/key/af_key.c
index 4e98193..3ef1f26 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -3636,6 +3636,7 @@  static const struct proto_ops pfkey_ops = {
 	.getsockopt	=	sock_no_getsockopt,
 	.mmap		=	sock_no_mmap,
 	.sendpage	=	sock_no_sendpage,
+	.unlocked_recvmsg =	sock_no_unlocked_recvmsg,
 
 	/* Now the operations that really occur. */
 	.release	=	pfkey_release,
diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
index c45eee1..d948caf 100644
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -1115,6 +1115,7 @@  static const struct proto_ops llc_ui_ops = {
 	.getsockopt  = llc_ui_getsockopt,
 	.sendmsg     = llc_ui_sendmsg,
 	.recvmsg     = llc_ui_recvmsg,
+	.unlocked_recvmsg = sock_no_unlocked_recvmsg,
 	.mmap	     = sock_no_mmap,
 	.sendpage    = sock_no_sendpage,
 };
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index d0ff382..0d1b446 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2036,6 +2036,7 @@  static const struct proto_ops netlink_ops = {
 	.getsockopt =	netlink_getsockopt,
 	.sendmsg =	netlink_sendmsg,
 	.recvmsg =	netlink_recvmsg,
+	.unlocked_recvmsg = sock_no_unlocked_recvmsg,
 	.mmap =		sock_no_mmap,
 	.sendpage =	sock_no_sendpage,
 };
diff --git a/net/netrom/af_netrom.c b/net/netrom/af_netrom.c
index ce1a34b..3550d34 100644
--- a/net/netrom/af_netrom.c
+++ b/net/netrom/af_netrom.c
@@ -1395,6 +1395,7 @@  static const struct proto_ops nr_proto_ops = {
 	.getsockopt	=	nr_getsockopt,
 	.sendmsg	=	nr_sendmsg,
 	.recvmsg	=	nr_recvmsg,
+	.unlocked_recvmsg =	sock_no_unlocked_recvmsg,
 	.mmap		=	sock_no_mmap,
 	.sendpage	=	sock_no_sendpage,
 };
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index d3d52c6..d987e23 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2338,6 +2338,7 @@  static const struct proto_ops packet_ops_spkt = {
 	.getsockopt =	sock_no_getsockopt,
 	.sendmsg =	packet_sendmsg_spkt,
 	.recvmsg =	packet_recvmsg,
+	.unlocked_recvmsg = sock_no_unlocked_recvmsg,
 	.mmap =		sock_no_mmap,
 	.sendpage =	sock_no_sendpage,
 };
@@ -2359,6 +2360,7 @@  static const struct proto_ops packet_ops = {
 	.getsockopt =	packet_getsockopt,
 	.sendmsg =	packet_sendmsg,
 	.recvmsg =	packet_recvmsg,
+	.unlocked_recvmsg = sock_no_unlocked_recvmsg,
 	.mmap =		packet_mmap,
 	.sendpage =	sock_no_sendpage,
 };
diff --git a/net/phonet/socket.c b/net/phonet/socket.c
index 7a4ee39..248e8b2 100644
--- a/net/phonet/socket.c
+++ b/net/phonet/socket.c
@@ -327,6 +327,7 @@  const struct proto_ops phonet_dgram_ops = {
 #endif
 	.sendmsg	= pn_socket_sendmsg,
 	.recvmsg	= sock_common_recvmsg,
+	.unlocked_recvmsg = sock_no_unlocked_recvmsg,
 	.mmap		= sock_no_mmap,
 	.sendpage	= sock_no_sendpage,
 };
@@ -352,6 +353,7 @@  const struct proto_ops phonet_stream_ops = {
 #endif
 	.sendmsg	= pn_socket_sendmsg,
 	.recvmsg	= sock_common_recvmsg,
+	.unlocked_recvmsg = sock_no_unlocked_recvmsg,
 	.mmap		= sock_no_mmap,
 	.sendpage	= sock_no_sendpage,
 };
diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c
index 108ed2e..1f2e8db 100644
--- a/net/rds/af_rds.c
+++ b/net/rds/af_rds.c
@@ -376,6 +376,7 @@  static struct proto_ops rds_proto_ops = {
 	.getsockopt =	rds_getsockopt,
 	.sendmsg =	rds_sendmsg,
 	.recvmsg =	rds_recvmsg,
+	.unlocked_recvmsg = sock_no_unlocked_recvmsg,
 	.mmap =		sock_no_mmap,
 	.sendpage =	sock_no_sendpage,
 };
diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c
index e5f478c..a64c623 100644
--- a/net/rose/af_rose.c
+++ b/net/rose/af_rose.c
@@ -1532,6 +1532,7 @@  static struct proto_ops rose_proto_ops = {
 	.getsockopt	=	rose_getsockopt,
 	.sendmsg	=	rose_sendmsg,
 	.recvmsg	=	rose_recvmsg,
+	.unlocked_recvmsg =	sock_no_unlocked_recvmsg,
 	.mmap		=	sock_no_mmap,
 	.sendpage	=	sock_no_sendpage,
 };
diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index bfe493e..bf4c38a 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -766,6 +766,7 @@  static const struct proto_ops rxrpc_rpc_ops = {
 	.getsockopt	= sock_no_getsockopt,
 	.sendmsg	= rxrpc_sendmsg,
 	.recvmsg	= rxrpc_recvmsg,
+	.unlocked_recvmsg = sock_no_unlocked_recvmsg,
 	.mmap		= sock_no_mmap,
 	.sendpage	= sock_no_sendpage,
 };
diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index 6a4b190..b68d9f9 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -918,6 +918,7 @@  static const struct proto_ops inet6_seqpacket_ops = {
 	.getsockopt	   = sock_common_getsockopt,
 	.sendmsg	   = inet_sendmsg,
 	.recvmsg	   = sock_common_recvmsg,
+	.unlocked_recvmsg  = sock_no_unlocked_recvmsg,
 	.mmap		   = sock_no_mmap,
 #ifdef CONFIG_COMPAT
 	.compat_setsockopt = compat_sock_common_setsockopt,
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 60093be..8caedcb 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -895,6 +895,7 @@  static const struct proto_ops inet_seqpacket_ops = {
 	.getsockopt	   = sock_common_getsockopt,
 	.sendmsg	   = inet_sendmsg,
 	.recvmsg	   = sock_common_recvmsg,
+	.unlocked_recvmsg  = sock_no_unlocked_recvmsg,
 	.mmap		   = sock_no_mmap,
 	.sendpage	   = sock_no_sendpage,
 #ifdef CONFIG_COMPAT
diff --git a/net/socket.c b/net/socket.c
index 32db56a..dc5b976 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -690,6 +690,32 @@  static inline int __sock_recvmsg(struct kiocb *iocb, struct socket *sock,
 	return err ?: __sock_recvmsg_nosec(iocb, sock, msg, size, flags);
 }
 
+static inline int __sock_unlocked_recvmsg_nosec(struct kiocb *iocb,
+						struct socket *sock,
+						struct msghdr *msg,
+						size_t size, int flags)
+{
+	struct sock_iocb *si = kiocb_to_siocb(iocb);
+
+	si->sock = sock;
+	si->scm = NULL;
+	si->msg = msg;
+	si->size = size;
+	si->flags = flags;
+
+	return sock->ops->unlocked_recvmsg(iocb, sock, msg, size, flags);
+}
+
+static inline int __sock_unlocked_recvmsg(struct kiocb *iocb,
+					  struct socket *sock,
+					  struct msghdr *msg, size_t size,
+					  int flags)
+{
+	int err = security_socket_recvmsg(sock, msg, size, flags);
+
+	return err ?: __sock_unlocked_recvmsg_nosec(iocb, sock, msg, size, flags);
+}
+
 int sock_recvmsg(struct socket *sock, struct msghdr *msg,
 		 size_t size, int flags)
 {
@@ -720,6 +746,58 @@  static int sock_recvmsg_nosec(struct socket *sock, struct msghdr *msg,
 	return ret;
 }
 
+static int sock_unlocked_recvmsg(struct socket *sock, struct msghdr *msg,
+				 size_t size, int flags)
+{
+	struct kiocb iocb;
+	struct sock_iocb siocb;
+	int ret;
+
+	init_sync_kiocb(&iocb, NULL);
+	iocb.private = &siocb;
+	ret = __sock_unlocked_recvmsg(&iocb, sock, msg, size, flags);
+	if (-EIOCBQUEUED == ret)
+		ret = wait_on_sync_kiocb(&iocb);
+	return ret;
+}
+
+static int sock_unlocked_recvmsg_nosec(struct socket *sock, struct msghdr *msg,
+				       size_t size, int flags)
+{
+	struct kiocb iocb;
+	struct sock_iocb siocb;
+	int ret;
+
+	init_sync_kiocb(&iocb, NULL);
+	iocb.private = &siocb;
+	ret = __sock_unlocked_recvmsg_nosec(&iocb, sock, msg, size, flags);
+	if (-EIOCBQUEUED == ret)
+		ret = wait_on_sync_kiocb(&iocb);
+	return ret;
+}
+
+enum sock_recvmsg_security {
+	SOCK_RECVMSG_SEC = 0,
+	SOCK_RECVMSG_NOSEC,
+};
+
+enum sock_recvmsg_locking {
+	SOCK_LOCKED_RECVMSG = 0,
+	SOCK_UNLOCKED_RECVMSG,
+};
+
+static int (*sock_recvmsg_table[2][2])(struct socket *sock, struct msghdr *msg,
+				       size_t size, int flags) = {
+	[SOCK_RECVMSG_SEC] = {
+		[SOCK_LOCKED_RECVMSG]	= sock_recvmsg, /* The old one */
+		[SOCK_UNLOCKED_RECVMSG] = sock_unlocked_recvmsg,
+	},
+	[SOCK_RECVMSG_NOSEC] = {
+		[SOCK_LOCKED_RECVMSG]	= sock_recvmsg_nosec,
+		[SOCK_UNLOCKED_RECVMSG] = sock_unlocked_recvmsg_nosec,
+	},
+};
+
 int kernel_recvmsg(struct socket *sock, struct msghdr *msg,
 		   struct kvec *vec, size_t num, size_t size, int flags)
 {
@@ -1984,7 +2062,9 @@  out:
 }
 
 static int __sys_recvmsg(struct socket *sock, struct msghdr __user *msg,
-			 struct msghdr *msg_sys, unsigned flags, int nosec)
+			 struct msghdr *msg_sys, unsigned flags,
+			 enum sock_recvmsg_security security,
+			 enum sock_recvmsg_locking locking)
 {
 	struct compat_msghdr __user *msg_compat =
 	    (struct compat_msghdr __user *)msg;
@@ -2044,8 +2124,8 @@  static int __sys_recvmsg(struct socket *sock, struct msghdr __user *msg,
 
 	if (sock->file->f_flags & O_NONBLOCK)
 		flags |= MSG_DONTWAIT;
-	err = (nosec ? sock_recvmsg_nosec : sock_recvmsg)(sock, msg_sys,
-							  total_len, flags);
+	err = sock_recvmsg_table[security][locking](sock, msg_sys,
+						    total_len, flags);
 	if (err < 0)
 		goto out_freeiov;
 	len = err;
@@ -2092,7 +2172,8 @@  SYSCALL_DEFINE3(recvmsg, int, fd, struct msghdr __user *, msg,
 	if (!sock)
 		goto out;
 
-	err = __sys_recvmsg(sock, msg, &msg_sys, flags, 0);
+	err = __sys_recvmsg(sock, msg, &msg_sys, flags,
+			    SOCK_RECVMSG_SEC, SOCK_LOCKED_RECVMSG);
 
 	fput_light(sock->file, fput_needed);
 out:
@@ -2111,6 +2192,7 @@  int __sys_recvmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen,
 	struct mmsghdr __user *entry;
 	struct msghdr msg_sys;
 	struct timespec end_time;
+	enum sock_recvmsg_security security;
 
 	if (timeout &&
 	    poll_select_set_timeout(&end_time, timeout->tv_sec,
@@ -2123,20 +2205,25 @@  int __sys_recvmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen,
 	if (!sock)
 		return err;
 
+	lock_sock(sock->sk);
+
 	err = sock_error(sock->sk);
 	if (err)
 		goto out_put;
 
 	entry = mmsg;
 
+	security = SOCK_RECVMSG_SEC;
 	while (datagrams < vlen) {
-		/*
-		 * No need to ask LSM for more than the first datagram.
-		 */
 		err = __sys_recvmsg(sock, (struct msghdr __user *)entry,
-				    &msg_sys, flags, datagrams);
+				    &msg_sys, flags, security,
+				    SOCK_UNLOCKED_RECVMSG);
 		if (err < 0)
 			break;
+		/*
+		 * No need to ask LSM for more than the first datagram.
+		 */
+		security = SOCK_RECVMSG_NOSEC;
 		err = put_user(err, &entry->msg_len);
 		if (err)
 			break;
@@ -2165,9 +2252,8 @@  out_put:
 	fput_light(sock->file, fput_needed);
 
 	if (err == 0)
-		return datagrams;
-
-	if (datagrams != 0) {
+		err = datagrams;
+	else if (datagrams != 0) {
 		/*
 		 * We may return less entries than requested (vlen) if the
 		 * sock is non block and there aren't enough datagrams...
@@ -2182,9 +2268,11 @@  out_put:
 			sock->sk->sk_err = -err;
 		}
 
-		return datagrams;
+		err = datagrams;
 	}
 
+	release_sock(sock->sk);
+
 	return err;
 }
 
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index e8254e8..97b3f05 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -1797,6 +1797,7 @@  static const struct proto_ops msg_ops = {
 	.getsockopt	= getsockopt,
 	.sendmsg	= send_msg,
 	.recvmsg	= recv_msg,
+	.unlocked_recvmsg = sock_no_unlocked_recvmsg,
 	.mmap		= sock_no_mmap,
 	.sendpage	= sock_no_sendpage
 };
@@ -1818,6 +1819,7 @@  static const struct proto_ops packet_ops = {
 	.getsockopt	= getsockopt,
 	.sendmsg	= send_packet,
 	.recvmsg	= recv_msg,
+	.unlocked_recvmsg = sock_no_unlocked_recvmsg,
 	.mmap		= sock_no_mmap,
 	.sendpage	= sock_no_sendpage
 };
@@ -1839,6 +1841,7 @@  static const struct proto_ops stream_ops = {
 	.getsockopt	= getsockopt,
 	.sendmsg	= send_stream,
 	.recvmsg	= recv_stream,
+	.unlocked_recvmsg = sock_no_unlocked_recvmsg,
 	.mmap		= sock_no_mmap,
 	.sendpage	= sock_no_sendpage
 };
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 51ab497..9e7aa9a 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -521,6 +521,7 @@  static const struct proto_ops unix_stream_ops = {
 	.getsockopt =	sock_no_getsockopt,
 	.sendmsg =	unix_stream_sendmsg,
 	.recvmsg =	unix_stream_recvmsg,
+	.unlocked_recvmsg = sock_no_unlocked_recvmsg,
 	.mmap =		sock_no_mmap,
 	.sendpage =	sock_no_sendpage,
 };
@@ -542,6 +543,7 @@  static const struct proto_ops unix_dgram_ops = {
 	.getsockopt =	sock_no_getsockopt,
 	.sendmsg =	unix_dgram_sendmsg,
 	.recvmsg =	unix_dgram_recvmsg,
+	.unlocked_recvmsg = sock_no_unlocked_recvmsg,
 	.mmap =		sock_no_mmap,
 	.sendpage =	sock_no_sendpage,
 };
@@ -563,6 +565,7 @@  static const struct proto_ops unix_seqpacket_ops = {
 	.getsockopt =	sock_no_getsockopt,
 	.sendmsg =	unix_seqpacket_sendmsg,
 	.recvmsg =	unix_dgram_recvmsg,
+	.unlocked_recvmsg = sock_no_unlocked_recvmsg,
 	.mmap =		sock_no_mmap,
 	.sendpage =	sock_no_sendpage,
 };
diff --git a/net/x25/af_x25.c b/net/x25/af_x25.c
index 5e6c072..7c20b26 100644
--- a/net/x25/af_x25.c
+++ b/net/x25/af_x25.c
@@ -1620,6 +1620,7 @@  static const struct proto_ops SOCKOPS_WRAPPED(x25_proto_ops) = {
 	.getsockopt =	x25_getsockopt,
 	.sendmsg =	x25_sendmsg,
 	.recvmsg =	x25_recvmsg,
+	.unlocked_recvmsg = sock_no_unlocked_recvmsg,
 	.mmap =		sock_no_mmap,
 	.sendpage =	sock_no_sendpage,
 };