diff mbox

[v9,net-next,5/7] net: simple poll/select low latency socket poll

Message ID 20130605103452.11172.31453.stgit@ladj378.jer.intel.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Eliezer Tamir June 5, 2013, 10:34 a.m. UTC
A very naive select/poll busy-poll support.
Add busy-polling to sock_poll().
When poll/select have nothing to report, call the low-level
sock_poll() again until we are out of time or we find something.
Right now we poll every socket once, this is suboptimal
but improves latency when the number of sockets polled is not large.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
---

 fs/select.c  |    7 +++++++
 net/socket.c |   10 +++++++++-
 2 files changed, 16 insertions(+), 1 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Eric Dumazet June 5, 2013, 1:30 p.m. UTC | #1
On Wed, 2013-06-05 at 13:34 +0300, Eliezer Tamir wrote:
> A very naive select/poll busy-poll support.
> Add busy-polling to sock_poll().
> When poll/select have nothing to report, call the low-level
> sock_poll() again until we are out of time or we find something.
> Right now we poll every socket once, this is suboptimal
> but improves latency when the number of sockets polled is not large.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> Tested-by: Willem de Bruijn <willemb@google.com>
> Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
> ---

I am a bit uneasy with this one, because an applicatio polling() on one
thousand file descriptors using select()/poll(), will call sk_poll_ll()
one thousand times.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eliezer Tamir June 5, 2013, 1:41 p.m. UTC | #2
On 05/06/2013 16:30, Eric Dumazet wrote:
> On Wed, 2013-06-05 at 13:34 +0300, Eliezer Tamir wrote:
>> A very naive select/poll busy-poll support.
>> Add busy-polling to sock_poll().
>> When poll/select have nothing to report, call the low-level
>> sock_poll() again until we are out of time or we find something.
>> Right now we poll every socket once, this is suboptimal
>> but improves latency when the number of sockets polled is not large.
>>
>> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
>> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
>> Tested-by: Willem de Bruijn <willemb@google.com>
>> Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
>> ---
>
> I am a bit uneasy with this one, because an applicatio polling() on one
> thousand file descriptors using select()/poll(), will call sk_poll_ll()
> one thousand times.

But we call sk_poll_ll() with nonblock set, so it will only test once
for each socket and not loop.

I think this is not as bad as it sounds.
We still honor the time limit on how long to poll.

When we busy-wait on a single socket we call sk_poll_ll() repeatedly
until we timeout or we have something to report.

Here on the other hand, we sk_poll_ll() once for each file, so we loop 
on the files. We moved the loop from inside sk_poll_ll to select/poll.

I also plan on improving this this in the next stage.

The plan is to give control on whether sk_poll_ll is called to
select/poll/epoll, so the caller has even more control.

-Eliezer


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Laight June 5, 2013, 1:49 p.m. UTC | #3
> I am a bit uneasy with this one, because an applicatio polling() on one

> thousand file descriptors using select()/poll(), will call sk_poll_ll()

> one thousand times.


Anything calling poll() on 1000 fds probably has performance
issues already! Which is why kevent schemes have been added.

At least the Linux code doesn't use a linked list for
the fd -> 'struct file' map which made poll() O(n^2),
and getting to that number of open fds O(n^3) on
some versions of SVR4.

	David
Eric Dumazet June 5, 2013, 1:56 p.m. UTC | #4
On Wed, 2013-06-05 at 16:41 +0300, Eliezer Tamir wrote:
> On 05/06/2013 16:30, Eric Dumazet wrote:
> > On Wed, 2013-06-05 at 13:34 +0300, Eliezer Tamir wrote:
> >> A very naive select/poll busy-poll support.
> >> Add busy-polling to sock_poll().
> >> When poll/select have nothing to report, call the low-level
> >> sock_poll() again until we are out of time or we find something.
> >> Right now we poll every socket once, this is suboptimal
> >> but improves latency when the number of sockets polled is not large.
> >>
> >> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> >> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> >> Tested-by: Willem de Bruijn <willemb@google.com>
> >> Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
> >> ---
> >
> > I am a bit uneasy with this one, because an applicatio polling() on one
> > thousand file descriptors using select()/poll(), will call sk_poll_ll()
> > one thousand times.
> 
> But we call sk_poll_ll() with nonblock set, so it will only test once
> for each socket and not loop.
> 
> I think this is not as bad as it sounds.
> We still honor the time limit on how long to poll.
> 
> When we busy-wait on a single socket we call sk_poll_ll() repeatedly
> until we timeout or we have something to report.
> 
> Here on the other hand, we sk_poll_ll() once for each file, so we loop 
> on the files. We moved the loop from inside sk_poll_ll to select/poll.
> 
> I also plan on improving this this in the next stage.
> 
> The plan is to give control on whether sk_poll_ll is called to
> select/poll/epoll, so the caller has even more control.

This looks quite easy, by adding in include/uapi/asm-generic/poll.h

#define POLL_LL 0x8000

And do the sk_poll_ll() call only if flag is set.

I do not think we have to support select(), as its legacy interface, and
people wanting ll should really use epoll() or poll().



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet June 5, 2013, 2 p.m. UTC | #5
On Wed, 2013-06-05 at 14:49 +0100, David Laight wrote:
> > I am a bit uneasy with this one, because an applicatio polling() on one
> > thousand file descriptors using select()/poll(), will call sk_poll_ll()
> > one thousand times.
> 
> Anything calling poll() on 1000 fds probably has performance
> issues already! Which is why kevent schemes have been added.
> 

You'll be surprised but many applications still use poll(),
and not epoll() or whatever OS specific interface, because they
are non portable or buggy. (I played with FreeBSD and kevent crashed
easily at 64,000 fds, while the epoll() version reached 4,000,000 fds
with no problems)




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet June 5, 2013, 2:17 p.m. UTC | #6
On Wed, 2013-06-05 at 06:56 -0700, Eric Dumazet wrote:

> This looks quite easy, by adding in include/uapi/asm-generic/poll.h
> 
> #define POLL_LL 0x8000
> 
> And do the sk_poll_ll() call only if flag is set.
> 
> I do not think we have to support select(), as its legacy interface, and
> people wanting ll should really use epoll() or poll().

Alternatively, add a per socket flag to enable/disable ll

This global enable assumes the application owns the host anyway.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eliezer Tamir June 5, 2013, 2:56 p.m. UTC | #7
On 05/06/2013 17:17, Eric Dumazet wrote:
> On Wed, 2013-06-05 at 06:56 -0700, Eric Dumazet wrote:
>
>> This looks quite easy, by adding in include/uapi/asm-generic/poll.h
>>
>> #define POLL_LL 0x8000
>>
>> And do the sk_poll_ll() call only if flag is set.
>>
>> I do not think we have to support select(), as its legacy interface, and
>> people wanting ll should really use epoll() or poll().
>
> Alternatively, add a per socket flag to enable/disable ll
>
> This global enable assumes the application owns the host anyway.
>

I plan on adding a socket option in the next stage.
I'm also testing a patch much like you described with a poll flag.
Select/poll set it to indicate that they want to busy poll.
Sock_poll sets it to indicate that this socket can (at the moment)
busy-poll.

If you think the way things are done right now is unacceptable,
even as an experimental feature, I would much prefer to drop this patch
and have the rest applied rather then bring in new code that is not 
fully tested at this stage.

-Eliezer

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet June 5, 2013, 3:20 p.m. UTC | #8
On Wed, 2013-06-05 at 16:41 +0300, Eliezer Tamir wrote:
> On 05/06/2013 16:30, Eric Dumazet wrote:

> > I am a bit uneasy with this one, because an applicatio polling() on one
> > thousand file descriptors using select()/poll(), will call sk_poll_ll()
> > one thousand times.
> 
> But we call sk_poll_ll() with nonblock set, so it will only test once
> for each socket and not loop.
> 
> I think this is not as bad as it sounds.
> We still honor the time limit on how long to poll.

We still call ndo_ll_poll() a thousand times, and probably do a
spinlock/unlock a thousand times in the driver.

I would definitely be convinced if you give us some performance numbers
of a poll() on a thousand tcp sockets for example.

See my following mail about sk_poll_ll()

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eliezer Tamir June 5, 2013, 3:47 p.m. UTC | #9
On 05/06/2013 18:20, Eric Dumazet wrote:
> On Wed, 2013-06-05 at 16:41 +0300, Eliezer Tamir wrote:
>> On 05/06/2013 16:30, Eric Dumazet wrote:
>
>>> I am a bit uneasy with this one, because an applicatio polling() on one
>>> thousand file descriptors using select()/poll(), will call sk_poll_ll()
>>> one thousand times.
>>
>> But we call sk_poll_ll() with nonblock set, so it will only test once
>> for each socket and not loop.
>>
>> I think this is not as bad as it sounds.
>> We still honor the time limit on how long to poll.
>
> We still call ndo_ll_poll() a thousand times, and probably do a
> spinlock/unlock a thousand times in the driver.
>
> I would definitely be convinced if you give us some performance numbers
> of a poll() on a thousand tcp sockets for example.

So with 1000 sockets this is defiantly not a win
sockperf with 1000 udp sockets

sysctl   50         0
select 178.5 us / 130.0 us
poll   188.6 us / 130.0 us

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/select.c b/fs/select.c
index 8c1c96c..f116bf0 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -27,6 +27,7 @@ 
 #include <linux/rcupdate.h>
 #include <linux/hrtimer.h>
 #include <linux/sched/rt.h>
+#include <net/ll_poll.h>
 
 #include <asm/uaccess.h>
 
@@ -400,6 +401,7 @@  int do_select(int n, fd_set_bits *fds, struct timespec *end_time)
 	poll_table *wait;
 	int retval, i, timed_out = 0;
 	unsigned long slack = 0;
+	cycles_t ll_time = ll_end_time();
 
 	rcu_read_lock();
 	retval = max_select_fd(n, fds);
@@ -486,6 +488,8 @@  int do_select(int n, fd_set_bits *fds, struct timespec *end_time)
 			break;
 		}
 
+		if (can_poll_ll(ll_time))
+			continue;
 		/*
 		 * If this is the first loop and we have a timeout
 		 * given, then we convert to ktime_t and set the to
@@ -750,6 +754,7 @@  static int do_poll(unsigned int nfds,  struct poll_list *list,
 	ktime_t expire, *to = NULL;
 	int timed_out = 0, count = 0;
 	unsigned long slack = 0;
+	cycles_t ll_time = ll_end_time();
 
 	/* Optimise the no-wait case */
 	if (end_time && !end_time->tv_sec && !end_time->tv_nsec) {
@@ -795,6 +800,8 @@  static int do_poll(unsigned int nfds,  struct poll_list *list,
 		if (count || timed_out)
 			break;
 
+		if (can_poll_ll(ll_time))
+			continue;
 		/*
 		 * If this is the first loop and we have a timeout
 		 * given, then we convert to ktime_t and set the to
diff --git a/net/socket.c b/net/socket.c
index 721f4e7..c34dad0 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1148,13 +1148,21 @@  EXPORT_SYMBOL(sock_create_lite);
 /* No kernel lock held - perfect */
 static unsigned int sock_poll(struct file *file, poll_table *wait)
 {
+	unsigned int poll_result;
 	struct socket *sock;
 
 	/*
 	 *      We can't return errors to poll, so it's either yes or no.
 	 */
 	sock = file->private_data;
-	return sock->ops->poll(file, sock, wait);
+
+	poll_result = sock->ops->poll(file, sock, wait);
+
+	if (wait && !(poll_result & wait->_key) &&
+		sk_valid_ll(sock->sk) && sk_poll_ll(sock->sk, 1))
+			poll_result = sock->ops->poll(file, sock, NULL);
+
+	return poll_result;
 }
 
 static int sock_mmap(struct file *file, struct vm_area_struct *vma)