diff mbox

af_unix: unix_write_space() use keyed wakeups

Message ID 1288421084.2680.717.camel@edumazet-laptop
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Oct. 30, 2010, 6:44 a.m. UTC
Le vendredi 29 octobre 2010 à 21:27 +0200, Eric Dumazet a écrit :
> Le vendredi 29 octobre 2010 à 19:18 +0100, Alban Crequy a écrit :
> > Hi,
> > 
> > When a process calls the poll or select, the kernel calls (struct
> > file_operations)->poll on every file descriptor and returns a mask of
> > events which are ready. If the process is only interested by POLLIN
> > events, the mask is still computed for POLLOUT and it can be expensive.
> > For example, on Unix datagram sockets, a process running poll() with
> > POLLIN will wakes-up when the remote end call read(). This is a
> > performance regression introduced when fixing another bug by
> > 3c73419c09a5ef73d56472dbfdade9e311496e9b and
> > ec0d215f9420564fc8286dcf93d2d068bb53a07e.
> > 

unix_write_space() doesn’t yet use the wake_up_interruptible_sync_poll()
to restrict wakeups to only POLLOUT | POLLWRNORM | POLLWRBAND interested
sleepers. Same for unix_dgram_recvmsg()

In your pathological case, each time the other process calls
unix_dgram_recvmsg(), it loops on 800 pollwake() /
default_wake_function() / try_to_wake_up(), which are obviously
expensive, as you pointed out with your test program, carefully designed
to show the false sharing and O(N^2) effect :)

Once do_select() thread can _really_ block, the false sharing problem
disappears for good.

We still loop on 800 items, on each wake_up_interruptible_sync_poll()
call, so maybe we want to optimize this later, adding a global key,
ORing all items keys. I dont think its worth the added complexity, given
the biased usage of your program (800 'listeners' to one event). Is it a
real life scenario ?

Thanks

[PATCH] af_unix: use keyed wakeups

Instead of wakeup all sleepers, use wake_up_interruptible_sync_poll() to
wakeup only ones interested into writing the socket.

This patch is a specialization of commit 37e5540b3c9d (epoll keyed
wakeups: make sockets use keyed wakeups).

On a test program provided by Alan Crequy :

Before:
real    0m3.101s
user    0m0.000s
sys     0m6.104s

After:

real	0m0.211s
user	0m0.000s
sys	0m0.208s

Reported-by: Alban Crequy <alban.crequy@collabora.co.uk>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
---
 net/unix/af_unix.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Davide Libenzi Oct. 30, 2010, 3:03 p.m. UTC | #1
On Sat, 30 Oct 2010, Eric Dumazet wrote:

> This patch is a specialization of commit 37e5540b3c9d (epoll keyed
> wakeups: make sockets use keyed wakeups).

Whoops, I must have missed AF_UNIX :)
Looks good to me.


- Davide


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alban Crequy Oct. 30, 2010, 9:36 p.m. UTC | #2
Le Sat, 30 Oct 2010 08:44:44 +0200,
Eric Dumazet <eric.dumazet@gmail.com> a écrit :

> We still loop on 800 items, on each wake_up_interruptible_sync_poll()
> call, so maybe we want to optimize this later, adding a global key,
> ORing all items keys. I dont think its worth the added complexity,
> given the biased usage of your program (800 'listeners' to one
> event). Is it a real life scenario ?

Pauli Nieminen told me about his performance problem in select() so I
wrote the test program but I don't know what exactly is the real life
scenario.

> [PATCH] af_unix: use keyed wakeups
> [PATCH] af_unix: optimize unix_dgram_poll()

Your 2 patches are good for me. In my opinion the improved performances
are good enough with your 2 patches, so no need to add more complexity
unless we discover new problems.

I am preparing patches to implement multicast features on Unix
datagram+seqpacket sockets and my patches could potentially make things
worse in unix_dgram_poll() because it would need to check the receiving
queues of all multicast members. So I want unix_dgram_poll() to be fast
in the first place before proposing other changes for multicast.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Nov. 8, 2010, 9:44 p.m. UTC | #3
From: Davide Libenzi <davidel@xmailserver.org>
Date: Sat, 30 Oct 2010 08:03:15 -0700 (PDT)

> On Sat, 30 Oct 2010, Eric Dumazet wrote:
> 
>> This patch is a specialization of commit 37e5540b3c9d (epoll keyed
>> wakeups: make sockets use keyed wakeups).
> 
> Whoops, I must have missed AF_UNIX :)
> Looks good to me.

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alban Crequy Nov. 24, 2010, 12:20 a.m. UTC | #4
Le Wed, 24 Nov 2010 01:27:56 +0200,
Pauli Nieminen <pauli.nieminen@collabora.co.uk> a écrit :

> ----- Original message -----
> > Le Sat, 30 Oct 2010 08:44:44 +0200,
> > Eric Dumazet <eric.dumazet@gmail.com> a écrit :
> > 
> > > We still loop on 800 items, on each
> > > wake_up_interruptible_sync_poll() call, so maybe we want to
> > > optimize this later, adding a global key, ORing all items keys. I
> > > dont think its worth the added complexity, given the biased usage
> > > of your program (800 'listeners' to one event). Is it a real life
> > > scenario ?
> > 
> > Pauli Nieminen told me about his performance problem in select() so
> > I wrote the test program but I don't know what exactly is the real
> > life scenario.
> > 
> 
> Real world scenario is xsever that has tens client connections to
> manage. When xserver is happily sleeping at seclect call some client
> reading events/replies from server triggers in kernel looping over
> all xserver fds. xserver isn't waiting for socket to became writeable
> in ussual cases so kernel schedules back to client.

But are they SOCK_STREAM or SOCK_DGRAM sockets? The patches fix
performances with SOCK_DGRAM. If the xserver scenario is with
SOCK_STREAM sockets, your problem is probably still unfixed.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Nov. 24, 2010, 12:28 a.m. UTC | #5
Le mercredi 24 novembre 2010 à 00:20 +0000, Alban Crequy a écrit :
> Le Wed, 24 Nov 2010 01:27:56 +0200,
> Pauli Nieminen <pauli.nieminen@collabora.co.uk> a écrit :
> 
> > ----- Original message -----
> > > Le Sat, 30 Oct 2010 08:44:44 +0200,
> > > Eric Dumazet <eric.dumazet@gmail.com> a écrit :
> > > 
> > > > We still loop on 800 items, on each
> > > > wake_up_interruptible_sync_poll() call, so maybe we want to
> > > > optimize this later, adding a global key, ORing all items keys. I
> > > > dont think its worth the added complexity, given the biased usage
> > > > of your program (800 'listeners' to one event). Is it a real life
> > > > scenario ?
> > > 
> > > Pauli Nieminen told me about his performance problem in select() so
> > > I wrote the test program but I don't know what exactly is the real
> > > life scenario.
> > > 
> > 
> > Real world scenario is xsever that has tens client connections to
> > manage. When xserver is happily sleeping at seclect call some client
> > reading events/replies from server triggers in kernel looping over
> > all xserver fds. xserver isn't waiting for socket to became writeable
> > in ussual cases so kernel schedules back to client.
> 
> But are they SOCK_STREAM or SOCK_DGRAM sockets? The patches fix
> performances with SOCK_DGRAM. If the xserver scenario is with
> SOCK_STREAM sockets, your problem is probably still unfixed.
> 


It should not matter ?

commit 67426b756c4d52c51 (af_unix: use keyed wakeups ) makes
unix_write_space() call wake_up_interruptible_sync_poll() instead of
wake_up_interruptible_sync().

So it should be fixed for both STREAM/DGRAM sockets ?



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 3c95304..f33c595 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -316,7 +316,8 @@  static void unix_write_space(struct sock *sk)
 	if (unix_writable(sk)) {
 		wq = rcu_dereference(sk->sk_wq);
 		if (wq_has_sleeper(wq))
-			wake_up_interruptible_sync(&wq->wait);
+			wake_up_interruptible_sync_poll(&wq->wait,
+				POLLOUT | POLLWRNORM | POLLWRBAND);
 		sk_wake_async(sk, SOCK_WAKE_SPACE, POLL_OUT);
 	}
 	rcu_read_unlock();
@@ -1710,7 +1711,8 @@  static int unix_dgram_recvmsg(struct kiocb *iocb, struct socket *sock,
 		goto out_unlock;
 	}
 
-	wake_up_interruptible_sync(&u->peer_wait);
+	wake_up_interruptible_sync_poll(&u->peer_wait,
+					POLLOUT | POLLWRNORM | POLLWRBAND);
 
 	if (msg->msg_name)
 		unix_copy_addr(msg, skb->sk);