From patchwork Sat Oct 30 06:44:44 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 69648 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id E173CB70E0 for ; Sat, 30 Oct 2010 17:45:05 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751797Ab0J3Gow (ORCPT ); Sat, 30 Oct 2010 02:44:52 -0400 Received: from mail-wy0-f174.google.com ([74.125.82.174]:59092 "EHLO mail-wy0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751570Ab0J3Gou (ORCPT ); Sat, 30 Oct 2010 02:44:50 -0400 Received: by wyf28 with SMTP id 28so3830537wyf.19 for ; Fri, 29 Oct 2010 23:44:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:subject:from:to:cc :in-reply-to:references:content-type:date:message-id:mime-version :x-mailer:content-transfer-encoding; bh=lkW2FK6n8Uo61UDPRnewQ1pMJ4kTbIZKD1r3euA4EE4=; b=pEZovMzBottmBPqKIbNYJRJGEQ8+WzbA/kcPyGZMiyRJoA344k4s5CN8fUwl49JjbD I7siUGvxu3nsBGbRVU95ILfvA2t8NRLFrLYmViQ/leM/Xnu+bsQpprtQTcnwX0LapJJR ppKYwR0nU1QclqMsWYFHfYNH4J6WzGvCKnUoE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=subject:from:to:cc:in-reply-to:references:content-type:date :message-id:mime-version:x-mailer:content-transfer-encoding; b=pu59dpA1arfhQbpOyjihkbYUHwxD+GB8+t9kkmG0O2bMC5XTi/bicHEOr1m8bwylez GsR0QkDyVmxeI5M3YsqySx2SgHgoIebibWrDdV7YVtCiwwaFGNk2Q3hERPY+KbQtNKuT qbPK1oPPXUZ0xg22kG4XAPqCO85LJVA8nKG64= Received: by 10.216.231.227 with SMTP id l77mr162383weq.104.1288421089341; Fri, 29 Oct 2010 23:44:49 -0700 (PDT) Received: from [192.168.1.21] (162.144.72-86.rev.gaoland.net [86.72.144.162]) by mx.google.com with ESMTPS id p4sm2147268wej.4.2010.10.29.23.44.47 (version=SSLv3 cipher=RC4-MD5); Fri, 29 Oct 2010 23:44:48 -0700 (PDT) Subject: [PATCH] af_unix: unix_write_space() use keyed wakeups From: Eric Dumazet To: Alban Crequy , Davide Libenzi Cc: "David S. Miller" , Stephen Hemminger , Cyrill Gorcunov , Alexey Dobriyan , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Pauli Nieminen , Rainer Weikusat In-Reply-To: <1288380431.2680.3.camel@edumazet-laptop> References: <20101029191857.5f789d56@chocolatine.cbg.collabora.co.uk> <1288380431.2680.3.camel@edumazet-laptop> Date: Sat, 30 Oct 2010 08:44:44 +0200 Message-ID: <1288421084.2680.717.camel@edumazet-laptop> Mime-Version: 1.0 X-Mailer: Evolution 2.30.3 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Le vendredi 29 octobre 2010 à 21:27 +0200, Eric Dumazet a écrit : > Le vendredi 29 octobre 2010 à 19:18 +0100, Alban Crequy a écrit : > > Hi, > > > > When a process calls the poll or select, the kernel calls (struct > > file_operations)->poll on every file descriptor and returns a mask of > > events which are ready. If the process is only interested by POLLIN > > events, the mask is still computed for POLLOUT and it can be expensive. > > For example, on Unix datagram sockets, a process running poll() with > > POLLIN will wakes-up when the remote end call read(). This is a > > performance regression introduced when fixing another bug by > > 3c73419c09a5ef73d56472dbfdade9e311496e9b and > > ec0d215f9420564fc8286dcf93d2d068bb53a07e. > > unix_write_space() doesn’t yet use the wake_up_interruptible_sync_poll() to restrict wakeups to only POLLOUT | POLLWRNORM | POLLWRBAND interested sleepers. Same for unix_dgram_recvmsg() In your pathological case, each time the other process calls unix_dgram_recvmsg(), it loops on 800 pollwake() / default_wake_function() / try_to_wake_up(), which are obviously expensive, as you pointed out with your test program, carefully designed to show the false sharing and O(N^2) effect :) Once do_select() thread can _really_ block, the false sharing problem disappears for good. We still loop on 800 items, on each wake_up_interruptible_sync_poll() call, so maybe we want to optimize this later, adding a global key, ORing all items keys. I dont think its worth the added complexity, given the biased usage of your program (800 'listeners' to one event). Is it a real life scenario ? Thanks [PATCH] af_unix: use keyed wakeups Instead of wakeup all sleepers, use wake_up_interruptible_sync_poll() to wakeup only ones interested into writing the socket. This patch is a specialization of commit 37e5540b3c9d (epoll keyed wakeups: make sockets use keyed wakeups). On a test program provided by Alan Crequy : Before: real 0m3.101s user 0m0.000s sys 0m6.104s After: real 0m0.211s user 0m0.000s sys 0m0.208s Reported-by: Alban Crequy Signed-off-by: Eric Dumazet Cc: Davide Libenzi --- net/unix/af_unix.c | 6 ++++-- 1 files changed, 4 insertions(+), 2 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 3c95304..f33c595 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -316,7 +316,8 @@ static void unix_write_space(struct sock *sk) if (unix_writable(sk)) { wq = rcu_dereference(sk->sk_wq); if (wq_has_sleeper(wq)) - wake_up_interruptible_sync(&wq->wait); + wake_up_interruptible_sync_poll(&wq->wait, + POLLOUT | POLLWRNORM | POLLWRBAND); sk_wake_async(sk, SOCK_WAKE_SPACE, POLL_OUT); } rcu_read_unlock(); @@ -1710,7 +1711,8 @@ static int unix_dgram_recvmsg(struct kiocb *iocb, struct socket *sock, goto out_unlock; } - wake_up_interruptible_sync(&u->peer_wait); + wake_up_interruptible_sync_poll(&u->peer_wait, + POLLOUT | POLLWRNORM | POLLWRBAND); if (msg->msg_name) unix_copy_addr(msg, skb->sk);