From patchwork Sat May 9 21:24:58 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Willem de Bruijn X-Patchwork-Id: 470386 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 7D15B1400A0 for ; Sun, 10 May 2015 07:25:23 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.b=hOx8MCUo; dkim-atps=neutral Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752895AbbEIVZS (ORCPT ); Sat, 9 May 2015 17:25:18 -0400 Received: from mail-yh0-f51.google.com ([209.85.213.51]:35868 "EHLO mail-yh0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752602AbbEIVZI (ORCPT ); Sat, 9 May 2015 17:25:08 -0400 Received: by yhrr66 with SMTP id r66so27625646yhr.3 for ; Sat, 09 May 2015 14:25:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=ozfL+KeRuYAoMFrgJTKd/IoUyrM2ZPmJm/E9cn0q7x4=; b=hOx8MCUoekx6kFNhehT9dToLriC32xBbDM22qEWvwGAIo6NfWgNFw6Xy+avP80DR2o NvkmKYNUrNano2F7CcrgQggV0YkCQ06lMSGt2ORUV7yy5+h3KSthDUchp2s2RXE2I98V dZSRvwAJ8pqVIgSKVp9yQCOozDVz2QR/vifkECUqRZXDWhYe21TBllP7+SPxbji7arvn KYcDykwnCIndOsuOtWAABqZxLelK/jbg4lrdy8v4ptofLgO4lquOreh0b54g5qn2FwSu tlDbXwEtnGt/R7+Ys02T8d4TeovzuH0hFH32ghwgvy0ck5uln1hB7Zj6V/CGG8Sc9+6B BSjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=ozfL+KeRuYAoMFrgJTKd/IoUyrM2ZPmJm/E9cn0q7x4=; b=Ab6zBm9kgvvwuu3h4Ehfvt3vzzb7S7JMhejeNbHVTQ+V6ZmLcuFwwJuT5zAYcKNuuP /C3LiqYqSTjFFb4xfVyiu6Zrlm16aq0VpepogIpmVvKiFYB8FbstEll0Db7EH1vwdmyY Z9qMZOJVzXHtEU7sNstpb21+hC1Za4I3WRzlngJc3rsH9HabuBsPiEFrA05uwr6UwHRB sfeX5NxfqJb0cenjjod37WuCNS4LNsL1I1CXUW7VkLDy0+Wu0kfDFF/1T3lJzYebd1c7 FrbPRGE/VwXETsMsu0Um07qyk3mc6kz6TLfBvW1Wnj6A66sy9OUeLxItbI9N5NzUu41s R2pA== X-Gm-Message-State: ALoCoQkMJY1zpRe4GLuaVESIN1itDIilvKfKjF9imMGzU4Br9iKyUBT/6Trvar5MqSuoXuhArXnf X-Received: by 10.236.11.111 with SMTP id 75mr3887408yhw.14.1431206707624; Sat, 09 May 2015 14:25:07 -0700 (PDT) Received: from gopher.nyc.corp.google.com ([172.26.106.37]) by mx.google.com with ESMTPSA id a11sm8072304yha.15.2015.05.09.14.25.06 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sat, 09 May 2015 14:25:07 -0700 (PDT) From: Willem de Bruijn X-Google-Original-From: Willem de Bruijn To: netdev@vger.kernel.org Cc: davem@davemloft.net, eric.dumazet@gmail.com, david.laight@aculab.com, Willem de Bruijn Subject: [PATCH net-next v2 3/6] packet: rollover only to socket with headroom Date: Sat, 9 May 2015 17:24:58 -0400 Message-Id: <1431206701-5019-4-git-send-email-willemb@google.com> X-Mailer: git-send-email 2.2.0.rc0.207.ga3a616c In-Reply-To: <1431206701-5019-1-git-send-email-willemb@google.com> References: <1431206701-5019-1-git-send-email-willemb@google.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Willem de Bruijn Only migrate flows to sockets that have sufficient headroom, where sufficient is defined as having at least 25% empty space. The kernel has three different buffer types: a regular socket, a ring with frames (TPACKET_V[12]) or a ring with blocks (TPACKET_V3). The latter two do not expose a read pointer to the kernel, so headroom is not computed easily. All three needs a different implementation to estimate free space. Tested: Ran bench_rollover for 10 sec with 1.5 Mpps of single flow input. bench_rollover has as many sockets as there are NIC receive queues in the system. Each socket is owned by a process that is pinned to one of the receive cpus. RFS is disabled. RPS is enabled with an identity mapping (cpu x -> cpu x), to count drops with softnettop. lpbb5:/export/hda3/willemb# ./bench_rollover -r -l 1000 -s Press [Enter] to exit cpu rx rx.k drop.k rollover r.huge r.failed 0 16 16 0 0 0 0 1 21 21 0 0 0 0 2 5227502 5227502 0 0 0 0 3 18 18 0 0 0 0 4 6083289 6083289 0 5227496 0 0 5 22 22 0 0 0 0 6 21 21 0 0 0 0 7 9 9 0 0 0 0 Signed-off-by: Willem de Bruijn --- net/packet/af_packet.c | 74 +++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 58 insertions(+), 16 deletions(-) diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index f8ec909..fb421a8 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -1234,27 +1234,68 @@ static void packet_free_pending(struct packet_sock *po) free_percpu(po->tx_ring.pending_refcnt); } -static bool packet_rcv_has_room(struct packet_sock *po, struct sk_buff *skb) +#define ROOM_POW_OFF 2 +#define ROOM_NONE 0x0 +#define ROOM_LOW 0x1 +#define ROOM_NORMAL 0x2 + +static bool __tpacket_has_room(struct packet_sock *po, int pow_off) +{ + int idx, len; + + len = po->rx_ring.frame_max + 1; + idx = po->rx_ring.head; + if (pow_off) + idx += len >> pow_off; + if (idx >= len) + idx -= len; + return packet_lookup_frame(po, &po->rx_ring, idx, TP_STATUS_KERNEL); +} + +static bool __tpacket_v3_has_room(struct packet_sock *po, int pow_off) +{ + int idx, len; + + len = po->rx_ring.prb_bdqc.knum_blocks; + idx = po->rx_ring.prb_bdqc.kactive_blk_num; + if (pow_off) + idx += len >> pow_off; + if (idx >= len) + idx -= len; + return prb_lookup_block(po, &po->rx_ring, idx, TP_STATUS_KERNEL); +} + +static int packet_rcv_has_room(struct packet_sock *po, struct sk_buff *skb) { struct sock *sk = &po->sk; - bool has_room; + int ret = ROOM_NONE; - if (po->prot_hook.func != tpacket_rcv) - return (atomic_read(&sk->sk_rmem_alloc) + skb->truesize) - <= sk->sk_rcvbuf; + if (po->prot_hook.func != tpacket_rcv) { + int avail = sk->sk_rcvbuf - atomic_read(&sk->sk_rmem_alloc) + - skb->truesize; + if (avail > (sk->sk_rcvbuf >> ROOM_POW_OFF)) + return ROOM_NORMAL; + else if (avail > 0) + return ROOM_LOW; + else + return ROOM_NONE; + } spin_lock(&sk->sk_receive_queue.lock); - if (po->tp_version == TPACKET_V3) - has_room = prb_lookup_block(po, &po->rx_ring, - po->rx_ring.prb_bdqc.kactive_blk_num, - TP_STATUS_KERNEL); - else - has_room = packet_lookup_frame(po, &po->rx_ring, - po->rx_ring.head, - TP_STATUS_KERNEL); + if (po->tp_version == TPACKET_V3) { + if (__tpacket_v3_has_room(po, ROOM_POW_OFF)) + ret = ROOM_NORMAL; + else if (__tpacket_v3_has_room(po, 0)) + ret = ROOM_LOW; + } else { + if (__tpacket_has_room(po, ROOM_POW_OFF)) + ret = ROOM_NORMAL; + else if (__tpacket_has_room(po, 0)) + ret = ROOM_LOW; + } spin_unlock(&sk->sk_receive_queue.lock); - return has_room; + return ret; } static void packet_sock_destruct(struct sock *sk) @@ -1325,12 +1366,13 @@ static unsigned int fanout_demux_rollover(struct packet_fanout *f, unsigned int i, j; po = pkt_sk(f->arr[idx]); - if (try_self && packet_rcv_has_room(po, skb)) + if (try_self && packet_rcv_has_room(po, skb) != ROOM_NONE) return idx; i = j = min_t(int, po->rollover->sock, num - 1); do { - if (i != idx && packet_rcv_has_room(pkt_sk(f->arr[i]), skb)) { + if (i != idx && + packet_rcv_has_room(pkt_sk(f->arr[i]), skb) == ROOM_NORMAL) { if (i != j) po->rollover->sock = i; return i;