[v5] rps: Receive Packet Steering

From: Eric Dumazet <eric.dumazet@gmail.com>

This patch implements software receive side packet steering (RPS).  RPS
distributes the load of received packet processing across multiple CPUs.

Problem statement: Protocol processing done in the NAPI context for received
packets is serialized per device queue and becomes a bottleneck under high
packet load.  This substantially limits pps that can be achieved on a single
queue NIC and provides no scaling with multiple cores.

This solution queues packets early on in the receive path on the backlog queues
of other CPUs.   This allows protocol processing (e.g. IP and TCP) to be
performed on packets in parallel.   For each device (or NAPI instance for
a multi-queue device) a mask of CPUs is set to indicate the CPUs that can
process packets for the device. A CPU is selected on a per packet basis by
hashing contents of the packet header (the TCP or UDP 4-tuple) and using the
result to index into the CPU mask.  The IPI mechanism is used to raise
networking receive softirqs between CPUs.  This effectively emulates in
software what a multi-queue NIC can provide, but is generic requiring no device
support.

Many devices now provide a hash over the 4-tuple on a per packet basis
(Toeplitz is popular).  This patch allow drivers to set the HW reported hash
in an skb field, and that value in turn is used to index into the RPS maps.
Using the HW generated hash can avoid cache misses on the packet when
steering the packet to a remote CPU.

The CPU masks is set on a per device basis in the sysfs variable
/sys/class/net/<device>/rps_cpus.  This is a set of canonical bit maps for
each NAPI nstance of the device.  For example:

echo "0b 0b0 0b00 0b000" > /sys/class/net/eth0/rps_cpus

would set maps for four NAPI instances on eth0.

Generally, we have found this technique increases pps capabilities of a single
queue device with good CPU utilization.  Optimal settings for the CPU mask
seems to depend on architectures and cache hierarcy.  Below are some results
running 500 instances of netperf TCP_RR test with 1 byte req. and resp.
Results show cumulative transaction rate and system CPU utilization.

e1000e on 8 core Intel
    Without RPS: 90K tps at 33% CPU
    With RPS:    239K tps at 60% CPU

foredeth on 16 core AMD
    Without RPS: 103K tps at 15% CPU
    With RPS:    285K tps at 49% CPU

Caveats:
- The benefits of this patch are dependent on architecture and cache hierarchy.
Tuning the masks to get best performance is probably necessary.
- This patch adds overhead in the path for processing a single packet.  In
a lightly loaded server this overhead may eliminate the advantages of
increased parallelism, and possibly cause some relative performance degradation.
We have found that RPS masks that are cache aware (share same caches with
the interrupting CPU) mitigate much of this.
- The RPS masks can be changed dynamically, however whenever the mask is changed
this introduces the possbility of generating out of order packets.  It's
probably best not change the masks too frequently.

Signed-off-by: Tom Herbert <therbert@google.com>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Message ID	alpine.DEB.1.00.1001141353140.19018@pokey.mtv.corp.google.com
State	Not Applicable, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 49F3CB7C7E for <patchwork-incoming@ozlabs.org>; Fri, 15 Jan 2010 08:56:44 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756574Ab0ANV4e (ORCPT <rfc822;patchwork-incoming@ozlabs.org>); Thu, 14 Jan 2010 16:56:34 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1756007Ab0ANV4d (ORCPT <rfc822; netdev-outgoing>); Thu, 14 Jan 2010 16:56:33 -0500 Received: from smtp-out.google.com ([216.239.33.17]:26529 "EHLO smtp-out.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752312Ab0ANV4c (ORCPT <rfc822;netdev@vger.kernel.org>); Thu, 14 Jan 2010 16:56:32 -0500 Received: from wpaz17.hot.corp.google.com (wpaz17.hot.corp.google.com [172.24.198.81]) by smtp-out.google.com with ESMTP id o0ELuPZa003351; Thu, 14 Jan 2010 21:56:25 GMT DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=google.com; s=beta; t=1263506186; bh=s39Mv+DRmYRF7TibJWRWaRhJD+o=; h=Date:From:To:Subject:Message-ID:MIME-Version:Content-Type; b=CRuCFm02xh2TK/YMU3m20B84ad/Rw0DBSXvah3oer4fmQbzLG1MzF8eejSF04yd+V rwv24ALNDC/LQ8Kf/NeqQ== DomainKey-Signature: a=rsa-sha1; s=beta; d=google.com; c=nofws; q=dns; h=date:from:to:subject:message-id:user-agent:mime-version:content-type; b=ukV4qbBme0abJPSO+NOxc50s9zP6lonKi4gXoMVQM3KULS/FFZG6c7nK5L6MypYJE qjgKzLZ/fBFGres9D3jgQ== Received: from pokey.mtv.corp.google.com (pokey.mtv.corp.google.com [172.22.64.87]) by wpaz17.hot.corp.google.com with ESMTP id o0ELuNF2030122; Thu, 14 Jan 2010 13:56:24 -0800 Received: by pokey.mtv.corp.google.com (Postfix, from userid 60832) id 5966C1C48E1; Thu, 14 Jan 2010 13:56:23 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by pokey.mtv.corp.google.com (Postfix) with ESMTP id 468DC1C48E0; Thu, 14 Jan 2010 13:56:23 -0800 (PST) Date: Thu, 14 Jan 2010 13:56:23 -0800 (PST) From: Tom Herbert <therbert@google.com> To: davem@davemloft.net, netdev@vger.kernel.org Subject: [PATCH v5] rps: Receive Packet Steering Message-ID: <alpine.DEB.1.00.1001141353140.19018@pokey.mtv.corp.google.com> User-Agent: Alpine 1.00 (DEB 882 2007-12-20) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: <netdev.vger.kernel.org> X-Mailing-List: netdev@vger.kernel.org

[v5] rps: Receive Packet Steering

Commit Message

Comments

Patch