[net-next,0/2] sunvnet: Packet processing in non-interrupt context.

From: Sowmini Varadhan <sowmini.varadhan@oracle.com>

On (10/01/14 16:25), David Miller wrote:
> 
> The limit is by default 64 packets, it won't matter.
> 
> I think you're overplaying the things that block use of NAPI, how
> about implementing it properly, using netif_gso_receive() and proper
> RCU accesses, and coming back with some real performance measurements?

I hope I did not give the impression that I've cut some corners and
did not do adequate testing to get here, because that is not the case.

I dont know what s/netif_skb_receive/napi_gro_receive has to do with it -  
but I resurrected my napi prototype, caught up with Jumbo MTU patc,
and replaced netif_receive_skb with napi_gro_receive.

The patch is attached to the end of this email. "Real performance
measurements" are below.

Afaict, the patch is quite "proper" - I was following
   http://www.linuxfoundation.org/collaborate/workgroups/networking/napi -
and the patch even goes to a lot of trouble to avoid sending needless
ldc messages arising from some napi-imposed budget. Here's the perf
data. Remember that packet size is 1500 bytes, so, e.g., 2 Gbps is approx
167k pps. Also the baseline perf today (without napi) is 1 - 1.3 Gbps.

     budget      iperf throughput
      64           336 Mbps
     128           556 Mbps
     512           398 Mbps

If I over-budget to 2048, and force my vnet_poll() to lie by returning
`budget', I can get an iperf throughput of approx 2.2 - 2.3 Gbps
for 1500 byte packets i.e., 167k pps. Yes, I violate NAPI rules in doing
this, and from reading the code, this forces me to a non-polling, 
pure-softirq mode. But this is also the best number I can get.

And as for mpstat output it comes out wit 100% of the softirqs on
2 cpus- something like this:

CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
all    0.00    0.00    0.57    0.06    0.00   12.67    0.00    0.00   86.70
0    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
1    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
2    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
3    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
4    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
5    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
6    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
7    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
8    0.00    0.00    1.00    0.00    0.00    0.00    0.00    0.00   99.00
9    0.00    0.00    0.00    0.00    0.00  100.00    0.00    0.00    0.00
10    0.00    0.00    1.98    0.00    0.00    0.00    0.00    0.00   98.02
11    0.00    0.00    0.99    0.00    0.00    0.00    0.00    0.00   99.01
12    0.00    0.00    0.00    0.00    0.00  100.00    0.00    0.00    0.00
13    0.00    0.00    3.00    0.00    0.00    0.00    0.00    0.00   97.00
14    0.00    0.00    2.56    0.00    0.00    0.00    0.00    0.00   97.44
15    0.00    0.00    1.00    1.00    0.00    0.00    0.00    0.00   98.00

Whereas with the workq, the load was spread nicely across multiple cpus.
I can share "Real performance data" for that as well, if you are curious.

Some differences between sunvnet-ldc and the typical network driver
that might be causing the perf drop here:

- The biggest benefit of NAPI is that it permits the reading of multiple
  packets in the context of a single interrupt, but the ldc/sunvnet infra
  already does that anyway. So the extra polling offered by NAPI does
  not have a significant benefit for my test- I can just as easily
  achieve load-spreading and fare-share in a non-interrupt context with
  a workq/kthread?

- But the VDC driver is also different from the typical driver in the
  "STOPPED" message- usually drivers only get signalled when the producer
  publishes data, the consumer does not send any signal back to producer,
  though the VDC driver does the latter. I've had to add more state-tracking
  code to get around this. 

Note that I am *not* attempting to fix the vnet race condition here-
that one is a pre-existing condition that I caught by merely reading
the code (I can easily look the other way), and my patch does not make
it worse.  Let's discuss that one later.  

NAPI Patch follows. Please tell me what's improper about it.

---------------------------------------------------------------------

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Message ID	20141002201203.GA6001@oracle.com
State	RFC, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 84E0414017B for <patchwork-incoming@ozlabs.org>; Fri, 3 Oct 2014 06:12:20 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752784AbaJBUMO (ORCPT <rfc822;patchwork-incoming@ozlabs.org>); Thu, 2 Oct 2014 16:12:14 -0400 Received: from aserp1040.oracle.com ([141.146.126.69]:44244 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751971AbaJBUMM (ORCPT <rfc822;netdev@vger.kernel.org>); Thu, 2 Oct 2014 16:12:12 -0400 Received: from acsinet22.oracle.com (acsinet22.oracle.com [141.146.126.238]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id s92KC95K021165 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 2 Oct 2014 20:12:10 GMT Received: from aserz7021.oracle.com (aserz7021.oracle.com [141.146.126.230]) by acsinet22.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id s92KC9qb000715 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 2 Oct 2014 20:12:09 GMT Received: from abhmp0004.oracle.com (abhmp0004.oracle.com [141.146.116.10]) by aserz7021.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id s92KC93Y001091; Thu, 2 Oct 2014 20:12:09 GMT Received: from oracle.com (/10.154.159.41) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 02 Oct 2014 13:12:08 -0700 Date: Thu, 2 Oct 2014 16:12:03 -0400 From: Sowmini Varadhan <sowmini.varadhan@oracle.com> To: David Miller <davem@davemloft.net> Cc: raghuram.kothakota@oracle.com, netdev@vger.kernel.org Subject: Re: [PATCH net-next 0/2] sunvnet: Packet processing in non-interrupt context. Message-ID: <20141002201203.GA6001@oracle.com> References: <542C5C37.1040409@oracle.com> <20141001.161508.1823792090990427608.davem@davemloft.net> <20141001202315.GN17706@oracle.com> <20141001.162529.2246298941833907545.davem@davemloft.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141001.162529.2246298941833907545.davem@davemloft.net> User-Agent: Mutt/1.5.21 (2010-09-15) X-Source-IP: acsinet22.oracle.com [141.146.126.238] Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: <netdev.vger.kernel.org> X-Mailing-List: netdev@vger.kernel.org

[net-next,0/2] sunvnet: Packet processing in non-interrupt context.

Commit Message

Comments

Patch