mbox series

[ovs-dev,v6,0/7] Output packet batching.

Message ID 1512143073-22347-1-git-send-email-i.maximets@samsung.com
Headers show
Series Output packet batching. | expand

Message

Ilya Maximets Dec. 1, 2017, 3:44 p.m. UTC
This patch-set inspired by [1] from Bhanuprakash Bodireddy.
Implementation of [1] looks very complex and introduces many pitfalls [2]
for later code modifications like possible packet stucks.

This version targeted to make simple and flexible output packet batching on
higher level without introducing and even simplifying netdev layer.

Basic testing of 'PVP with OVS bonding on phy ports' scenario shows
significant performance improvement.

Test results for time-based batching for v3:
https://mail.openvswitch.org/pipermail/ovs-dev/2017-September/338247.html

Test results for v4:
https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/339624.html

[1] [PATCH v4 0/5] netdev-dpdk: Use intermediate queue during packet transmission.
    https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/337019.html

[2] For example:
    https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/337133.html

Version 6:
	* Rebased on current master:
	  - Added new patch to refactor dp_netdev_pmd_thread structure
	    according to following suggestion:
	    https://mail.openvswitch.org/pipermail/ovs-dev/2017-November/341230.html

	  NOTE: I still prefer reverting of the padding related patch.
	        Rebase done to not block acepting of this series.
	        Revert patch and discussion here:
	        https://mail.openvswitch.org/pipermail/ovs-dev/2017-November/341153.html

	* Added comment about pmd_thread_ctx_time_update() usage.

Version 5:
	* pmd_thread_ctx_time_update() calls moved to different places to
	  call them only from dp_netdev_process_rxq_port() and main
	  polling functions:
	  	pmd_thread_main, dpif_netdev_run and dpif_netdev_execute.
	  All other functions should use cached time from pmd->ctx.now.
	  It's guaranteed to be updated at least once per polling cycle.
	* 'may_steal' patch returned to version from v3 because
	  'may_steal' in qos is a completely different variable. This
	  patch only removes 'may_steal' from netdev API.
	* 2 more usec functions added to timeval to have complete public API.
	* Checking of 'output_cnt' turned to assertion.

Version 4:
	* Rebased on current master.
	* Rebased on top of "Keep latest measured time for PMD thread."
	  (Jan Scheurich)
	* Microsecond resolution related patches integrated.
	* Time-based batching without RFC tag.
	* 'output_time' renamed to 'flush_time'. (Jan Scheurich)
	* 'flush_time' update moved to 'dp_netdev_pmd_flush_output_on_port'.
	  (Jan Scheurich)
	* 'output-max-latency' renamed to 'tx-flush-interval'.
	* Added patch for output batching statistics.

Version 3:

	* Rebased on current master.
	* Time based RFC: fixed assert on n_output_batches <= 0.

Version 2:

	* Rebased on current master.
	* Added time based batching RFC patch.
	* Fixed mixing packets with different sources in same batch.


Ilya Maximets (7):
  dpif-netdev: Refactor PMD thread structure for further extension.
  dpif-netdev: Keep latest measured time for PMD thread.
  dpif-netdev: Output packet batching.
  netdev: Remove unused may_steal.
  netdev: Remove useless cutlen.
  dpif-netdev: Time based output batching.
  dpif-netdev: Count sent packets and batches.

 lib/dpif-netdev.c     | 412 +++++++++++++++++++++++++++++++++++++-------------
 lib/netdev-bsd.c      |   6 +-
 lib/netdev-dpdk.c     |  30 ++--
 lib/netdev-dummy.c    |   6 +-
 lib/netdev-linux.c    |   8 +-
 lib/netdev-provider.h |   7 +-
 lib/netdev.c          |  12 +-
 lib/netdev.h          |   2 +-
 vswitchd/vswitch.xml  |  16 ++
 9 files changed, 349 insertions(+), 150 deletions(-)

Comments

Stokes, Ian Dec. 1, 2017, 3:56 p.m. UTC | #1
> This patch-set inspired by [1] from Bhanuprakash Bodireddy.
> Implementation of [1] looks very complex and introduces many pitfalls [2]
> for later code modifications like possible packet stucks.
> 
> This version targeted to make simple and flexible output packet batching
> on higher level without introducing and even simplifying netdev layer.
> 
> Basic testing of 'PVP with OVS bonding on phy ports' scenario shows
> significant performance improvement.
> 

Thanks for the rebase of this series Ilya, we identified this feature as a priority for the 2.9 release so I'll begin review/validation of this next week.

Thanks
Ian

> Test results for time-based batching for v3:
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-September/338247.html
> 
> Test results for v4:
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/339624.html
> 
> [1] [PATCH v4 0/5] netdev-dpdk: Use intermediate queue during packet
> transmission.
>     https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/337019.html
> 
> [2] For example:
>     https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/337133.html
> 
> Version 6:
> 	* Rebased on current master:
> 	  - Added new patch to refactor dp_netdev_pmd_thread structure
> 	    according to following suggestion:
> 	    https://mail.openvswitch.org/pipermail/ovs-dev/2017-
> November/341230.html
> 
> 	  NOTE: I still prefer reverting of the padding related patch.
> 	        Rebase done to not block acepting of this series.
> 	        Revert patch and discussion here:
> 	        https://mail.openvswitch.org/pipermail/ovs-dev/2017-
> November/341153.html
> 
> 	* Added comment about pmd_thread_ctx_time_update() usage.
> 
> Version 5:
> 	* pmd_thread_ctx_time_update() calls moved to different places to
> 	  call them only from dp_netdev_process_rxq_port() and main
> 	  polling functions:
> 	  	pmd_thread_main, dpif_netdev_run and dpif_netdev_execute.
> 	  All other functions should use cached time from pmd->ctx.now.
> 	  It's guaranteed to be updated at least once per polling cycle.
> 	* 'may_steal' patch returned to version from v3 because
> 	  'may_steal' in qos is a completely different variable. This
> 	  patch only removes 'may_steal' from netdev API.
> 	* 2 more usec functions added to timeval to have complete public API.
> 	* Checking of 'output_cnt' turned to assertion.
> 
> Version 4:
> 	* Rebased on current master.
> 	* Rebased on top of "Keep latest measured time for PMD thread."
> 	  (Jan Scheurich)
> 	* Microsecond resolution related patches integrated.
> 	* Time-based batching without RFC tag.
> 	* 'output_time' renamed to 'flush_time'. (Jan Scheurich)
> 	* 'flush_time' update moved to 'dp_netdev_pmd_flush_output_on_port'.
> 	  (Jan Scheurich)
> 	* 'output-max-latency' renamed to 'tx-flush-interval'.
> 	* Added patch for output batching statistics.
> 
> Version 3:
> 
> 	* Rebased on current master.
> 	* Time based RFC: fixed assert on n_output_batches <= 0.
> 
> Version 2:
> 
> 	* Rebased on current master.
> 	* Added time based batching RFC patch.
> 	* Fixed mixing packets with different sources in same batch.
> 
> 
> Ilya Maximets (7):
>   dpif-netdev: Refactor PMD thread structure for further extension.
>   dpif-netdev: Keep latest measured time for PMD thread.
>   dpif-netdev: Output packet batching.
>   netdev: Remove unused may_steal.
>   netdev: Remove useless cutlen.
>   dpif-netdev: Time based output batching.
>   dpif-netdev: Count sent packets and batches.
> 
>  lib/dpif-netdev.c     | 412 +++++++++++++++++++++++++++++++++++++--------
> -----
>  lib/netdev-bsd.c      |   6 +-
>  lib/netdev-dpdk.c     |  30 ++--
>  lib/netdev-dummy.c    |   6 +-
>  lib/netdev-linux.c    |   8 +-
>  lib/netdev-provider.h |   7 +-
>  lib/netdev.c          |  12 +-
>  lib/netdev.h          |   2 +-
>  vswitchd/vswitch.xml  |  16 ++
>  9 files changed, 349 insertions(+), 150 deletions(-)
> 
> --
> 2.7.4
Jan Scheurich Dec. 4, 2017, 11:21 p.m. UTC | #2
Hi Ilya,



I have retested your "Output patches batching" v6 in our standard PVP L3-VPN/VXLAN benchmark setup [1]. The configuration is a single PMD serving a physical 10G port and a VM running DPDK testpmd as IP reflector with 4 equally loaded vhostuser ports. The tests are run with 64 byte packets. Below are Mpps values averaged over four 10 second runs:



        master  patch                patch

Flows   Mpps    tx-flush-interval=0  tx-flush-interval=50

8       4.419   4.342   -1.7%        4.749    7.5%

100     4.026   3.956   -1.7%        4.281    6.3%

1000    3.630   3.632    0.1%        3.760    3.6%

2000    3.394   3.390   -0.1%        3.490    2.8%

5000    2.989   2.938   -1.7%        2.994    0.2%

10000   2.756   2.711   -1.6%        2.746   -0.4%

20000   2.641   2.598   -1.6%        2.622   -0.7%

50000   2.604   2.558   -1.8%        2.579   -1.0%

100000  2.598   2.552   -1.8%        2.572   -1.0%

500000  2.598   2.550   -1.8%        2.571   -1.0%



As expected output batching within rx bursts (tx-flush-interval=0) provides little or no benefit in this scenario. The test results reflect roughly a 1.7% performance penalty due to the tx batching overhead. This overhead is measurable, but should in my eyes not be a blocker for merging this patch series.



Interestingly, tests with time-based tx batching and a minimum flush interval of 50 microseconds show a consistent and significant performance increase for small number of flows (in the regime where EMC is effective) and a reduced penalty of 1% for many flows. I don't have a good explanation yet for this phenomenon. I would be interested to see if other benchmark results support the general positive impact of time-based tx batching on throughput also for synthetic DPDK applications in the VM. The average Ping RTT increases by 20-30 us as expected.



We will also retest the performance improvement of time-based tx batching on interrupt driven Linux kernel applications (such as iperf3).



BR, Jan



> -----Original Message-----

> From: Ilya Maximets [mailto:i.maximets@samsung.com]

> Sent: Friday, 01 December, 2017 16:44

> To: ovs-dev@openvswitch.org; Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>

> Cc: Heetae Ahn <heetae82.ahn@samsung.com>; Antonio Fischetti <antonio.fischetti@intel.com>; Eelco Chaudron

> <echaudro@redhat.com>; Ciara Loftus <ciara.loftus@intel.com>; Kevin Traynor <ktraynor@redhat.com>; Jan Scheurich

> <jan.scheurich@ericsson.com>; Ian Stokes <ian.stokes@intel.com>; Ilya Maximets <i.maximets@samsung.com>

> Subject: [PATCH v6 0/7] Output packet batching.

>

> This patch-set inspired by [1] from Bhanuprakash Bodireddy.

> Implementation of [1] looks very complex and introduces many pitfalls [2]

> for later code modifications like possible packet stucks.

>

> This version targeted to make simple and flexible output packet batching on

> higher level without introducing and even simplifying netdev layer.

>

> Basic testing of 'PVP with OVS bonding on phy ports' scenario shows

> significant performance improvement.

>

> Test results for time-based batching for v3:

> https://mail.openvswitch.org/pipermail/ovs-dev/2017-September/338247.html

>

> Test results for v4:

> https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/339624.html

>

> [1] [PATCH v4 0/5] netdev-dpdk: Use intermediate queue during packet transmission.

>     https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/337019.html

>

> [2] For example:

>     https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/337133.html

>

> Version 6:

>             * Rebased on current master:

>               - Added new patch to refactor dp_netdev_pmd_thread structure

>                 according to following suggestion:

>                 https://mail.openvswitch.org/pipermail/ovs-dev/2017-November/341230.html

>

>               NOTE: I still prefer reverting of the padding related patch.

>                     Rebase done to not block acepting of this series.

>                     Revert patch and discussion here:

>                     https://mail.openvswitch.org/pipermail/ovs-dev/2017-November/341153.html

>

>             * Added comment about pmd_thread_ctx_time_update() usage.

>

> Version 5:

>             * pmd_thread_ctx_time_update() calls moved to different places to

>               call them only from dp_netdev_process_rxq_port() and main

>               polling functions:

>                             pmd_thread_main, dpif_netdev_run and dpif_netdev_execute.

>               All other functions should use cached time from pmd->ctx.now.

>               It's guaranteed to be updated at least once per polling cycle.

>             * 'may_steal' patch returned to version from v3 because

>               'may_steal' in qos is a completely different variable. This

>               patch only removes 'may_steal' from netdev API.

>             * 2 more usec functions added to timeval to have complete public API.

>             * Checking of 'output_cnt' turned to assertion.

>

> Version 4:

>             * Rebased on current master.

>             * Rebased on top of "Keep latest measured time for PMD thread."

>               (Jan Scheurich)

>             * Microsecond resolution related patches integrated.

>             * Time-based batching without RFC tag.

>             * 'output_time' renamed to 'flush_time'. (Jan Scheurich)

>             * 'flush_time' update moved to 'dp_netdev_pmd_flush_output_on_port'.

>               (Jan Scheurich)

>             * 'output-max-latency' renamed to 'tx-flush-interval'.

>             * Added patch for output batching statistics.

>

> Version 3:

>

>             * Rebased on current master.

>             * Time based RFC: fixed assert on n_output_batches <= 0.

>

> Version 2:

>

>             * Rebased on current master.

>             * Added time based batching RFC patch.

>             * Fixed mixing packets with different sources in same batch.

>

>

> Ilya Maximets (7):

>   dpif-netdev: Refactor PMD thread structure for further extension.

>   dpif-netdev: Keep latest measured time for PMD thread.

>   dpif-netdev: Output packet batching.

>   netdev: Remove unused may_steal.

>   netdev: Remove useless cutlen.

>   dpif-netdev: Time based output batching.

>   dpif-netdev: Count sent packets and batches.

>

>  lib/dpif-netdev.c     | 412 +++++++++++++++++++++++++++++++++++++-------------

>  lib/netdev-bsd.c      |   6 +-

>  lib/netdev-dpdk.c     |  30 ++--

>  lib/netdev-dummy.c    |   6 +-

>  lib/netdev-linux.c    |   8 +-

>  lib/netdev-provider.h |   7 +-

>  lib/netdev.c          |  12 +-

>  lib/netdev.h          |   2 +-

>  vswitchd/vswitch.xml  |  16 ++

>  9 files changed, 349 insertions(+), 150 deletions(-)

>

> --

> 2.7.4
Bodireddy, Bhanuprakash Dec. 5, 2017, 1:36 p.m. UTC | #3
>I have retested your "Output patches batching" v6 in our standard PVP L3-
>VPN/VXLAN benchmark setup [1]. The configuration is a single PMD serving a
>physical 10G port and a VM running DPDK testpmd as IP reflector with 4
>equally loaded vhostuser ports. The tests are run with 64 byte packets. Below
>are Mpps values averaged over four 10 second runs:
>
>        master  patch                patch
>Flows   Mpps    tx-flush-interval=0  tx-flush-interval=50
>8       4.419   4.342   -1.7%        4.749    7.5%
>100     4.026   3.956   -1.7%        4.281    6.3%
>1000    3.630   3.632    0.1%        3.760    3.6%
>2000    3.394   3.390   -0.1%        3.490    2.8%
>5000    2.989   2.938   -1.7%        2.994    0.2%
>10000   2.756   2.711   -1.6%        2.746   -0.4%
>20000   2.641   2.598   -1.6%        2.622   -0.7%
>50000   2.604   2.558   -1.8%        2.579   -1.0%
>100000  2.598   2.552   -1.8%        2.572   -1.0%
>500000  2.598   2.550   -1.8%        2.571   -1.0%
>
>As expected output batching within rx bursts (tx-flush-interval=0) provides
>little or no benefit in this scenario. The test results reflect roughly a 1.7%
>performance penalty due to the tx batching overhead. This overhead is
>measurable, but should in my eyes not be a blocker for merging this patch
>series.

I had a similar observation when I was testing for regression with non-batching scenario.
https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/339719.html

As tx-flush-interval by default is 0 (enable instant send) and causes performance degradation,
I recommend documenting this in one of the commits and giving a link to this performance numbers(adding Tested-at tag)
so that users can tune tx-flush-interval accordingly. 

>
>Interestingly, tests with time-based tx batching and a minimum flush interval
>of 50 microseconds show a consistent and significant performance increase
>for small number of flows (in the regime where EMC is effective) and a
>reduced penalty of 1% for many flows. I don't have a good explanation yet for
>this phenomenon. I would be interested to see if other benchmark results
>support the general positive impact of time-based tx batching on throughput
>also for synthetic DPDK applications in the VM. The average Ping RTT increases
>by 20-30 us as expected.

I think this depends on tx-flush-interval and also should be documented.

- Bhanuprakash.

>
>We will also retest the performance improvement of time-based tx batching
>on interrupt driven Linux kernel applications (such as iperf3).
>
>BR, Jan
>
>> -----Original Message-----
>> From: Ilya Maximets [mailto:i.maximets@samsung.com]
>> Sent: Friday, 01 December, 2017 16:44
>> To: ovs-dev@openvswitch.org; Bhanuprakash Bodireddy
><bhanuprakash.bodireddy@intel.com>
>> Cc: Heetae Ahn <heetae82.ahn@samsung.com>; Antonio Fischetti
><antonio.fischetti@intel.com>; Eelco Chaudron
>> <echaudro@redhat.com>; Ciara Loftus <ciara.loftus@intel.com>; Kevin
>Traynor <ktraynor@redhat.com>; Jan Scheurich
>> <jan.scheurich@ericsson.com>; Ian Stokes <ian.stokes@intel.com>; Ilya
>Maximets <i.maximets@samsung.com>
>> Subject: [PATCH v6 0/7] Output packet batching.
>>
>> This patch-set inspired by [1] from Bhanuprakash Bodireddy.
>> Implementation of [1] looks very complex and introduces many pitfalls [2]
>> for later code modifications like possible packet stucks.
>>
>> This version targeted to make simple and flexible output packet batching on
>> higher level without introducing and even simplifying netdev layer.
>>
>> Basic testing of 'PVP with OVS bonding on phy ports' scenario shows
>> significant performance improvement.
>>
>> Test results for time-based batching for v3:
>> https://mail.openvswitch.org/pipermail/ovs-dev/2017-
>September/338247.html
>>
>> Test results for v4:
>> https://mail.openvswitch.org/pipermail/ovs-dev/2017-
>October/339624.html
>>
>> [1] [PATCH v4 0/5] netdev-dpdk: Use intermediate queue during packet
>transmission.
>>     https://mail.openvswitch.org/pipermail/ovs-dev/2017-
>August/337019.html
>>
>> [2] For example:
>>     https://mail.openvswitch.org/pipermail/ovs-dev/2017-
>August/337133.html
>>
>> Version 6:
>> 	* Rebased on current master:
>> 	  - Added new patch to refactor dp_netdev_pmd_thread structure
>> 	    according to following suggestion:
>> 	    https://mail.openvswitch.org/pipermail/ovs-dev/2017-
>November/341230.html
>>
>> 	  NOTE: I still prefer reverting of the padding related patch.
>> 	        Rebase done to not block acepting of this series.
>> 	        Revert patch and discussion here:
>> 	        https://mail.openvswitch.org/pipermail/ovs-dev/2017-
>November/341153.html
>>
>> 	* Added comment about pmd_thread_ctx_time_update() usage.
>>
>> Version 5:
>> 	* pmd_thread_ctx_time_update() calls moved to different places to
>> 	  call them only from dp_netdev_process_rxq_port() and main
>> 	  polling functions:
>> 	  	pmd_thread_main, dpif_netdev_run and
>dpif_netdev_execute.
>> 	  All other functions should use cached time from pmd->ctx.now.
>> 	  It's guaranteed to be updated at least once per polling cycle.
>> 	* 'may_steal' patch returned to version from v3 because
>> 	  'may_steal' in qos is a completely different variable. This
>> 	  patch only removes 'may_steal' from netdev API.
>> 	* 2 more usec functions added to timeval to have complete public API.
>> 	* Checking of 'output_cnt' turned to assertion.
>>
>> Version 4:
>> 	* Rebased on current master.
>> 	* Rebased on top of "Keep latest measured time for PMD thread."
>> 	  (Jan Scheurich)
>> 	* Microsecond resolution related patches integrated.
>> 	* Time-based batching without RFC tag.
>> 	* 'output_time' renamed to 'flush_time'. (Jan Scheurich)
>> 	* 'flush_time' update moved to
>'dp_netdev_pmd_flush_output_on_port'.
>> 	  (Jan Scheurich)
>> 	* 'output-max-latency' renamed to 'tx-flush-interval'.
>> 	* Added patch for output batching statistics.
>>
>> Version 3:
>>
>> 	* Rebased on current master.
>> 	* Time based RFC: fixed assert on n_output_batches <= 0.
>>
>> Version 2:
>>
>> 	* Rebased on current master.
>> 	* Added time based batching RFC patch.
>> 	* Fixed mixing packets with different sources in same batch.
>>
>>
>> Ilya Maximets (7):
>>   dpif-netdev: Refactor PMD thread structure for further extension.
>>   dpif-netdev: Keep latest measured time for PMD thread.
>>   dpif-netdev: Output packet batching.
>>   netdev: Remove unused may_steal.
>>   netdev: Remove useless cutlen.
>>   dpif-netdev: Time based output batching.
>>   dpif-netdev: Count sent packets and batches.
>>
>>  lib/dpif-netdev.c     | 412 +++++++++++++++++++++++++++++++++++++--
>-----------
>>  lib/netdev-bsd.c      |   6 +-
>>  lib/netdev-dpdk.c     |  30 ++--
>>  lib/netdev-dummy.c    |   6 +-
>>  lib/netdev-linux.c    |   8 +-
>>  lib/netdev-provider.h |   7 +-
>>  lib/netdev.c          |  12 +-
>>  lib/netdev.h          |   2 +-
>>  vswitchd/vswitch.xml  |  16 ++
>>  9 files changed, 349 insertions(+), 150 deletions(-)
>>
>> --
>> 2.7.4
Jan Scheurich Dec. 5, 2017, 5:26 p.m. UTC | #4
We have now repeated our earlier iperf3 tests for this patch series.
https://mail.openvswitch.org/pipermail/ovs-dev/2017-September/338247.html

We use an iperf3 server as representative for a typical IO-intensive kernel application. The iperf3 server executes in a VM with 2 vCPUs where both virtio interrupts and iperf process are pinned to the same vCPU for best performance. We run two iperf3 clients in parallel on a different server to avoid the client to become the bottleneck when enabling tx batching.

OVS       tx-flush-     iperf3      Avg. PMD    PMD       Iperf       ping -f
version   interval      Gbps        cycles/pkt  util      CPU load    avg rtt
----------------------------------------------------------------------------------
master        -         7.24        1778        46.5%      99.7%      23 us
Patch v6      0         7.18        1873        47.7%     100.0%      29 us
Patch v6     50         8.99        1108        36.3%      99.7%      38 us
Patch v6    100         ----        ----        ----      -----       88 us

In all cases the vCPU capacity of the of the server VM handing the virtio interrupts and the iperf3 server thread is the bottleneck. The TCP throughput is throttled by packets being dropped on Tx to the vhostuser port of the server VM. The Linux kernel is not fast enough to handle the interrupts and poll the incoming packets.

As expected the tx batching patch alone with tx-flush-interval=0 does not provide any benefit as it doesn't reduce the virtio interrupt rate.

Setting the tx-flush-interval to 50 microseconds immediately improves the throughput: The PMD utilization drops from 47% to 36% due to the reduced rate of write calls to the virtio kick fd. (I believe the more pronounced drop in processing cycles/pkt is an artifact of the patch. The cycles used for delayed tx to vhostuser are no longer counted as packet processing cost. To be checked in the individual patch review.)

More importantly, the iperf3 server VM can now receive 8.99 instead of 7.24 Gbit/s, an increase by 24%. I am sure that 10G line rate could be reached with vhost multi-queue in the server VM.

Compared to the v4 version of the patches, the impact on latency is now reduced a lot. Packets with an inter-arrival time larger than the configured tx-flush-interval are not affected at all. For a 50 us tx-flush-interval this means packet flows with a packet rate of up to 20 Kpps!

Hence the average RTT reported by "ping -f" experience only a small increases from 23 us on master to 38 us with tx-flush-interval=50. Only when increasing tx-flush-interval well beyond the intrinsic average inter-arrival time, it translates directly into increased latency.

Conclusion: Time-based tx batching fulfills the expectations for interrupt-driven kernel workloads, while avoiding a latency impact even on moderately loaded ports.

BR, Jan


From: Jan Scheurich
Sent: Tuesday, 05 December, 2017 00:21
To: Ilya Maximets <i.maximets@samsung.com>; ovs-dev@openvswitch.org; Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Cc: Heetae Ahn <heetae82.ahn@samsung.com>; Antonio Fischetti <antonio.fischetti@intel.com>; Eelco Chaudron <echaudro@redhat.com>; Ciara Loftus <ciara.loftus@intel.com>; Kevin Traynor <ktraynor@redhat.com>; Ian Stokes <ian.stokes@intel.com>
Subject: RE: [PATCH v6 0/7] Output packet batching.


Hi Ilya,



I have retested your "Output patches batching" v6 in our standard PVP L3-VPN/VXLAN benchmark setup [1]. The configuration is a single PMD serving a physical 10G port and a VM running DPDK testpmd as IP reflector with 4 equally loaded vhostuser ports. The tests are run with 64 byte packets. Below are Mpps values averaged over four 10 second runs:



        master  patch                patch

Flows   Mpps    tx-flush-interval=0  tx-flush-interval=50

8       4.419   4.342   -1.7%        4.749    7.5%

100     4.026   3.956   -1.7%        4.281    6.3%

1000    3.630   3.632    0.1%        3.760    3.6%

2000    3.394   3.390   -0.1%        3.490    2.8%

5000    2.989   2.938   -1.7%        2.994    0.2%

10000   2.756   2.711   -1.6%        2.746   -0.4%

20000   2.641   2.598   -1.6%        2.622   -0.7%

50000   2.604   2.558   -1.8%        2.579   -1.0%

100000  2.598   2.552   -1.8%        2.572   -1.0%

500000  2.598   2.550   -1.8%        2.571   -1.0%



As expected output batching within rx bursts (tx-flush-interval=0) provides little or no benefit in this scenario. The test results reflect roughly a 1.7% performance penalty due to the tx batching overhead. This overhead is measurable, but should in my eyes not be a blocker for merging this patch series.



Interestingly, tests with time-based tx batching and a minimum flush interval of 50 microseconds show a consistent and significant performance increase for small number of flows (in the regime where EMC is effective) and a reduced penalty of 1% for many flows. I don't have a good explanation yet for this phenomenon. I would be interested to see if other benchmark results support the general positive impact of time-based tx batching on throughput also for synthetic DPDK applications in the VM. The average Ping RTT increases by 20-30 us as expected.



We will also retest the performance improvement of time-based tx batching on interrupt driven Linux kernel applications (such as iperf3).



BR, Jan



> -----Original Message-----

> From: Ilya Maximets [mailto:i.maximets@samsung.com]

> Sent: Friday, 01 December, 2017 16:44

> To: ovs-dev@openvswitch.org<mailto:ovs-dev@openvswitch.org>; Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com<mailto:bhanuprakash.bodireddy@intel.com>>

> Cc: Heetae Ahn <heetae82.ahn@samsung.com<mailto:heetae82.ahn@samsung.com>>; Antonio Fischetti <antonio.fischetti@intel.com<mailto:antonio.fischetti@intel.com>>; Eelco Chaudron

> <echaudro@redhat.com<mailto:echaudro@redhat.com>>; Ciara Loftus <ciara.loftus@intel.com<mailto:ciara.loftus@intel.com>>; Kevin Traynor <ktraynor@redhat.com<mailto:ktraynor@redhat.com>>; Jan Scheurich

> <jan.scheurich@ericsson.com<mailto:jan.scheurich@ericsson.com>>; Ian Stokes <ian.stokes@intel.com<mailto:ian.stokes@intel.com>>; Ilya Maximets <i.maximets@samsung.com<mailto:i.maximets@samsung.com>>

> Subject: [PATCH v6 0/7] Output packet batching.

>

> This patch-set inspired by [1] from Bhanuprakash Bodireddy.

> Implementation of [1] looks very complex and introduces many pitfalls [2]

> for later code modifications like possible packet stucks.

>

> This version targeted to make simple and flexible output packet batching on

> higher level without introducing and even simplifying netdev layer.

>

> Basic testing of 'PVP with OVS bonding on phy ports' scenario shows

> significant performance improvement.

>

> Test results for time-based batching for v3:

> https://mail.openvswitch.org/pipermail/ovs-dev/2017-September/338247.html

>

> Test results for v4:

> https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/339624.html

>

> [1] [PATCH v4 0/5] netdev-dpdk: Use intermediate queue during packet transmission.

>     https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/337019.html

>

> [2] For example:

>     https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/337133.html

>

> Version 6:

>           * Rebased on current master:

>             - Added new patch to refactor dp_netdev_pmd_thread structure

>               according to following suggestion:

>               https://mail.openvswitch.org/pipermail/ovs-dev/2017-November/341230.html

>

>             NOTE: I still prefer reverting of the padding related patch.

>                   Rebase done to not block acepting of this series.

>                   Revert patch and discussion here:

>                   https://mail.openvswitch.org/pipermail/ovs-dev/2017-November/341153.html

>

>           * Added comment about pmd_thread_ctx_time_update() usage.

>

> Version 5:

>           * pmd_thread_ctx_time_update() calls moved to different places to

>             call them only from dp_netdev_process_rxq_port() and main

>             polling functions:

>                           pmd_thread_main, dpif_netdev_run and dpif_netdev_execute.

>             All other functions should use cached time from pmd->ctx.now.

>             It's guaranteed to be updated at least once per polling cycle.

>           * 'may_steal' patch returned to version from v3 because

>             'may_steal' in qos is a completely different variable. This

>             patch only removes 'may_steal' from netdev API.

>           * 2 more usec functions added to timeval to have complete public API.

>           * Checking of 'output_cnt' turned to assertion.

>

> Version 4:

>           * Rebased on current master.

>           * Rebased on top of "Keep latest measured time for PMD thread."

>             (Jan Scheurich)

>           * Microsecond resolution related patches integrated.

>           * Time-based batching without RFC tag.

>           * 'output_time' renamed to 'flush_time'. (Jan Scheurich)

>           * 'flush_time' update moved to 'dp_netdev_pmd_flush_output_on_port'.

>             (Jan Scheurich)

>           * 'output-max-latency' renamed to 'tx-flush-interval'.

>           * Added patch for output batching statistics.

>

> Version 3:

>

>           * Rebased on current master.

>           * Time based RFC: fixed assert on n_output_batches <= 0.

>

> Version 2:

>

>           * Rebased on current master.

>           * Added time based batching RFC patch.

>           * Fixed mixing packets with different sources in same batch.

>

>

> Ilya Maximets (7):

>   dpif-netdev: Refactor PMD thread structure for further extension.

>   dpif-netdev: Keep latest measured time for PMD thread.

>   dpif-netdev: Output packet batching.

>   netdev: Remove unused may_steal.

>   netdev: Remove useless cutlen.

>   dpif-netdev: Time based output batching.

>   dpif-netdev: Count sent packets and batches.

>

>  lib/dpif-netdev.c     | 412 +++++++++++++++++++++++++++++++++++++-------------

>  lib/netdev-bsd.c      |   6 +-

>  lib/netdev-dpdk.c     |  30 ++--

>  lib/netdev-dummy.c    |   6 +-

>  lib/netdev-linux.c    |   8 +-

>  lib/netdev-provider.h |   7 +-

>  lib/netdev.c          |  12 +-

>  lib/netdev.h          |   2 +-

>  vswitchd/vswitch.xml  |  16 ++

>  9 files changed, 349 insertions(+), 150 deletions(-)

>

> --

> 2.7.4
Ilya Maximets Dec. 7, 2017, 10:28 a.m. UTC | #5
Thanks a lot for testing the series.
From my side I want to share testing results for my primary target of this
patch set: PVP with DPDK guest and bonded physical ports in OVS.

Setup looks like:
----
netdev@ovs-netdev:
	br-virt:
		vhuost-user-1 (dpdkvhostuserclient, 1 rx and 1 tx queue)
		bond-patch (peer=virt-patch)
	br-bond:
		ens1f0 (dpdk)
		ens1f1 (dpdk)
		virt-patch (peer=bond-patch)
----
pmd thread numa_id 0 core_id 1:
        isolated : true
        port: ens1f0    queue-id: 0
        port: ens1f1    queue-id: 0
pmd thread numa_id 0 core_id 2:
        isolated : true
        port: vhuost-user-1    queue-id: 0
----
Bridge "br-bond"
	Port "bond"
		Interface "ens1f1"
		Interface "ens1f0"

Port "bond" is the balance-tcp OVS bond port.
----
testpmd DPDK application works in guest in macswap mode.
----
Packet size: 512B
----
Results are in Mpps averaged over three 20 second runs.
----

Results:

         master  patch                patch  
Flows    Mpps    tx-flush-interval=0  tx-flush-interval=50   

8        3.891   4.236   + 8.8%       4.210    + 8.2%
256      2.612   3.268   +25.1%       3.077    +17.8%
1024     2.509   3.144   +25.3%       2.935    +16.9%
8192     2.031   2.384   +17.3%       2.379    +17.1%
1048576  1.950   2.292   +17.5%       2.287    +17.2%


The patch-set provides constant significant improvement for all numbers of flows
in that testing scenario up to 25% for medium number of flows. Time based batching
shows less performance improvement (significantly less for medium numbers of flows)
and not preferred in that case.

Significant performance improvement achieved due to reduced number of send operations.
With patches applied OVS makes only ~2 calls to netdev_send() per input batch (one call
per port in bonding) instead of sending almost each packet separately.

Best regards, Ilya Maximets.

On 05.12.2017 20:26, Jan Scheurich wrote:
> We have now repeated our earlier iperf3 tests for this patch series.
> 
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-September/338247.html
> 
>  
> 
> We use an iperf3 server as representative for a typical IO-intensive kernel application. The iperf3 server executes in a VM with 2 vCPUs where both virtio interrupts and iperf process are pinned to the same vCPU for best performance. We run two iperf3 clients in parallel on a different server to avoid the client to become the bottleneck when enabling tx batching.
> 
>  
> 
> OVS       tx-flush-     iperf3      Avg. PMD    PMD       Iperf       ping -f
> 
> version   interval      Gbps        cycles/pkt  util      CPU load    avg rtt  
> 
> ----------------------------------------------------------------------------------
> 
> master        -         7.24        1778        46.5%      99.7%      23 us
> 
> Patch v6      0         7.18        1873        47.7%     100.0%      29 us
> 
> Patch v6     50         8.99        1108        36.3%      99.7%      38 us
> 
> Patch v6    100         ----        ----        ----      -----       88 us
> 
>  
> 
> In all cases the vCPU capacity of the of the server VM handing the virtio interrupts and the iperf3 server thread is the bottleneck. The TCP throughput is throttled by packets being dropped on Tx to the vhostuser port of the server VM. The Linux kernel is not fast enough to handle the interrupts and poll the incoming packets.
> 
>  
> 
> As expected the tx batching patch alone with tx-flush-interval=0 does not provide any benefit as it doesn’t reduce the virtio interrupt rate.
> 
>  
> 
> Setting the tx-flush-interval to 50 microseconds immediately improves the throughput: The PMD utilization drops from 47% to 36% due to the reduced rate of write calls to the virtio kick fd. (I believe the more pronounced drop in processing cycles/pkt is an artifact of the patch. The cycles used for delayed tx to vhostuser are no longer counted as packet processing cost. To be checked in the individual patch review.)
> 
>  
> 
> More importantly, the iperf3 server VM can now receive 8.99 instead of 7.24 Gbit/s, an increase by 24%. I am sure that 10G line rate could be reached with vhost multi-queue in the server VM.
> 
>  
> 
> Compared to the v4 version of the patches, the impact on latency is now reduced a lot. Packets with an inter-arrival time larger than the configured tx-flush-interval are not affected at all. For a 50 us tx-flush-interval this means packet flows with a packet rate of up to 20 Kpps!
> 
>  
> 
> Hence the average RTT reported by “ping -f” experience only a small increases from 23 us on master to 38 us with tx-flush-interval=50. Only when increasing tx-flush-interval well beyond the intrinsic average inter-arrival time, it translates directly into increased latency.
> 
>  
> 
> Conclusion: Time-based tx batching fulfills the expectations for interrupt-driven kernel workloads, while avoiding a latency impact even on moderately loaded ports.
> 
>  
> 
> BR, Jan
> 
>  
> 
>  
> 
> *From:*Jan Scheurich
> *Sent:* Tuesday, 05 December, 2017 00:21
> *To:* Ilya Maximets <i.maximets@samsung.com>; ovs-dev@openvswitch.org; Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
> *Cc:* Heetae Ahn <heetae82.ahn@samsung.com>; Antonio Fischetti <antonio.fischetti@intel.com>; Eelco Chaudron <echaudro@redhat.com>; Ciara Loftus <ciara.loftus@intel.com>; Kevin Traynor <ktraynor@redhat.com>; Ian Stokes <ian.stokes@intel.com>
> *Subject:* RE: [PATCH v6 0/7] Output packet batching.
> 
>  
> 
> Hi Ilya,
> 
>  
> 
> I have retested your "Output patches batching" v6 in our standard PVP L3-VPN/VXLAN benchmark setup [1]. The configuration is a single PMD serving a physical 10G port and a VM running DPDK testpmd as IP reflector with 4 equally loaded vhostuser ports. The tests are run with 64 byte packets. Below are Mpps values averaged over four 10 second runs:
> 
>  
> 
>         master  patch                patch  
> 
> Flows   Mpps    tx-flush-interval=0  tx-flush-interval=50   
> 
> 8       4.419   4.342   -1.7%        4.749    7.5%
> 
> 100     4.026   3.956   -1.7%        4.281    6.3%
> 
> 1000    3.630   3.632    0.1%        3.760    3.6%
> 
> 2000    3.394   3.390   -0.1%        3.490    2.8%
> 
> 5000    2.989   2.938   -1.7%        2.994    0.2%
> 
> 10000   2.756   2.711   -1.6%        2.746   -0.4%
> 
> 20000   2.641   2.598   -1.6%        2.622   -0.7%
> 
> 50000   2.604   2.558   -1.8%        2.579   -1.0%
> 
> 100000  2.598   2.552   -1.8%        2.572   -1.0%
> 
> 500000  2.598   2.550   -1.8%        2.571   -1.0%
> 
>  
> 
> As expected output batching within rx bursts (tx-flush-interval=0) provides little or no benefit in this scenario. The test results reflect roughly a 1.7% performance penalty due to the tx batching overhead. This overhead is measurable, but should in my eyes not be a blocker for merging this patch series.
> 
>  
> 
> Interestingly, tests with time-based tx batching and a minimum flush interval of 50 microseconds show a consistent and significant performance increase for small number of flows (in the regime where EMC is effective) and a reduced penalty of 1% for many flows. I don’t have a good explanation yet for this phenomenon. I would be interested to see if other benchmark results support the general positive impact of time-based tx batching on throughput also for synthetic DPDK applications in the VM. The average Ping RTT increases by 20-30 us as expected.
> 
>  
> 
> We will also retest the performance improvement of time-based tx batching on interrupt driven Linux kernel applications (such as iperf3).
> 
>  
> 
> BR, Jan
> 
>  
> 
>> -----Original Message-----
> 
>> From: Ilya Maximets [mailto:i.maximets@samsung.com]
> 
>> Sent: Friday, 01 December, 2017 16:44
> 
>> To: ovs-dev@openvswitch.org <mailto:ovs-dev@openvswitch.org>; Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com <mailto:bhanuprakash.bodireddy@intel.com>>
> 
>> Cc: Heetae Ahn <heetae82.ahn@samsung.com <mailto:heetae82.ahn@samsung.com>>; Antonio Fischetti <antonio.fischetti@intel.com <mailto:antonio.fischetti@intel.com>>; Eelco Chaudron
> 
>> <echaudro@redhat.com <mailto:echaudro@redhat.com>>; Ciara Loftus <ciara.loftus@intel.com <mailto:ciara.loftus@intel.com>>; Kevin Traynor <ktraynor@redhat.com <mailto:ktraynor@redhat.com>>; Jan Scheurich
> 
>> <jan.scheurich@ericsson.com <mailto:jan.scheurich@ericsson.com>>; Ian Stokes <ian.stokes@intel.com <mailto:ian.stokes@intel.com>>; Ilya Maximets <i.maximets@samsung.com <mailto:i.maximets@samsung.com>>
> 
>> Subject: [PATCH v6 0/7] Output packet batching.
> 
>>
> 
>> This patch-set inspired by [1] from Bhanuprakash Bodireddy.
> 
>> Implementation of [1] looks very complex and introduces many pitfalls [2]
> 
>> for later code modifications like possible packet stucks.
> 
>>
> 
>> This version targeted to make simple and flexible output packet batching on
> 
>> higher level without introducing and even simplifying netdev layer.
> 
>>
> 
>> Basic testing of 'PVP with OVS bonding on phy ports' scenario shows
> 
>> significant performance improvement.
> 
>>
> 
>> Test results for time-based batching for v3:
> 
>> https://mail.openvswitch.org/pipermail/ovs-dev/2017-September/338247.html
> 
>>
> 
>> Test results for v4:
> 
>> https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/339624.html
> 
>>
> 
>> [1] [PATCH v4 0/5] netdev-dpdk: Use intermediate queue during packet transmission.
> 
>>     https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/337019.html
> 
>>
> 
>> [2] For example:
> 
>>     https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/337133.html
> 
>>
> 
>> Version 6:
> 
>>           * Rebased on current master:
> 
>>             - Added new patch to refactor dp_netdev_pmd_thread structure
> 
>>               according to following suggestion:
> 
>>               https://mail.openvswitch.org/pipermail/ovs-dev/2017-November/341230.html
> 
>>
> 
>>             NOTE: I still prefer reverting of the padding related patch.
> 
>>                   Rebase done to not block acepting of this series.
> 
>>                   Revert patch and discussion here:
> 
>>                   https://mail.openvswitch.org/pipermail/ovs-dev/2017-November/341153.html
> 
>>
> 
>>           * Added comment about pmd_thread_ctx_time_update() usage.
> 
>>
> 
>> Version 5:
> 
>>           * pmd_thread_ctx_time_update() calls moved to different places to
> 
>>             call them only from dp_netdev_process_rxq_port() and main
> 
>>             polling functions:
> 
>>                           pmd_thread_main, dpif_netdev_run and dpif_netdev_execute.
> 
>>             All other functions should use cached time from pmd->ctx.now.
> 
>>             It's guaranteed to be updated at least once per polling cycle.
> 
>>           * 'may_steal' patch returned to version from v3 because
> 
>>             'may_steal' in qos is a completely different variable. This
> 
>>             patch only removes 'may_steal' from netdev API.
> 
>>           * 2 more usec functions added to timeval to have complete public API.
> 
>>           * Checking of 'output_cnt' turned to assertion.
> 
>>
> 
>> Version 4:
> 
>>           * Rebased on current master.
> 
>>           * Rebased on top of "Keep latest measured time for PMD thread."
> 
>>             (Jan Scheurich)
> 
>>           * Microsecond resolution related patches integrated.
> 
>>           * Time-based batching without RFC tag.
> 
>>           * 'output_time' renamed to 'flush_time'. (Jan Scheurich)
> 
>>           * 'flush_time' update moved to 'dp_netdev_pmd_flush_output_on_port'.
> 
>>             (Jan Scheurich)
> 
>>           * 'output-max-latency' renamed to 'tx-flush-interval'.
> 
>>           * Added patch for output batching statistics.
> 
>>
> 
>> Version 3:
> 
>>
> 
>>           * Rebased on current master.
> 
>>           * Time based RFC: fixed assert on n_output_batches <= 0.
> 
>>
> 
>> Version 2:
> 
>>
> 
>>           * Rebased on current master.
> 
>>           * Added time based batching RFC patch.
> 
>>           * Fixed mixing packets with different sources in same batch.
> 
>>
> 
>>
> 
>> Ilya Maximets (7):
> 
>>   dpif-netdev: Refactor PMD thread structure for further extension.
> 
>>   dpif-netdev: Keep latest measured time for PMD thread.
> 
>>   dpif-netdev: Output packet batching.
> 
>>   netdev: Remove unused may_steal.
> 
>>   netdev: Remove useless cutlen.
> 
>>   dpif-netdev: Time based output batching.
> 
>>   dpif-netdev: Count sent packets and batches.
> 
>>
> 
>>  lib/dpif-netdev.c     | 412 +++++++++++++++++++++++++++++++++++++-------------
> 
>>  lib/netdev-bsd.c      |   6 +-
> 
>>  lib/netdev-dpdk.c     |  30 ++--
> 
>>  lib/netdev-dummy.c    |   6 +-
> 
>>  lib/netdev-linux.c    |   8 +-
> 
>>  lib/netdev-provider.h |   7 +-
> 
>>  lib/netdev.c          |  12 +-
> 
>>  lib/netdev.h          |   2 +-
> 
>>  vswitchd/vswitch.xml  |  16 ++
> 
>>  9 files changed, 349 insertions(+), 150 deletions(-)
> 
>>
> 
>> --
> 
>> 2.7.4
> 
>  
>
Eelco Chaudron Dec. 7, 2017, 12:54 p.m. UTC | #6
Hi All,

I reviewed the code for this V6, however I did not do the testing as 
before as my setup is torn down at the moment.
Some small comments on "[PATCH v6 1/7] dpif-netdev: Refactor PMD thread 
structure for further extension." see reply on that email.

Other than that I would like to ack the series:

Acked-by: Eelco Chaudron <echaudro@redhat.com>


Guess the only thing missing is proper documentation for this change.

Thanks,

Eelco

On 01/12/17 16:44, Ilya Maximets wrote:
> This patch-set inspired by [1] from Bhanuprakash Bodireddy.
> Implementation of [1] looks very complex and introduces many pitfalls [2]
> for later code modifications like possible packet stucks.
>
> This version targeted to make simple and flexible output packet batching on
> higher level without introducing and even simplifying netdev layer.
>
> Basic testing of 'PVP with OVS bonding on phy ports' scenario shows
> significant performance improvement.
>
> Test results for time-based batching for v3:
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-September/338247.html
>
> Test results for v4:
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/339624.html
>
> [1] [PATCH v4 0/5] netdev-dpdk: Use intermediate queue during packet transmission.
>      https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/337019.html
>
> [2] For example:
>      https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/337133.html
>
> Version 6:
> 	* Rebased on current master:
> 	  - Added new patch to refactor dp_netdev_pmd_thread structure
> 	    according to following suggestion:
> 	    https://mail.openvswitch.org/pipermail/ovs-dev/2017-November/341230.html
>
> 	  NOTE: I still prefer reverting of the padding related patch.
> 	        Rebase done to not block acepting of this series.
> 	        Revert patch and discussion here:
> 	        https://mail.openvswitch.org/pipermail/ovs-dev/2017-November/341153.html
>
> 	* Added comment about pmd_thread_ctx_time_update() usage.
>
> Version 5:
> 	* pmd_thread_ctx_time_update() calls moved to different places to
> 	  call them only from dp_netdev_process_rxq_port() and main
> 	  polling functions:
> 	  	pmd_thread_main, dpif_netdev_run and dpif_netdev_execute.
> 	  All other functions should use cached time from pmd->ctx.now.
> 	  It's guaranteed to be updated at least once per polling cycle.
> 	* 'may_steal' patch returned to version from v3 because
> 	  'may_steal' in qos is a completely different variable. This
> 	  patch only removes 'may_steal' from netdev API.
> 	* 2 more usec functions added to timeval to have complete public API.
> 	* Checking of 'output_cnt' turned to assertion.
>
> Version 4:
> 	* Rebased on current master.
> 	* Rebased on top of "Keep latest measured time for PMD thread."
> 	  (Jan Scheurich)
> 	* Microsecond resolution related patches integrated.
> 	* Time-based batching without RFC tag.
> 	* 'output_time' renamed to 'flush_time'. (Jan Scheurich)
> 	* 'flush_time' update moved to 'dp_netdev_pmd_flush_output_on_port'.
> 	  (Jan Scheurich)
> 	* 'output-max-latency' renamed to 'tx-flush-interval'.
> 	* Added patch for output batching statistics.
>
> Version 3:
>
> 	* Rebased on current master.
> 	* Time based RFC: fixed assert on n_output_batches <= 0.
>
> Version 2:
>
> 	* Rebased on current master.
> 	* Added time based batching RFC patch.
> 	* Fixed mixing packets with different sources in same batch.
>
>
> Ilya Maximets (7):
>    dpif-netdev: Refactor PMD thread structure for further extension.
>    dpif-netdev: Keep latest measured time for PMD thread.
>    dpif-netdev: Output packet batching.
>    netdev: Remove unused may_steal.
>    netdev: Remove useless cutlen.
>    dpif-netdev: Time based output batching.
>    dpif-netdev: Count sent packets and batches.
>
>   lib/dpif-netdev.c     | 412 +++++++++++++++++++++++++++++++++++++-------------
>   lib/netdev-bsd.c      |   6 +-
>   lib/netdev-dpdk.c     |  30 ++--
>   lib/netdev-dummy.c    |   6 +-
>   lib/netdev-linux.c    |   8 +-
>   lib/netdev-provider.h |   7 +-
>   lib/netdev.c          |  12 +-
>   lib/netdev.h          |   2 +-
>   vswitchd/vswitch.xml  |  16 ++
>   9 files changed, 349 insertions(+), 150 deletions(-)
>