Message ID | 20200924020516.256444-1-yang_y_yi@163.com |
---|---|
State | Deferred |
Headers | show |
Series | [ovs-dev,v3] userspace: fix bad UDP performance issue of veth | expand |
Bleep bloop. Greetings yang_y_yi, I am a robot and I have tried out your patch. Thanks for your contribution. I encountered some error that I wasn't expecting. See the details below. checkpatch: WARNING: Line is 80 characters long (recommended limit is 79) #203 FILE: Documentation/howto/userspace-udp-performance-tunning.rst:48: iperf3: OUT OF ORDER - incoming packet = 14 and received packet = 123 AND SP = 5 WARNING: Line is 80 characters long (recommended limit is 79) #204 FILE: Documentation/howto/userspace-udp-performance-tunning.rst:49: iperf3: OUT OF ORDER - incoming packet = 15 and received packet = 125 AND SP = 5 WARNING: Line is 80 characters long (recommended limit is 79) #205 FILE: Documentation/howto/userspace-udp-performance-tunning.rst:50: iperf3: OUT OF ORDER - incoming packet = 78 and received packet = 137 AND SP = 5 WARNING: Line is 80 characters long (recommended limit is 79) #206 FILE: Documentation/howto/userspace-udp-performance-tunning.rst:51: iperf3: OUT OF ORDER - incoming packet = 79 and received packet = 137 AND SP = 5 WARNING: Line is 80 characters long (recommended limit is 79) #207 FILE: Documentation/howto/userspace-udp-performance-tunning.rst:52: iperf3: OUT OF ORDER - incoming packet = 80 and received packet = 139 AND SP = 5 WARNING: Line is 80 characters long (recommended limit is 79) #208 FILE: Documentation/howto/userspace-udp-performance-tunning.rst:53: iperf3: OUT OF ORDER - incoming packet = 82 and received packet = 172 AND SP = 5 WARNING: Line is 80 characters long (recommended limit is 79) #209 FILE: Documentation/howto/userspace-udp-performance-tunning.rst:54: iperf3: OUT OF ORDER - incoming packet = 83 and received packet = 173 AND SP = 5 WARNING: Line is 86 characters long (recommended limit is 79) #222 FILE: Documentation/howto/userspace-udp-performance-tunning.rst:67: $ sudo ip netns exec ns03 iperf3 -t 10 -i 1 -u -b 10G -c 10.15.2.3 --get-server-output WARNING: Line is 84 characters long (recommended limit is 79) #237 FILE: Documentation/howto/userspace-udp-performance-tunning.rst:82: [ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams WARNING: Line is 84 characters long (recommended limit is 79) #245 FILE: Documentation/howto/userspace-udp-performance-tunning.rst:90: [ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams WARNING: Line is 85 characters long (recommended limit is 79) #277 FILE: Documentation/howto/userspace-udp-performance-tunning.rst:122: $ sudo ip netns exec ns03 iperf3 -t 10 -i 1 -u -b 3G -c 10.15.2.3 --get-server-output WARNING: Line is 84 characters long (recommended limit is 79) #292 FILE: Documentation/howto/userspace-udp-performance-tunning.rst:137: [ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams WARNING: Line is 84 characters long (recommended limit is 79) #300 FILE: Documentation/howto/userspace-udp-performance-tunning.rst:145: [ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams WARNING: Line is 90 characters long (recommended limit is 79) #349 FILE: Documentation/howto/userspace-udp-performance-tunning.rst:194: $ sudo ovs-vsctl set Open_vSwitch . other_config:userspace-sock-buf-size=1073741823 WARNING: Line is 80 characters long (recommended limit is 79) #355 FILE: Documentation/howto/userspace-udp-performance-tunning.rst:200: for OVS system interface is two times minimum one of wmem_max and this value. Lines checked: 607, Warnings: 15, Errors: 0 build: reading sources... [ 87%] topics/language-bindings reading sources... [ 88%] topics/networking-namespaces reading sources... [ 89%] topics/openflow reading sources... [ 90%] topics/ovs-extensions reading sources... [ 91%] topics/ovsdb-replication reading sources... [ 92%] topics/porting reading sources... [ 93%] topics/testing reading sources... [ 93%] topics/tracing reading sources... [ 94%] topics/userspace-tso reading sources... [ 95%] topics/windows reading sources... [ 96%] tutorials/faucet reading sources... [ 97%] tutorials/index reading sources... [ 98%] tutorials/ipsec reading sources... [ 99%] tutorials/ovs-advanced reading sources... [100%] tutorials/ovs-conntrack deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_ io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 ( 2012-12- 16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation wa rning: i o.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argumen t `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docut ils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed.deprecation warning: io.FileInput() argument `handle_io_errors` is ignored since "Docutils 0.10 (2012-12-16)" and will soon be removed. Warning, treated as error: /var/lib/jenkins/jobs/0day_robot_upstream_build_from_pw/workspace/Documentation/howto/userspace-udp-performance-tunning.rst:108: WARNING: Block quote ends without a blank line; unexpected unindent. make[2]: *** [docs-check] Error 1 make[2]: Leaving directory `/var/lib/jenkins/jobs/0day_robot_upstream_build_from_pw/workspace' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/var/lib/jenkins/jobs/0day_robot_upstream_build_from_pw/workspace' make: *** [all] Error 2 Please check this out. If you feel there has been an error, please email aconole@redhat.com Thanks, 0-day Robot
On Thu, Sep 24, 2020 at 10:05:16AM +0800, yang_y_yi@163.com wrote: > From: Yi Yang <yangyi01@inspur.com> > > iperf3 UDP performance of veth to veth case is > very very bad because of too many packet loss, > the root cause is rmem_default and wmem_default > are just 212992, but iperf3 UDP test used 8K > UDP size which resulted in many UDP fragment in > case that MTU size is 1500, one 8K UDP send would > enqueue 6 UDP fragments to socket receive queue, > the default small socket buffer size can't cache > so many packets that many packets are lost. > > This commit fixed packet loss issue, it allows > users to set socket receive and send buffer size > per their own system environment to proper value, > therefore there will not be packet loss. > > Users can set system interface socket buffer size > by command lines: > > $ sudo sh -c "echo 1073741823 > /proc/sys/net/core/wmem_max" > $ sudo sh -c "echo 1073741823 > /proc/sys/net/core/rmem_max" > > or > > $ sudo ovs-vsctl set Open_vSwitch . \ > other_config:userspace-sock-buf-size=1073741823 > > But final socket buffer size is minimum one among of them. > Possible value range is 212992 to 1073741823. Users must > explicitly set other_config:userspace-sock-buf-size to the > value they expect, otherwise OVS won't set socket send > and receive buffer size. More details about it is in the > document > Documentation/howto/userspace-udp-performance-tunning.rst. > > By the way, big socket buffer doesn't mean it will > allocate big buffer on creating socket, actually > it won't alocate any extra buffer compared to default > socket buffer size, it just means more skbuffs can > be enqueued to socket receive queue and send queue, > therefore there will not be packet loss. > > The below is for your reference. > > The result before apply this commit > =================================== > $ ip netns exec ns02 iperf3 -t 5 -i 1 -u -b 100M -c 10.15.2.6 --get-server-output -A 5 > Connecting to host 10.15.2.6, port 5201 > [ 4] local 10.15.2.2 port 59053 connected to 10.15.2.6 port 5201 > [ ID] Interval Transfer Bandwidth Total Datagrams > [ 4] 0.00-1.00 sec 10.8 MBytes 90.3 Mbits/sec 1378 > [ 4] 1.00-2.00 sec 11.9 MBytes 100 Mbits/sec 1526 > [ 4] 2.00-3.00 sec 11.9 MBytes 100 Mbits/sec 1526 > [ 4] 3.00-4.00 sec 11.9 MBytes 100 Mbits/sec 1526 > [ 4] 4.00-5.00 sec 11.9 MBytes 100 Mbits/sec 1526 > - - - - - - - - - - - - - - - - - - - - - - - - - > [ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams > [ 4] 0.00-5.00 sec 58.5 MBytes 98.1 Mbits/sec 0.047 ms 357/531 (67%) > [ 4] Sent 531 datagrams > > Server output: > ----------------------------------------------------------- > Accepted connection from 10.15.2.2, port 60314 > [ 5] local 10.15.2.6 port 5201 connected to 10.15.2.2 port 59053 > [ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams > [ 5] 0.00-1.00 sec 1.36 MBytes 11.4 Mbits/sec 0.047 ms 357/531 (67%) > [ 5] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec 0.047 ms 0/0 (-nan%) > [ 5] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec 0.047 ms 0/0 (-nan%) > [ 5] 3.00-4.00 sec 0.00 Bytes 0.00 bits/sec 0.047 ms 0/0 (-nan%) > [ 5] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec 0.047 ms 0/0 (-nan%) > > iperf Done. > > The result after apply this commit > =================================== > $ sudo ip netns exec ns02 iperf3 -t 5 -i 1 -u -b 4G -c 10.15.2.6 --get-server-output -A 5 > Connecting to host 10.15.2.6, port 5201 > [ 4] local 10.15.2.2 port 48547 connected to 10.15.2.6 port 5201 > [ ID] Interval Transfer Bandwidth Total Datagrams > [ 4] 0.00-1.00 sec 440 MBytes 3.69 Gbits/sec 56276 > [ 4] 1.00-2.00 sec 481 MBytes 4.04 Gbits/sec 61579 > [ 4] 2.00-3.00 sec 474 MBytes 3.98 Gbits/sec 60678 > [ 4] 3.00-4.00 sec 480 MBytes 4.03 Gbits/sec 61452 > [ 4] 4.00-5.00 sec 480 MBytes 4.03 Gbits/sec 61441 > - - - - - - - - - - - - - - - - - - - - - - - - - > [ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams > [ 4] 0.00-5.00 sec 2.30 GBytes 3.95 Gbits/sec 0.024 ms 0/301426 (0%) > [ 4] Sent 301426 datagrams > > Server output: > ----------------------------------------------------------- > Accepted connection from 10.15.2.2, port 60320 > [ 5] local 10.15.2.6 port 5201 connected to 10.15.2.2 port 48547 > [ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams > [ 5] 0.00-1.00 sec 209 MBytes 1.75 Gbits/sec 0.021 ms 0/26704 (0%) > [ 5] 1.00-2.00 sec 258 MBytes 2.16 Gbits/sec 0.025 ms 0/32967 (0%) > [ 5] 2.00-3.00 sec 258 MBytes 2.16 Gbits/sec 0.022 ms 0/32987 (0%) > [ 5] 3.00-4.00 sec 257 MBytes 2.16 Gbits/sec 0.023 ms 0/32954 (0%) > [ 5] 4.00-5.00 sec 257 MBytes 2.16 Gbits/sec 0.021 ms 0/32937 (0%) > [ 5] 5.00-6.00 sec 255 MBytes 2.14 Gbits/sec 0.026 ms 0/32685 (0%) > [ 5] 6.00-7.00 sec 254 MBytes 2.13 Gbits/sec 0.025 ms 0/32453 (0%) > [ 5] 7.00-8.00 sec 255 MBytes 2.14 Gbits/sec 0.026 ms 0/32679 (0%) > [ 5] 8.00-9.00 sec 255 MBytes 2.14 Gbits/sec 0.022 ms 0/32669 (0%) > > iperf Done. > > Signed-off-by: Yi Yang <yangyi01@inspur.com> Hi Yi Yang, This patch appears to have gone stale in patchwork, for one reason or another. If it is still relevant then I think it needs to be revisited, by being reposted after appropriate preparation. As such I'm marking this patch as "Deferred" in patchwork. No action is required unless there is a desire to revisit this patch.
diff --git a/Documentation/automake.mk b/Documentation/automake.mk index f85c432..4431097 100644 --- a/Documentation/automake.mk +++ b/Documentation/automake.mk @@ -71,6 +71,7 @@ DOC_SOURCE = \ Documentation/howto/sflow.rst \ Documentation/howto/tunneling.png \ Documentation/howto/tunneling.rst \ + Documentation/howto/userspace-udp-performance-tunning.rst \ Documentation/howto/userspace-tunneling.rst \ Documentation/howto/vlan.png \ Documentation/howto/vlan.rst \ diff --git a/Documentation/howto/index.rst b/Documentation/howto/index.rst index 60fb8a7..d5271f0 100644 --- a/Documentation/howto/index.rst +++ b/Documentation/howto/index.rst @@ -44,6 +44,7 @@ OVS lisp tunneling userspace-tunneling + userspace-udp-performance-tunning vlan qos vtep diff --git a/Documentation/howto/userspace-udp-performance-tunning.rst b/Documentation/howto/userspace-udp-performance-tunning.rst new file mode 100644 index 0000000..a6a9d0b --- /dev/null +++ b/Documentation/howto/userspace-udp-performance-tunning.rst @@ -0,0 +1,221 @@ +.. + Licensed under the Apache License, Version 2.0 (the "License"); you may + not use this file except in compliance with the License. You may obtain + a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + License for the specific language governing permissions and limitations + under the License. + + Convention for heading levels in Open vSwitch documentation: + + ======= Heading 0 (reserved for the title in a document) + ------- Heading 1 + ~~~~~~~ Heading 2 + +++++++ Heading 3 + ''''''' Heading 4 + + Avoid deeper levels because they do not render well. + +================================= +Userspace UDP performance tunning +================================= + +This document describes how to tune UDP performance for Open vSwitch +userspace. In Open vSwitch userspace case, if you run iperf3 to test UDP +performance, you will see bigger packet loss rate, sometimes, you also +will see iperf3 outputs some information as below. + +[ 5] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec 0.018 ms 0/0 (-nan%) +[ 5] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec 0.018 ms 0/0 (-nan%) +[ 5] 3.00-4.00 sec 0.00 Bytes 0.00 bits/sec 0.018 ms 0/0 (-nan%) +[ 5] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec 0.018 ms 0/0 (-nan%) +[ 5] 5.00-6.00 sec 0.00 Bytes 0.00 bits/sec 0.018 ms 0/0 (-nan%) +[ 5] 6.00-7.00 sec 0.00 Bytes 0.00 bits/sec 0.018 ms 0/0 (-nan%) +[ 5] 7.00-8.00 sec 0.00 Bytes 0.00 bits/sec 0.018 ms 0/0 (-nan%) +[ 5] 8.00-9.00 sec 0.00 Bytes 0.00 bits/sec 0.018 ms 0/0 (-nan%) +[ 5] 9.00-10.00 sec 0.00 Bytes 0.00 bits/sec 0.018 ms 0/0 (-nan%) + +or + +iperf3: OUT OF ORDER - incoming packet = 70 and received packet = 97 AND SP = 5 +iperf3: OUT OF ORDER - incoming packet = 71 and received packet = 97 AND SP = 5 +iperf3: OUT OF ORDER - incoming packet = 72 and received packet = 99 AND SP = 5 +iperf3: OUT OF ORDER - incoming packet = 14 and received packet = 123 AND SP = 5 +iperf3: OUT OF ORDER - incoming packet = 15 and received packet = 125 AND SP = 5 +iperf3: OUT OF ORDER - incoming packet = 78 and received packet = 137 AND SP = 5 +iperf3: OUT OF ORDER - incoming packet = 79 and received packet = 137 AND SP = 5 +iperf3: OUT OF ORDER - incoming packet = 80 and received packet = 139 AND SP = 5 +iperf3: OUT OF ORDER - incoming packet = 82 and received packet = 172 AND SP = 5 +iperf3: OUT OF ORDER - incoming packet = 83 and received packet = 173 AND SP = 5 + +There are many reasons resulting in such issues, for example, you don't use +-b to limit bandwidth, big packet(UDP packet data size is 8192 by default if +you don't use -l to specify UDP payload size) means many IP fragments if your +MTU is 1500/1450, any one of them is lost, that means the whole UDP packet +is lost because TCP/IP protocol stack can't reassemble original UDP packet, so +big packet isn't always good for performance. But among of them, the most +important reason is socket buffer size of UDP send side and receive side. + +Here is iperf3 output if system interface added to OVS use default buffer size +(which is 212992 by default). + +$ sudo ip netns exec ns03 iperf3 -t 10 -i 1 -u -b 10G -c 10.15.2.3 --get-server-output +Connecting to host 10.15.2.3, port 5201 +[ 4] local 10.15.2.7 port 39415 connected to 10.15.2.3 port 5201 +[ ID] Interval Transfer Bandwidth Total Datagrams +[ 4] 0.00-1.00 sec 572 MBytes 4.79 Gbits/sec 73154 +[ 4] 1.00-2.00 sec 611 MBytes 5.12 Gbits/sec 78196 +[ 4] 2.00-3.00 sec 588 MBytes 4.93 Gbits/sec 75248 +[ 4] 3.00-4.00 sec 619 MBytes 5.19 Gbits/sec 79200 +[ 4] 4.00-5.00 sec 625 MBytes 5.24 Gbits/sec 79937 +[ 4] 5.00-6.00 sec 664 MBytes 5.57 Gbits/sec 85043 +[ 4] 6.00-7.00 sec 636 MBytes 5.34 Gbits/sec 81417 +[ 4] 7.00-8.00 sec 629 MBytes 5.27 Gbits/sec 80461 +[ 4] 8.00-9.00 sec 635 MBytes 5.33 Gbits/sec 81326 +[ 4] 9.00-10.00 sec 627 MBytes 5.26 Gbits/sec 80270 +- - - - - - - - - - - - - - - - - - - - - - - - - +[ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams +[ 4] 0.00-10.00 sec 6.06 GBytes 5.21 Gbits/sec 0.067 ms 3793/5791 (65%) +[ 4] Sent 5791 datagrams + +Server output: +- - - - - - - - +Accepted connection from 10.15.2.7, port 54090 +[ 5] local 10.15.2.3 port 5201 connected to 10.15.2.7 port 39415 +[ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams +[ 5] 0.00-1.00 sec 15.6 MBytes 131 Mbits/sec 0.067 ms 3793/5791 (65%) +[ 5] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec 0.067 ms 0/0 (-nan%) +[ 5] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec 0.067 ms 0/0 (-nan%) +[ 5] 3.00-4.00 sec 0.00 Bytes 0.00 bits/sec 0.067 ms 0/0 (-nan%) +[ 5] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec 0.067 ms 0/0 (-nan%) +[ 5] 5.00-6.00 sec 0.00 Bytes 0.00 bits/sec 0.067 ms 0/0 (-nan%) +[ 5] 6.00-7.00 sec 0.00 Bytes 0.00 bits/sec 0.067 ms 0/0 (-nan%) +[ 5] 7.00-8.00 sec 0.00 Bytes 0.00 bits/sec 0.067 ms 0/0 (-nan%) +[ 5] 8.00-9.00 sec 0.00 Bytes 0.00 bits/sec 0.067 ms 0/0 (-nan%) +[ 5] 9.00-10.00 sec 0.00 Bytes 0.00 bits/sec 0.067 ms 0/0 (-nan%) + + +iperf Done. + +Test setup is below: + + netns ns02 netns ns03 ++------------+ +------------+ +|10.15.2.3/24| |10.15.2.7/24| +| | | | +| veth02 | | veth03 | ++------|-----+ +-----------------+ +-----|------+ + | | | | + +--------| br0 |--------+ + |(datapath=netdev)| + +-----------------+ + + +But what if you increase socket buffer size? Let us increase it to 1073741823 +and check it again. + +$ sudo ip netns exec ns03 iperf3 -t 10 -i 1 -u -b 3G -c 10.15.2.3 --get-server-output +Connecting to host 10.15.2.3, port 5201 +[ 4] local 10.15.2.7 port 52686 connected to 10.15.2.3 port 5201 +[ ID] Interval Transfer Bandwidth Total Datagrams +[ 4] 0.00-1.00 sec 343 MBytes 2.88 Gbits/sec 43945 +[ 4] 1.00-2.00 sec 357 MBytes 3.00 Gbits/sec 45742 +[ 4] 2.00-3.00 sec 357 MBytes 3.00 Gbits/sec 45759 +[ 4] 3.00-4.00 sec 357 MBytes 3.00 Gbits/sec 45716 +[ 4] 4.00-5.00 sec 358 MBytes 3.01 Gbits/sec 45882 +[ 4] 5.00-6.00 sec 360 MBytes 3.02 Gbits/sec 46046 +[ 4] 6.00-7.00 sec 368 MBytes 3.09 Gbits/sec 47163 +[ 4] 7.00-8.00 sec 357 MBytes 3.00 Gbits/sec 45734 +[ 4] 8.00-9.00 sec 353 MBytes 2.97 Gbits/sec 45246 +[ 4] 9.00-10.00 sec 356 MBytes 2.99 Gbits/sec 45630 +- - - - - - - - - - - - - - - - - - - - - - - - - +[ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams +[ 4] 0.00-10.00 sec 3.49 GBytes 2.99 Gbits/sec 0.027 ms 0/456861 (0%) +[ 4] Sent 456861 datagrams + +Server output: +- - - - - - - - +Accepted connection from 10.15.2.7, port 54096 +[ 5] local 10.15.2.3 port 5201 connected to 10.15.2.7 port 52686 +[ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams +[ 5] 0.00-1.00 sec 190 MBytes 1.59 Gbits/sec 0.031 ms 0/24303 (0%) +[ 5] 1.00-2.00 sec 219 MBytes 1.84 Gbits/sec 0.023 ms 0/28025 (0%) +[ 5] 2.00-3.00 sec 219 MBytes 1.84 Gbits/sec 0.029 ms 0/28006 (0%) +[ 5] 3.00-4.00 sec 219 MBytes 1.83 Gbits/sec 0.030 ms 0/27990 (0%) +[ 5] 4.00-5.00 sec 218 MBytes 1.83 Gbits/sec 0.031 ms 0/27920 (0%) +[ 5] 5.00-6.00 sec 209 MBytes 1.76 Gbits/sec 0.094 ms 0/26807 (0%) +[ 5] 6.00-7.00 sec 185 MBytes 1.55 Gbits/sec 0.032 ms 0/23673 (0%) +[ 5] 7.00-8.00 sec 217 MBytes 1.82 Gbits/sec 0.030 ms 0/27721 (0%) +[ 5] 8.00-9.00 sec 208 MBytes 1.75 Gbits/sec 0.029 ms 0/26646 (0%) +[ 5] 9.00-10.00 sec 219 MBytes 1.84 Gbits/sec 0.029 ms 0/28007 (0%) +[ 5] 10.00-11.00 sec 217 MBytes 1.82 Gbits/sec 0.026 ms 0/27816 (0%) +[ 5] 11.00-12.00 sec 218 MBytes 1.83 Gbits/sec 0.024 ms 0/27936 (0%) +[ 5] 12.00-13.00 sec 213 MBytes 1.79 Gbits/sec 0.036 ms 0/27282 (0%) +[ 5] 13.00-14.00 sec 211 MBytes 1.77 Gbits/sec 0.035 ms 0/27018 (0%) +[ 5] 14.00-15.00 sec 212 MBytes 1.78 Gbits/sec 0.029 ms 0/27162 (0%) +[ 5] 15.00-16.00 sec 216 MBytes 1.81 Gbits/sec 0.025 ms 0/27605 (0%) + + +iperf Done. + +You can see the performance number has huge improvement, packet loss rate +is 0. + +.. note:: + + This howto covers the steps required to tune UDP performance. The same + approach can be used for iperf3 client and iperf3 server in VMs or network + namespaces. + +Tunning Steps +------------- + +Perform the following steps on OVS node to tune socket buffer for OVS system +interface. + +#. Change Linux system maximum socket buffer size for send and receive sides + + $ sudo sh -c "echo 1073741823 > /proc/sys/net/core/wmem_max" + $ sudo sh -c "echo 1073741823 > /proc/sys/net/core/rmem_max" + + In order to ensure they are still set to the above value after your system + is rebooted, you also need change systctl config to persist these values. + + $ sudo sh -c "echo net.core.rmem_max=1073741823 >> /etc/sysctl.conf" + $ sudo sh -c "echo net.core.wmem_max=1073741823 >> /etc/sysctl.conf" + +#. Change socket buffer size for OVS system interface + + $ sudo ovs-vsctl set Open_vSwitch . other_config:userspace-sock-buf-size=1073741823 + + Note: other_config:userspace-sock-buf-size is both for receive socket buffer + size and send socket buffer size, its possible value range is 212992 to + 1073741823, final receive socket buffer size for OVS system interface is two + times minimum one of rmem_max and this value, final send socket buffer size + for OVS system interface is two times minimum one of wmem_max and this value. + So you can change it to the value you want just by changing + other_config:userspace-sock-buf-size, you also can set + other_config:userspace-sock-buf-size to 1073741823 then just change + /proc/sys/net/core/rmem_max and /proc/sys/net/core/wmem_max to the value you + want. The changed value will take effect only after restart ovs-vswitchd. + +#. Restart ovs-vswitchd + + Note: you have to restart ovs-vswitchd to make sure the changed value takes + effect. + +#. You need repeat the above steps on all the OVS nodes to make sure + cross-node veth-to-veth, veth-to-tap, or tap-to-tap UDP performance + can get improved. + +Potential Impact +---------------- + +Although this tunning can improve UDP performance, it possibly also +impacts on TCP performance, please reset the above values to default +values in your system if you see it hurts your TCP performance. diff --git a/lib/automake.mk b/lib/automake.mk index 380a672..ffbc3e3 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -343,6 +343,8 @@ lib_libopenvswitch_la_SOURCES = \ lib/unicode.h \ lib/unixctl.c \ lib/unixctl.h \ + lib/userspace-sock-buf-size.c \ + lib/userspace-sock-buf-size.h \ lib/userspace-tso.c \ lib/userspace-tso.h \ lib/util.c \ diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index fe7fb9b..c196266 100644 --- a/lib/netdev-linux.c +++ b/lib/netdev-linux.c @@ -78,6 +78,7 @@ #include "timer.h" #include "unaligned.h" #include "openvswitch/vlog.h" +#include "userspace-sock-buf-size.h" #include "userspace-tso.h" #include "util.h" @@ -1103,6 +1104,20 @@ netdev_linux_rxq_construct(struct netdev_rxq *rxq_) ARRAY_SIZE(filt), (struct sock_filter *) filt }; + /* sock_buf_size must be less than 1G, so maximum value is + * (1 << 30) - 1, i.e. 1073741823, this doesn't mean this + * socket will allocate so big buffer, it just means the + * packets client sends won't be dropped because of small + * default socket buffer, the result is we can get the best + * possible throughtput, no packet loss, this can improve + * UDP and TCP performance significantly, especially for + * fragmented UDP. + */ + static uint32_t last_rcv_sock_buf_size; + static uint32_t last_snd_sock_buf_size; + uint32_t sock_buf_size = userspace_get_sock_buf_size(); + uint32_t sock_opt_len = sizeof(sock_buf_size); + /* Create file descriptor. */ rx->fd = socket(PF_PACKET, SOCK_RAW, 0); if (rx->fd < 0) { @@ -1161,6 +1176,58 @@ netdev_linux_rxq_construct(struct netdev_rxq *rxq_) netdev_get_name(netdev_), ovs_strerror(error)); goto error; } + + if (sock_buf_size) { + /* Set send socket buffer size */ + error = setsockopt(rx->fd, SOL_SOCKET, SO_SNDBUF, &sock_buf_size, + sock_opt_len); + if (error && (errno == EBADF || errno == ENOTSOCK)) { + error = errno; + VLOG_ERR("%s: failed to set send socket buffer size (%s)", + netdev_get_name(netdev_), ovs_strerror(error)); + goto error; + } + + /* Set recv socket buffer size */ + error = setsockopt(rx->fd, SOL_SOCKET, SO_RCVBUF, &sock_buf_size, + sock_opt_len); + if (error && (errno == EBADF || errno == ENOTSOCK)) { + error = errno; + VLOG_ERR("%s: failed to set recv socket buffer size (%s)", + netdev_get_name(netdev_), ovs_strerror(error)); + goto error; + } + } + + /* Get final recv socket buffer size, it should be + * 2 * ((1 << 30) - 1) (i.e. 2147483646) if successfully. + * Don't doubt it is wrong, Linux kernel does so, i.e. + * final sk_rcvbuf = val * 2. + */ + error= getsockopt(rx->fd, SOL_SOCKET, SO_RCVBUF, &sock_buf_size, + &sock_opt_len); + if (!error) { + if (last_rcv_sock_buf_size != sock_buf_size) { + VLOG_INFO("Current socket recv buffer size: %d", + sock_buf_size); + last_rcv_sock_buf_size = sock_buf_size; + } + } + + /* Get final send socket buffer size, it should be + * 2 * ((1 << 30) - 1) (i.e. 2147483646) if successfully. + * Don't doubt it is wrong, Linux kernel does so, i.e. + * final sk_sndbuf = val * 2. + */ + error = getsockopt(rx->fd, SOL_SOCKET, SO_SNDBUF, &sock_buf_size, + &sock_opt_len); + if (!error) { + if (last_snd_sock_buf_size != sock_buf_size) { + VLOG_INFO("Current socket send buffer size: %d", + sock_buf_size); + last_snd_sock_buf_size = sock_buf_size; + } + } } ovs_mutex_unlock(&netdev->mutex); diff --git a/lib/userspace-sock-buf-size.c b/lib/userspace-sock-buf-size.c new file mode 100644 index 0000000..24500a4 --- /dev/null +++ b/lib/userspace-sock-buf-size.c @@ -0,0 +1,68 @@ +/* + * Copyright (c) 2020 Inspur, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include <config.h> + +#include "smap.h" +#include "openvswitch/vlog.h" +#include "ovs-thread.h" +#include "userspace-sock-buf-size.h" + +VLOG_DEFINE_THIS_MODULE(userspace_sock_buf_size); + +/* Minimum socket buffer size, it is Linux default size */ +#define MIN_SOCK_BUF_SIZE 212992 + +/* Maximum possible socket buffer size */ +#define MAX_SOCK_BUF_SIZE 1073741823 + +static uint32_t userspace_sock_buf_size; + +void +userspace_sock_buf_size_init(const struct smap *ovs_other_config) +{ + static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER; + + if (ovsthread_once_start(&once)) { + uint32_t sock_buf_size; + + sock_buf_size = smap_get_int(ovs_other_config, + "userspace-sock-buf-size", + 0); + + if (sock_buf_size == 0) { + goto tail; + } + + if (sock_buf_size < MIN_SOCK_BUF_SIZE) { + sock_buf_size = MIN_SOCK_BUF_SIZE; + } else if (sock_buf_size > MAX_SOCK_BUF_SIZE) { + sock_buf_size = MAX_SOCK_BUF_SIZE; + } + + userspace_sock_buf_size = sock_buf_size; + VLOG_INFO("Userspace socket buffer size for system interface: %d", + userspace_sock_buf_size); +tail: + ovsthread_once_done(&once); + } +} + +uint32_t +userspace_get_sock_buf_size(void) +{ + return userspace_sock_buf_size; +} diff --git a/lib/userspace-sock-buf-size.h b/lib/userspace-sock-buf-size.h new file mode 100644 index 0000000..80385ba --- /dev/null +++ b/lib/userspace-sock-buf-size.h @@ -0,0 +1,23 @@ +/* + * Copyright (c) 2020 Inspur Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef USERSPACE_SOCK_SIZE_H +#define USERSPACE_SOCK_SIZE_H 1 + +void userspace_sock_buf_size_init(const struct smap *ovs_other_config); +uint32_t userspace_get_sock_buf_size(void); + +#endif /* userspace-sock-buf-size.h */ diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c index a3e7fac..8ab33ee 100644 --- a/vswitchd/bridge.c +++ b/vswitchd/bridge.c @@ -65,6 +65,7 @@ #include "system-stats.h" #include "timeval.h" #include "tnl-ports.h" +#include "userspace-sock-buf-size.h" #include "userspace-tso.h" #include "util.h" #include "unixctl.h" @@ -3291,6 +3292,7 @@ bridge_run(void) netdev_set_flow_api_enabled(&cfg->other_config); dpdk_init(&cfg->other_config); userspace_tso_init(&cfg->other_config); + userspace_sock_buf_size_init(&cfg->other_config); } /* Initialize the ofproto library. This only needs to run once, but