Message ID | 7f2873367602497fb9f00c1df6b30b8f@inspur.com |
---|---|
State | Superseded |
Headers | show |
Series | [ovs-dev] 答复: [PATCH v6] Use TPACKET_V3 to accelerate veth for userspace datapath | expand |
On Fri, Mar 13, 2020 at 01:04:07AM +0000, Yi Yang (杨燚)-云服务集团 wrote: > Per my understanding, Ben meant a build system (which isn't Linux > probably, it doesn't have include/linux/if_packet.h) should be able to > build tpacket_v3 code in order that built-out binary can work on Linux > system with tpacket_v3 feature, this is Ben's point, that is why he > wanted me to add include/linux/if_packet.h in ovs repo. > > Ben, can you help double confirm if include/linux/if_packet.h in ovs > is necessary? I think my meaning was misunderstood. Linux always has if_packet.h. Only recent enough Linux has TPACKET_V3 in if_packet.h. If the system is Linux but the TPACKET_V3 types and constants are not defined in if_packet.h, then the build system should define them.
On Fri, Mar 13, 2020 at 8:57 AM Ben Pfaff <blp@ovn.org> wrote: > > On Fri, Mar 13, 2020 at 01:04:07AM +0000, Yi Yang (杨燚)-云服务集团 wrote: > > Per my understanding, Ben meant a build system (which isn't Linux > > probably, it doesn't have include/linux/if_packet.h) should be able to > > build tpacket_v3 code in order that built-out binary can work on Linux > > system with tpacket_v3 feature, this is Ben's point, that is why he > > wanted me to add include/linux/if_packet.h in ovs repo. > > > > Ben, can you help double confirm if include/linux/if_packet.h in ovs > > is necessary? > > I think my meaning was misunderstood. Linux always has if_packet.h. > Only recent enough Linux has TPACKET_V3 in if_packet.h. If the system > is Linux but the TPACKET_V3 types and constants are not defined in > if_packet.h, then the build system should define them. Thanks! My suggestion is that if the system is Linux but the TPACKET_V3 types and constants are not defined in if_packet.h, then just skip using TPACKET_V3 and use the current recvmmsg approach. Because when we start TPACKET_V3 patch, the af_packet on veth performance is about 200Mbps, so tpacket_v3 has huge performance benefits. With YiYang's patch "Use batch process recv for tap and raw socket in netdev datapath" the af_packet on veth improves to 1.47Gbps. And tpacket_v3 shows similar or 7% better performance. So there isn't a huge benefits now. William
On 3/13/20 5:22 PM, William Tu wrote: > On Fri, Mar 13, 2020 at 8:57 AM Ben Pfaff <blp@ovn.org> wrote: >> >> On Fri, Mar 13, 2020 at 01:04:07AM +0000, Yi Yang (杨燚)-云服务集团 wrote: >>> Per my understanding, Ben meant a build system (which isn't Linux >>> probably, it doesn't have include/linux/if_packet.h) should be able to >>> build tpacket_v3 code in order that built-out binary can work on Linux >>> system with tpacket_v3 feature, this is Ben's point, that is why he >>> wanted me to add include/linux/if_packet.h in ovs repo. >>> >>> Ben, can you help double confirm if include/linux/if_packet.h in ovs >>> is necessary? >> >> I think my meaning was misunderstood. Linux always has if_packet.h. >> Only recent enough Linux has TPACKET_V3 in if_packet.h. If the system >> is Linux but the TPACKET_V3 types and constants are not defined in >> if_packet.h, then the build system should define them. > > Thanks! > > My suggestion is that if the system is Linux but the TPACKET_V3 types > and constants are not defined in if_packet.h, then just skip using > TPACKET_V3 and > use the current recvmmsg approach. Because when we start TPACKET_V3 patch, > the af_packet on veth performance is about 200Mbps, so tpacket_v3 has huge > performance benefits. > > With YiYang's patch > "Use batch process recv for tap and raw socket in netdev datapath" > the af_packet on veth improves to 1.47Gbps. And tpacket_v3 shows > similar or 7% better performance. So there isn't a huge benefits now. With such a small performance benefit does it make sense to have these 700 lines of code that is so hard to read and maintain? Another point is that hopefully segmentation offloading in userspace datapath will evolve so we could enable it by default and all this code will become almost useless. If you're looking for poll mode/async -like solutions we could try and check io_uring way for calling same recvmsg/sendmsg. That might have more benefits and it will support all the functionality supported by these calls. Even better, we could also make io_uring support as an internal library and reuse it for other OVS subsystems like making async poll/timers/logging/etc in the future. Best regards, Ilya Maximets.
On Fri, Mar 13, 2020 at 9:48 AM Ilya Maximets <i.maximets@ovn.org> wrote: > > On 3/13/20 5:22 PM, William Tu wrote: > > On Fri, Mar 13, 2020 at 8:57 AM Ben Pfaff <blp@ovn.org> wrote: > >> > >> On Fri, Mar 13, 2020 at 01:04:07AM +0000, Yi Yang (杨燚)-云服务集团 wrote: > >>> Per my understanding, Ben meant a build system (which isn't Linux > >>> probably, it doesn't have include/linux/if_packet.h) should be able to > >>> build tpacket_v3 code in order that built-out binary can work on Linux > >>> system with tpacket_v3 feature, this is Ben's point, that is why he > >>> wanted me to add include/linux/if_packet.h in ovs repo. > >>> > >>> Ben, can you help double confirm if include/linux/if_packet.h in ovs > >>> is necessary? > >> > >> I think my meaning was misunderstood. Linux always has if_packet.h. > >> Only recent enough Linux has TPACKET_V3 in if_packet.h. If the system > >> is Linux but the TPACKET_V3 types and constants are not defined in > >> if_packet.h, then the build system should define them. > > > > Thanks! > > > > My suggestion is that if the system is Linux but the TPACKET_V3 types > > and constants are not defined in if_packet.h, then just skip using > > TPACKET_V3 and > > use the current recvmmsg approach. Because when we start TPACKET_V3 patch, > > the af_packet on veth performance is about 200Mbps, so tpacket_v3 has huge > > performance benefits. > > > > With YiYang's patch > > "Use batch process recv for tap and raw socket in netdev datapath" > > the af_packet on veth improves to 1.47Gbps. And tpacket_v3 shows > > similar or 7% better performance. So there isn't a huge benefits now. > > With such a small performance benefit does it make sense to have > these 700 lines of code that is so hard to read and maintain? Agree. I was hoping that using "tpacket_v3 + is_pmd=true + TSO" can show much better performance. But TSO has some issue and this patch is not there yet. > > Another point is that hopefully segmentation offloading in userspace > datapath will evolve so we could enable it by default and all this > code will become almost useless. > > If you're looking for poll mode/async -like solutions we could try and > check io_uring way for calling same recvmsg/sendmsg. That might > have more benefits and it will support all the functionality supported > by these calls. Even better, we could also make io_uring support as > an internal library and reuse it for other OVS subsystems like making > async poll/timers/logging/etc in the future. Thanks! I will take a look. William
On Fri, Mar 13, 2020 at 05:47:54PM +0100, Ilya Maximets wrote: > On 3/13/20 5:22 PM, William Tu wrote: > > On Fri, Mar 13, 2020 at 8:57 AM Ben Pfaff <blp@ovn.org> wrote: > >> > >> On Fri, Mar 13, 2020 at 01:04:07AM +0000, Yi Yang (杨燚)-云服务集团 wrote: > >>> Per my understanding, Ben meant a build system (which isn't Linux > >>> probably, it doesn't have include/linux/if_packet.h) should be able to > >>> build tpacket_v3 code in order that built-out binary can work on Linux > >>> system with tpacket_v3 feature, this is Ben's point, that is why he > >>> wanted me to add include/linux/if_packet.h in ovs repo. > >>> > >>> Ben, can you help double confirm if include/linux/if_packet.h in ovs > >>> is necessary? > >> > >> I think my meaning was misunderstood. Linux always has if_packet.h. > >> Only recent enough Linux has TPACKET_V3 in if_packet.h. If the system > >> is Linux but the TPACKET_V3 types and constants are not defined in > >> if_packet.h, then the build system should define them. > > > > Thanks! > > > > My suggestion is that if the system is Linux but the TPACKET_V3 types > > and constants are not defined in if_packet.h, then just skip using > > TPACKET_V3 and > > use the current recvmmsg approach. Because when we start TPACKET_V3 patch, > > the af_packet on veth performance is about 200Mbps, so tpacket_v3 has huge > > performance benefits. > > > > With YiYang's patch > > "Use batch process recv for tap and raw socket in netdev datapath" > > the af_packet on veth improves to 1.47Gbps. And tpacket_v3 shows > > similar or 7% better performance. So there isn't a huge benefits now. > > With such a small performance benefit does it make sense to have > these 700 lines of code that is so hard to read and maintain? Rarely used code with minimal benefit is a burden, so I'd skip it for now. If we figure out some way to a bigger benefit later, we can revisit it.
Got it, then we can safely remove inclue/linux/if_packet.h in ovs because the minimal Linux version OVS supports has supported tpacket_v3. Thanks Ben for clarification. -----邮件原件----- 发件人: Ben Pfaff [mailto:blp@ovn.org] 发送时间: 2020年3月13日 23:57 收件人: Yi Yang (杨燚)-云服务集团 <yangyi01@inspur.com> 抄送: u9012063@gmail.com; yang_y_yi@163.com; ovs-dev@openvswitch.org 主题: Re: 答复: [ovs-dev] [PATCH v6] Use TPACKET_V3 to accelerate veth for userspace datapath On Fri, Mar 13, 2020 at 01:04:07AM +0000, Yi Yang (杨燚)-云服务集团 wrote: > Per my understanding, Ben meant a build system (which isn't Linux > probably, it doesn't have include/linux/if_packet.h) should be able to > build tpacket_v3 code in order that built-out binary can work on Linux > system with tpacket_v3 feature, this is Ben's point, that is why he > wanted me to add include/linux/if_packet.h in ovs repo. > > Ben, can you help double confirm if include/linux/if_packet.h in ovs > is necessary? I think my meaning was misunderstood. Linux always has if_packet.h. Only recent enough Linux has TPACKET_V3 in if_packet.h. If the system is Linux but the TPACKET_V3 types and constants are not defined in if_packet.h, then the build system should define them.
Io_uring is a feature brought in by Linux kernel 5.1, so it can't be used on Linux system with kernel version < 5.1. tpacket_v3 is only one way to avoid system call on almost all the Linux kernel versions, it is unique from this perspective. Maybe you will miss it if someone fixes kernel side issue :-) In addition, according to what Flavio said, TSO can't support VXLAN currently, but in most cloud scenarios, VXLAN is only one choice, so for such cases, TSO can be ignored. My point is we can provide one option for such use cases, once kernel side issue is fixed, all the Linux distributions can apply this fix, users can get immediate benefits without change. So maybe adding a switch userspace-use-tpacket-v3 in other-config (set to False by default) is an acceptable way to handle this. -----邮件原件----- 发件人: dev [mailto:ovs-dev-bounces@openvswitch.org] 代表 Ilya Maximets 发送时间: 2020年3月14日 0:48 收件人: William Tu <u9012063@gmail.com>; Ben Pfaff <blp@ovn.org> 抄送: yang_y_yi@163.com; ovs-dev@openvswitch.org; i.maximets@ovn.org 主题: Re: [ovs-dev] 答复: [PATCH v6] Use TPACKET_V3 to accelerate veth for userspace datapath On 3/13/20 5:22 PM, William Tu wrote: > On Fri, Mar 13, 2020 at 8:57 AM Ben Pfaff <blp@ovn.org> wrote: >> >> On Fri, Mar 13, 2020 at 01:04:07AM +0000, Yi Yang (杨燚)-云服务集团 wrote: >>> Per my understanding, Ben meant a build system (which isn't Linux >>> probably, it doesn't have include/linux/if_packet.h) should be able >>> to build tpacket_v3 code in order that built-out binary can work on >>> Linux system with tpacket_v3 feature, this is Ben's point, that is >>> why he wanted me to add include/linux/if_packet.h in ovs repo. >>> >>> Ben, can you help double confirm if include/linux/if_packet.h in ovs >>> is necessary? >> >> I think my meaning was misunderstood. Linux always has if_packet.h. >> Only recent enough Linux has TPACKET_V3 in if_packet.h. If the >> system is Linux but the TPACKET_V3 types and constants are not >> defined in if_packet.h, then the build system should define them. > > Thanks! > > My suggestion is that if the system is Linux but the TPACKET_V3 types > and constants are not defined in if_packet.h, then just skip using > TPACKET_V3 and > use the current recvmmsg approach. Because when we start TPACKET_V3 > patch, the af_packet on veth performance is about 200Mbps, so > tpacket_v3 has huge performance benefits. > > With YiYang's patch > "Use batch process recv for tap and raw socket in netdev datapath" > the af_packet on veth improves to 1.47Gbps. And tpacket_v3 shows > similar or 7% better performance. So there isn't a huge benefits now. With such a small performance benefit does it make sense to have these 700 lines of code that is so hard to read and maintain? Another point is that hopefully segmentation offloading in userspace datapath will evolve so we could enable it by default and all this code will become almost useless. If you're looking for poll mode/async -like solutions we could try and check io_uring way for calling same recvmsg/sendmsg. That might have more benefits and it will support all the functionality supported by these calls. Even better, we could also make io_uring support as an internal library and reuse it for other OVS subsystems like making async poll/timers/logging/etc in the future. Best regards, Ilya Maximets.
On Fri, Mar 13, 2020 at 9:45 PM Yi Yang (杨燚)-云服务集团 <yangyi01@inspur.com> wrote: > > Io_uring is a feature brought in by Linux kernel 5.1, so it can't be used on Linux system with kernel version < 5.1. tpacket_v3 is only one way to avoid system call on almost all the Linux kernel versions, it is unique from this perspective. Maybe you will miss it if someone fixes kernel side issue :-) > > In addition, according to what Flavio said, TSO can't support VXLAN currently, but in most cloud scenarios, VXLAN is only one choice, so for such cases, TSO can be ignored. > > My point is we can provide one option for such use cases, once kernel side issue is fixed, all the Linux distributions can apply this fix, users can get immediate benefits without change. So maybe adding a switch userspace-use-tpacket-v3 in other-config (set to False by default) is an acceptable way to handle this. > The tpacket_v3 patch now shows very little performance improvement. So there is little incentive to merge and maintain this code. Do you know if kernel side is fixed, will tpacket_v3 have better performance improvement? Or another way is to study io_uring and compare its performance with tpacket_v3. William
There might still be a misunderstanding. There can be a difference between the kernel that OVS runs on (version A) and the kernel headers against which it is built (version B). Often, the latter are supplied by the distribution and they are not usually kept as up to date, so B < A is common. I don't know whether this is likely to be a problem in this particular case. On Sat, Mar 14, 2020 at 03:35:46AM +0000, Yi Yang (杨燚)-云服务集团 wrote: > Got it, then we can safely remove inclue/linux/if_packet.h in ovs > because the minimal Linux version OVS supports has supported > tpacket_v3. Thanks Ben for clarification. > > -----邮件原件----- > 发件人: Ben Pfaff [mailto:blp@ovn.org] > 发送时间: 2020年3月13日 23:57 > 收件人: Yi Yang (杨燚)-云服务集团 <yangyi01@inspur.com> > 抄送: u9012063@gmail.com; yang_y_yi@163.com; ovs-dev@openvswitch.org > 主题: Re: 答复: [ovs-dev] [PATCH v6] Use TPACKET_V3 to accelerate veth for userspace datapath > > On Fri, Mar 13, 2020 at 01:04:07AM +0000, Yi Yang (杨燚)-云服务集团 wrote: > > Per my understanding, Ben meant a build system (which isn't Linux > > probably, it doesn't have include/linux/if_packet.h) should be able to > > build tpacket_v3 code in order that built-out binary can work on Linux > > system with tpacket_v3 feature, this is Ben's point, that is why he > > wanted me to add include/linux/if_packet.h in ovs repo. > > > > Ben, can you help double confirm if include/linux/if_packet.h in ovs > > is necessary? > > I think my meaning was misunderstood. Linux always has if_packet.h. > Only recent enough Linux has TPACKET_V3 in if_packet.h. If the system is Linux but the TPACKET_V3 types and constants are not defined in if_packet.h, then the build system should define them.
All the definitions/macros have been in include/linux/if_packet.h since 3.10.0, so there will not be that case existing. -----邮件原件----- 发件人: Ben Pfaff [mailto:blp@ovn.org] 发送时间: 2020年3月15日 4:04 收件人: Yi Yang (杨燚)-云服务集团 <yangyi01@inspur.com> 抄送: u9012063@gmail.com; yang_y_yi@163.com; ovs-dev@openvswitch.org 主题: Re: 答复: 答复: [ovs-dev] [PATCH v6] Use TPACKET_V3 to accelerate veth for userspace datapath There might still be a misunderstanding. There can be a difference between the kernel that OVS runs on (version A) and the kernel headers against which it is built (version B). Often, the latter are supplied by the distribution and they are not usually kept as up to date, so B < A is common. I don't know whether this is likely to be a problem in this particular case. On Sat, Mar 14, 2020 at 03:35:46AM +0000, Yi Yang (杨燚)-云服务集团 wrote: > Got it, then we can safely remove inclue/linux/if_packet.h in ovs > because the minimal Linux version OVS supports has supported > tpacket_v3. Thanks Ben for clarification. > > -----邮件原件----- > 发件人: Ben Pfaff [mailto:blp@ovn.org] > 发送时间: 2020年3月13日 23:57 > 收件人: Yi Yang (杨燚)-云服务集团 <yangyi01@inspur.com> > 抄送: u9012063@gmail.com; yang_y_yi@163.com; ovs-dev@openvswitch.org > 主题: Re: 答复: [ovs-dev] [PATCH v6] Use TPACKET_V3 to accelerate veth for > userspace datapath > > On Fri, Mar 13, 2020 at 01:04:07AM +0000, Yi Yang (杨燚)-云服务集团 wrote: > > Per my understanding, Ben meant a build system (which isn't Linux > > probably, it doesn't have include/linux/if_packet.h) should be able > > to build tpacket_v3 code in order that built-out binary can work on > > Linux system with tpacket_v3 feature, this is Ben's point, that is > > why he wanted me to add include/linux/if_packet.h in ovs repo. > > > > Ben, can you help double confirm if include/linux/if_packet.h in ovs > > is necessary? > > I think my meaning was misunderstood. Linux always has if_packet.h. > Only recent enough Linux has TPACKET_V3 in if_packet.h. If the system is Linux but the TPACKET_V3 types and constants are not defined in if_packet.h, then the build system should define them.
Hi, William Finally, my highend server is available and so I can do performance comparison again, tpacket_v3 obviously has big performance improvement, here is my data. By the way, in order to get stable performance data, please use taskset to pin ovs-vswitchd to a physical core (you shouldn't schedule other task to its logical sibling core for stable performance data), iperf3 client an iperf3 use different cores, for my case, ovs-vswitchd is pinned to core 1, iperf3 server is pinned to core 4, iperf3 client is pinned to core 5. According to my test, tpacket_v3 can get about 55% improvement (from 1.34 to 2.08, (2.08-1.34)/1.34 = 0.55) , with my further optimization (use zero copy for receive side), it can have more improvement (from 1.34 to 2.21, (2.21-1.34)/1.34 = 0.65), so I still think performance improvement is big, please reconsider it again. William, I can help you do performance check on your servers if you'd like, from these data and previous data, we can draw a conclusion performance data is very platform sensitive. You can schedule a meeting for further discussion if needed. No zero copy and no tpacket_v3 (recvmmsg, sendmmsg) =================================================== eipadmin@eip01:~$ sudo ./run-iperf3.sh Connecting to host 10.15.1.3, port 5201 [ 4] local 10.15.1.2 port 43194 connected to 10.15.1.3 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-10.00 sec 1.58 GBytes 1.35 Gbits/sec 13851 103 KBytes [ 4] 10.00-20.00 sec 1.56 GBytes 1.34 Gbits/sec 14018 94.7 KBytes [ 4] 20.00-30.00 sec 1.56 GBytes 1.34 Gbits/sec 13942 94.7 KBytes [ 4] 30.00-40.00 sec 1.56 GBytes 1.34 Gbits/sec 13565 106 KBytes [ 4] 40.00-50.00 sec 1.54 GBytes 1.32 Gbits/sec 14567 106 KBytes [ 4] 50.00-60.00 sec 1.56 GBytes 1.34 Gbits/sec 13738 84.8 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-60.00 sec 9.35 GBytes 1.34 Gbits/sec 83681 sender [ 4] 0.00-60.00 sec 9.35 GBytes 1.34 Gbits/sec receiver Server output: Accepted connection from 10.15.1.2, port 43192 [ 5] local 10.15.1.3 port 5201 connected to 10.15.1.2 port 43194 [ ID] Interval Transfer Bandwidth [ 5] 0.00-10.00 sec 1.57 GBytes 1.35 Gbits/sec [ 5] 10.00-20.00 sec 1.56 GBytes 1.34 Gbits/sec [ 5] 20.00-30.00 sec 1.56 GBytes 1.34 Gbits/sec [ 5] 30.00-40.00 sec 1.56 GBytes 1.34 Gbits/sec [ 5] 40.00-50.00 sec 1.54 GBytes 1.32 Gbits/sec [ 5] 50.00-60.00 sec 1.56 GBytes 1.34 Gbits/sec iperf Done. eipadmin@eip01:~$ No zero copy but with tpacket_v3 ================================ eipadmin@eip01:~$ sudo ./run-iperf3.sh Connecting to host 10.15.1.3, port 5201 [ 4] local 10.15.1.2 port 43174 connected to 10.15.1.3 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-10.00 sec 2.36 GBytes 2.02 Gbits/sec 0 3.04 MBytes [ 4] 10.00-20.00 sec 2.43 GBytes 2.09 Gbits/sec 0 3.04 MBytes [ 4] 20.00-30.00 sec 2.44 GBytes 2.09 Gbits/sec 0 3.04 MBytes [ 4] 30.00-40.00 sec 2.43 GBytes 2.09 Gbits/sec 0 3.04 MBytes [ 4] 40.00-50.00 sec 2.43 GBytes 2.09 Gbits/sec 0 3.04 MBytes [ 4] 50.00-60.00 sec 2.44 GBytes 2.10 Gbits/sec 0 3.04 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-60.00 sec 14.5 GBytes 2.08 Gbits/sec 0 sender [ 4] 0.00-60.00 sec 14.5 GBytes 2.08 Gbits/sec receiver Server output: Accepted connection from 10.15.1.2, port 43172 [ 5] local 10.15.1.3 port 5201 connected to 10.15.1.2 port 43174 [ ID] Interval Transfer Bandwidth [ 5] 0.00-10.00 sec 2.35 GBytes 2.02 Gbits/sec [ 5] 10.00-20.00 sec 2.43 GBytes 2.09 Gbits/sec [ 5] 20.00-30.00 sec 2.44 GBytes 2.09 Gbits/sec [ 5] 30.00-40.00 sec 2.43 GBytes 2.09 Gbits/sec [ 5] 40.00-50.00 sec 2.43 GBytes 2.09 Gbits/sec [ 5] 50.00-60.00 sec 2.44 GBytes 2.10 Gbits/sec iperf Done. eipadmin@eip01:~$ Have zero copy patch and tpacket_v3 =================================== eipadmin@eip01:~$ sudo ./run-iperf3.sh Connecting to host 10.15.1.3, port 5201 [ 4] local 10.15.1.2 port 43182 connected to 10.15.1.3 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-10.00 sec 2.54 GBytes 2.18 Gbits/sec 0 3.03 MBytes [ 4] 10.00-20.00 sec 2.58 GBytes 2.22 Gbits/sec 0 3.03 MBytes [ 4] 20.00-30.00 sec 2.58 GBytes 2.22 Gbits/sec 0 3.03 MBytes [ 4] 30.00-40.00 sec 2.59 GBytes 2.22 Gbits/sec 0 3.03 MBytes [ 4] 40.00-50.00 sec 2.57 GBytes 2.21 Gbits/sec 0 3.03 MBytes [ 4] 50.00-60.00 sec 2.57 GBytes 2.21 Gbits/sec 0 3.03 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-60.00 sec 15.4 GBytes 2.21 Gbits/sec 0 sender [ 4] 0.00-60.00 sec 15.4 GBytes 2.21 Gbits/sec receiver Server output: Accepted connection from 10.15.1.2, port 43180 [ 5] local 10.15.1.3 port 5201 connected to 10.15.1.2 port 43182 [ ID] Interval Transfer Bandwidth [ 5] 0.00-10.00 sec 2.53 GBytes 2.17 Gbits/sec [ 5] 10.00-20.00 sec 2.58 GBytes 2.22 Gbits/sec [ 5] 20.00-30.00 sec 2.58 GBytes 2.22 Gbits/sec [ 5] 30.00-40.00 sec 2.59 GBytes 2.22 Gbits/sec [ 5] 40.00-50.00 sec 2.57 GBytes 2.21 Gbits/sec [ 5] 50.00-60.00 sec 2.57 GBytes 2.21 Gbits/sec iperf Done. eipadmin@eip01:~$ -----邮件原件----- 发件人: William Tu [mailto:u9012063@gmail.com] 发送时间: 2020年3月14日 22:18 收件人: Yi Yang (杨燚)-云服务集团 <yangyi01@inspur.com> 抄送: i.maximets@ovn.org; blp@ovn.org; yang_y_yi@163.com; ovs-dev@openvswitch.org 主题: Re: [ovs-dev] 答复: [PATCH v6] Use TPACKET_V3 to accelerate veth for userspace datapath On Fri, Mar 13, 2020 at 9:45 PM Yi Yang (杨燚)-云服务集团 <yangyi01@inspur.com> wrote: > > Io_uring is a feature brought in by Linux kernel 5.1, so it can't be > used on Linux system with kernel version < 5.1. tpacket_v3 is only one > way to avoid system call on almost all the Linux kernel versions, it > is unique from this perspective. Maybe you will miss it if someone > fixes kernel side issue :-) > > In addition, according to what Flavio said, TSO can't support VXLAN currently, but in most cloud scenarios, VXLAN is only one choice, so for such cases, TSO can be ignored. > > My point is we can provide one option for such use cases, once kernel side issue is fixed, all the Linux distributions can apply this fix, users can get immediate benefits without change. So maybe adding a switch userspace-use-tpacket-v3 in other-config (set to False by default) is an acceptable way to handle this. > The tpacket_v3 patch now shows very little performance improvement. So there is little incentive to merge and maintain this code. Do you know if kernel side is fixed, will tpacket_v3 have better performance improvement? Or another way is to study io_uring and compare its performance with tpacket_v3. William
On Tue, Mar 17, 2020 at 2:08 AM Yi Yang (杨燚)-云服务集团 <yangyi01@inspur.com> wrote: > > Hi, William > > Finally, my highend server is available and so I can do performance comparison again, tpacket_v3 obviously has big performance improvement, here is my data. By the way, in order to get stable performance data, please use taskset to pin ovs-vswitchd to a physical core (you shouldn't schedule other task to its logical sibling core for stable performance data), iperf3 client an iperf3 use different cores, for my case, ovs-vswitchd is pinned to core 1, iperf3 server is pinned to core 4, iperf3 client is pinned to core 5. > > According to my test, tpacket_v3 can get about 55% improvement (from 1.34 to 2.08, (2.08-1.34)/1.34 = 0.55) , with my further optimization (use zero copy for receive side), it can have more improvement (from 1.34 to 2.21, (2.21-1.34)/1.34 = 0.65), so I still think performance improvement is big, please reconsider it again. > That's great improvement. What is your optimization "zero copy for receive side"? Does it include in the patch? Regards William
Great. Then we do not need any special case for if_packet.h. On Mon, Mar 16, 2020 at 12:48:20AM +0000, Yi Yang (杨燚)-云服务集团 wrote: > All the definitions/macros have been in include/linux/if_packet.h since 3.10.0, so there will not be that case existing. > > -----邮件原件----- > 发件人: Ben Pfaff [mailto:blp@ovn.org] > 发送时间: 2020年3月15日 4:04 > 收件人: Yi Yang (杨燚)-云服务集团 <yangyi01@inspur.com> > 抄送: u9012063@gmail.com; yang_y_yi@163.com; ovs-dev@openvswitch.org > 主题: Re: 答复: 答复: [ovs-dev] [PATCH v6] Use TPACKET_V3 to accelerate veth for userspace datapath > > There might still be a misunderstanding. > > There can be a difference between the kernel that OVS runs on (version > A) and the kernel headers against which it is built (version B). Often, the latter are supplied by the distribution and they are not usually kept as up to date, so B < A is common. > > I don't know whether this is likely to be a problem in this particular case. > > On Sat, Mar 14, 2020 at 03:35:46AM +0000, Yi Yang (杨燚)-云服务集团 wrote: > > Got it, then we can safely remove inclue/linux/if_packet.h in ovs > > because the minimal Linux version OVS supports has supported > > tpacket_v3. Thanks Ben for clarification. > > > > -----邮件原件----- > > 发件人: Ben Pfaff [mailto:blp@ovn.org] > > 发送时间: 2020年3月13日 23:57 > > 收件人: Yi Yang (杨燚)-云服务集团 <yangyi01@inspur.com> > > 抄送: u9012063@gmail.com; yang_y_yi@163.com; ovs-dev@openvswitch.org > > 主题: Re: 答复: [ovs-dev] [PATCH v6] Use TPACKET_V3 to accelerate veth for > > userspace datapath > > > > On Fri, Mar 13, 2020 at 01:04:07AM +0000, Yi Yang (杨燚)-云服务集团 wrote: > > > Per my understanding, Ben meant a build system (which isn't Linux > > > probably, it doesn't have include/linux/if_packet.h) should be able > > > to build tpacket_v3 code in order that built-out binary can work on > > > Linux system with tpacket_v3 feature, this is Ben's point, that is > > > why he wanted me to add include/linux/if_packet.h in ovs repo. > > > > > > Ben, can you help double confirm if include/linux/if_packet.h in ovs > > > is necessary? > > > > I think my meaning was misunderstood. Linux always has if_packet.h. > > Only recent enough Linux has TPACKET_V3 in if_packet.h. If the system is Linux but the TPACKET_V3 types and constants are not defined in if_packet.h, then the build system should define them. > >
William, are you trying my patch for zero copy? I can send you for a try on your platform. Per your af_xdp change, I find dp_packet can use pre-allocated buffer, so I used that way, because tpacket_v3 has setup rx ring there, so dp_packet can directly use those rx ring buffer. -----邮件原件----- 发件人: William Tu [mailto:u9012063@gmail.com] 发送时间: 2020年3月17日 22:58 收件人: Yi Yang (杨燚)-云服务集团 <yangyi01@inspur.com> 抄送: i.maximets@ovn.org; blp@ovn.org; yang_y_yi@163.com; ovs-dev@openvswitch.org 主题: Re: [ovs-dev] 答复: [PATCH v6] Use TPACKET_V3 to accelerate veth for userspace datapath On Tue, Mar 17, 2020 at 2:08 AM Yi Yang (杨燚)-云服务集团 <yangyi01@inspur.com> wrote: > > Hi, William > > Finally, my highend server is available and so I can do performance comparison again, tpacket_v3 obviously has big performance improvement, here is my data. By the way, in order to get stable performance data, please use taskset to pin ovs-vswitchd to a physical core (you shouldn't schedule other task to its logical sibling core for stable performance data), iperf3 client an iperf3 use different cores, for my case, ovs-vswitchd is pinned to core 1, iperf3 server is pinned to core 4, iperf3 client is pinned to core 5. > > According to my test, tpacket_v3 can get about 55% improvement (from 1.34 to 2.08, (2.08-1.34)/1.34 = 0.55) , with my further optimization (use zero copy for receive side), it can have more improvement (from 1.34 to 2.21, (2.21-1.34)/1.34 = 0.65), so I still think performance improvement is big, please reconsider it again. > That's great improvement. What is your optimization "zero copy for receive side"? Does it include in the patch? Regards William
By the way, with tpacket_v3, zero copy optimization and is_pmd=true, the performance is much better, 3.77Gbps, (3.77-1.34)/1.34 = 1.81 , i.e. 181% improvement, here is the performance data. is_pmd = true ============= eipadmin@eip01:~$ sudo ./run-iperf3.sh Connecting to host 10.15.1.3, port 5201 [ 4] local 10.15.1.2 port 43210 connected to 10.15.1.3 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-10.00 sec 4.34 GBytes 3.73 Gbits/sec 0 3.03 MBytes [ 4] 10.00-20.00 sec 4.40 GBytes 3.78 Gbits/sec 0 3.03 MBytes [ 4] 20.00-30.00 sec 4.40 GBytes 3.78 Gbits/sec 0 3.03 MBytes [ 4] 30.00-40.00 sec 4.40 GBytes 3.78 Gbits/sec 0 3.03 MBytes [ 4] 40.00-50.00 sec 4.40 GBytes 3.78 Gbits/sec 0 3.03 MBytes [ 4] 50.00-60.00 sec 4.40 GBytes 3.78 Gbits/sec 0 3.03 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-60.00 sec 26.3 GBytes 3.77 Gbits/sec 0 sender [ 4] 0.00-60.00 sec 26.3 GBytes 3.77 Gbits/sec receiver Server output: Accepted connection from 10.15.1.2, port 43208 [ 5] local 10.15.1.3 port 5201 connected to 10.15.1.2 port 43210 [ ID] Interval Transfer Bandwidth [ 5] 0.00-10.00 sec 4.32 GBytes 3.71 Gbits/sec [ 5] 10.00-20.00 sec 4.40 GBytes 3.78 Gbits/sec [ 5] 20.00-30.00 sec 4.40 GBytes 3.78 Gbits/sec [ 5] 30.00-40.00 sec 4.40 GBytes 3.78 Gbits/sec [ 5] 40.00-50.00 sec 4.40 GBytes 3.78 Gbits/sec [ 5] 50.00-60.00 sec 4.40 GBytes 3.78 Gbits/sec iperf Done. eipadmin@eip01:~$ -----邮件原件----- 发件人: William Tu [mailto:u9012063@gmail.com] 发送时间: 2020年3月17日 22:58 收件人: Yi Yang (杨燚)-云服务集团 <yangyi01@inspur.com> 抄送: i.maximets@ovn.org; blp@ovn.org; yang_y_yi@163.com; ovs-dev@openvswitch.org 主题: Re: [ovs-dev] 答复: [PATCH v6] Use TPACKET_V3 to accelerate veth for userspace datapath On Tue, Mar 17, 2020 at 2:08 AM Yi Yang (杨燚)-云服务集团 <yangyi01@inspur.com> wrote: > > Hi, William > > Finally, my highend server is available and so I can do performance comparison again, tpacket_v3 obviously has big performance improvement, here is my data. By the way, in order to get stable performance data, please use taskset to pin ovs-vswitchd to a physical core (you shouldn't schedule other task to its logical sibling core for stable performance data), iperf3 client an iperf3 use different cores, for my case, ovs-vswitchd is pinned to core 1, iperf3 server is pinned to core 4, iperf3 client is pinned to core 5. > > According to my test, tpacket_v3 can get about 55% improvement (from 1.34 to 2.08, (2.08-1.34)/1.34 = 0.55) , with my further optimization (use zero copy for receive side), it can have more improvement (from 1.34 to 2.21, (2.21-1.34)/1.34 = 0.65), so I still think performance improvement is big, please reconsider it again. > That's great improvement. What is your optimization "zero copy for receive side"? Does it include in the patch? Regards William
On Tue, Mar 17, 2020 at 7:00 PM Yi Yang (杨燚)-云服务集团 <yangyi01@inspur.com> wrote: > > By the way, with tpacket_v3, zero copy optimization and is_pmd=true, the performance is much better, 3.77Gbps, (3.77-1.34)/1.34 = 1.81 , i.e. 181% improvement, here is the performance data. > Can you send out the tpacket_v3 patch together with these optimizations to the mailing list? Thanks William
Ok, I will send out v7 with these changes. -----邮件原件----- 发件人: William Tu [mailto:u9012063@gmail.com] 发送时间: 2020年3月18日 11:56 收件人: Yi Yang (杨燚)-云服务集团 <yangyi01@inspur.com> 抄送: i.maximets@ovn.org; blp@ovn.org; yang_y_yi@163.com; ovs-dev@openvswitch.org 主题: Re: [ovs-dev] 答复: [PATCH v6] Use TPACKET_V3 to accelerate veth for userspace datapath On Tue, Mar 17, 2020 at 7:00 PM Yi Yang (杨燚)-云服务集团 <yangyi01@inspur.com> wrote: > > By the way, with tpacket_v3, zero copy optimization and is_pmd=true, the performance is much better, 3.77Gbps, (3.77-1.34)/1.34 = 1.81 , i.e. 181% improvement, here is the performance data. > Can you send out the tpacket_v3 patch together with these optimizations to the mailing list? Thanks William
diff --git a/acinclude.m4 b/acinclude.m4 index 1488deda0371..4b11085ab190 100644 --- a/acinclude.m4 +++ b/acinclude.m4 @@ -1086,12 +1086,14 @@ dnl OVS_CHECK_LINUX_TPACKET dnl dnl Configure Linux TPACKET. AC_DEFUN([OVS_CHECK_LINUX_TPACKET], [ - AC_COMPILE_IFELSE([ - AC_LANG_PROGRAM([#include <linux/if_packet.h>], [ - struct tpacket3_hdr x = { 0 }; - ])], - [AC_DEFINE([HAVE_TPACKET_V3], [1], - [Define to 1 if struct tpacket3_hdr is available.])]) + AC_CHECK_HEADER([linux/if_packet.h], + [AC_COMPILE_IFELSE([ + AC_LANG_PROGRAM([#include <linux/if_packet.h>], [ + struct tpacket3_hdr x = { 0 }; + ])], + [AC_DEFINE([HAVE_TPACKET_V3], [1], + [Define to 1 if struct tpacket3_hdr is available.])])], + []) ]) dnl Checks for buggy strtok_r. diff --git a/include/linux/automake.mk b/include/linux/automake.mk index a659e65abe27..8f063f482e15 100644 --- a/include/linux/automake.mk +++ b/include/linux/automake.mk @@ -1,5 +1,4 @@ noinst_HEADERS += \ - include/linux/if_packet.h \ include/linux/netlink.h \ include/linux/netfilter/nf_conntrack_sctp.h \ include/linux/pkt_cls.h \ diff --git a/include/linux/if_packet.h b/include/linux/if_packet.h deleted file mode 100644 index e20aaccb1e32..000000000000 --- a/include/linux/if_packet.h +++ /dev/null @@ -1,128 +0,0 @@ -#ifndef __LINUX_IF_PACKET_WRAPPER_H -#define __LINUX_IF_PACKET_WRAPPER_H 1 - -#ifdef HAVE_TPACKET_V3 -#include_next <linux/if_packet.h> -#else -#define HAVE_TPACKET_V3 1 - -struct sockaddr_pkt { - unsigned short spkt_family; - unsigned char spkt_device[14]; - uint16_t spkt_protocol; -}; - -struct sockaddr_ll { - unsigned short sll_family; - uint16_t sll_protocol; - int sll_ifindex; - unsigned short sll_hatype; - unsigned char sll_pkttype; - unsigned char sll_halen; - unsigned char sll_addr[8]; -}; - -/* Packet types */ -#define PACKET_HOST 0 /* To us */ -#define PACKET_OTHERHOST 3 /* To someone else */ -#define PACKET_LOOPBACK 5 /* MC/BRD frame looped back */ - -/* Packet socket options */ -#define PACKET_RX_RING 5 -#define PACKET_VERSION 10 -#define PACKET_TX_RING 13 -#define PACKET_VNET_HDR 15 - -/* Rx ring - header status */ -#define TP_STATUS_KERNEL 0 -#define TP_STATUS_USER (1 << 0) -#define TP_STATUS_VLAN_VALID (1 << 4) /* auxdata has valid tp_vlan_tci */ -#define TP_STATUS_VLAN_TPID_VALID (1 << 6) /* auxdata has valid tp_vlan_tpid */ - -/* Tx ring - header status */ -#define TP_STATUS_SEND_REQUEST (1 << 0) -#define TP_STATUS_SENDING (1 << 1) - -struct tpacket_hdr { - unsigned long tp_status; - unsigned int tp_len; - unsigned int tp_snaplen; - unsigned short tp_mac; - unsigned short tp_net; - unsigned int tp_sec; - unsigned int tp_usec; -}; - -#define TPACKET_ALIGNMENT 16 -#define TPACKET_ALIGN(x) (((x)+TPACKET_ALIGNMENT-1)&~(TPACKET_ALIGNMENT-1)) - -struct tpacket_hdr_variant1 { - uint32_t tp_rxhash; - uint32_t tp_vlan_tci; - uint16_t tp_vlan_tpid; - uint16_t tp_padding; -}; - -struct tpacket3_hdr { - uint32_t tp_next_offset; - uint32_t tp_sec; - uint32_t tp_nsec; - uint32_t tp_snaplen; - uint32_t tp_len; - uint32_t tp_status; - uint16_t tp_mac; - uint16_t tp_net; - /* pkt_hdr variants */ - union { - struct tpacket_hdr_variant1 hv1; - }; - uint8_t tp_padding[8]; -}; - -struct tpacket_bd_ts { - unsigned int ts_sec; - union { - unsigned int ts_usec; - unsigned int ts_nsec; - }; -}; - -struct tpacket_hdr_v1 { - uint32_t block_status; - uint32_t num_pkts; - uint32_t offset_to_first_pkt; - uint32_t blk_len; - uint64_t __attribute__((aligned(8))) seq_num; - struct tpacket_bd_ts ts_first_pkt, ts_last_pkt; -}; - -union tpacket_bd_header_u { - struct tpacket_hdr_v1 bh1; -}; - -struct tpacket_block_desc { - uint32_t version; - uint32_t offset_to_priv; - union tpacket_bd_header_u hdr; -}; - -#define TPACKET3_HDRLEN \ - (TPACKET_ALIGN(sizeof(struct tpacket3_hdr)) + sizeof(struct sockaddr_ll)) - -enum tpacket_versions { - TPACKET_V1, - TPACKET_V2, - TPACKET_V3 -}; - -struct tpacket_req3 { - unsigned int tp_block_size; /* Minimal size of contiguous block */ - unsigned int tp_block_nr; /* Number of blocks */ - unsigned int tp_frame_size; /* Size of frame */ - unsigned int tp_frame_nr; /* Total number of frames */ - unsigned int tp_retire_blk_tov; /* Timeout in msecs */ - unsigned int tp_sizeof_priv; /* Offset to private data area */ - unsigned int tp_feature_req_word; -}; -#endif /* HAVE_TPACKET_V3 */ -#endif /* __LINUX_IF_PACKET_WRAPPER_H */ diff --git a/include/sparse/linux/if_packet.h b/include/sparse/linux/if_packet.h index 0ac3fcefc895..3813892a0788 100644 --- a/include/sparse/linux/if_packet.h +++ b/include/sparse/linux/if_packet.h @@ -28,114 +28,4 @@ struct sockaddr_ll { unsigned char sll_addr[8]; }; -/* Packet types */ -#define PACKET_HOST 0 /* To us */ -#define PACKET_OTHERHOST 3 /* To someone else */ -#define PACKET_LOOPBACK 5 /* MC/BRD frame looped back */ - -/* Packet socket options */ -#define PACKET_RX_RING 5 -#define PACKET_VERSION 10 -#define PACKET_TX_RING 13 -#define PACKET_VNET_HDR 15 - -/* Rx ring - header status */ -#define TP_STATUS_KERNEL 0 -#define TP_STATUS_USER (1 << 0) -#define TP_STATUS_VLAN_VALID (1 << 4) /* auxdata has valid tp_vlan_tci */ -#define TP_STATUS_VLAN_TPID_VALID (1 << 6) /* auxdata has valid tp_vlan_tpid */ - -/* Tx ring - header status */ -#define TP_STATUS_SEND_REQUEST (1 << 0) -#define TP_STATUS_SENDING (1 << 1) - -#define tpacket_hdr rpl_tpacket_hdr -struct tpacket_hdr { - unsigned long tp_status; - unsigned int tp_len; - unsigned int tp_snaplen; - unsigned short tp_mac; - unsigned short tp_net; - unsigned int tp_sec; - unsigned int tp_usec; -}; - -#define TPACKET_ALIGNMENT 16 -#define TPACKET_ALIGN(x) (((x)+TPACKET_ALIGNMENT-1)&~(TPACKET_ALIGNMENT-1)) - -#define tpacket_hdr_variant1 rpl_tpacket_hdr_variant1 -struct tpacket_hdr_variant1 { - uint32_t tp_rxhash; - uint32_t tp_vlan_tci; - uint16_t tp_vlan_tpid; - uint16_t tp_padding; -}; - -#define tpacket3_hdr rpl_tpacket3_hdr -struct tpacket3_hdr { - uint32_t tp_next_offset; - uint32_t tp_sec; - uint32_t tp_nsec; - uint32_t tp_snaplen; - uint32_t tp_len; - uint32_t tp_status; - uint16_t tp_mac; - uint16_t tp_net; - /* pkt_hdr variants */ - union { - struct tpacket_hdr_variant1 hv1; - }; - uint8_t tp_padding[8]; -}; - -#define tpacket_bd_ts rpl_tpacket_bd_ts -struct tpacket_bd_ts { - unsigned int ts_sec; - union { - unsigned int ts_usec; - unsigned int ts_nsec; - }; -}; - -#define tpacket_hdr_v1 rpl_tpacket_hdr_v1 -struct tpacket_hdr_v1 { - uint32_t block_status; - uint32_t num_pkts; - uint32_t offset_to_first_pkt; - uint32_t blk_len; - uint64_t __attribute__((aligned(8))) seq_num; - struct tpacket_bd_ts ts_first_pkt, ts_last_pkt; -}; - -#define tpacket_bd_header_u rpl_tpacket_bd_header_u -union tpacket_bd_header_u { - struct tpacket_hdr_v1 bh1; -}; - -#define tpacket_block_desc rpl_tpacket_block_desc -struct tpacket_block_desc { - uint32_t version; - uint32_t offset_to_priv; - union tpacket_bd_header_u hdr; -}; - -#define TPACKET3_HDRLEN \ - (TPACKET_ALIGN(sizeof(struct tpacket3_hdr)) + sizeof(struct sockaddr_ll)) - -enum rpl_tpacket_versions { - TPACKET_V1, - TPACKET_V2, - TPACKET_V3 -}; - -#define tpacket_req3 rpl_tpacket_req3 -struct tpacket_req3 { - unsigned int tp_block_size; /* Minimal size of contiguous block */ - unsigned int tp_block_nr; /* Number of blocks */ - unsigned int tp_frame_size; /* Size of frame */ - unsigned int tp_frame_nr; /* Total number of frames */ - unsigned int tp_retire_blk_tov; /* Timeout in msecs */ - unsigned int tp_sizeof_priv; /* Offset to private data area */ - unsigned int tp_feature_req_word; -}; #endif _______________________________________________ dev mailing list dev@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev