Message ID | 1272463605.2267.70.camel@edumazet-laptop |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
Le mercredi 28 avril 2010 à 16:06 +0200, Eric Dumazet a écrit : > Le mercredi 28 avril 2010 à 08:36 -0400, jamal a écrit : > > On Wed, 2010-04-28 at 14:33 +0200, Eric Dumazet wrote: > > > > > If you wait a bit, I have another patch to speedup udp receive path ;) > > > > Shoot whenever you are ready ;-> I will test with and without your > > patch.. > > > > Here it is ;) > > Thanks I forgot to say that with my previous DDOS test/bench (16 cpus trying to feed one udp socket), my receiver can now process 420.000 pps instead of 200.000 ;) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le mercredi 28 avril 2010 à 16:19 +0200, Eric Dumazet a écrit : > I forgot to say that with my previous DDOS test/bench (16 cpus trying to > feed one udp socket), my receiver can now process 420.000 pps instead of > 200.000 ;) And perf top of the cpu dedicated to the thread doing the recvmsg() is : (after patch) ---------------------------------------------------------------------------------------------------------------------------------------------- PerfTop: 1001 irqs/sec kernel:98.0% [1000Hz cycles], (all, cpu: 1) ---------------------------------------------------------------------------------------------------------------------------------------------- samples pcnt function DSO _______ _____ _____________________________ ____________________________ 5463.00 45.5% _raw_spin_lock_bh vmlinux 761.00 6.3% copy_user_generic_string vmlinux 662.00 5.5% sock_recv_ts_and_drops vmlinux 645.00 5.4% kfree vmlinux 568.00 4.7% _raw_spin_lock vmlinux 494.00 4.1% __skb_recv_datagram vmlinux 488.00 4.1% skb_copy_datagram_iovec vmlinux 467.00 3.9% __slab_free vmlinux 176.00 1.5% udp_recvmsg vmlinux 168.00 1.4% ia32_sysenter_target vmlinux 161.00 1.3% kmem_cache_free vmlinux 161.00 1.3% _raw_spin_lock_irqsave vmlinux 151.00 1.3% memcpy_toiovec vmlinux 131.00 1.1% fget_light vmlinux 130.00 1.1% sock_rfree vmlinux 104.00 0.9% inet_recvmsg vmlinux 99.00 0.8% dst_release vmlinux 98.00 0.8% skb_release_head_state vmlinux 83.00 0.7% __sk_mem_reclaim vmlinux 75.00 0.6% sys_recvfrom vmlinux 61.00 0.5% sysexit_from_sys_call vmlinux 59.00 0.5% fput vmlinux 56.00 0.5% schedule vmlinux 56.00 0.5% sock_recvmsg vmlinux 54.00 0.4% move_addr_to_user vmlinux 51.00 0.4% compat_sys_socketcall vmlinux 48.00 0.4% _raw_spin_unlock_bh vmlinux -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Wed, 28 Apr 2010 16:06:45 +0200 > [PATCH net-next-2.6] net: speedup udp receive path > > Since commit 95766fff ([UDP]: Add memory accounting.), > each received packet needs one extra sock_lock()/sock_release() pair. > > This added latency because of possible backlog handling. Then later, > ticket spinlocks added yet another latency source in case of DDOS. > > This patch introduces lock_sock_bh() and unlock_sock_bh() > synchronization primitives, avoiding one atomic operation and backlog > processing. > > skb_free_datagram_locked() uses them instead of full blown > lock_sock()/release_sock(). skb is orphaned inside locked section for > proper socket memory reclaim, and finally freed outside of it. > > UDP receive path now take the socket spinlock only once. > > Signed-off-by: Eric DUmazet <eric.dumazet@gmail.com> Clever, let's see what this breaks :-) Applied, thanks Eric. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2010-04-28 at 16:06 +0200, Eric Dumazet wrote:
> Here it is ;)
Sorry - things got a little hectic with TheMan.
I am afraid i dont have good news.
Actually, I should say i dont have good news in regards to rps.
For my sample app, two things seem to be happening:
a) The overall performance has gotten better for both rps
and non-rps.
b) non-rps is now performing relatively better
This is just what i see in net-next not related to your patch.
It seems the kernels i tested prior to April 23 showed rps better.
The one i tested on Apr23 showed rps being about the same as non-rps.
As i stated in my last result posting, I thought i didnt test properly
but i did again today and saw the same thing. And now non-rps is
_consistently_ better.
So some regression is going on...
Your patch has improved the performance of rps relative to what is in
net-next very lightly; but it has also improved the performance of
non-rps;->
My traces look different for the app cpu than yours - likely because of
the apps being different.
At the moment i dont have time to dig deeper into code, but i could
test as cycles show up.
I am attaching the profile traces and results.
cheers,
jamal
April 23 net-next
kernel sink cpu all cpuint cpuapp
---------------------------------------------------------
nn 93.95% 84.5% 99.8% 79.8%
nn-rps 96.41% 85.4% 95.5% 82.5%
nn-cl 97.29% 84.0% 99.9% 79.6%
nn-cl-rps 97.76% 86.5% 96.5% 84.8%
nn: Basic net-next from Apr23
nn-rps: Basic net-next from Apr23 with rps mask ee and irq affinity to cpu0
nn-cl: Basic net-next from Apr23 + Changli patch
nn-cl-rps: Basic net-next from Apr23 + Changli patch + rps mask ee,irq aff cpu0
sink: the amount of traffic the system was able to sink in.
cpu all: avg % system cpu consumed in test
cpuint: avg %cpu consumed by the cpu where interrupts happened
cpuapp: avg %cpu consumed by a sample cpu which did app processing
Now repeat with Erics changes and kernel from Apr-28
kernel sink cpu all cpuint cpuapp
---------------------------------------------------------
nn2 98.78% 83.6% 100.0% 82.8%
nn2-rps 94.43% 84.2% 98.1% 82.0%
nn2-ed 98.74% 83.2% 99.9% 81.6%
nn2-ed-rps 95.15% 84.5% 97.3% 82.1%
nn2: Basic net-next from Apr28
nn2-rps: Basic net-next from Apr23 with rps mask ee and irq affinity to cpu0
nn2-ed: Basic net-next from Apr23 + Eric patch
nn2-ed-rps: Basic net-next from Apr23 + Eric patch + rps mask ee,irq aff cpu0
I: net-next
Average udp sink: 98.78%
--------------------------------------------------------------------------------------------------
PerfTop: 3632 irqs/sec kernel:83.7% [1000Hz cycles], (all, 8 CPUs)
--------------------------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ___________________________ ____________________
2738.00 9.8% sky2_poll [sky2]
1543.00 5.5% _raw_spin_lock_irqsave [kernel]
1019.00 3.7% system_call [kernel]
740.00 2.7% copy_user_generic_string [kernel]
687.00 2.5% fget [kernel]
640.00 2.3% _raw_spin_unlock_irqrestore [kernel]
634.00 2.3% sys_epoll_ctl [kernel]
613.00 2.2% datagram_poll [kernel]
553.00 2.0% _raw_spin_lock_bh [kernel]
530.00 1.9% kmem_cache_free [kernel]
522.00 1.9% schedule [kernel]
487.00 1.7% vread_tsc [kernel].vsyscall_fn
467.00 1.7% _raw_spin_lock [kernel]
432.00 1.5% udp_recvmsg [kernel]
426.00 1.5% kmem_cache_alloc [kernel]
418.00 1.5% __udp4_lib_lookup [kernel]
417.00 1.5% sys_epoll_wait [kernel]
376.00 1.3% fput [kernel]
361.00 1.3% ip_route_input [kernel]
344.00 1.2% local_bh_enable_ip [kernel]
326.00 1.2% ip_rcv [kernel]
321.00 1.2% first_packet_length [kernel]
307.00 1.1% ep_remove [kernel]
303.00 1.1% dst_release [kernel]
301.00 1.1% skb_copy_datagram_iovec [kernel]
297.00 1.1% mutex_lock [kernel]
--------------------------------------------------------------------------------------------------
PerfTop: 4018 irqs/sec kernel:83.3% [1000Hz cycles], (all, 8 CPUs)
--------------------------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ___________________________ ______________________
4274.00 9.7% sky2_poll [sky2]
2473.00 5.6% _raw_spin_lock_irqsave [kernel]
1585.00 3.6% system_call [kernel]
1179.00 2.7% copy_user_generic_string [kernel]
1089.00 2.5% fget [kernel]
1019.00 2.3% _raw_spin_unlock_irqrestore [kernel]
1011.00 2.3% sys_epoll_ctl [kernel]
965.00 2.2% datagram_poll [kernel]
902.00 2.0% kmem_cache_free [kernel]
841.00 1.9% _raw_spin_lock_bh [kernel]
837.00 1.9% schedule [kernel]
735.00 1.7% vread_tsc [kernel].vsyscall_fn
730.00 1.7% udp_recvmsg [kernel]
729.00 1.7% _raw_spin_lock [kernel]
678.00 1.5% kmem_cache_alloc [kernel]
651.00 1.5% sys_epoll_wait [kernel]
635.00 1.4% __udp4_lib_lookup [kernel]
595.00 1.3% fput [kernel]
568.00 1.3% local_bh_enable_ip [kernel]
562.00 1.3% ip_route_input [kernel]
516.00 1.2% dst_release [kernel]
502.00 1.1% ep_remove [kernel]
485.00 1.1% skb_copy_datagram_iovec [kernel]
484.00 1.1% first_packet_length [kernel]
476.00 1.1% ip_rcv [kernel]
470.00 1.1% __alloc_skb [kernel]
459.00 1.0% epoll_ctl /lib/libc-2.7.so
458.00 1.0% mutex_lock [kernel]
--------------------------------------------------------------------------------------------------
PerfTop: 1000 irqs/sec kernel:100.0% [1000Hz cycles], (all, cpu: 0)
--------------------------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ___________________________ ________
3534.00 34.7% sky2_poll [sky2]
545.00 5.3% __udp4_lib_lookup [kernel]
537.00 5.3% ip_route_input [kernel]
427.00 4.2% _raw_spin_lock_irqsave [kernel]
401.00 3.9% __alloc_skb [kernel]
360.00 3.5% ip_rcv [kernel]
332.00 3.3% _raw_spin_lock [kernel]
292.00 2.9% sock_queue_rcv_skb [kernel]
291.00 2.9% __udp4_lib_rcv [kernel]
273.00 2.7% sock_def_readable [kernel]
269.00 2.6% __netif_receive_skb [kernel]
209.00 2.1% __wake_up_common [kernel]
196.00 1.9% __kmalloc [kernel]
164.00 1.6% _raw_read_lock [kernel]
157.00 1.5% kmem_cache_alloc [kernel]
157.00 1.5% ep_poll_callback [kernel]
133.00 1.3% resched_task [kernel]
128.00 1.3% task_rq_lock [kernel]
120.00 1.2% swiotlb_sync_single [kernel]
120.00 1.2% sky2_rx_submit [sky2]
117.00 1.1% udp_queue_rcv_skb [kernel]
108.00 1.1% ip_local_deliver [kernel]
104.00 1.0% try_to_wake_up [kernel]
102.00 1.0% _raw_spin_unlock_irqrestore [kernel]
98.00 1.0% select_task_rq_fair [kernel]
--------------------------------------------------------------------------------------------------
PerfTop: 1000 irqs/sec kernel:100.0% [1000Hz cycles], (all, cpu: 0)
--------------------------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ___________________________ ________
4601.00 34.0% sky2_poll [sky2]
732.00 5.4% __udp4_lib_lookup [kernel]
724.00 5.3% ip_route_input [kernel]
527.00 3.9% _raw_spin_lock_irqsave [kernel]
520.00 3.8% __alloc_skb [kernel]
483.00 3.6% ip_rcv [kernel]
441.00 3.3% _raw_spin_lock [kernel]
401.00 3.0% sock_queue_rcv_skb [kernel]
373.00 2.8% __udp4_lib_rcv [kernel]
365.00 2.7% sock_def_readable [kernel]
353.00 2.6% __netif_receive_skb [kernel]
285.00 2.1% __wake_up_common [kernel]
273.00 2.0% __kmalloc [kernel]
230.00 1.7% _raw_read_lock [kernel]
208.00 1.5% ep_poll_callback [kernel]
199.00 1.5% kmem_cache_alloc [kernel]
180.00 1.3% task_rq_lock [kernel]
172.00 1.3% sky2_rx_submit [sky2]
171.00 1.3% resched_task [kernel]
165.00 1.2% ip_local_deliver [kernel]
162.00 1.2% udp_queue_rcv_skb [kernel]
158.00 1.2% _raw_spin_unlock_irqrestore [kernel]
148.00 1.1% select_task_rq_fair [kernel]
144.00 1.1% try_to_wake_up [kernel]
142.00 1.0% sky2_remove [sky2]
140.00 1.0% swiotlb_sync_single [kernel]
95.00 0.7% cache_alloc_refill [kernel]
92.00 0.7% dev_gro_receive [kernel]
82.00 0.6% is_swiotlb_buffer [kernel]
--------------------------------------------------------------------------------------------------
PerfTop: 622 irqs/sec kernel:74.9% [1000Hz cycles], (all, cpu: 2)
--------------------------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ___________________________ _____________________________________
113.00 6.5% _raw_spin_lock_irqsave /lib/modules/2.6.34-rc5/build/vmlinux
105.00 6.0% system_call /lib/modules/2.6.34-rc5/build/vmlinux
69.00 3.9% fget /lib/modules/2.6.34-rc5/build/vmlinux
64.00 3.7% datagram_poll /lib/modules/2.6.34-rc5/build/vmlinux
56.00 3.2% copy_user_generic_string /lib/modules/2.6.34-rc5/build/vmlinux
55.00 3.1% sys_epoll_ctl /lib/modules/2.6.34-rc5/build/vmlinux
53.00 3.0% _raw_spin_unlock_irqrestore /lib/modules/2.6.34-rc5/build/vmlinux
46.00 2.6% _raw_spin_lock_bh /lib/modules/2.6.34-rc5/build/vmlinux
42.00 2.4% kmem_cache_free /lib/modules/2.6.34-rc5/build/vmlinux
37.00 2.1% dst_release /lib/modules/2.6.34-rc5/build/vmlinux
37.00 2.1% schedule /lib/modules/2.6.34-rc5/build/vmlinux
35.00 2.0% mutex_lock /lib/modules/2.6.34-rc5/build/vmlinux
35.00 2.0% vread_tsc [kernel].vsyscall_fn
35.00 2.0% udp_recvmsg /lib/modules/2.6.34-rc5/build/vmlinux
34.00 1.9% sys_epoll_wait /lib/modules/2.6.34-rc5/build/vmlinux
31.00 1.8% local_bh_enable_ip /lib/modules/2.6.34-rc5/build/vmlinux
29.00 1.7% ep_remove /lib/modules/2.6.34-rc5/build/vmlinux
28.00 1.6% kmem_cache_alloc /lib/modules/2.6.34-rc5/build/vmlinux
27.00 1.5% process_recv /home/hadi/udp_sink/mcpudp
25.00 1.4% mutex_unlock /lib/modules/2.6.34-rc5/build/vmlinux
24.00 1.4% ep_send_events_proc /lib/modules/2.6.34-rc5/build/vmlinux
24.00 1.4% clock_gettime /lib/librt-2.7.so
23.00 1.3% fput /lib/modules/2.6.34-rc5/build/vmlinux
23.00 1.3% skb_copy_datagram_iovec /lib/modules/2.6.34-rc5/build/vmlinux
20.00 1.1% sock_recv_ts_and_drops /lib/modules/2.6.34-rc5/build/vmlinux
20.00 1.1% inet_recvmsg /lib/modules/2.6.34-rc5/build/vmlinux
19.00 1.1% epoll_dispatch /usr/lib/libevent-1.3e.so.1.0.3
19.00 1.1% first_packet_length /lib/modules/2.6.34-rc5/build/vmlinux
--------------------------------------------------------------------------------------------------
PerfTop: 625 irqs/sec kernel:83.0% [1000Hz cycles], (all, cpu: 2)
--------------------------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ___________________________ _____________________________________
315.00 6.8% _raw_spin_lock_irqsave /lib/modules/2.6.34-rc5/build/vmlinux
232.00 5.0% system_call /lib/modules/2.6.34-rc5/build/vmlinux
175.00 3.8% fget /lib/modules/2.6.34-rc5/build/vmlinux
174.00 3.8% datagram_poll /lib/modules/2.6.34-rc5/build/vmlinux
168.00 3.6% sys_epoll_ctl /lib/modules/2.6.34-rc5/build/vmlinux
155.00 3.4% copy_user_generic_string /lib/modules/2.6.34-rc5/build/vmlinux
144.00 3.1% kmem_cache_free /lib/modules/2.6.34-rc5/build/vmlinux
133.00 2.9% _raw_spin_lock_bh /lib/modules/2.6.34-rc5/build/vmlinux
126.00 2.7% _raw_spin_unlock_irqrestore /lib/modules/2.6.34-rc5/build/vmlinux
113.00 2.4% vread_tsc [kernel].vsyscall_fn
110.00 2.4% _raw_spin_unlock_bh /lib/modules/2.6.34-rc5/build/vmlinux
106.00 2.3% schedule /lib/modules/2.6.34-rc5/build/vmlinux
103.00 2.2% local_bh_enable_ip /lib/modules/2.6.34-rc5/build/vmlinux
101.00 2.2% udp_recvmsg /lib/modules/2.6.34-rc5/build/vmlinux
97.00 2.1% sys_epoll_wait /lib/modules/2.6.34-rc5/build/vmlinux
84.00 1.8% dst_release /lib/modules/2.6.34-rc5/build/vmlinux
78.00 1.7% fput /lib/modules/2.6.34-rc5/build/vmlinux
75.00 1.6% first_packet_length /lib/modules/2.6.34-rc5/build/vmlinux
74.00 1.6% kmem_cache_alloc /lib/modules/2.6.34-rc5/build/vmlinux
71.00 1.5% ep_remove /lib/modules/2.6.34-rc5/build/vmlinux
69.00 1.5% epoll_ctl /lib/libc-2.7.so
67.00 1.5% mutex_lock /lib/modules/2.6.34-rc5/build/vmlinux
65.00 1.4% sock_recv_ts_and_drops /lib/modules/2.6.34-rc5/build/vmlinux
65.00 1.4% inet_recvmsg /lib/modules/2.6.34-rc5/build/vmlinux
64.00 1.4% process_recv /home/hadi/udp_sink/mcpudp
62.00 1.3% skb_copy_datagram_iovec /lib/modules/2.6.34-rc5/build/vmlinux
60.00 1.3% clock_gettime /lib/librt-2.7.so
--------------------------------------------------------------------------------------------------
PerfTop: 700 irqs/sec kernel:84.3% [1000Hz cycles], (all, cpu: 2)
--------------------------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ___________________________ _____________________________________
489.00 6.4% _raw_spin_lock_irqsave /lib/modules/2.6.34-rc5/build/vmlinux
376.00 4.9% system_call /lib/modules/2.6.34-rc5/build/vmlinux
308.00 4.0% fget /lib/modules/2.6.34-rc5/build/vmlinux
302.00 3.9% copy_user_generic_string /lib/modules/2.6.34-rc5/build/vmlinux
280.00 3.6% sys_epoll_ctl /lib/modules/2.6.34-rc5/build/vmlinux
274.00 3.6% datagram_poll /lib/modules/2.6.34-rc5/build/vmlinux
249.00 3.2% kmem_cache_free /lib/modules/2.6.34-rc5/build/vmlinux
223.00 2.9% _raw_spin_unlock_irqrestore /lib/modules/2.6.34-rc5/build/vmlinux
221.00 2.9% _raw_spin_unlock_bh /lib/modules/2.6.34-rc5/build/vmlinux
221.00 2.9% local_bh_enable_ip /lib/modules/2.6.34-rc5/build/vmlinux
208.00 2.7% vread_tsc [kernel].vsyscall_fn
200.00 2.6% _raw_spin_lock_bh /lib/modules/2.6.34-rc5/build/vmlinux
191.00 2.5% schedule /lib/modules/2.6.34-rc5/build/vmlinux
188.00 2.4% sys_epoll_wait /lib/modules/2.6.34-rc5/build/vmlinux
177.00 2.3% udp_recvmsg /lib/modules/2.6.34-rc5/build/vmlinux
141.00 1.8% fput /lib/modules/2.6.34-rc5/build/vmlinux
140.00 1.8% first_packet_length /lib/modules/2.6.34-rc5/build/vmlinux
128.00 1.7% kmem_cache_alloc /lib/modules/2.6.34-rc5/build/vmlinux
119.00 1.5% dst_release /lib/modules/2.6.34-rc5/build/vmlinux
105.00 1.4% ep_remove /lib/modules/2.6.34-rc5/build/vmlinux
104.00 1.4% epoll_ctl /lib/libc-2.7.so
102.00 1.3% skb_copy_datagram_iovec /lib/modules/2.6.34-rc5/build/vmlinux
100.00 1.3% mutex_lock /lib/modules/2.6.34-rc5/build/vmlinux
95.00 1.2% mutex_unlock /lib/modules/2.6.34-rc5/build/vmlinux
94.00 1.2% sock_recv_ts_and_drops /lib/modules/2.6.34-rc5/build/vmlinux
92.00 1.2% ep_send_events_proc /lib/modules/2.6.34-rc5/build/vmlinux
92.00 1.2% clock_gettime /lib/librt-2.7.so
92.00 1.2% __skb_recv_datagram /lib/modules/2.6.34-rc5/build/vmlinux
91.00 1.2% process_recv /home/hadi/udp_sink/mcpudp
88.00 1.1% kfree /lib/modules/2.6.34-rc5/build/vmlinux
86.00 1.1% _raw_spin_lock /lib/modules/2.6.34-rc5/build/vmlinux
II: net-next with rps = ee
94.43%
--------------
--------------------------------------------------------------------------------------------------
PerfTop: 4328 irqs/sec kernel:84.0% [1000Hz cycles], (all, 8 CPUs)
--------------------------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ______________________________ ______________________
3908.00 17.1% sky2_poll [sky2]
694.00 3.0% _raw_spin_lock_irqsave [kernel]
584.00 2.6% sky2_intr [sky2]
557.00 2.4% system_call [kernel]
490.00 2.1% _raw_spin_unlock_irqrestore [kernel]
488.00 2.1% fget [kernel]
425.00 1.9% ip_rcv [kernel]
405.00 1.8% sys_epoll_ctl [kernel]
398.00 1.7% __netif_receive_skb [kernel]
375.00 1.6% _raw_spin_lock [kernel]
365.00 1.6% copy_user_generic_string [kernel]
363.00 1.6% ip_route_input [kernel]
350.00 1.5% kmem_cache_free [kernel]
346.00 1.5% schedule [kernel]
319.00 1.4% call_function_single_interrupt [kernel]
295.00 1.3% vread_tsc [kernel].vsyscall_fn
270.00 1.2% __udp4_lib_lookup [kernel]
264.00 1.2% kmem_cache_alloc [kernel]
235.00 1.0% fput [kernel]
219.00 1.0% datagram_poll [kernel]
--------------------------------------------------------------------------------------------------
PerfTop: 3791 irqs/sec kernel:84.4% [1000Hz cycles], (all, 8 CPUs)
--------------------------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ______________________________ ______________________
6274.00 17.2% sky2_poll [sky2]
1139.00 3.1% _raw_spin_lock_irqsave [kernel]
953.00 2.6% system_call [kernel]
942.00 2.6% sky2_intr [sky2]
785.00 2.2% _raw_spin_unlock_irqrestore [kernel]
745.00 2.0% fget [kernel]
695.00 1.9% ip_rcv [kernel]
653.00 1.8% sys_epoll_ctl [kernel]
609.00 1.7% ip_route_input [kernel]
606.00 1.7% __netif_receive_skb [kernel]
583.00 1.6% _raw_spin_lock [kernel]
569.00 1.6% kmem_cache_free [kernel]
564.00 1.5% copy_user_generic_string [kernel]
554.00 1.5% schedule [kernel]
510.00 1.4% call_function_single_interrupt [kernel]
488.00 1.3% vread_tsc [kernel].vsyscall_fn
459.00 1.3% kmem_cache_alloc [kernel]
417.00 1.1% __udp4_lib_lookup [kernel]
387.00 1.1% fput [kernel]
358.00 1.0% __udp4_lib_rcv [kernel]
347.00 1.0% event_base_loop libevent-1.3e.so.1.0.3
-----------------------------------------------------------------------------------------------
PerfTop: 997 irqs/sec kernel:98.2% [1000Hz cycles], (all, cpu: 0)
-----------------------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ___________________________________ ________
3926.00 61.0% sky2_poll [sky2]
671.00 10.4% sky2_intr [sky2]
192.00 3.0% __alloc_skb [kernel]
126.00 2.0% get_rps_cpu [kernel]
111.00 1.7% __kmalloc [kernel]
97.00 1.5% enqueue_to_backlog [kernel]
95.00 1.5% _raw_spin_lock_irqsave [kernel]
93.00 1.4% _raw_spin_lock [kernel]
79.00 1.2% kmem_cache_alloc [kernel]
63.00 1.0% sky2_rx_submit [sky2]
-----------------------------------------------------------------------------------------------
PerfTop: 980 irqs/sec kernel:98.0% [1000Hz cycles], (all, cpu: 0)
-----------------------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ___________________________________ ____________________
6945.00 61.4% sky2_poll [sky2]
1219.00 10.8% sky2_intr [sky2]
323.00 2.9% __alloc_skb [kernel]
243.00 2.1% get_rps_cpu [kernel]
195.00 1.7% __kmalloc [kernel]
161.00 1.4% _raw_spin_lock_irqsave [kernel]
149.00 1.3% enqueue_to_backlog [kernel]
139.00 1.2% _raw_spin_lock [kernel]
136.00 1.2% kmem_cache_alloc [kernel]
135.00 1.2% irq_entries_start [kernel]
108.00 1.0% sky2_rx_submit [sky2]
-----------------------------------------------------------------------------------------------
PerfTop: 458 irqs/sec kernel:80.8% [1000Hz cycles], (all, cpu: 2)
-----------------------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ______________________________ _____________________________________
130.00 4.7% _raw_spin_lock_irqsave /lib/modules/2.6.34-rc5/build/vmlinux
114.00 4.1% system_call /lib/modules/2.6.34-rc5/build/vmlinux
91.00 3.3% ip_rcv /lib/modules/2.6.34-rc5/build/vmlinux
82.00 3.0% _raw_spin_unlock_irqrestore /lib/modules/2.6.34-rc5/build/vmlinux
74.00 2.7% call_function_single_interrupt /lib/modules/2.6.34-rc5/build/vmlinux
74.00 2.7% fget /lib/modules/2.6.34-rc5/build/vmlinux
71.00 2.6% __netif_receive_skb /lib/modules/2.6.34-rc5/build/vmlinux
69.00 2.5% ip_route_input /lib/modules/2.6.34-rc5/build/vmlinux
66.00 2.4% schedule /lib/modules/2.6.34-rc5/build/vmlinux
63.00 2.3% kmem_cache_free /lib/modules/2.6.34-rc5/build/vmlinux
61.00 2.2% sys_epoll_ctl /lib/modules/2.6.34-rc5/build/vmlinux
61.00 2.2% __udp4_lib_lookup /lib/modules/2.6.34-rc5/build/vmlinux
57.00 2.1% copy_user_generic_string /lib/modules/2.6.34-rc5/build/vmlinux
49.00 1.8% vread_tsc [kernel].vsyscall_fn
49.00 1.8% _raw_spin_lock /lib/modules/2.6.34-rc5/build/vmlinux
47.00 1.7% ep_remove /lib/modules/2.6.34-rc5/build/vmlinux
45.00 1.6% fput /lib/modules/2.6.34-rc5/build/vmlinux
44.00 1.6% sys_epoll_wait /lib/modules/2.6.34-rc5/build/vmlinux
40.00 1.4% kmem_cache_alloc /lib/modules/2.6.34-rc5/build/vmlinux
40.00 1.4% local_bh_enable_ip /lib/modules/2.6.34-rc5/build/vmlinux
38.00 1.4% sock_recv_ts_and_drops /lib/modules/2.6.34-rc5/build/vmlinux
35.00 1.3% process_recv /home/hadi/udp_sink/mcpudp
34.00 1.2% mutex_unlock /lib/modules/2.6.34-rc5/build/vmlinux
31.00 1.1% _raw_spin_unlock_bh /lib/modules/2.6.34-rc5/build/vmlinux
31.00 1.1% event_base_loop /usr/lib/libevent-1.3e.so.1.0.3
-----------------------------------------------------------------------------------------------
PerfTop: 552 irqs/sec kernel:82.4% [1000Hz cycles], (all, cpu: 2)
-----------------------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ______________________________ _____________________________________
204.00 4.7% _raw_spin_lock_irqsave /lib/modules/2.6.34-rc5/build/vmlinux
169.00 3.9% system_call /lib/modules/2.6.34-rc5/build/vmlinux
151.00 3.5% _raw_spin_unlock_irqrestore /lib/modules/2.6.34-rc5/build/vmlinux
132.00 3.0% ip_rcv /lib/modules/2.6.34-rc5/build/vmlinux
129.00 3.0% fget /lib/modules/2.6.34-rc5/build/vmlinux
123.00 2.8% __netif_receive_skb /lib/modules/2.6.34-rc5/build/vmlinux
115.00 2.6% ip_route_input /lib/modules/2.6.34-rc5/build/vmlinux
112.00 2.6% call_function_single_interrupt /lib/modules/2.6.34-rc5/build/vmlinux
112.00 2.6% sys_epoll_ctl /lib/modules/2.6.34-rc5/build/vmlinux
103.00 2.4% schedule /lib/modules/2.6.34-rc5/build/vmlinux
94.00 2.2% kmem_cache_free /lib/modules/2.6.34-rc5/build/vmlinux
89.00 2.0% copy_user_generic_string /lib/modules/2.6.34-rc5/build/vmlinux
86.00 2.0% _raw_spin_lock /lib/modules/2.6.34-rc5/build/vmlinux
83.00 1.9% __udp4_lib_lookup /lib/modules/2.6.34-rc5/build/vmlinux
76.00 1.7% vread_tsc [kernel].vsyscall_fn
68.00 1.6% ep_remove /lib/modules/2.6.34-rc5/build/vmlinux
67.00 1.5% fput /lib/modules/2.6.34-rc5/build/vmlinux
64.00 1.5% kmem_cache_alloc /lib/modules/2.6.34-rc5/build/vmlinux
62.00 1.4% sys_epoll_wait /lib/modules/2.6.34-rc5/build/vmlinux
60.00 1.4% dst_release /lib/modules/2.6.34-rc5/build/vmlinux
60.00 1.4% sock_recv_ts_and_drops /lib/modules/2.6.34-rc5/build/vmlinux
56.00 1.3% _raw_spin_lock_bh /lib/modules/2.6.34-rc5/build/vmlinux
53.00 1.2% event_base_loop /usr/lib/libevent-1.3e.so.1.0.3
51.00 1.2% datagram_poll /lib/modules/2.6.34-rc5/build/vmlinux
48.00 1.1% epoll_ctl /lib/libc-2.7.so
48.00 1.1% kfree /lib/modules/2.6.34-rc5/build/vmlinux
47.00 1.1% _raw_spin_unlock_bh /lib/modules/2.6.34-rc5/build/vmlinux
47.00 1.1% mutex_unlock /lib/modules/2.6.34-rc5/build/vmlinux
45.00 1.0% __udp4_lib_rcv /lib/modules/2.6.34-rc5/build/vmlinux
45.00 1.0% tick_nohz_stop_sched_tick /lib/modules/2.6.34-rc5/build/vmlinux
-----------------------------------------------------------------------------------------------
PerfTop: 408 irqs/sec kernel:82.1% [1000Hz cycles], (all, cpu: 2)
-----------------------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ______________________________ _____________________________________
240.00 4.8% _raw_spin_lock_irqsave /lib/modules/2.6.34-rc5/build/vmlinux
200.00 4.0% system_call /lib/modules/2.6.34-rc5/build/vmlinux
165.00 3.3% _raw_spin_unlock_irqrestore /lib/modules/2.6.34-rc5/build/vmlinux
161.00 3.2% ip_rcv /lib/modules/2.6.34-rc5/build/vmlinux
158.00 3.1% fget /lib/modules/2.6.34-rc5/build/vmlinux
150.00 3.0% sys_epoll_ctl /lib/modules/2.6.34-rc5/build/vmlinux
135.00 2.7% __netif_receive_skb /lib/modules/2.6.34-rc5/build/vmlinux
122.00 2.4% ip_route_input /lib/modules/2.6.34-rc5/build/vmlinux
117.00 2.3% call_function_single_interrupt /lib/modules/2.6.34-rc5/build/vmlinux
114.00 2.3% schedule /lib/modules/2.6.34-rc5/build/vmlinux
110.00 2.2% _raw_spin_lock /lib/modules/2.6.34-rc5/build/vmlinux
108.00 2.1% copy_user_generic_string /lib/modules/2.6.34-rc5/build/vmlinux
101.00 2.0% kmem_cache_free /lib/modules/2.6.34-rc5/build/vmlinux
94.00 1.9% vread_tsc [kernel].vsyscall_fn
90.00 1.8% __udp4_lib_lookup /lib/modules/2.6.34-rc5/build/vmlinux
85.00 1.7% fput /lib/modules/2.6.34-rc5/build/vmlinux
78.00 1.5% dst_release /lib/modules/2.6.34-rc5/build/vmlinux
77.00 1.5% ep_remove /lib/modules/2.6.34-rc5/build/vmlinux
75.00 1.5% kmem_cache_alloc /lib/modules/2.6.34-rc5/build/vmlinux
74.00 1.5% _raw_spin_lock_bh /lib/modules/2.6.34-rc5/build/vmlinux
69.00 1.4% sys_epoll_wait /lib/modules/2.6.34-rc5/build/vmlinux
68.00 1.3% event_base_loop /usr/lib/libevent-1.3e.so.1.0.3
68.00 1.3% sock_recv_ts_and_drops /lib/modules/2.6.34-rc5/build/vmlinux
62.00 1.2% _raw_spin_unlock_bh /lib/modules/2.6.34-rc5/build/vmlinux
62.00 1.2% datagram_poll /lib/modules/2.6.34-rc5/build/vmlinux
55.00 1.1% epoll_ctl /lib/libc-2.7.so
53.00 1.1% local_bh_enable_ip /lib/modules/2.6.34-rc5/build/vmlinux
53.00 1.1% tick_nohz_stop_sched_tick /lib/modules/2.6.34-rc5/build/vmlinux
52.00 1.0% mutex_unlock /lib/modules/2.6.34-rc5/build/vmlinux
-----------------------------------------------------------------------------------------------
PerfTop: 440 irqs/sec kernel:85.0% [1000Hz cycles], (all, cpu: 2)
-----------------------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ______________________________ _____________________________________
226.00 4.6% _raw_spin_lock_irqsave /lib/modules/2.6.34-rc5/build/vmlinux
213.00 4.3% system_call /lib/modules/2.6.34-rc5/build/vmlinux
154.00 3.1% _raw_spin_unlock_irqrestore /lib/modules/2.6.34-rc5/build/vmlinux
148.00 3.0% ip_rcv /lib/modules/2.6.34-rc5/build/vmlinux
143.00 2.9% fget /lib/modules/2.6.34-rc5/build/vmlinux
143.00 2.9% ip_route_input /lib/modules/2.6.34-rc5/build/vmlinux
140.00 2.8% __netif_receive_skb /lib/modules/2.6.34-rc5/build/vmlinux
124.00 2.5% call_function_single_interrupt /lib/modules/2.6.34-rc5/build/vmlinux
124.00 2.5% sys_epoll_ctl /lib/modules/2.6.34-rc5/build/vmlinux
104.00 2.1% copy_user_generic_string /lib/modules/2.6.34-rc5/build/vmlinux
103.00 2.1% vread_tsc [kernel].vsyscall_fn
101.00 2.0% schedule /lib/modules/2.6.34-rc5/build/vmlinux
100.00 2.0% kmem_cache_free /lib/modules/2.6.34-rc5/build/vmlinux
99.00 2.0% _raw_spin_lock /lib/modules/2.6.34-rc5/build/vmlinux
93.00 1.9% __udp4_lib_lookup /lib/modules/2.6.34-rc5/build/vmlinux
80.00 1.6% fput /lib/modules/2.6.34-rc5/build/vmlinux
76.00 1.5% kmem_cache_alloc /lib/modules/2.6.34-rc5/build/vmlinux
75.00 1.5% sock_recv_ts_and_drops /lib/modules/2.6.34-rc5/build/vmlinux
73.00 1.5% dst_release /lib/modules/2.6.34-rc5/build/vmlinux
70.00 1.4% sys_epoll_wait /lib/modules/2.6.34-rc5/build/vmlinux
69.00 1.4% datagram_poll /lib/modules/2.6.34-rc5/build/vmlinux
65.00 1.3% event_base_loop /usr/lib/libevent-1.3e.so.1.0.3
65.00 1.3% ep_remove /lib/modules/2.6.34-rc5/build/vmlinux
III: Kernel compiled with Erics patch, rps mask 00
Avg udp packets sunk: 98.74%
-------------------------------------------------------------------------------
PerfTop: 4202 irqs/sec kernel:82.5% [1000Hz cycles], (all, 8 CPUs)
-------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ___________________________ ______________________
1639.00 9.0% sky2_poll [sky2]
1051.00 5.8% _raw_spin_lock_irqsave [kernel]
665.00 3.7% system_call [kernel]
578.00 3.2% fget [kernel]
476.00 2.6% _raw_spin_unlock_irqrestore [kernel]
457.00 2.5% copy_user_generic_string [kernel]
427.00 2.4% sys_epoll_ctl [kernel]
401.00 2.2% datagram_poll [kernel]
391.00 2.2% kmem_cache_free [kernel]
349.00 1.9% schedule [kernel]
339.00 1.9% vread_tsc [kernel].vsyscall_fn
323.00 1.8% udp_recvmsg [kernel]
292.00 1.6% kmem_cache_alloc [kernel]
285.00 1.6% _raw_spin_lock [kernel]
272.00 1.5% _raw_spin_lock_bh [kernel]
268.00 1.5% sys_epoll_wait [kernel]
260.00 1.4% fput [kernel]
234.00 1.3% ip_route_input [kernel]
221.00 1.2% __udp4_lib_lookup [kernel]
212.00 1.2% dst_release [kernel]
209.00 1.2% ip_rcv [kernel]
203.00 1.1% ep_remove [kernel]
202.00 1.1% first_packet_length [kernel]
-------------------------------------------------------------------------------
PerfTop: 3999 irqs/sec kernel:82.3% [1000Hz cycles], (all, 8 CPUs)
-------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ___________________________ ______________________
3452.00 9.3% sky2_poll [sky2]
2212.00 5.9% _raw_spin_lock_irqsave [kernel]
1350.00 3.6% system_call [kernel]
1187.00 3.2% fget [kernel]
1010.00 2.7% copy_user_generic_string [kernel]
965.00 2.6% _raw_spin_unlock_irqrestore [kernel]
842.00 2.3% sys_epoll_ctl [kernel]
833.00 2.2% datagram_poll [kernel]
770.00 2.1% kmem_cache_free [kernel]
710.00 1.9% vread_tsc [kernel].vsyscall_fn
688.00 1.8% schedule [kernel]
651.00 1.7% udp_recvmsg [kernel]
603.00 1.6% _raw_spin_lock_bh [kernel]
599.00 1.6% _raw_spin_lock [kernel]
597.00 1.6% sys_epoll_wait [kernel]
594.00 1.6% kmem_cache_alloc [kernel]
553.00 1.5% ip_route_input [kernel]
528.00 1.4% fput [kernel]
496.00 1.3% __udp4_lib_lookup [kernel]
444.00 1.2% dst_release [kernel]
433.00 1.2% ip_rcv [kernel]
408.00 1.1% first_packet_length [kernel]
-------------------------------------------------------------------------------
PerfTop: 3765 irqs/sec kernel:83.7% [1000Hz cycles], (all, 8 CPUs)
-------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ___________________________ ______________________
4275.00 9.5% sky2_poll [sky2]
2684.00 6.0% _raw_spin_lock_irqsave [kernel]
1654.00 3.7% system_call [kernel]
1447.00 3.2% fget [kernel]
1223.00 2.7% copy_user_generic_string [kernel]
1146.00 2.5% _raw_spin_unlock_irqrestore [kernel]
1036.00 2.3% sys_epoll_ctl [kernel]
1019.00 2.3% datagram_poll [kernel]
974.00 2.2% kmem_cache_free [kernel]
843.00 1.9% vread_tsc [kernel].vsyscall_fn
799.00 1.8% schedule [kernel]
761.00 1.7% udp_recvmsg [kernel]
736.00 1.6% kmem_cache_alloc [kernel]
719.00 1.6% _raw_spin_lock_bh [kernel]
716.00 1.6% _raw_spin_lock [kernel]
696.00 1.5% sys_epoll_wait [kernel]
680.00 1.5% ip_route_input [kernel]
657.00 1.5% fput [kernel]
613.00 1.4% __udp4_lib_lookup [kernel]
552.00 1.2% dst_release [kernel]
507.00 1.1% ip_rcv [kernel]
-------------------------------------------------------------------------------
PerfTop: 1001 irqs/sec kernel:99.9% [1000Hz cycles], (all, cpu: 0)
-------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ___________________________ ________
669.00 32.2% sky2_poll [sky2]
128.00 6.2% ip_route_input [kernel]
106.00 5.1% ip_rcv [kernel]
105.00 5.1% __udp4_lib_lookup [kernel]
86.00 4.1% _raw_spin_lock [kernel]
85.00 4.1% _raw_spin_lock_irqsave [kernel]
82.00 3.9% __alloc_skb [kernel]
78.00 3.8% sock_queue_rcv_skb [kernel]
57.00 2.7% __netif_receive_skb [kernel]
53.00 2.6% __wake_up_common [kernel]
47.00 2.3% __udp4_lib_rcv [kernel]
42.00 2.0% sock_def_readable [kernel]
37.00 1.8% kmem_cache_alloc [kernel]
34.00 1.6% ep_poll_callback [kernel]
34.00 1.6% __kmalloc [kernel]
34.00 1.6% select_task_rq_fair [kernel]
30.00 1.4% _raw_read_lock [kernel]
27.00 1.3% _raw_spin_unlock_irqrestore [kernel]
24.00 1.2% sky2_rx_submit [sky2]
22.00 1.1% udp_queue_rcv_skb [kernel]
21.00 1.0% try_to_wake_up [kernel]
-------------------------------------------------------------------------------
PerfTop: 1000 irqs/sec kernel:100.0% [1000Hz cycles], (all, cpu: 0)
-------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ___________________________ ________
3061.00 31.9% sky2_poll [sky2]
529.00 5.5% ip_route_input [kernel]
518.00 5.4% __udp4_lib_lookup [kernel]
424.00 4.4% ip_rcv [kernel]
390.00 4.1% _raw_spin_lock_irqsave [kernel]
389.00 4.1% __alloc_skb [kernel]
365.00 3.8% _raw_spin_lock [kernel]
326.00 3.4% sock_queue_rcv_skb [kernel]
297.00 3.1% __netif_receive_skb [kernel]
273.00 2.8% __udp4_lib_rcv [kernel]
223.00 2.3% sock_def_readable [kernel]
205.00 2.1% __wake_up_common [kernel]
181.00 1.9% __kmalloc [kernel]
151.00 1.6% kmem_cache_alloc [kernel]
147.00 1.5% _raw_read_lock [kernel]
143.00 1.5% ep_poll_callback [kernel]
136.00 1.4% sky2_rx_submit [sky2]
123.00 1.3% task_rq_lock [kernel]
118.00 1.2% _raw_spin_unlock_irqrestore [kernel]
114.00 1.2% select_task_rq_fair [kernel]
104.00 1.1% resched_task [kernel]
104.00 1.1% sky2_remove [sky2]
102.00 1.1% udp_queue_rcv_skb [kernel]
-------------------------------------------------------------------------------
PerfTop: 1001 irqs/sec kernel:100.0% [1000Hz cycles], (all, cpu: 0)
-------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ___________________________ ________
3898.00 31.0% sky2_poll [sky2]
715.00 5.7% ip_route_input [kernel]
651.00 5.2% __udp4_lib_lookup [kernel]
576.00 4.6% ip_rcv [kernel]
534.00 4.2% __alloc_skb [kernel]
518.00 4.1% _raw_spin_lock_irqsave [kernel]
441.00 3.5% sock_queue_rcv_skb [kernel]
439.00 3.5% _raw_spin_lock [kernel]
396.00 3.1% __netif_receive_skb [kernel]
351.00 2.8% __udp4_lib_rcv [kernel]
300.00 2.4% sock_def_readable [kernel]
264.00 2.1% __wake_up_common [kernel]
260.00 2.1% __kmalloc [kernel]
198.00 1.6% kmem_cache_alloc [kernel]
193.00 1.5% ep_poll_callback [kernel]
192.00 1.5% _raw_read_lock [kernel]
168.00 1.3% sky2_rx_submit [sky2]
167.00 1.3% task_rq_lock [kernel]
153.00 1.2% udp_queue_rcv_skb [kernel]
149.00 1.2% _raw_spin_unlock_irqrestore [kernel]
147.00 1.2% ip_local_deliver [kernel]
144.00 1.1% resched_task [kernel]
137.00 1.1% sky2_remove [sky2]
-------------------------------------------------------------------------------
PerfTop: 663 irqs/sec kernel:81.9% [1000Hz cycles], (all, cpu: 2)
-------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ___________________________ ____________________
129.00 7.0% _raw_spin_lock_irqsave [kernel]
84.00 4.5% fget [kernel]
83.00 4.5% system_call [kernel]
82.00 4.4% copy_user_generic_string [kernel]
67.00 3.6% _raw_spin_unlock_irqrestore [kernel]
63.00 3.4% datagram_poll [kernel]
57.00 3.1% udp_recvmsg [kernel]
55.00 3.0% sys_epoll_ctl [kernel]
55.00 3.0% vread_tsc [kernel].vsyscall_fn
43.00 2.3% sys_epoll_wait [kernel]
43.00 2.3% _raw_spin_lock_bh [kernel]
41.00 2.2% first_packet_length [kernel]
40.00 2.2% dst_release [kernel]
37.00 2.0% fput [kernel]
37.00 2.0% kmem_cache_free [kernel]
36.00 1.9% mutex_unlock [kernel]
35.00 1.9% schedule [kernel]
34.00 1.8% skb_copy_datagram_iovec [kernel]
34.00 1.8% ep_remove [kernel]
29.00 1.6% mutex_lock [kernel]
29.00 1.6% _raw_spin_lock [kernel]
28.00 1.5% __skb_recv_datagram [kernel]
25.00 1.4% epoll_ctl /lib/libc-2.7.so
25.00 1.4% tick_nohz_stop_sched_tick [kernel]
-------------------------------------------------------------------------------
PerfTop: 629 irqs/sec kernel:81.1% [1000Hz cycles], (all, cpu: 2)
-------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ___________________________ ______________________
351.00 7.9% _raw_spin_lock_irqsave [kernel]
248.00 5.6% system_call [kernel]
219.00 5.0% fget [kernel]
194.00 4.4% copy_user_generic_string [kernel]
184.00 4.2% datagram_poll [kernel]
162.00 3.7% sys_epoll_ctl [kernel]
159.00 3.6% _raw_spin_unlock_irqrestore [kernel]
129.00 2.9% udp_recvmsg [kernel]
129.00 2.9% kmem_cache_free [kernel]
123.00 2.8% vread_tsc [kernel].vsyscall_fn
108.00 2.4% schedule [kernel]
107.00 2.4% _raw_spin_lock_bh [kernel]
104.00 2.4% sys_epoll_wait [kernel]
100.00 2.3% fput [kernel]
94.00 2.1% dst_release [kernel]
78.00 1.8% first_packet_length [kernel]
73.00 1.7% ep_remove [kernel]
69.00 1.6% epoll_ctl /lib/libc-2.7.so
66.00 1.5% skb_copy_datagram_iovec [kernel]
66.00 1.5% mutex_unlock [kernel]
64.00 1.4% __skb_recv_datagram [kernel]
64.00 1.4% mutex_lock [kernel]
57.00 1.3% sock_recv_ts_and_drops [kernel]
51.00 1.2% kmem_cache_alloc [kernel]
49.00 1.1% ep_send_events_proc [kernel]
-------------------------------------------------------------------------------
PerfTop: 457 irqs/sec kernel:72.0% [1000Hz cycles], (all, cpu: 2)
-------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ___________________________ ______________________
411.00 7.8% _raw_spin_lock_irqsave [kernel]
280.00 5.3% system_call [kernel]
269.00 5.1% fget [kernel]
239.00 4.5% copy_user_generic_string [kernel]
232.00 4.4% datagram_poll [kernel]
175.00 3.3% _raw_spin_unlock_irqrestore [kernel]
170.00 3.2% sys_epoll_ctl [kernel]
169.00 3.2% kmem_cache_free [kernel]
149.00 2.8% udp_recvmsg [kernel]
144.00 2.7% vread_tsc [kernel].vsyscall_fn
129.00 2.4% sys_epoll_wait [kernel]
128.00 2.4% _raw_spin_lock_bh [kernel]
115.00 2.2% fput [kernel]
112.00 2.1% schedule [kernel]
108.00 2.0% dst_release [kernel]
88.00 1.7% first_packet_length [kernel]
86.00 1.6% ep_remove [kernel]
83.00 1.6% mutex_lock [kernel]
79.00 1.5% skb_copy_datagram_iovec [kernel]
76.00 1.4% mutex_unlock [kernel]
75.00 1.4% epoll_ctl /lib/libc-2.7.so
73.00 1.4% sock_recv_ts_and_drops [kernel]
67.00 1.3% __skb_recv_datagram [kernel]
65.00 1.2% tick_nohz_stop_sched_tick [kernel]
Interesting stuff; check cache miss contributions - wow, how low is eth_type_trans..
and yet we keep optimizing that!
-------------------------------------------------------------------------------
PerfTop: 1021 irqs/sec kernel:98.8% [1000Hz cache-misses], (all, 8 CPUs)
-------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ _______________________________ ________
5271.00 77.8% sky2_poll [sky2]
706.00 10.4% kmem_cache_alloc [kernel]
154.00 2.3% dev_gro_receive [kernel]
149.00 2.2% __napi_gro_receive [kernel]
128.00 1.9% napi_gro_receive [kernel]
106.00 1.6% __alloc_skb [kernel]
57.00 0.8% eth_type_trans [kernel]
45.00 0.7% skb_gro_reset_offset [kernel]
26.00 0.4% drain_array [kernel]
23.00 0.3% perf_session__mmap_read_counter perf
10.00 0.1% cache_alloc_refill [kernel]
9.00 0.1% __netdev_alloc_skb [kernel]
9.00 0.1% event__preprocess_sample perf
-------------------------------------------------------------------------------
PerfTop: 997 irqs/sec kernel:100.0% [1000Hz cache-misses], (all, cpu: 0)
-------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ____________________ ________
3019.00 79.4% sky2_poll [sky2]
360.00 9.5% kmem_cache_alloc [kernel]
91.00 2.4% dev_gro_receive [kernel]
86.00 2.3% __alloc_skb [kernel]
83.00 2.2% __napi_gro_receive [kernel]
69.00 1.8% napi_gro_receive [kernel]
45.00 1.2% eth_type_trans [kernel]
25.00 0.7% skb_gro_reset_offset [kernel]
9.00 0.2% __netdev_alloc_skb [kernel]
5.00 0.1% cache_alloc_refill [kernel]
5.00 0.1% skb_pull [kernel]
-------------------------------------------------------------------------------
PerfTop: 997 irqs/sec kernel:100.0% [1000Hz cache-misses], (all, cpu: 0)
-------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ____________________ ________
8887.00 79.8% sky2_poll [sky2]
1138.00 10.2% kmem_cache_alloc [kernel]
273.00 2.5% __napi_gro_receive [kernel]
246.00 2.2% dev_gro_receive [kernel]
189.00 1.7% napi_gro_receive [kernel]
159.00 1.4% __alloc_skb [kernel]
119.00 1.1% eth_type_trans [kernel]
86.00 0.8% skb_gro_reset_offset [kernel]
13.00 0.1% __netdev_alloc_skb [kernel]
8.00 0.1% skb_pull [kernel]
7.00 0.1% cache_alloc_refill [kernel]
Not much going on in other cpus .. i.e hardly anything shows up in
the profile ..
IV: rps with ee and irq affinity to cpu0
Avg udp packets sunk: 95.15%
-------------------------------------------------------------------------------
PerfTop: 3558 irqs/sec kernel:84.6% [1000Hz cycles], (all, 8 CPUs)
-------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ _____________________________ ______________________
3096.00 17.1% sky2_poll [sky2]
645.00 3.6% _raw_spin_lock_irqsave [kernel]
493.00 2.7% system_call [kernel]
462.00 2.6% sky2_intr [sky2]
416.00 2.3% _raw_spin_unlock_irqrestore [kernel]
382.00 2.1% fget [kernel]
361.00 2.0% __netif_receive_skb [kernel]
342.00 1.9% ip_rcv [kernel]
334.00 1.8% _raw_spin_lock [kernel]
320.00 1.8% sys_epoll_ctl [kernel]
298.00 1.6% copy_user_generic_string [kernel]
288.00 1.6% call_function_single_interrup [kernel]
277.00 1.5% load_balance [kernel]
271.00 1.5% ip_route_input [kernel]
270.00 1.5% vread_tsc [kernel].vsyscall_fn
256.00 1.4% kmem_cache_free [kernel]
222.00 1.2% __udp4_lib_lookup [kernel]
222.00 1.2% schedule [kernel]
194.00 1.1% fput [kernel]
189.00 1.0% kmem_cache_alloc [kernel]
171.00 0.9% sys_epoll_wait [kernel]
164.00 0.9% ep_remove [kernel]
-------------------------------------------------------------------------------
PerfTop: 3452 irqs/sec kernel:84.3% [1000Hz cycles], (all, 8 CPUs)
-------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ _____________________________ ______________________
5033.00 16.2% sky2_poll [sky2]
1147.00 3.7% _raw_spin_lock_irqsave [kernel]
888.00 2.9% system_call [kernel]
774.00 2.5% sky2_intr [sky2]
757.00 2.4% _raw_spin_unlock_irqrestore [kernel]
702.00 2.3% fget [kernel]
630.00 2.0% __netif_receive_skb [kernel]
609.00 2.0% _raw_spin_lock [kernel]
607.00 2.0% ip_rcv [kernel]
553.00 1.8% sys_epoll_ctl [kernel]
514.00 1.7% ip_route_input [kernel]
508.00 1.6% call_function_single_interrup [kernel]
504.00 1.6% copy_user_generic_string [kernel]
466.00 1.5% kmem_cache_free [kernel]
452.00 1.5% schedule [kernel]
450.00 1.4% vread_tsc [kernel].vsyscall_fn
390.00 1.3% load_balance [kernel]
377.00 1.2% fput [kernel]
364.00 1.2% __udp4_lib_lookup [kernel]
329.00 1.1% kmem_cache_alloc [kernel]
314.00 1.0% ep_remove [kernel]
289.00 0.9% dst_release [kernel]
276.00 0.9% sys_epoll_wait [kernel]
265.00 0.9% datagram_poll [kernel]
-------------------------------------------------------------------------------
PerfTop: 3328 irqs/sec kernel:85.7% [1000Hz cycles], (all, 8 CPUs)
-------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ _____________________________ ______________________
6788.00 17.5% sky2_poll [sky2]
1413.00 3.6% _raw_spin_lock_irqsave [kernel]
1042.00 2.7% system_call [kernel]
997.00 2.6% sky2_intr [sky2]
903.00 2.3% _raw_spin_unlock_irqrestore [kernel]
837.00 2.2% fget [kernel]
740.00 1.9% _raw_spin_lock [kernel]
725.00 1.9% __netif_receive_skb [kernel]
722.00 1.9% ip_rcv [kernel]
651.00 1.7% sys_epoll_ctl [kernel]
609.00 1.6% call_function_single_interrup [kernel]
604.00 1.6% ip_route_input [kernel]
601.00 1.5% copy_user_generic_string [kernel]
573.00 1.5% schedule [kernel]
561.00 1.4% kmem_cache_free [kernel]
538.00 1.4% load_balance [kernel]
515.00 1.3% vread_tsc [kernel].vsyscall_fn
480.00 1.2% fput [kernel]
421.00 1.1% kmem_cache_alloc [kernel]
418.00 1.1% __udp4_lib_lookup [kernel]
377.00 1.0% ep_remove [kernel]
347.00 0.9% datagram_poll [kernel]
335.00 0.9% dst_release [kernel]
-------------------------------------------------------------------------------
PerfTop: 1000 irqs/sec kernel:96.2% [1000Hz cycles], (all, cpu: 0)
-------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ _____________________________ ______________________
2109.00 61.3% sky2_poll [sky2]
366.00 10.6% sky2_intr [sky2]
84.00 2.4% __alloc_skb [kernel]
57.00 1.7% _raw_spin_lock_irqsave [kernel]
56.00 1.6% get_rps_cpu [kernel]
52.00 1.5% __kmalloc [kernel]
39.00 1.1% irq_entries_start [kernel]
39.00 1.1% enqueue_to_backlog [kernel]
34.00 1.0% kmem_cache_alloc [kernel]
33.00 1.0% default_send_IPI_mask_sequenc [kernel]
32.00 0.9% sky2_rx_submit [sky2]
30.00 0.9% swiotlb_sync_single [kernel]
28.00 0.8% _raw_spin_lock [kernel]
23.00 0.7% sky2_remove [sky2]
22.00 0.6% __smp_call_function_single [kernel]
19.00 0.6% system_call [kernel]
18.00 0.5% sys_epoll_ctl [kernel]
18.00 0.5% fget [kernel]
17.00 0.5% cache_alloc_refill [kernel]
16.00 0.5% copy_user_generic_string [kernel]
16.00 0.5% _raw_spin_unlock_irqrestore [kernel]
15.00 0.4% dev_gro_receive [kernel]
14.00 0.4% net_rx_action [kernel]
-------------------------------------------------------------------------------
PerfTop: 1000 irqs/sec kernel:97.9% [1000Hz cycles], (all, cpu: 0)
-------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ _______________________________ ____________________
4479.00 60.9% sky2_poll [sky2]
849.00 11.5% sky2_intr [sky2]
163.00 2.2% __alloc_skb [kernel]
155.00 2.1% get_rps_cpu [kernel]
121.00 1.6% _raw_spin_lock_irqsave [kernel]
92.00 1.3% __kmalloc [kernel]
89.00 1.2% _raw_spin_lock [kernel]
83.00 1.1% enqueue_to_backlog [kernel]
79.00 1.1% irq_entries_start [kernel]
78.00 1.1% kmem_cache_alloc [kernel]
69.00 0.9% sky2_rx_submit [sky2]
65.00 0.9% swiotlb_sync_single [kernel]
58.00 0.8% default_send_IPI_mask_sequence_ [kernel]
50.00 0.7% system_call [kernel]
45.00 0.6% fget [kernel]
40.00 0.5% sky2_remove [sky2]
37.00 0.5% __smp_call_function_single [kernel]
36.00 0.5% datagram_poll [kernel]
36.00 0.5% _raw_spin_unlock_irqrestore [kernel]
34.00 0.5% cache_alloc_refill [kernel]
31.00 0.4% net_rx_action [kernel]
28.00 0.4% kmem_cache_free [kernel]
27.00 0.4% _raw_spin_lock_bh [kernel]
27.00 0.4% copy_user_generic_string [kernel]
25.00 0.3% dev_gro_receive [kernel]
-------------------------------------------------------------------------------
PerfTop: 980 irqs/sec kernel:97.3% [1000Hz cycles], (all, cpu: 0)
-------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ _______________________________ ____________________
6544.00 61.6% sky2_poll [sky2]
1098.00 10.3% sky2_intr [sky2]
248.00 2.3% __alloc_skb [kernel]
198.00 1.9% get_rps_cpu [kernel]
182.00 1.7% _raw_spin_lock_irqsave [kernel]
144.00 1.4% __kmalloc [kernel]
138.00 1.3% _raw_spin_lock [kernel]
127.00 1.2% kmem_cache_alloc [kernel]
125.00 1.2% irq_entries_start [kernel]
119.00 1.1% enqueue_to_backlog [kernel]
93.00 0.9% sky2_rx_submit [sky2]
91.00 0.9% swiotlb_sync_single [kernel]
83.00 0.8% default_send_IPI_mask_sequence_ [kernel]
82.00 0.8% system_call [kernel]
64.00 0.6% sky2_remove [sky2]
60.00 0.6% fget [kernel]
58.00 0.5% cache_alloc_refill [kernel]
57.00 0.5% _raw_spin_unlock_irqrestore [kernel]
51.00 0.5% datagram_poll [kernel]
47.00 0.4% copy_user_generic_string [kernel]
-------------------------------------------------------------------------------
PerfTop: 315 irqs/sec kernel:81.0% [1000Hz cycles], (all, cpu: 2)
-------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ _____________________________ ______________________
114.00 4.5% system_call [kernel]
98.00 3.9% _raw_spin_lock_irqsave [kernel]
89.00 3.5% _raw_spin_unlock_irqrestore [kernel]
89.00 3.5% ip_rcv [kernel]
83.00 3.3% call_function_single_interrup [kernel]
76.00 3.0% __netif_receive_skb [kernel]
67.00 2.6% fget [kernel]
62.00 2.4% ip_route_input [kernel]
59.00 2.3% vread_tsc [kernel].vsyscall_fn
54.00 2.1% kmem_cache_free [kernel]
54.00 2.1% sys_epoll_ctl [kernel]
51.00 2.0% schedule [kernel]
49.00 1.9% _raw_spin_lock [kernel]
49.00 1.9% __udp4_lib_lookup [kernel]
44.00 1.7% ep_remove [kernel]
44.00 1.7% copy_user_generic_string [kernel]
41.00 1.6% fput [kernel]
38.00 1.5% sys_epoll_wait [kernel]
37.00 1.5% tick_nohz_stop_sched_tick [kernel]
36.00 1.4% kmem_cache_alloc [kernel]
34.00 1.3% datagram_poll [kernel]
33.00 1.3% __udp4_lib_rcv [kernel]
31.00 1.2% process_recv mcpudp
-------------------------------------------------------------------------------
PerfTop: 292 irqs/sec kernel:82.9% [1000Hz cycles], (all, cpu: 2)
-------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ _____________________________ ______________________
154.00 4.7% _raw_spin_lock_irqsave [kernel]
140.00 4.2% system_call [kernel]
111.00 3.4% ip_rcv [kernel]
106.00 3.2% _raw_spin_unlock_irqrestore [kernel]
96.00 2.9% call_function_single_interrup [kernel]
95.00 2.9% fget [kernel]
90.00 2.7% __netif_receive_skb [kernel]
89.00 2.7% sys_epoll_ctl [kernel]
77.00 2.3% copy_user_generic_string [kernel]
77.00 2.3% ip_route_input [kernel]
76.00 2.3% kmem_cache_free [kernel]
74.00 2.2% _raw_spin_lock [kernel]
71.00 2.1% schedule [kernel]
69.00 2.1% vread_tsc [kernel].vsyscall_fn
58.00 1.8% __udp4_lib_lookup [kernel]
52.00 1.6% __udp4_lib_rcv [kernel]
51.00 1.5% fput [kernel]
47.00 1.4% ep_remove [kernel]
47.00 1.4% event_base_loop libevent-1.3e.so.1.0.3
39.00 1.2% process_recv mcpudp
39.00 1.2% sys_epoll_wait [kernel]
38.00 1.2% udp_recvmsg [kernel]
38.00 1.2% sock_recv_ts_and_drops [kernel]
37.00 1.1% __switch_to [kernel]
-------------------------------------------------------------------------------
PerfTop: 290 irqs/sec kernel:82.1% [1000Hz cycles], (all, cpu: 2)
-------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ _____________________________ ______________________
175.00 4.7% _raw_spin_lock_irqsave [kernel]
153.00 4.2% system_call [kernel]
122.00 3.3% ip_rcv [kernel]
114.00 3.1% _raw_spin_unlock_irqrestore [kernel]
114.00 3.1% fget [kernel]
105.00 2.8% __netif_receive_skb [kernel]
101.00 2.7% sys_epoll_ctl [kernel]
100.00 2.7% call_function_single_interrup [kernel]
90.00 2.4% copy_user_generic_string [kernel]
84.00 2.3% schedule [kernel]
76.00 2.1% kmem_cache_free [kernel]
76.00 2.1% _raw_spin_lock [kernel]
72.00 2.0% ip_route_input [kernel]
70.00 1.9% vread_tsc [kernel].vsyscall_fn
68.00 1.8% __udp4_lib_lookup [kernel]
68.00 1.8% __udp4_lib_rcv [kernel]
57.00 1.5% ep_remove [kernel]
57.00 1.5% fput [kernel]
55.00 1.5% kmem_cache_alloc [kernel]
51.00 1.4% process_recv mcpudp
On Wed, 2010-04-28 at 19:45 -0400, jamal wrote: > Your patch has improved the performance of rps relative to what is in > net-next very lightly; but it has also improved the performance of > non-rps;-> Correction: Last part of sentence not true (obvious if you look at results i attached) cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le mercredi 28 avril 2010 à 19:44 -0400, jamal a écrit : > On Wed, 2010-04-28 at 16:06 +0200, Eric Dumazet wrote: > > > Here it is ;) > > Sorry - things got a little hectic with TheMan. > > I am afraid i dont have good news. > Actually, I should say i dont have good news in regards to rps. > For my sample app, two things seem to be happening: > a) The overall performance has gotten better for both rps > and non-rps. > b) non-rps is now performing relatively better > > This is just what i see in net-next not related to your patch. > It seems the kernels i tested prior to April 23 showed rps better. > The one i tested on Apr23 showed rps being about the same as non-rps. > As i stated in my last result posting, I thought i didnt test properly > but i did again today and saw the same thing. And now non-rps is > _consistently_ better. > So some regression is going on... > > Your patch has improved the performance of rps relative to what is in > net-next very lightly; but it has also improved the performance of > non-rps;-> > My traces look different for the app cpu than yours - likely because of > the apps being different. > > At the moment i dont have time to dig deeper into code, but i could > test as cycles show up. > > I am attaching the profile traces and results. > > cheers, > jamal Hi Jamal I dont see in your results the number of pps, number of udp ports, number of flows. In my latest results, I can handle more pps than before, regardless of rps being on or off, and with various number of udp ports (one user thread per port), number of flows (many src addr so that rps spread packets on many cpus) If/when contention windows are smaller, cpu can run uncontended, and can consume more cycles to process more frames ? With a non yet published patch, I even can reach 600.000 pps in DDOS situations, instead of 400.000. Thanks ! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2010-04-29 at 06:09 +0200, Eric Dumazet wrote: > I dont see in your results the number of pps, number of udp ports, > number of flows. My test scenario is still the same: send 1M packets of 8 flows round-robin at 750Kpps. Repeat test 4-6 times and average out. 8 flows map to 8 cpus. Any rate above 750Kpps and the driver starts dropping. The flows are {Fixed dst IP, fixed src IP, fixed src port, 8 variable dst port}. ip_rcv and friends show up in profile as we have already discussed - but i dont want to change the test characteristic because i cant do fair backward comparison. Also i use rps mask ee to use all the cpus except the core doing demux (core 0). In the results when i say "udp sink 90%" it means 90% of 750Kpps was successfuly received by the app (on the multiple cpus). > In my latest results, I can handle more pps than before, regardless of > rps being on or off, Same here - even in my worst case scenario 88.5% of 750Kpps > 600Kpps. Attached is history results to make more sense of what i am saying: we have net-next kernels from apr14, apr23, apr23 with changlis change, apr28, apr28 with your change. What you'll see is non-rps (blue) gets better and rps (Orange) gets better slowly then by apr28 it is worse. > and with various number of udp ports (one user > thread per port), number of flows (many src addr so that rps spread > packets on many cpus) > This is true for me except for non rps getting relatively better and rps getting worse in plain net-next for Apr 28. Sorry, dont have time to dissect where things changed but i figured if i reported it will point to something obvious. > If/when contention windows are smaller, cpu can run uncontended, and can > consume more cycles to process more frames ? > > With a non yet published patch, I even can reach 600.000 pps in DDOS > situations, instead of 400.000. So my tests are simpler. What i was hoping to see was at minimum rps maintains its gap of 6-7% more capacity. I dont mind seeing rps get better. If both rps and non-rps get better that even more interesting. cheers, jamal
On Thu, Apr 29, 2010 at 7:35 PM, jamal <hadi@cyberus.ca> wrote: > > Same here - even in my worst case scenario 88.5% of 750Kpps > 600Kpps. > Attached is history results to make more sense of what i am saying: > we have net-next kernels from apr14, apr23, apr23 with changlis change, > apr28, apr28 with your change. What you'll see is non-rps (blue) gets > better and rps (Orange) gets better slowly then by apr28 it is worse. Did the number of IPIs increase in the apr28 test? The finial patch with Eric's change may introduce more IPIs. And I am wondering why 23rdcl-non-rps is better than before. Maybe it is the side effect of my patch: enlarge the netdev_max_backlog.
Le jeudi 29 avril 2010 à 20:12 +0800, Changli Gao a écrit : > On Thu, Apr 29, 2010 at 7:35 PM, jamal <hadi@cyberus.ca> wrote: > > > > Same here - even in my worst case scenario 88.5% of 750Kpps > 600Kpps. > > Attached is history results to make more sense of what i am saying: > > we have net-next kernels from apr14, apr23, apr23 with changlis change, > > apr28, apr28 with your change. What you'll see is non-rps (blue) gets > > better and rps (Orange) gets better slowly then by apr28 it is worse. > > Did the number of IPIs increase in the apr28 test? The finial patch > with Eric's change may introduce more IPIs. And I am wondering why > 23rdcl-non-rps is better than before. Maybe it is the side effect of > my patch: enlarge the netdev_max_backlog. > > Changli, I wonder how you can cook "performance" patches without testing them at all for real... This cannot be true ? When the cpu doing the device softirq is flooded, it handles 300 packets per net_rx_action() round (netdev_budget), so sends at most 6 ipis per 300 packets, with or without my patch, with or without your patch as well. (At most because if remote cpus are flooded as well, they dont napi_complete so no IPI needed at all) (My patch had an effect only on normal load, ie one packet received in a while... up to 50.000 pps I would say). And it also has a nice effect on non RPS loads (mostly the more typical load for following years). If a second packet comes 3us after the first one, and before 2nd CPU handled it, we _can_ afford an extra IPI. 750.000/50 = 15.000 IPI per second. Even with 200.000 IPI per second, 'perf top -C CPU_IPI_sender' shows that sending IPI is very cheap (maybe ~1% of cpu cycles) # Samples: 32033467127 # # Overhead Command Shared Object Symbol # ........ .............. ................. ...... # 18.05% init [kernel.kallsyms] [k] poll_idle 10.91% init [kernel.kallsyms] [k] bnx2x_rx_int 10.42% init [kernel.kallsyms] [k] eth_type_trans 5.72% init [kernel.kallsyms] [k] kmem_cache_alloc_node 5.43% init [kernel.kallsyms] [k] __memset 5.20% init [kernel.kallsyms] [k] get_rps_cpu 4.82% init [kernel.kallsyms] [k] __slab_alloc 4.34% init [kernel.kallsyms] [k] get_partial_node 4.22% init [kernel.kallsyms] [k] _raw_spin_lock 3.41% init [kernel.kallsyms] [k] __kmalloc_node_track_caller 3.01% init [kernel.kallsyms] [k] __alloc_skb 2.22% init [kernel.kallsyms] [k] enqueue_to_backlog 2.10% init [kernel.kallsyms] [k] vlan_gro_common 1.34% init [kernel.kallsyms] [k] swiotlb_map_page 1.25% init [kernel.kallsyms] [k] skb_put 1.06% init [kernel.kallsyms] [k] _raw_spin_lock_irqsave 0.92% init [kernel.kallsyms] [k] dev_gro_receive 0.88% init [kernel.kallsyms] [k] swiotlb_dma_mapping_error 0.83% init [kernel.kallsyms] [k] vlan_gro_receive 0.83% init [kernel.kallsyms] [k] __phys_addr 0.83% init [kernel.kallsyms] [k] __napi_complete 0.83% init [kernel.kallsyms] [k] default_send_IPI_mask_sequence_phys 0.77% init [kernel.kallsyms] [k] is_swiotlb_buffer 0.76% init [kernel.kallsyms] [k] __netdev_alloc_skb 0.74% init [kernel.kallsyms] [k] deactivate_slab 0.73% init [kernel.kallsyms] [k] netif_receive_skb 0.72% init [kernel.kallsyms] [k] unmap_single 0.69% init [kernel.kallsyms] [k] csd_lock 0.63% init [kernel.kallsyms] [k] bnx2x_poll 0.61% init [kernel.kallsyms] [k] bnx2x_msix_fp_int 0.59% init [kernel.kallsyms] [k] irq_entries_start 0.59% init [kernel.kallsyms] [k] swiotlb_sync_single 0.54% init [kernel.kallsyms] [k] get_slab 0.46% init [kernel.kallsyms] [k] napi_skb_finish -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2010-04-29 at 14:45 +0200, Eric Dumazet wrote: > > Changli, I wonder how you can cook "performance" patches without testing > them at all for real... This cannot be true ? Eric, I am with you, however you are in the minority of people who test and produce numbers ;-> The system rewards people for sending patches not much for anything else - so i cant blame Changli ;-> > When the cpu doing the device softirq is flooded, it handles 300 packets > per net_rx_action() round (netdev_budget), so sends at most 6 ipis per > 300 packets, with or without my patch, with or without your patch as > well. > > (At most because if remote cpus are flooded as well, they dont > napi_complete so no IPI needed at all) > > (My patch had an effect only on normal load, ie one packet received in a > while... up to 50.000 pps I would say). And it also has a nice effect on > non RPS loads (mostly the more typical load for following years). > If a second packet comes 3us after the first one, and before 2nd CPU > handled it, we _can_ afford an extra IPI. > > 750.000/50 = 15.000 IPI per second. Could we have some stat in there that shows IPIs being produced? I think it would help to at least observe any changes over variety of tests. I did try to patch my system during the first few tests to record IPIs but it seems to make more sense to have it as a perf stat. > Even with 200.000 IPI per second, 'perf top -C CPU_IPI_sender' shows > that sending IPI is very cheap (maybe ~1% of cpu cycles) > > # Samples: 32033467127 > # One thing i observed is our profiles seem different. Could you send me your .config for a single nehalem and i will try to go as close as possible to it? I have a sky2 instead of bnx - but i suspect everything else will be very similar... I apologize i dont have much time to look into details - but what i can do is test at least. cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le jeudi 29 avril 2010 à 09:17 -0400, jamal a écrit : > Could we have some stat in there that shows IPIs being produced? I think > it would help to at least observe any changes over variety of tests. > I did try to patch my system during the first few tests to record IPIs > but it seems to make more sense to have it as a perf stat. > > > Even with 200.000 IPI per second, 'perf top -C CPU_IPI_sender' shows > > that sending IPI is very cheap (maybe ~1% of cpu cycles) > > > > # Samples: 32033467127 > > # > > One thing i observed is our profiles seem different. Could you send me > your .config for a single nehalem and i will try to go as close as > possible to it? I have a sky2 instead of bnx - but i suspect everything > else will be very similar... > I apologize i dont have much time to look into details - but what i can > do is test at least. I'am going to redo some test on my 'old machine', with tg3 driver. You could try following program : #include <stdio.h> #include <string.h> #include <stdlib.h> #include <unistd.h> struct softnet_stat_vals { int flip; unsigned int tab[2][10]; }; int read_file(struct softnet_stat_vals *v) { char buffer[1024]; FILE *F = fopen("/proc/net/softnet_stat", "r"); v->flip ^= 1; if (!F) return -1; memset(v->tab[v->flip], 0, 10 * sizeof(unsigned int)); while (fgets(buffer, sizeof(buffer), F)) { int i, pos = 0; unsigned int val; for (i = 0; ;) { if (sscanf(buffer + pos, "%08x", &val) != 1) break; v->tab[v->flip][i] += val; pos += 9; if (++i == 10) break; } } fclose(F); } int main(int argc, char *argv[]) { struct softnet_stat_vals *v = calloc(sizeof(struct softnet_stat_vals), 1); read_file(v); for (;;) { sleep(1); read_file(v); printf("%u rps\n", v->tab[v->flip][9] - v->tab[v->flip^1][9]); } } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2010-04-29 at 15:21 +0200, Eric Dumazet wrote: > > You could try following program : > Will do later today (test machine is not on the network and is about 20 minutes from here; so worst case i will get you results by end of day) I guess this program is good enough since it tells me the system wide ipi count - what my patch did was also to break it down by which cpu got how many IPIs (served to check if there was uneven distribution) > > Is your application mono threaded and receiving data to 8 sockets ? > I fork one instance per detected cpu and bind to different ports each time. Example bind to port 8200 on cpu0, 8201 on cpu1, etc. cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le jeudi 29 avril 2010 à 09:37 -0400, jamal a écrit : > On Thu, 2010-04-29 at 15:21 +0200, Eric Dumazet wrote: > > > > > > You could try following program : > > > > Will do later today (test machine is not on the network and is about 20 > minutes from here; so worst case i will get you results by end of day) > I guess this program is good enough since it tells me the system wide > ipi count - what my patch did was also to break it down by which cpu got > how many IPIs (served to check if there was uneven distribution) > > > > > Is your application mono threaded and receiving data to 8 sockets ? > > > > I fork one instance per detected cpu and bind to different ports each > time. Example bind to port 8200 on cpu0, 8201 on cpu1, etc. > I guess this is the problem ;) With RPS, you should not bind your threads to cpu. This is the rps hash who will decide for you. I am using following program : /* * Usage: udpsink [ -p baseport] nbports * */ #include <sys/socket.h> #include <netinet/in.h> #include <arpa/inet.h> #include <string.h> #include <stdio.h> #include <errno.h> #include <unistd.h> #include <stdlib.h> #include <fcntl.h> struct worker_data { int fd; unsigned long pack_count; unsigned long bytes_count; unsigned long _padd[16 - 3]; /* alignment */ }; void usage(int code) { fprintf(stderr, "Usage: udpsink [-p baseport] nbports\n"); exit(code); } void *worker_func(void *arg) { struct worker_data *wdata = (struct worker_data *)arg; char buffer[4096]; struct sockaddr_in addr; int lu; while (1) { socklen_t len = sizeof(addr); lu = recvfrom(wdata->fd, buffer, sizeof(buffer), 0, (struct sockaddr *)&addr, &len); if (lu > 0) { wdata->pack_count++; wdata->bytes_count += lu; } } } int main(int argc, char *argv[]) { int c; int baseport = 4000; int nbthreads; struct worker_data *wdata; unsigned long ototal = 0; int concurrent = 0; int verbose = 0; int i; while ((c = getopt(argc, argv, "cvp:")) != -1) { if (c == 'p') baseport = atoi(optarg); else if (c == 'c') concurrent = 1; else if (c == 'v') verbose++; else usage(1); } if (optind == argc) usage(1); nbthreads = atoi(argv[optind]); wdata = calloc(sizeof(struct worker_data), nbthreads); if (!wdata) { perror("calloc"); return 1; } for (i = 0; i < nbthreads; i++) { struct sockaddr_in addr; pthread_t tid; if (i && concurrent) { wdata[i].fd = wdata[0].fd ; } else { wdata[i].fd = socket(PF_INET, SOCK_DGRAM, 0); if (wdata[i].fd == -1) { perror("socket"); return 1; } memset(&addr, 0, sizeof(addr)); addr.sin_family = AF_INET; // addr.sin_addr.s_addr = inet_addr(argv[optind]); addr.sin_port = htons(baseport + i); if (bind(wdata[i].fd, (struct sockaddr *) &addr, sizeof(addr)) < 0) { perror("bind"); return 1; } // fcntl(wdata[i].fd, F_SETFL, O_NDELAY); } pthread_create(&tid, NULL, worker_func, wdata + i); } for (;;) { unsigned long total; long delta; sleep(1); total = 0; for (i = 0; i < nbthreads;i++) { total += wdata[i].pack_count; } delta = total - ototal; if (delta) { printf("%lu pps (%lu", delta, total); if (verbose) { for (i = 0; i < nbthreads;i++) { if (wdata[i].pack_count) printf(" %d:%lu", i, wdata[i].pack_count); } } printf(")\n"); } ototal = total; } } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2010-04-29 at 15:49 +0200, Eric Dumazet wrote: > > I fork one instance per detected cpu and bind to different ports each > > time. Example bind to port 8200 on cpu0, 8201 on cpu1, etc. > > > > I guess this is the problem ;) > > With RPS, you should not bind your threads to cpu. > This is the rps hash who will decide for you. > Sorry - I was not clear; i have the option of binding to cpu vs the setsched api; but what i meant in this case is: - for each cpu detected, fork -- open socket ---bind to udp port cpu# + 8200 I could also bind to a cpu in the last step and i did notice it improved distribution - but all my tests since apr23 dont do that ;-> > > I am using following program : > I will try your program instead so we can reduce the variables cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2010-04-29 at 09:56 -0400, jamal wrote: > > I will try your program instead so we can reduce the variables Results attached. With your app rps does a hell lot better and non-rps worse ;-> With my proggie, non-rps does much better than yours and rps does a lot worse for same setup. I see the scheduler kicking quiet a bit in non-rps for you... The main difference between us as i see it is: a) i use epoll - actually linked to libevent (1.0.something) b) I fork processes and you use pthreads. I dont have time to chase it today, but 1) I am either going to change yours to use libevent or make mine get rid of it then 2) move towards pthreads or have yours fork.. then observe if that makes any difference.. cheers, jamal No RPS; same kernel as yesterday with Eric's changes ------------------------------------------------------------------------------- PerfTop: 2572 irqs/sec kernel:94.7% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- samples pcnt function DSO _______ _____ ___________________________ ________ 2901.00 17.4% sky2_poll [sky2] 781.00 4.7% schedule [kernel] 574.00 3.4% __skb_recv_datagram [kernel] 518.00 3.1% _raw_spin_lock_irqsave [kernel] 460.00 2.8% udp_recvmsg [kernel] 457.00 2.7% copy_user_generic_string [kernel] 397.00 2.4% _raw_spin_lock_bh [kernel] 340.00 2.0% __udp4_lib_lookup [kernel] 320.00 1.9% ip_route_input [kernel] 295.00 1.8% _raw_spin_lock [kernel] 293.00 1.8% dst_release [kernel] 282.00 1.7% ip_rcv [kernel] 275.00 1.6% skb_copy_datagram_iovec [kernel] 263.00 1.6% __switch_to [kernel] 257.00 1.5% __alloc_skb [kernel] 256.00 1.5% system_call [kernel] 243.00 1.5% sock_recv_ts_and_drops [kernel] 227.00 1.4% sock_queue_rcv_skb [kernel] 225.00 1.3% _raw_spin_unlock_irqrestore [kernel] 220.00 1.3% fget_light [kernel] 218.00 1.3% pick_next_task_fair [kernel] ------------------------------------------------------------------------------- PerfTop: 1000 irqs/sec kernel:100.0% [1000Hz cycles], (all, cpu: 0) ------------------------------------------------------------------------------- samples pcnt function DSO _______ _____ ___________________________ ________ 1508.00 37.9% sky2_poll [sky2] 198.00 5.0% ip_route_input [kernel] 184.00 4.6% __udp4_lib_lookup [kernel] 172.00 4.3% ip_rcv [kernel] 139.00 3.5% _raw_spin_lock [kernel] 131.00 3.3% __alloc_skb [kernel] 130.00 3.3% sock_queue_rcv_skb [kernel] 111.00 2.8% __udp4_lib_rcv [kernel] 101.00 2.5% __netif_receive_skb [kernel] 78.00 2.0% select_task_rq_fair [kernel] 74.00 1.9% try_to_wake_up [kernel] 73.00 1.8% sock_def_readable [kernel] 72.00 1.8% _raw_spin_lock_irqsave [kernel] 67.00 1.7% task_rq_lock [kernel] 66.00 1.7% _raw_read_lock [kernel] 64.00 1.6% __kmalloc [kernel] 62.00 1.6% resched_task [kernel] 61.00 1.5% sky2_rx_submit [sky2] 52.00 1.3% ip_local_deliver [kernel] 51.00 1.3% kmem_cache_alloc [kernel] 51.00 1.3% swiotlb_sync_single [kernel] 43.00 1.1% sky2_remove [sky2] 41.00 1.0% udp_queue_rcv_skb [kernel] 39.00 1.0% __wake_up_common [kernel] ------------------------------------------------------------------------------- PerfTop: 368 irqs/sec kernel:95.9% [1000Hz cycles], (all, cpu: 1) ------------------------------------------------------------------------------- samples pcnt function DSO _______ _____ ___________________________ ________ 279.00 8.2% schedule [kernel] 260.00 7.7% __skb_recv_datagram [kernel] 196.00 5.8% _raw_spin_lock_bh [kernel] 180.00 5.3% copy_user_generic_string [kernel] 176.00 5.2% udp_recvmsg [kernel] 150.00 4.4% _raw_spin_lock_irqsave [kernel] 142.00 4.2% dst_release [kernel] 106.00 3.1% skb_copy_datagram_iovec [kernel] 97.00 2.9% sock_recv_ts_and_drops [kernel] 93.00 2.7% tick_nohz_stop_sched_tick [kernel] 89.00 2.6% sys_recvfrom [kernel] 89.00 2.6% __switch_to [kernel] 86.00 2.5% pick_next_task_fair [kernel] 82.00 2.4% sock_rfree [kernel] 75.00 2.2% system_call [kernel] 73.00 2.2% fget_light [kernel] 70.00 2.1% _raw_spin_lock_irq [kernel] 63.00 1.9% kmem_cache_free [kernel] 61.00 1.8% _raw_spin_unlock_irqrestore [kernel] 60.00 1.8% kfree [kernel] 56.00 1.7% select_nohz_load_balancer [kernel] 55.00 1.6% finish_task_switch [kernel] 48.00 1.4% inet_recvmsg [kernel] 41.00 1.2% security_socket_recvmsg [kernel] ------------------------------------------------------------------------------- PerfTop: 97 irqs/sec kernel:81.4% [1000Hz cycles], (all, cpu: 7) ------------------------------------------------------------------------------- samples pcnt function DSO _______ _____ ____________________________ ________ 55.00 10.8% schedule [kernel] 38.00 7.5% __skb_recv_datagram [kernel] 36.00 7.1% udp_recvmsg [kernel] 32.00 6.3% _raw_spin_lock_irqsave [kernel] 31.00 6.1% _raw_spin_lock_bh [kernel] 30.00 5.9% copy_user_generic_string [kernel] 29.00 5.7% sock_recv_ts_and_drops [kernel] 27.00 5.3% skb_copy_datagram_iovec [kernel] 17.00 3.3% system_call [kernel] 17.00 3.3% dst_release [kernel] 14.00 2.7% _raw_spin_unlock_irqrestore [kernel] 12.00 2.4% __switch_to [kernel] 12.00 2.4% pick_next_task_fair [kernel] 11.00 2.2% inet_recvmsg [kernel] 11.00 2.2% sys_recvfrom [kernel] 10.00 2.0% finish_task_switch [kernel] 10.00 2.0% sock_rfree [kernel] 10.00 2.0% select_nohz_load_balancer [kernel] 7.00 1.4% rcu_enter_nohz [kernel] 7.00 1.4% tick_nohz_stop_sched_tick [kernel] 7.00 1.4% tick_nohz_restart_sched_tick [kernel] 5.00 1.0% ktime_get [kernel] Run1 ---- 557257 pps (557257 0:69750 1:69417 2:69063 3:68818 4:70139 5:69824 6:70135 7:70113) 737468 pps (1294725 0:162765 1:162430 2:162075 3:155770 4:163150 5:162838 6:163150 7:162549) 744238 pps (2038963 0:255795 1:255460 2:255105 3:248800 4:256180 5:255867 6:256180 7:255579) 719343 pps (2758306 0:348825 1:348202 2:348135 3:338166 4:349210 5:333030 6:349210 7:343528) 741830 pps (3500136 0:440870 1:440933 2:441165 3:430162 4:442240 5:425970 6:442240 7:436558) 686289 pps (4186425 0:533900 1:533749 2:515637 3:511486 4:531997 5:504717 6:525536 7:529406) 681708 pps (4868133 0:613701 1:617409 2:608667 3:599774 4:607480 5:589487 6:609802 7:621817) 697577 pps (5565710 0:704183 1:710439 2:688904 3:681696 4:689120 5:673932 6:702448 7:714988) 729284 pps (6294994 0:797213 1:803469 2:775863 3:770959 4:781160 5:766105 6:792207 7:808018) 734160 pps (7029154 0:886389 1:896504 2:868898 3:863506 4:868426 5:859138 6:885242 7:901053) 728541 pps (7757695 0:978789 1:989534 2:961928 3:946834 4:961458 5:952170 6:978272 7:988714) 709578 pps (8467273 0:1071819 1:1079000 2:1041101 3:1038974 4:1047215 5:1037254 6:1070168 7:1081744) 684154 pps (9151427 0:1160855 1:1158471 2:1122874 3:1129012 4:1136563 5:1120258 6:1153624 7:1169773) 498291 pps (9649718 0:1224303 1:1214178 2:1185737 3:1191467 4:1200058 5:1183753 6:1217121 7:1233101) Essentially sink in about 96.5% of 10M packet run2 --- 402553 pps (402553 0:51530 1:53289 2:53625 3:45748 4:53625 5:49484 6:42292 7:52960) 711539 pps (1114092 0:144028 1:146426 2:144237 3:124551 4:146760 5:142619 6:119376 7:146095) 692319 pps (1806411 0:208285 1:239557 2:220103 3:211096 4:239890 5:235749 6:212506 7:239225) 731896 pps (2538307 0:301450 1:332723 2:308718 3:304264 4:333055 5:320036 6:305671 7:332390) 712869 pps (3251176 0:393270 1:418806 2:397578 3:396844 4:426245 5:406943 6:398861 7:412629) 681513 pps (3932689 0:486300 1:501926 2:490613 3:489874 4:466455 5:499973 6:491891 7:505659) 697308 pps (4629997 0:567969 1:585032 2:583643 3:576712 4:548243 5:589399 6:581080 7:597922) 712903 pps (5342900 0:657579 1:660221 2:676673 3:669744 4:641273 5:682222 6:674110 7:681082) 687765 pps (6030665 0:744421 1:752470 2:764631 3:751445 4:722250 5:771799 6:761224 7:762426) 695799 pps (6726464 0:832438 1:842797 2:853337 3:844470 4:804427 5:857412 6:846918 7:844668) 720011 pps (7446475 0:925210 1:934696 2:934883 3:937280 4:894644 5:949883 6:932740 7:937142) 712021 pps (8158496 0:1017246 1:1027726 2:1016841 3:1024712 4:978513 5:1042913 6:1023516 7:1027031) 709810 pps (8868306 0:1098522 1:1111823 2:1109871 3:1117444 4:1070124 5:1131774 6:1109841 7:1118909) 591817 pps (9460123 0:1178005 1:1185698 2:1189381 3:1196367 4:1143880 5:1198406 6:1176121 7:1192265) 94.6% run3 --- 682714 pps (682714 0:83336 1:86683 2:86895 3:86243 4:84616 5:81152 6:86895 7:86895) 691212 pps (1373926 0:164602 1:179240 2:171897 3:174162 4:176509 5:158115 6:174083 7:175321) 661913 pps (2035839 0:243004 1:263829 2:259312 3:267160 4:268875 5:231009 6:253411 7:249239) 715612 pps (2751451 0:336034 1:350220 2:346461 3:360190 4:359219 5:317625 6:346441 7:335265) 655354 pps (3406805 0:419339 1:434934 2:432010 3:442138 4:437837 5:394805 6:427064 7:418679) 592126 pps (3998931 0:494253 1:511454 2:508829 3:511992 4:508978 5:474866 6:496884 7:491679) 697177 pps (4696108 0:584474 1:601703 2:589111 3:602252 4:598767 5:565114 6:582153 7:572539) 681004 pps (5377112 0:662864 1:684427 2:678825 3:688402 4:685441 5:651962 6:673697 7:651495) 669622 pps (6046734 0:740275 1:765126 2:762764 3:773772 4:772144 5:731330 6:762339 7:738987) 645906 pps (6692640 0:825606 1:850550 2:846793 3:858243 4:850408 5:812402 6:838248 7:810391) 705873 pps (7398513 0:916877 1:937693 2:929956 3:950433 4:938179 5:894913 6:928125 7:902337) 735460 pps (8133973 0:1009907 1:1030722 2:1022986 3:1037959 4:1031209 5:987943 6:1021155 7:992092) 707605 pps (8841578 0:1102933 1:1122367 2:1101160 3:1129212 4:1124239 5:1063617 6:1112929 7:1085122) 347807 pps (9189385 0:1149677 1:1168026 2:1147905 3:1170556 4:1158858 5:1110362 6:1152134 7:1131867) 91.9% run4 ---- 552606 pps (552606 0:72743 1:75411 2:67732 3:70204 4:63741 5:64934 6:66096 7:71746) 684450 pps (1237056 0:162839 1:165064 2:148974 3:160417 4:153919 5:135895 6:156238 7:153710) 696799 pps (1933855 0:254440 1:252304 2:240107 3:249399 4:246028 5:228009 6:247409 7:216161) 676546 pps (2610401 0:341132 1:336959 2:325332 3:330438 4:336250 5:305238 6:336208 7:298848) 712251 pps (3322652 0:432976 1:428990 2:413228 3:419977 4:425918 5:386917 6:426275 7:388371) 615680 pps (3938332 0:515679 1:497421 2:491618 3:505449 4:489452 5:462820 6:505336 7:470561) 635467 pps (4573799 0:597340 1:582917 2:555389 3:582751 4:573273 5:545378 6:584378 7:552373) 725581 pps (5299380 0:690038 1:675870 2:636347 3:676029 4:666231 5:632208 6:677337 7:645324) 699015 pps (5998395 0:783068 1:763654 2:725184 3:762784 4:752559 5:709123 6:764439 7:737586) 674472 pps (6672867 0:872645 1:847669 2:808333 3:827766 4:842267 5:798997 6:853779 7:821412) 680913 pps (7353780 0:961487 1:926760 2:887273 3:919158 4:925165 5:891082 6:929793 7:913064) 666279 pps (8020059 0:1050823 1:1012028 2:972691 3:988738 4:1009904 5:974127 6:1017940 7:993808) 680615 pps (8700674 0:1124223 1:1087779 2:1057541 3:1080546 4:1094373 5:1066880 6:1102496 7:1086838) 420306 pps (9120980 0:1177541 1:1130287 2:1111621 3:1134624 4:1148453 5:1120960 6:1156576 7:1140918) 91.2% run5 ------ 294229 pps (294229 0:38805 1:30946 2:32655 3:36613 4:38805 5:38805 6:38800 7:38801) 694748 pps (988977 0:124394 1:123976 2:114107 3:128079 4:111317 5:131835 6:131835 7:123434) 690185 pps (1679162 0:217405 1:216988 2:194192 3:204091 4:195948 5:224678 6:220924 7:204937) 726561 pps (2405723 0:307828 1:309671 2:278163 3:296811 4:286642 5:317346 6:311296 7:297967) 695974 pps (3101697 0:391228 1:395256 2:371056 3:388790 4:379533 5:410242 6:393051 7:372541) 665395 pps (3767092 0:473134 1:484367 2:447394 3:462837 4:471026 5:491170 6:473947 7:463219) 671483 pps (4438575 0:562883 1:574014 2:534258 3:544512 4:534064 5:581420 6:560073 7:547353) 679400 pps (5117975 0:641135 1:663809 2:618019 3:633448 4:605085 5:674433 6:649865 7:632183) 696263 pps (5814238 0:734516 1:743715 2:711049 3:717481 4:693193 5:758493 6:740374 7:715417) 681791 pps (6496029 0:823596 1:836004 2:795579 3:809104 4:783457 5:820061 6:820219 7:808010) 670672 pps (7166701 0:911202 1:927618 2:888127 3:875504 4:874363 5:889342 6:911838 7:888707) 743444 pps (7910145 0:1004233 1:1020652 2:981157 3:968534 4:967393 5:982078 6:1004362 7:981737) 725623 pps (8635768 0:1096546 1:1113682 2:1059978 3:1061564 4:1060423 5:1072761 6:1097392 7:1073423) 662504 pps (9298272 0:1171688 1:1197579 2:1137559 3:1154595 4:1146405 5:1161670 6:1176001 7:1152776) 12979 pps (9311251 0:1173488 1:1199379 2:1137914 3:1156399 4:1148209 5:1163475 6:1177806 7:1154581) 93.1% Average for no-rps 93.5% of 10M incoming at ~ 750Kpps. # echo 1 > /proc/irq/55/smp_affinity # echo ee > /sys/class/net/eth0/queues/rx-0/rps_cpus ------------------------------------------------------------------------------- PerfTop: 2273 irqs/sec kernel:93.7% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- samples pcnt function DSO _______ _____ ______________________________ ________ 922.00 10.3% sky2_poll [sky2] 402.00 4.5% __netif_receive_skb [kernel] 400.00 4.4% ip_rcv [kernel] 356.00 4.0% call_function_single_interrupt [kernel] 339.00 3.8% ip_route_input [kernel] 282.00 3.1% schedule [kernel] 194.00 2.2% _raw_spin_lock_irqsave [kernel] 180.00 2.0% sock_recv_ts_and_drops [kernel] 178.00 2.0% _raw_spin_lock [kernel] 173.00 1.9% __udp4_lib_lookup [kernel] 171.00 1.9% __udp4_lib_rcv [kernel] 162.00 1.8% system_call [kernel] 154.00 1.7% kfree [kernel] 147.00 1.6% __skb_recv_datagram [kernel] 146.00 1.6% copy_user_generic_string [kernel] 136.00 1.5% dst_release [kernel] 136.00 1.5% _raw_spin_unlock_irqrestore [kernel] 126.00 1.4% fget_light [kernel] 126.00 1.4% sky2_intr [sky2] 122.00 1.4% udp_recvmsg [kernel] 111.00 1.2% sock_queue_rcv_skb [kernel] ------------------------------------------------------------------------------- PerfTop: 325 irqs/sec kernel:93.2% [1000Hz cycles], (all, cpu: 0) ------------------------------------------------------------------------------- samples pcnt function DSO _______ _____ ___________________________________ ________ 1033.00 62.9% sky2_poll [sky2] 159.00 9.7% sky2_intr [sky2] 119.00 7.3% irq_entries_start [kernel] 51.00 3.1% __alloc_skb [kernel] 48.00 2.9% get_rps_cpu [kernel] 24.00 1.5% __kmalloc [kernel] 23.00 1.4% swiotlb_sync_single [kernel] 20.00 1.2% _raw_spin_lock [kernel] 17.00 1.0% sky2_rx_submit [sky2] 15.00 0.9% enqueue_to_backlog [kernel] 14.00 0.9% kmem_cache_alloc [kernel] 11.00 0.7% default_send_IPI_mask_sequence_phys [kernel] 10.00 0.6% sky2_remove [sky2] 10.00 0.6% cache_alloc_refill [kernel] 8.00 0.5% _raw_spin_lock_irqsave [kernel] 7.00 0.4% dev_gro_receive [kernel] 6.00 0.4% net_rx_action [kernel] 6.00 0.4% __netdev_alloc_skb [kernel] 6.00 0.4% load_balance [kernel] 5.00 0.3% __smp_call_function_single [kernel] ------------------------------------------------------------------------------- PerfTop: 347 irqs/sec kernel:96.3% [1000Hz cycles], (all, cpu: 1) ------------------------------------------------------------------------------- samples pcnt function DSO _______ _____ ______________________________ ________ 104.00 6.7% call_function_single_interrupt [kernel] 104.00 6.7% __netif_receive_skb [kernel] 95.00 6.1% ip_rcv [kernel] 93.00 6.0% ip_route_input [kernel] 62.00 4.0% schedule [kernel] 49.00 3.2% sock_recv_ts_and_drops [kernel] 46.00 3.0% system_call [kernel] 46.00 3.0% dst_release [kernel] 45.00 2.9% _raw_spin_lock [kernel] 41.00 2.7% _raw_spin_lock_irqsave [kernel] 40.00 2.6% _raw_spin_unlock_irqrestore [kernel] 36.00 2.3% copy_user_generic_string [kernel] 34.00 2.2% __udp4_lib_rcv [kernel] 30.00 1.9% fget_light [kernel] 30.00 1.9% sock_queue_rcv_skb [kernel] 28.00 1.8% udp_recvmsg [kernel] 28.00 1.8% __udp4_lib_lookup [kernel] 26.00 1.7% select_task_rq_fair [kernel] 25.00 1.6% tick_nohz_stop_sched_tick [kernel] 23.00 1.5% __napi_complete [kernel] 20.00 1.3% __switch_to [kernel] 20.00 1.3% finish_task_switch [kernel] 20.00 1.3% kmem_cache_free [kernel] 20.00 1.3% sys_recvfrom [kernel] 19.00 1.2% kfree [kernel] 19.00 1.2% __skb_recv_datagram [kernel] ------------------------------------------------------------------------------- PerfTop: 243 irqs/sec kernel:95.5% [1000Hz cycles], (all, cpu: 7) ------------------------------------------------------------------------------- samples pcnt function DSO _______ _____ ______________________________ ________ 92.00 7.3% ip_rcv [kernel] 74.00 5.9% __netif_receive_skb [kernel] 57.00 4.6% ip_route_input [kernel] 49.00 3.9% sock_recv_ts_and_drops [kernel] 49.00 3.9% system_call [kernel] 47.00 3.8% schedule [kernel] 39.00 3.1% _raw_spin_lock_irqsave [kernel] 36.00 2.9% call_function_single_interrupt [kernel] 34.00 2.7% udp_recvmsg [kernel] 32.00 2.6% __udp4_lib_rcv [kernel] 31.00 2.5% copy_user_generic_string [kernel] 31.00 2.5% fget_light [kernel] 30.00 2.4% __udp4_lib_lookup [kernel] 26.00 2.1% kfree [kernel] 25.00 2.0% __skb_recv_datagram [kernel] 25.00 2.0% sock_queue_rcv_skb [kernel] 23.00 1.8% __switch_to [kernel] 22.00 1.8% sock_recvmsg [kernel] 22.00 1.8% _raw_spin_unlock_irqrestore [kernel] 21.00 1.7% select_task_rq_fair [kernel] 18.00 1.4% _raw_spin_lock [kernel] 17.00 1.4% process_backlog [kernel] 17.00 1.4% sys_recvfrom [kernel] 17.00 1.4% _raw_spin_lock_bh [kernel] run1 ---- 590479 pps (590479 0:73820 1:73817 2:73820 3:73819 4:73815 5:73815 6:73815 7:73815) 744641 pps (1335120 0:166895 1:166895 2:166895 3:166895 4:166895 5:166895 6:166895 7:166895) 744374 pps (2079494 0:259940 1:259940 2:259940 3:259940 4:259940 5:259940 6:259940 7:259940) 744340 pps (2823834 0:352985 1:352985 2:352985 3:352985 4:352985 5:352985 6:352980 7:352985) 744390 pps (3568224 0:446035 1:446035 2:446035 3:446035 4:446035 5:446035 6:446032 7:446030) 744404 pps (4312628 0:539085 1:539085 2:539085 3:539081 4:539085 5:539085 6:539085 7:539085) 744369 pps (5056997 0:632130 1:632130 2:632130 3:632130 4:632130 5:632130 6:632130 7:632130) 744394 pps (5801391 0:725180 1:725180 2:725180 3:725180 4:725180 5:725180 6:725180 7:725180) 744399 pps (6545790 0:818230 1:818230 2:818229 3:818230 4:818230 5:818226 6:818225 7:818225) 744354 pps (7290144 0:911275 1:911275 2:911275 3:911275 4:911270 5:911270 6:911270 7:911270) 744363 pps (8034507 0:1004320 1:1004320 2:1004320 3:1004320 4:1004320 5:1004306 6:1004320 7:1004317) 744379 pps (8778886 0:1097370 1:1097368 2:1097370 3:1097370 4:1097370 5:1097356 6:1097367 7:1097365) 744449 pps (9523335 0:1190425 1:1190425 2:1190425 3:1190421 4:1190425 5:1190411 6:1190425 7:1190425) 476651 pps (9999986 0:1250000 1:1250000 2:1250000 3:1250000 4:1250000 5:1249986 6:1250000 7:1250000) 99.9% ! rps counter.. 865721 rps 1067721 rps run2 ---- 573759 pps (573759 0:71720 1:71720 2:71720 3:71723 4:71721 5:71720 6:71720 7:71719) 744249 pps (1318008 0:164755 1:164753 2:164750 3:164750 4:164750 5:164750 6:164750 7:164750) 744260 pps (2062268 0:257785 1:257785 2:257785 3:257785 4:257785 5:257783 6:257780 7:257780) 744238 pps (2806506 0:350815 1:350815 2:350815 3:350815 4:350815 5:350811 6:350810 7:350810) 744233 pps (3550739 0:443845 1:443845 2:443845 3:443845 4:443844 5:443841 6:443841 7:443840) 744236 pps (4294975 0:536875 1:536875 2:536875 3:536870 4:536870 5:536870 6:536870 7:536870) 744244 pps (5039219 0:629905 1:629905 2:629905 3:629905 4:629905 5:629901 6:629901 7:629900) 744240 pps (5783459 0:722935 1:722935 2:722935 3:722934 4:722930 5:722930 6:722930 7:722930) 744214 pps (6527673 0:815962 1:815960 2:815965 3:815963 4:815962 5:815960 6:815955 7:815955) 744268 pps (7271941 0:908995 1:908995 2:908995 3:908995 4:908991 5:908990 6:908990 7:908990) 744239 pps (8016180 0:1002025 1:1002025 2:1002025 3:1002025 4:1002020 5:1002020 6:1002020 7:1002020) 744241 pps (8760421 0:1095055 1:1095055 2:1095052 3:1095055 4:1095055 5:1095050 6:1095050 7:1095050) 744234 pps (9504655 0:1188085 1:1188085 2:1188084 3:1188085 4:1188085 5:1188081 6:1188080 7:1188080) 495345 pps (10000000 0:1250000 1:1250000 2:1250000 3:1250000 4:1250000 5:1250000 6:1250000 7:1250000) 100.0% !!! rps count .. 3651 rps 1455997 rps 498777 rps run3 ---- 72947 pps (72947 0:9120 1:9120 2:9120 3:9120 4:9120 5:9117 6:9115 7:9115) 744616 pps (817563 0:102198 1:102195 2:102195 3:102195 4:102195 5:102195 6:102195 7:102195) 744710 pps (1562273 0:195285 1:195285 2:195285 3:195285 4:195285 5:195285 6:195285 7:195283) 744478 pps (2306751 0:288345 1:288345 2:288345 3:288345 4:288345 5:288345 6:288341 7:288340) 744603 pps (3051354 0:381422 1:381420 2:381420 3:381414 4:381420 5:381420 6:381420 7:381420) 744475 pps (3795829 0:474480 1:474480 2:474480 3:474472 4:474480 5:474480 6:474480 7:474477) 744740 pps (4540569 0:567575 1:567575 2:567575 3:567564 4:567570 5:567570 6:567570 7:567570) 744641 pps (5285210 0:660655 1:660655 2:660655 3:660646 4:660650 5:660650 6:660650 7:660650) 744300 pps (6029510 0:753695 1:753690 2:753690 3:753682 4:753690 5:753690 6:753690 7:753690) 744249 pps (6773759 0:846725 1:846725 2:846725 3:846712 4:846720 5:846720 6:846720 7:846720) 744709 pps (7518468 0:939814 1:939810 2:939810 3:939802 4:939810 5:939810 6:939810 7:939810) 744647 pps (8263115 0:1032893 1:1032890 2:1032890 3:1032882 4:1032890 5:1032890 6:1032890 7:1032890) 744672 pps (9007787 0:1125976 1:1125975 2:1125975 3:1125967 4:1125975 5:1125975 6:1125975 7:1125970) 744692 pps (9752479 0:1219065 1:1219065 2:1219062 3:1219056 4:1219060 5:1219060 6:1219060 7:1219060) 247513 pps (9999992 0:1250000 1:1250000 2:1250000 3:1249992 4:1250000 5:1250000 6:1250000 7:1250000) 99.9%! rps count ... 1118484 rps 842940 rps run4 ---- 288558 pps (288558 0:36070 1:36070 2:36070 3:36070 4:36070 5:36070 6:36070 7:36068) 744237 pps (1032795 0:129103 1:129100 2:129105 3:129100 4:129100 5:129100 6:129095 7:129095) 742988 pps (1775783 0:222135 1:222135 2:222135 3:222135 4:220853 5:222130 6:222130 7:222130) 744210 pps (2519993 0:315160 1:315160 2:315160 3:315160 4:313883 5:315160 6:315155 7:315155) 744214 pps (3264207 0:408189 1:408185 2:408185 3:408185 4:406908 5:408185 6:408185 7:408185) 744278 pps (4008485 0:501223 1:501220 2:501220 3:501220 4:499943 5:501220 6:501220 7:501220) 743699 pps (4752184 0:594252 1:594250 2:593718 3:594250 4:592973 5:594250 6:594248 7:594245) 744243 pps (5496427 0:687280 1:687280 2:686748 3:687280 4:686003 5:687280 6:687280 7:687276) 744231 pps (6240658 0:780310 1:780310 2:779778 3:780310 4:779033 5:780300 6:780310 7:780307) 743958 pps (6984616 0:873342 1:873340 2:872808 3:873340 4:872063 5:873043 6:873340 7:873340) 744241 pps (7728857 0:966373 1:966370 2:965838 3:966370 4:965093 5:966073 6:966370 7:966370) 744232 pps (8473089 0:1059400 1:1059400 2:1058868 3:1059400 4:1058123 5:1059103 6:1059397 7:1059398) 743660 pps (9216749 0:1152434 1:1152430 2:1151898 3:1152430 4:1151153 5:1151556 6:1152427 7:1152430) 744251 pps (9961000 0:1245463 1:1245460 2:1244928 3:1245460 4:1244183 5:1244586 6:1245460 7:1245460) 36317 pps (9997317 0:1250000 1:1250000 2:1249468 3:1250000 4:1248723 5:1249126 6:1250000 7:1250000) 99.9%! rps count 818552 rps 1146570 rps run 5 ---- 686211 pps (686211 0:85780 1:85780 2:85775 3:85779 4:85780 5:85780 6:85775 7:85775) 744260 pps (1430471 0:178810 1:178810 2:178810 3:178810 4:178810 5:178810 6:178806 7:178805) 744242 pps (2174713 0:271840 1:271840 2:271840 3:271840 4:271840 5:271840 6:271838 7:271835) 744241 pps (2918954 0:364870 1:364870 2:364870 3:364870 4:364870 5:364870 6:364869 7:364865) 744238 pps (3663192 0:457900 1:457900 2:457900 3:457900 4:457900 5:457900 6:457900 7:457899) 744240 pps (4407432 0:550930 1:550930 2:550930 3:550930 4:550930 5:550930 6:550927 7:550925) 744244 pps (5151676 0:643960 1:643960 2:643960 3:643960 4:643960 5:643960 6:643960 7:643956) 744236 pps (5895912 0:736990 1:736990 2:736990 3:736990 4:736990 5:736990 6:736987 7:736985) 744241 pps (6640153 0:830020 1:830020 2:830020 3:830020 4:830020 5:830020 6:830018 7:830015) 744235 pps (7384388 0:923050 1:923050 2:923050 3:923050 4:923050 5:923049 6:923045 7:923047) 744244 pps (8128632 0:1016080 1:1016080 2:1016080 3:1016080 4:1016080 5:1016080 6:1016079 7:1016075) 744231 pps (8872863 0:1109110 1:1109110 2:1109110 3:1109110 4:1109108 5:1109105 6:1109105 7:1109105) 744258 pps (9617121 0:1202141 1:1202140 2:1202140 3:1202140 4:1202140 5:1202140 6:1202140 7:1202140) 382879 pps (10000000 0:1250000 1:1250000 2:1250000 3:1250000 4:1250000 5:1250000 6:1250000 7:1250000) 100% rpsipi count .. 768383 rps 1178132 rps
On Thu, Apr 29, 2010 at 8:45 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > Changli, I wonder how you can cook "performance" patches without testing > them at all for real... This cannot be true ? > I am sorry. But I wasn't against your patch, and I just wanted to understand the test result from jamal. It is my fault submitting a performance patch without testing them. I should not reply on code inspection for the performance patch.
Eric Dumazet wrote: > Here is last 'patch of the day' for me ;) > Next one will be able to coalesce wakeup calls (they'll be delayed at > the end of net_rx_action(), like a patch I did last year to help > multicast reception) > > vger seems to be down, I suspect I'll have to resend it later. > > [PATCH net-next-2.6] net: sock_def_readable() and friends RCU conversion > > sk_callback_lock rwlock actually protects sk->sk_sleep pointer, so we > need two atomic operations (and associated dirtying) per incoming > packet. > This patch boots for me, I haven't noticed any strangeness yet. I ran a few benchmarks (the multicast fan-out mcasttest.c from last year, a few other things we have lying around). I think I see a modest improvement from this and your other 2 packets. Presumably the big wins are where multiple cores perform bh for the same socket, that's not the case in these benchmarks. If it's appropriate: Tested-by: Brian Bloniarz <bmb@athenacr.com> > Next one will be able to coalesce wakeup calls (they'll be delayed at > the end of net_rx_action(), like a patch I did last year to help > multicast reception) Keep em coming :) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric! I managed to mod your program to look conceptually similar to mine and i reproduced the results with same test kernel from yesterday. So it is likely the issue is in using epoll vs not using any async as in your case. Results attached as well as modified program. Note: the key things to remember: rps with this program gets worse over time and different net-next kernels since Apr14 (look at graph i supplied). Sorry, I am really busy-ed out to dig any further. cheers, jamal On Thu, 2010-04-29 at 16:36 -0400, jamal wrote: > On Thu, 2010-04-29 at 09:56 -0400, jamal wrote: > > > > > I will try your program instead so we can reduce the variables > > Results attached. > With your app rps does a hell lot better and non-rps worse ;-> > With my proggie, non-rps does much better than yours and rps does > a lot worse for same setup. I see the scheduler kicking quiet a bit in > non-rps for you... > > The main difference between us as i see it is: > a) i use epoll - actually linked to libevent (1.0.something) > b) I fork processes and you use pthreads. > > I dont have time to chase it today, but 1) I am either going to change > yours to use libevent or make mine get rid of it then 2) move towards > pthreads or have yours fork.. > then observe if that makes any difference.. > > > cheers, > jamal First a few runs with Eric's code + epoll/libevent ------------------------------------------------------------------------------- PerfTop: 4009 irqs/sec kernel:83.4% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- samples pcnt function DSO _______ _____ ___________________________ ____________________ 2097.00 8.6% sky2_poll [sky2] 1742.00 7.2% _raw_spin_lock_irqsave [kernel] 831.00 3.4% system_call [kernel] 654.00 2.7% copy_user_generic_string [kernel] 654.00 2.7% datagram_poll [kernel] 647.00 2.7% fget [kernel] 623.00 2.6% _raw_spin_unlock_irqrestore [kernel] 547.00 2.3% _raw_spin_lock_bh [kernel] 506.00 2.1% sys_epoll_ctl [kernel] 475.00 2.0% kmem_cache_free [kernel] 466.00 1.9% schedule [kernel] 436.00 1.8% vread_tsc [kernel].vsyscall_fn 417.00 1.7% fput [kernel] 415.00 1.7% sys_epoll_wait [kernel] 402.00 1.7% _raw_spin_lock [kernel] ------------------------------------------------------------------------------- PerfTop: 616 irqs/sec kernel:98.7% [1000Hz cycles], (all, cpu: 0) ------------------------------------------------------------------------------- samples pcnt function DSO _______ _____ ______________________ ________ 2534.00 28.6% sky2_poll [sky2] 503.00 5.7% ip_route_input [kernel] 438.00 4.9% _raw_spin_lock_irqsave [kernel] 418.00 4.7% __udp4_lib_lookup [kernel] 378.00 4.3% __alloc_skb [kernel] 364.00 4.1% ip_rcv [kernel] 323.00 3.6% _raw_spin_lock [kernel] 315.00 3.5% sock_queue_rcv_skb [kernel] 284.00 3.2% __netif_receive_skb [kernel] 281.00 3.2% __udp4_lib_rcv [kernel] 266.00 3.0% __wake_up_common [kernel] 238.00 2.7% sock_def_readable [kernel] 181.00 2.0% __kmalloc [kernel] 163.00 1.8% kmem_cache_alloc [kernel] 150.00 1.7% ep_poll_callback [kernel] ------------------------------------------------------------------------------- PerfTop: 854 irqs/sec kernel:80.2% [1000Hz cycles], (all, cpu: 2) ------------------------------------------------------------------------------- samples pcnt function DSO _______ _____ ___________________________ ____________________ 341.00 8.0% _raw_spin_lock_irqsave [kernel] 235.00 5.5% system_call [kernel] 174.00 4.1% datagram_poll [kernel] 174.00 4.1% fget [kernel] 173.00 4.1% copy_user_generic_string [kernel] 135.00 3.2% _raw_spin_unlock_irqrestore [kernel] 125.00 2.9% _raw_spin_lock_bh [kernel] 122.00 2.9% schedule [kernel] 113.00 2.6% sys_epoll_ctl [kernel] 113.00 2.6% kmem_cache_free [kernel] 108.00 2.5% vread_tsc [kernel].vsyscall_fn 105.00 2.5% sys_epoll_wait [kernel] 102.00 2.4% udp_recvmsg [kernel] 95.00 2.2% mutex_lock [kernel] Average 97.55% of 10M packets at 750Kpps Turn on rps mask ee and irq affinity to cpu0 ------------------------------------------------------------------------------- PerfTop: 3885 irqs/sec kernel:83.6% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- samples pcnt function DSO _______ _____ ______________________________ ________ 2945.00 16.7% sky2_poll [sky2] 653.00 3.7% _raw_spin_lock_irqsave [kernel] 460.00 2.6% system_call [kernel] 420.00 2.4% _raw_spin_unlock_irqrestore [kernel] 414.00 2.3% sky2_intr [sky2] 392.00 2.2% fget [kernel] 360.00 2.0% ip_rcv [kernel] 324.00 1.8% sys_epoll_ctl [kernel] 323.00 1.8% __netif_receive_skb [kernel] 310.00 1.8% schedule [kernel] 292.00 1.7% ip_route_input [kernel] 292.00 1.7% _raw_spin_lock [kernel] 291.00 1.7% copy_user_generic_string [kernel] 284.00 1.6% kmem_cache_free [kernel] 262.00 1.5% call_function_single_interrupt [kernel] ------------------------------------------------------------------------------- PerfTop: 1000 irqs/sec kernel:98.1% [1000Hz cycles], (all, cpu: 0) ------------------------------------------------------------------------------- samples pcnt function DSO _______ _____ ___________________________________ ________ 4170.00 61.9% sky2_poll [sky2] 723.00 10.7% sky2_intr [sky2] 159.00 2.4% __alloc_skb [kernel] 140.00 2.1% get_rps_cpu [kernel] 106.00 1.6% __kmalloc [kernel] 95.00 1.4% enqueue_to_backlog [kernel] 86.00 1.3% kmem_cache_alloc [kernel] 85.00 1.3% irq_entries_start [kernel] 85.00 1.3% _raw_spin_lock_irqsave [kernel] 82.00 1.2% _raw_spin_lock [kernel] 66.00 1.0% swiotlb_sync_single [kernel] 58.00 0.9% sky2_remove [sky2] 49.00 0.7% default_send_IPI_mask_sequence_phys [kernel] 47.00 0.7% sky2_rx_submit [sky2] 36.00 0.5% _raw_spin_unlock_irqrestore [kernel] ------------------------------------------------------------------------------- PerfTop: 344 irqs/sec kernel:84.3% [1000Hz cycles], (all, cpu: 2) ------------------------------------------------------------------------------- samples pcnt function DSO _______ _____ ______________________________ ____________________ 114.00 5.2% _raw_spin_lock_irqsave [kernel] 79.00 3.6% fget [kernel] 78.00 3.6% ip_rcv [kernel] 78.00 3.6% system_call [kernel] 75.00 3.4% _raw_spin_unlock_irqrestore [kernel] 67.00 3.1% sys_epoll_ctl [kernel] 65.00 3.0% schedule [kernel] 61.00 2.8% ip_route_input [kernel] 48.00 2.2% vread_tsc [kernel].vsyscall_fn 48.00 2.2% call_function_single_interrupt [kernel] 46.00 2.1% kmem_cache_free [kernel] 45.00 2.1% __netif_receive_skb [kernel] 41.00 1.9% process_recv snkudp 40.00 1.8% kfree [kernel] 39.00 1.8% _raw_spin_lock [kernel] 92.97% of 10M packets at 750Kpps Ok, so this is exactly what i saw with my app. non-rps is better. To summarize: It used to be the opposite on net-next before around Apr14. rps has gotten worse.
Le vendredi 30 avril 2010 à 15:30 -0400, jamal a écrit : > Eric! > > I managed to mod your program to look conceptually similar to mine > and i reproduced the results with same test kernel from yesterday. > So it is likely the issue is in using epoll vs not using any async as > in your case. > Results attached as well as modified program. > > Note: the key things to remember: > rps with this program gets worse over time and different net-next > kernels since Apr14 (look at graph i supplied). Sorry, I am really > busy-ed out to dig any further. > > cheers, > jamal > I am lost. I used your program, and with RPS off, I can get at most 220.000 pps with my "old" hardware. I dont understand how you can reach 700.000 pps with RPS off. Or is it with your Nehalem ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2010-04-30 at 22:40 +0200, Eric Dumazet wrote: > > I used your program, and with RPS off, I can get at most 220.000 pps > with my "old" hardware. I dont understand how you can reach 700.000 pps > with RPS off. Or is it with your Nehalem ? Yes, Nehalem. RPS off is better (~700Kpp) than RPS on(~650kpps). Are you seeing the same trend on the old hardware? cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le vendredi 30 avril 2010 à 20:06 -0400, jamal a écrit : > Yes, Nehalem. > RPS off is better (~700Kpp) than RPS on(~650kpps). Are you seeing the > same trend on the old hardware? > Of course not ! Or else RPS would be useless :( I changed your program a bit to use EV_PERSIST, (to avoid epoll_ctl() overhead for each packet...) RPS off : 220.000 pps RPS on (ee mask) : 700.000 pps (with a slightly modified tg3 driver) 96% of delivered packets This is on tg3 adapter, and tg3 has copybreak feature : small packets are copied into skb of the right size. define TG3_RX_COPY_THRESHOLD 256 -> 40 ... We really should disable this feature for RPS workload, unfortunatly ethtool cannot tweak this. So profile of cpu 0 (RPS ON) looks like : ------------------------------------------------------------------------------------------------------------------------ PerfTop: 1001 irqs/sec kernel:99.7% [1000Hz cycles], (all, cpu: 0) ------------------------------------------------------------------------------------------------------------------------ samples pcnt function DSO _______ _____ ______________________ _______ 819.00 12.6% __alloc_skb vmlinux 592.00 9.1% eth_type_trans vmlinux 509.00 7.8% _raw_spin_lock vmlinux 475.00 7.3% __kmalloc_track_caller vmlinux 358.00 5.5% tg3_read32 vmlinux 345.00 5.3% __netdev_alloc_skb vmlinux 329.00 5.0% kmem_cache_alloc vmlinux 307.00 4.7% _raw_spin_lock_irqsave vmlinux 284.00 4.4% bnx2_interrupt vmlinux 277.00 4.2% skb_pull vmlinux 248.00 3.8% tg3_poll_work vmlinux 202.00 3.1% __slab_alloc vmlinux 197.00 3.0% get_rps_cpu vmlinux 106.00 1.6% enqueue_to_backlog vmlinux 87.00 1.3% _raw_spin_lock_bh vmlinux 80.00 1.2% __copy_to_user_ll vmlinux 77.00 1.2% nommu_map_page vmlinux 77.00 1.2% __napi_gro_receive vmlinux 65.00 1.0% tg3_alloc_rx_skb vmlinux 60.00 0.9% skb_gro_reset_offset vmlinux 57.00 0.9% skb_put vmlinux 57.00 0.9% __slab_free vmlinux /* * Usage: udpsnkfrk [ -p baseport] nbports */ #include <sys/socket.h> #include <netinet/in.h> #include <arpa/inet.h> #include <string.h> #include <stdio.h> #include <errno.h> #include <unistd.h> #include <stdlib.h> #include <fcntl.h> #include <event.h> struct worker_data { struct event *snk_ev; struct event_base *base; struct timeval t; unsigned long pack_count; unsigned long bytes_count; unsigned long tout; int fd; /* move to avoid hole on 64-bit */ int pad1; unsigned long _padd[99]; /* avoid false sharing */ }; void usage(int code) { fprintf(stderr, "Usage: udpsink [-p baseport] nbports\n"); exit(code); } void process_recv(int fd, short ev, void *arg) { char buffer[4096]; struct sockaddr_in addr; socklen_t len = sizeof(addr); struct worker_data *wdata = (struct worker_data *)arg; int lu = 0; if (ev == EV_TIMEOUT) { wdata->tout++; if ((event_add(wdata->snk_ev, &wdata->t)) < 0) { perror("cb event_add"); return; } } else { do { lu = recvfrom(wdata->fd, buffer, sizeof(buffer), 0, (struct sockaddr *)&addr, &len); if (lu > 0) { wdata->pack_count++; wdata->bytes_count += lu; } } while (lu > 0); } } int prep_thread(struct worker_data *wdata) { wdata->t.tv_sec = 1; wdata->t.tv_usec = random() % 50000L; wdata->base = event_init(); event_set(wdata->snk_ev, wdata->fd, EV_READ|EV_PERSIST, process_recv, wdata); event_base_set(wdata->base, wdata->snk_ev); if ((event_add(wdata->snk_ev, &wdata->t)) < 0) { perror("event_add"); return -1; } return 0; } void *worker_func(void *arg) { struct worker_data *wdata = (struct worker_data *)arg; return (void *)event_base_loop(wdata->base, 0); } int main(int argc, char *argv[]) { int c; int baseport = 4000; int nbthreads; struct worker_data *wdata; unsigned long ototal = 0; int concurrent = 0; int verbose = 0; int i; while ((c = getopt(argc, argv, "cvp:")) != -1) { if (c == 'p') baseport = atoi(optarg); else if (c == 'c') concurrent = 1; else if (c == 'v') verbose++; else usage(1); } if (optind == argc) usage(1); nbthreads = atoi(argv[optind]); wdata = calloc(sizeof(struct worker_data), nbthreads); if (!wdata) { perror("calloc"); return 1; } for (i = 0; i < nbthreads; i++) { struct sockaddr_in addr; pthread_t tid; if (i && concurrent) { wdata[i].fd = wdata[0].fd; } else { wdata[i].snk_ev = malloc(sizeof(struct event)); if (!wdata[i].snk_ev) return 1; memset(wdata[i].snk_ev, 0, sizeof(struct event)); wdata[i].fd = socket(PF_INET, SOCK_DGRAM, 0); if (wdata[i].fd == -1) { free(wdata[i].snk_ev); perror("socket"); return 1; } memset(&addr, 0, sizeof(addr)); addr.sin_family = AF_INET; // addr.sin_addr.s_addr = inet_addr(argv[optind]); addr.sin_port = htons(baseport + i); if (bind (wdata[i].fd, (struct sockaddr *)&addr, sizeof(addr)) < 0) { free(wdata[i].snk_ev); perror("bind"); return 1; } fcntl(wdata[i].fd, F_SETFL, O_NDELAY); } if (prep_thread(wdata + i)) { printf("failed to allocate thread %d, exit\n", i); exit(0); } pthread_create(&tid, NULL, worker_func, wdata + i); } for (;;) { unsigned long total; long delta; sleep(1); total = 0; for (i = 0; i < nbthreads; i++) { total += wdata[i].pack_count; } delta = total - ototal; if (delta) { printf("%lu pps (%lu", delta, total); if (verbose) { for (i = 0; i < nbthreads; i++) { if (wdata[i].pack_count) printf(" %d:%lu", i, wdata[i].pack_count); } } printf(")\n"); } ototal = total; } } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le samedi 01 mai 2010 à 07:57 +0200, Eric Dumazet a écrit : > Le vendredi 30 avril 2010 à 20:06 -0400, jamal a écrit : > > > Yes, Nehalem. > > RPS off is better (~700Kpp) than RPS on(~650kpps). Are you seeing the > > same trend on the old hardware? > > > > Of course not ! Or else RPS would be useless :( > > I changed your program a bit to use EV_PERSIST, (to avoid epoll_ctl() > overhead for each packet...) > > RPS off : 220.000 pps > > RPS on (ee mask) : 700.000 pps (with a slightly modified tg3 driver) > 96% of delivered packets BTW, using ee mask, cpu4 is not used at _all_, even for the user threads. Scheduler does a bad job IMHO. Using fe mask, I get all packets (sent at 733311pps by my pktgen machine), and my CPU0 even has idle time !!! Limit seems to be around 800.000 pps ------------------------------------------------------------------------------------------------------------------------ PerfTop: 5616 irqs/sec kernel:93.9% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------------------------------------------------ samples pcnt function DSO _______ _____ ___________________________ _______ 3492.00 6.2% __slab_free vmlinux 2334.00 4.2% _raw_spin_lock vmlinux 2314.00 4.1% _raw_spin_lock_irqsave vmlinux 1807.00 3.2% ip_rcv vmlinux 1605.00 2.9% schedule vmlinux 1474.00 2.6% __netif_receive_skb vmlinux 1464.00 2.6% kfree vmlinux 1405.00 2.5% ip_route_input vmlinux 1318.00 2.4% __copy_to_user_ll vmlinux 1214.00 2.2% __alloc_skb vmlinux 1160.00 2.1% nf_hook_slow vmlinux 1020.00 1.8% eth_type_trans vmlinux 860.00 1.5% sched_clock_local vmlinux 775.00 1.4% read_tsc vmlinux 773.00 1.4% ipt_do_table vmlinux 766.00 1.4% _raw_spin_unlock_irqrestore vmlinux 748.00 1.3% sock_recv_ts_and_drops vmlinux 747.00 1.3% ia32_sysenter_target vmlinux 740.00 1.3% select_nohz_load_balancer vmlinux 644.00 1.2% __kmalloc_track_caller vmlinux 596.00 1.1% tg3_read32 vmlinux 566.00 1.0% __udp4_lib_lookup vmlinux -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, May 1, 2010 at 2:14 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > BTW, using ee mask, cpu4 is not used at _all_, even for the user > threads. Scheduler does a bad job IMHO. > > Using fe mask, I get all packets (sent at 733311pps by my pktgen > machine), and my CPU0 even has idle time !!! > > Limit seems to be around 800.000 pps > > ------------------------------------------------------------------------------------------------------------------------ > PerfTop: 5616 irqs/sec kernel:93.9% [1000Hz cycles], (all, 8 CPUs) > ------------------------------------------------------------------------------------------------------------------------ > Oh, cpu0 usage is about 100-(100-93.9)*8 = 51.2%(Am I right?). If we can do weighted packet distributing: cpu0's weight is 1, and other cpus are 2. maybe we can utilize all the cpu power.
Le samedi 01 mai 2010 à 18:24 +0800, Changli Gao a écrit : > On Sat, May 1, 2010 at 2:14 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > > BTW, using ee mask, cpu4 is not used at _all_, even for the user > > threads. Scheduler does a bad job IMHO. > > > > Using fe mask, I get all packets (sent at 733311pps by my pktgen > > machine), and my CPU0 even has idle time !!! > > > > Limit seems to be around 800.000 pps > > > > ------------------------------------------------------------------------------------------------------------------------ > > PerfTop: 5616 irqs/sec kernel:93.9% [1000Hz cycles], (all, 8 CPUs) > > ------------------------------------------------------------------------------------------------------------------------ > > > > Oh, cpu0 usage is about 100-(100-93.9)*8 = 51.2%(Am I right?). If we > can do weighted packet distributing: cpu0's weight is 1, and other > cpus are 2. maybe we can utilize all the cpu power. > Nope, cpu0 was at 100% in this test, other cpus were about at 50% each. weigthed would be ok if I wanted to use cpu0 in the 'slave' cpus (RPS targets). But I know the workload I am interested to, and ability to resist to DDOS, want to keep cpu0 outside of IP/TCP/UDP stack. Later, skb_pull() inline in eth_type_trans() permitted to reach 840.000 pps. top - 12:42:55 up 3:00, 2 users, load average: 0.44, 0.11, 0.03 Tasks: 126 total, 1 running, 125 sleeping, 0 stopped, 0 zombie Cpu(s): 2.2%us, 16.5%sy, 0.0%ni, 46.5%id, 11.4%wa, 0.9%hi, 22.5%si, 0.0%st Mem: 4148112k total, 211152k used, 3936960k free, 15228k buffers Swap: 4192928k total, 0k used, 4192928k free, 121804k cached You can see average idle of 46% So there is probably more optimizations to do to reach maybe 1.300.000 pps ;) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, 2010-05-01 at 07:57 +0200, Eric Dumazet wrote: > I changed your program a bit to use EV_PERSIST, (to avoid epoll_ctl() > overhead for each packet...) Thats a different test case then ;-> You can also get rid of the timer (I doubt it will show much difference in results) - I have it in there because it i am trying to replicate what i saw causing the regression. > RPS off : 220.000 pps > > RPS on (ee mask) : 700.000 pps (with a slightly modified tg3 driver) > 96% of delivered packets > That's a very very huge gap. What were the numbers before you changed to EV_PERSIST? Note: i did not add any of your other patches for dst refcnt, sockets etc. Were you running with those patches in these tests? I will try the next opportunity i get to have latest kernel + those patches. > This is on tg3 adapter, and tg3 has copybreak feature : small packets > are copied into skb of the right size. Ok, so the driver tuning is also important then (and it shows in the profile). cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, 2010-05-01 at 08:14 +0200, Eric Dumazet wrote: > BTW, using ee mask, cpu4 is not used at _all_, even for the user > threads. Scheduler does a bad job IMHO. I have the opposite frustration ;-> I did notice it got used. My goal was to totally avoid using it, for simple reason it is an SMT thread that shares same core as cpu0. In retrospect i should probably set irq affinity then to cpu0 and 4. > Using fe mask, I get all packets (sent at 733311pps by my pktgen > machine), and my CPU0 even has idle time !!! I will try this next time i get the chance. cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le samedi 01 mai 2010 à 07:23 -0400, jamal a écrit : > On Sat, 2010-05-01 at 07:57 +0200, Eric Dumazet wrote: > > > I changed your program a bit to use EV_PERSIST, (to avoid epoll_ctl() > > overhead for each packet...) > > Thats a different test case then ;-> You can also get rid of the timer > (I doubt it will show much difference in results) - I have it in there > because it i am trying to replicate what i saw causing the regression. > > > RPS off : 220.000 pps > > > > RPS on (ee mask) : 700.000 pps (with a slightly modified tg3 driver) > > 96% of delivered packets > > > > That's a very very huge gap. What were the numbers before you changed to > EV_PERSIST? But, whole point of epoll is to not change interest each time you get an event. Without EV_PERSIST, you need two more syscalls per recvfrom() epoll_wait() epoll_ctl(REMOVE) epoll_ctl(ADD) recvfrom() Even poll() would be faster in your case poll(one fd) recvfrom() > Note: i did not add any of your other patches for dst refcnt, sockets > etc. Were you running with those patches in these tests? I will try the > next opportunity i get to have latest kernel + those patches. > > > This is on tg3 adapter, and tg3 has copybreak feature : small packets > > are copied into skb of the right size. > > Ok, so the driver tuning is also important then (and it shows in the > profile). I always thought copybreak was borderline... It can help to reduce memory footprint (allocating 128 bytes instead of 2048/4096 bytes per frame), but with RPS, it would make sense to perform copybreak after RPS, not before. Reducing memory footprint also means less changes on udp_memory_allocated /tcp_memory_allocate (memory reclaim logic) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, 2010-05-01 at 13:42 +0200, Eric Dumazet wrote: > But, whole point of epoll is to not change interest each time you get an > event. > > Without EV_PERSIST, you need two more syscalls per recvfrom() > > epoll_wait() > epoll_ctl(REMOVE) > epoll_ctl(ADD) > recvfrom() > > Even poll() would be faster in your case > > poll(one fd) > recvfrom() > This is true - but my goal was/is to replicate the regression i was seeing[1]. I will try with PERSIST next opportunity. If it gets better then it is something that needs documentation in the doc Tom promised ;-> > I always thought copybreak was borderline... > It can help to reduce memory footprint (allocating 128 bytes instead of > 2048/4096 bytes per frame), but with RPS, it would make sense to perform > copybreak after RPS, not before. > > Reducing memory footprint also means less changes on > udp_memory_allocated /tcp_memory_allocate (memory reclaim logic) Indeed, something that didnt cross my mind in the rush to test - it is one of those things that need to be mentioned in some doc somewhere. Tom, are you listening? ;-> cheers, jamal [1]i.e with this program rps was getting worse (it was much better before say net-next of apr14) and that non-rps has been getting better numbers since. The regression is real - but it is likely in another subsystem. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le samedi 01 mai 2010 à 07:56 -0400, jamal a écrit : > > [1]i.e with this program rps was getting worse (it was much better > before say net-next of apr14) and that non-rps has been getting better > numbers since. The regression is real - but it is likely in another > subsystem. > You must understand that the whole 'bench' is mostly governed by scheduler artifacts. The regression you mention is probably a side effect. By slowing down one part, its possible to zap all calls to scheduler and go maybe 300% faster (Because consumer threads can avoid 3/4 of the time to schedule) Reciprocally, optimizing one part of the network stack might make threads hitting an empty queue, and need to call more often the scheduler. This is why some higly specialized programs never block/schedule and perform busy loops instead. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, 2010-05-01 at 15:22 +0200, Eric Dumazet wrote: > You must understand that the whole 'bench' is mostly governed by > scheduler artifacts. The regression you mention is probably a side > effect. likely. > By slowing down one part, its possible to zap all calls to scheduler and > go maybe 300% faster (Because consumer threads can avoid 3/4 of the time > to schedule) > > Reciprocally, optimizing one part of the network stack might make > threads hitting an empty queue, and need to call more often the > scheduler. It is fair to say that what i am seeing is _not_ fatal because it is rps that is regressing; non-rps is fine. I would consider non-rps to be the common use scenario and if that was doing badly then it is a problem. The good news is it is getting better - likely because of some changes made on behalf of rps ;-> With rps, one could follow some instructions on how to make it better. I am hoping that some of the system "magic" is documented as Tom mentioned he will. > This is why some higly specialized programs never block/schedule and > perform busy loops instead. Agreed. My brain cells should learn to accept this fact ;-> cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, 2010-05-01 at 07:56 -0400, jamal wrote: > On Sat, 2010-05-01 at 13:42 +0200, Eric Dumazet wrote: > > > But, whole point of epoll is to not change interest each time you get an > > event. > > > > Without EV_PERSIST, you need two more syscalls per recvfrom() > > > > epoll_wait() > > epoll_ctl(REMOVE) > > epoll_ctl(ADD) > > recvfrom() > > > > Even poll() would be faster in your case > > > > poll(one fd) > > recvfrom() > > > > This is true - but my goal was/is to replicate the regression i was > seeing[1]. > I will try with PERSIST next opportunity. If it gets better > then it is something that needs documentation in the doc Tom > promised ;-> I tried it with PERSIST and today's net-next and you are right: rps was better compared with (99.4% vs 98.1% of 750Kpps). If however i removed the PERSIST i.e both rps and non-rps have two extra syscalls, again rps performed worse (93.2% vs 97.8% of 750Kpps). Eric, I know the answer is not to do the non-PERSIST mode for rps ;-> But lets just ignore that for a sec: what the heck is going on? I would expect the degradation to be the same for both non-rps. I also wanna do the broken record reminder that kernels before net-next of Apr14 were doing about 97% (as opposed to 93% currently for same test). cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/net/sock.h b/include/net/sock.h index cf12b1e..d361c77 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1021,6 +1021,16 @@ extern void release_sock(struct sock *sk); SINGLE_DEPTH_NESTING) #define bh_unlock_sock(__sk) spin_unlock(&((__sk)->sk_lock.slock)) +static inline void lock_sock_bh(struct sock *sk) +{ + spin_lock_bh(&sk->sk_lock.slock); +} + +static inline void unlock_sock_bh(struct sock *sk) +{ + spin_unlock_bh(&sk->sk_lock.slock); +} + extern struct sock *sk_alloc(struct net *net, int family, gfp_t priority, struct proto *prot); diff --git a/net/core/datagram.c b/net/core/datagram.c index 5574a5d..95b851f 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -229,9 +229,13 @@ EXPORT_SYMBOL(skb_free_datagram); void skb_free_datagram_locked(struct sock *sk, struct sk_buff *skb) { - lock_sock(sk); - skb_free_datagram(sk, skb); - release_sock(sk); + lock_sock_bh(sk); + skb_orphan(skb); + sk_mem_reclaim_partial(sk); + unlock_sock_bh(sk); + + /* skb is now orphaned, might be freed outside of locked section */ + consume_skb(skb); } EXPORT_SYMBOL(skb_free_datagram_locked); diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 63eb56b..1f86965 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1062,10 +1062,10 @@ static unsigned int first_packet_length(struct sock *sk) spin_unlock_bh(&rcvq->lock); if (!skb_queue_empty(&list_kill)) { - lock_sock(sk); + lock_sock_bh(sk); __skb_queue_purge(&list_kill); sk_mem_reclaim_partial(sk); - release_sock(sk); + unlock_sock_bh(sk); } return res; } @@ -1196,10 +1196,10 @@ out: return err; csum_copy_err: - lock_sock(sk); + lock_sock_bh(sk); if (!skb_kill_datagram(sk, skb, flags)) UDP_INC_STATS_USER(sock_net(sk), UDP_MIB_INERRORS, is_udplite); - release_sock(sk); + unlock_sock_bh(sk); if (noblock) return -EAGAIN; @@ -1624,9 +1624,9 @@ int udp_rcv(struct sk_buff *skb) void udp_destroy_sock(struct sock *sk) { - lock_sock(sk); + lock_sock_bh(sk); udp_flush_pending_frames(sk); - release_sock(sk); + unlock_sock_bh(sk); } /* diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 3ead20a..91c60f0 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -424,7 +424,7 @@ out: return err; csum_copy_err: - lock_sock(sk); + lock_sock_bh(sk); if (!skb_kill_datagram(sk, skb, flags)) { if (is_udp4) UDP_INC_STATS_USER(sock_net(sk), @@ -433,7 +433,7 @@ csum_copy_err: UDP6_INC_STATS_USER(sock_net(sk), UDP_MIB_INERRORS, is_udplite); } - release_sock(sk); + unlock_sock_bh(sk); if (flags & MSG_DONTWAIT) return -EAGAIN;