Message ID | 492C919E.3050108@cosmosbay.com |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
From: Eric Dumazet <dada1@cosmosbay.com> Date: Wed, 26 Nov 2008 01:00:30 +0100 > Hum... for a tcp connection, dst refcount should already be pinned > by a sk->sk_dst_cache. Maybe test refcount value, and if this value > is > 1, dont take a reference. (given rcu_read_lock() is done before > calling ip_rcv_finish()) Input route is different from output route. sk->sk_dst_cache holds the output route. So I can't see how this can help here. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <dada1@cosmosbay.com> Date: Wed, 26 Nov 2008 01:00:30 +0100 > In the meantime, what do you think of the following patch ? > > [PATCH] net: release skb->dst in sock_queue_rcv_skb() > > When queuing a skb to sk->sk_receive_queue, we can release its dst, not > anymore needed. > Since current cpu did the dst_hold(), refcount is probably still hot > int this cpu caches. > > This avoids readers to access the original dst to decrement its refcount, > possibly a long time after packet reception. This should speedup UDP > and RAW receive path. > > Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> I guess the idea is that if we release quickly we'll not have to reget the cacheline in owned state. I wonder if this might actually slightly hurt loads like tbench where we are banging on the refcnt constantly on every cpu anyways. Can you do a quick check? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Miller a écrit : > From: Eric Dumazet <dada1@cosmosbay.com> > Date: Wed, 26 Nov 2008 01:00:30 +0100 > >> In the meantime, what do you think of the following patch ? >> >> [PATCH] net: release skb->dst in sock_queue_rcv_skb() >> >> When queuing a skb to sk->sk_receive_queue, we can release its dst, not >> anymore needed. >> Since current cpu did the dst_hold(), refcount is probably still hot >> int this cpu caches. >> >> This avoids readers to access the original dst to decrement its refcount, >> possibly a long time after packet reception. This should speedup UDP >> and RAW receive path. >> >> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> > > I guess the idea is that if we release quickly we'll not have > to reget the cacheline in owned state. Yes, this is the idea. > > I wonder if this might actually slightly hurt loads like tbench where > we are banging on the refcnt constantly on every cpu anyways. Yes, the only way to reduce the load on this case is not moving the increments/decrements : It could help on some machines, and hurt on others. In the long term, only reducing the number of dirtying can help the average. > > Can you do a quick check? > > Sure I can do a check :) No impact at all on tbench, unless this bench can run in UDP mode :) CPU: Core 2, speed 3000.1 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000 samples cum. samples % cum. % symbol name 599983 599983 11.2948 11.2948 copy_from_user 549349 1149332 10.3416 21.6364 ipt_do_table 245697 1395029 4.6253 26.2617 copy_to_user 221663 1616692 4.1729 30.4346 schedule 144888 1761580 2.7275 33.1621 tcp_sendmsg 136967 1898547 2.5784 35.7406 tcp_ack 115513 2014060 2.1746 37.9151 tcp_transmit_skb 99517 2113577 1.8734 39.7885 sysenter_past_esp 99438 2213015 1.8719 41.6605 ip_queue_xmit 98171 2311186 1.8481 43.5086 tcp_recvmsg 80763 2391949 1.5204 45.0290 __switch_to 79621 2471570 1.4989 46.5278 tcp_v4_rcv 77210 2548780 1.4535 47.9813 dst_release 69387 2618167 1.3062 49.2876 tcp_rcv_established 64709 2682876 1.2182 50.5057 __tcp_push_pending_frames 55223 2738099 1.0396 51.5453 lock_sock_nested 53754 2791853 1.0119 52.5572 sys_socketcall 50092 2841945 0.9430 53.5002 netif_receive_skb 49499 2891444 0.9318 54.4321 release_sock 47796 2939240 0.8998 55.3318 __inet_lookup_established 45162 2984402 0.8502 56.1820 update_curr 44895 3029297 0.8452 57.0272 ip_rcv 42945 3072242 0.8084 57.8356 dev_queue_xmit 42892 3115134 0.8075 58.6431 tcp_event_data_recv 42768 3157902 0.8051 59.4482 local_bh_enable 41555 3199457 0.7823 60.2305 netif_rx 38613 3238070 0.7269 60.9574 __alloc_skb 38016 3276086 0.7157 61.6730 ip_finish_output 36867 3312953 0.6940 62.3671 tcp_current_mss 36759 3349712 0.6920 63.0591 skb_release_data 35560 3385272 0.6694 63.7285 local_bh_enable_ip 34100 3419372 0.6419 64.3704 sock_recvmsg 33829 3453201 0.6368 65.0073 __kfree_skb 32949 3486150 0.6203 65.6275 sched_clock_cpu -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <dada1@cosmosbay.com> Date: Wed, 26 Nov 2008 08:39:30 +0100 > David Miller a écrit : > > From: Eric Dumazet <dada1@cosmosbay.com> > > Date: Wed, 26 Nov 2008 01:00:30 +0100 > > > > Can you do a quick check? > > > > Sure I can do a check :) > > No impact at all on tbench, unless this bench can run in UDP mode :) Durrr, of course! Patch applied, thanks Eric. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, On Wed, 2008-11-26 at 01:00 +0100, Eric Dumazet wrote: > [PATCH] net: release skb->dst in sock_queue_rcv_skb() > > When queuing a skb to sk->sk_receive_queue, we can release its dst, not > anymore needed. > Since current cpu did the dst_hold(), refcount is probably still hot > int this cpu caches. > > This avoids readers to access the original dst to decrement its refcount, > possibly a long time after packet reception. This should speedup UDP > and RAW receive path. > > Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> > plain text document attachment (sock_queue_rcv_skb.patch) > diff --git a/net/core/sock.c b/net/core/sock.c > index a4e840e..b287645 100644 > --- a/net/core/sock.c > +++ b/net/core/sock.c > @@ -289,7 +289,11 @@ int sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb) > > skb->dev = NULL; > skb_set_owner_r(skb, sk); > - > + /* > + * release dst right now while its hot > + */ > + dst_release(skb->dst); > + skb->dst = NULL; IP_PKTINFO cmsg data is one post-queueing user: static void ip_cmsg_recv_pktinfo(struct msghdr *msg, struct sk_buff *skb) { struct in_pktinfo info; struct rtable *rt = skb->rtable; info.ipi_addr.s_addr = ip_hdr(skb)->daddr; if (rt) { info.ipi_ifindex = rt->rt_iif; info.ipi_spec_dst.s_addr = rt->rt_spec_dst; } else { info.ipi_ifindex = 0; info.ipi_spec_dst.s_addr = 0; } put_cmsg(msg, SOL_IP, IP_PKTINFO, sizeof(info), &info); } (i.e. skb->rtable is NULL at this point) I'm seeing dnsmasq not working on net-next-2.6 because of this and reverting commit 7035560 makes things work as expected again. Cheers, Mark. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Mark McLoughlin <markmc@redhat.com> Date: Wed, 17 Dec 2008 11:25:01 +0000 > On Wed, 2008-11-26 at 01:00 +0100, Eric Dumazet wrote: > > > [PATCH] net: release skb->dst in sock_queue_rcv_skb() ... > IP_PKTINFO cmsg data is one post-queueing user: Eric, we'll need to rever this change I think. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/core/sock.c b/net/core/sock.c index a4e840e..b287645 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -289,7 +289,11 @@ int sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb) skb->dev = NULL; skb_set_owner_r(skb, sk); - + /* + * release dst right now while its hot + */ + dst_release(skb->dst); + skb->dst = NULL; /* Cache the SKB length before we tack it onto the receive * queue. Once it is added it no longer belongs to us and * may be freed by other threads of control pulling packets