Message ID | 4AC710DF.5070705@gmail.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
Eric Dumazet a écrit : > Eric Dumazet a écrit : >> Rafael J. Wysocki a écrit : >>> This message has been generated automatically as a part of a report >>> of regressions introduced between 2.6.30 and 2.6.31. >>> >>> The following bug entry is on the current list of known regressions >>> introduced between 2.6.30 and 2.6.31. Please verify if it still should >>> be listed and let me know (either way). >>> >>> >>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14301 >>> Subject : WARNING: at net/ipv4/af_inet.c:154 >>> Submitter : Ralf Hildebrandt <Ralf.Hildebrandt@charite.de> >>> Date : 2009-09-30 12:24 (2 days old) >>> References : http://marc.info/?l=linux-kernel&m=125431350218137&w=4 >>> >>> >> If commit d99927f4d93f36553699573b279e0ff98ad7dea6 >> (net: Fix sock_wfree() race) doesnt fix this problem, then >> maybe we should take a look at an old patch. >> >> < data mining... running... output results to lkml/netdev > >> >> Random guesses >> >> 1) : commit d55d87fdff8252d0e2f7c28c2d443aee17e9d70f >> (net: Move rx skb_orphan call to where needed) >> >> A similar problem on SCTP was fixed by commit >> 1bc4ee4088c9a502db0e9c87f675e61e57fa1734 >> (sctp: fix warning at inet_sock_destruct() while release sctp socket) >> >> 2) CORK and UDP sockets >> It seems we can leave an UDP socket with a frame in sk_write_queue >> Purge of this queue is done by udp_flush_pending_frames() >> This calls ip_flush_pending_frames() >> But this function only calls kfree_skb(), not sk_wmem_free_skb()... >> >> >> Could you try following patch ? >> > > Hmm, I missed the ip_cork_release(), here is an updated version. > Please ignore this patch, I was wrong, sk_forward_alloc is not used on xmit side for udp, only receive side. CORK/UDP should be fine Investigation still needed... -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet a écrit : > Eric Dumazet a écrit : >> Eric Dumazet a écrit : >>> Rafael J. Wysocki a écrit : >>>> This message has been generated automatically as a part of a report >>>> of regressions introduced between 2.6.30 and 2.6.31. >>>> >>>> The following bug entry is on the current list of known regressions >>>> introduced between 2.6.30 and 2.6.31. Please verify if it still should >>>> be listed and let me know (either way). >>>> >>>> >>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14301 >>>> Subject : WARNING: at net/ipv4/af_inet.c:154 >>>> Submitter : Ralf Hildebrandt <Ralf.Hildebrandt@charite.de> >>>> Date : 2009-09-30 12:24 (2 days old) >>>> References : http://marc.info/?l=linux-kernel&m=125431350218137&w=4 >>>> > > Investigation still needed... > OK, my last (buggy ???) feeling is about commit 95766fff6b9a78d1 [UDP]: Add memory accounting. (Its a two years old patch, oh well...) Problem is the udp_poll() : We check the first frame to be dequeued from sk_receive_queue has a good checksum. If it doesnt, we drop the frame ( calling kfree_skb(skb); ) Problem is now we perform memory accounting on UDP, this kfree_skb() should be done with socket locked, but we are allowed to call lock_sock() from this udp_poll() context unsigned int udp_poll(struct file *file, struct socket *sock, poll_table *wait) { unsigned int mask = datagram_poll(file, sock, wait); struct sock *sk = sock->sk; int is_lite = IS_UDPLITE(sk); /* Check for false positives due to checksum errors */ if ((mask & POLLRDNORM) && !(file->f_flags & O_NONBLOCK) && !(sk->sk_shutdown & RCV_SHUTDOWN)) { struct sk_buff_head *rcvq = &sk->sk_receive_queue; struct sk_buff *skb; spin_lock_bh(&rcvq->lock); while ((skb = skb_peek(rcvq)) != NULL && udp_lib_checksum_complete(skb)) { UDP_INC_STATS_BH(sock_net(sk), UDP_MIB_INERRORS, is_lite); __skb_unlink(skb, rcvq); <<HERE>> kfree_skb(skb); } spin_unlock_bh(&rcvq->lock); David, Herbert, any idea how to solve this problem ? 1) Allow false positives Or 2) Maybe we should finally convert sk_forward_alloc to an atomic_t after all... This would make things easier, and speedup UDP (no more need to lock_sock()) Or 3) ??? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/net/ip.h b/include/net/ip.h index 2f47e54..c8d8828 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -117,6 +117,7 @@ extern int ip_generic_getfrag(void *from, char *to, int offset, int len, int od extern ssize_t ip_append_page(struct sock *sk, struct page *page, int offset, size_t size, int flags); extern int ip_push_pending_frames(struct sock *sk); +extern void ip_cork_release(struct inet_sock *); extern void ip_flush_pending_frames(struct sock *sk); /* datagram.c */ diff --git a/include/net/sock.h b/include/net/sock.h index 1621935..7c80fec 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -882,6 +882,16 @@ static inline void sk_wmem_free_skb(struct sock *sk, struct sk_buff *skb) __kfree_skb(skb); } +/* write queue abstraction */ +static inline void sk_write_queue_purge(struct sock *sk) +{ + struct sk_buff *skb; + + while ((skb = __skb_dequeue(&sk->sk_write_queue)) != NULL) + sk_wmem_free_skb(sk, skb); + sk_mem_reclaim(sk); +} + /* Used by processes to "lock" a socket state, so that * interrupts and bottom half handlers won't change it * from under us. It essentially blocks any incoming diff --git a/include/net/tcp.h b/include/net/tcp.h index 03a49c7..4c7036a 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1220,16 +1220,6 @@ static inline void tcp_put_md5sig_pool(void) put_cpu(); } -/* write queue abstraction */ -static inline void tcp_write_queue_purge(struct sock *sk) -{ - struct sk_buff *skb; - - while ((skb = __skb_dequeue(&sk->sk_write_queue)) != NULL) - sk_wmem_free_skb(sk, skb); - sk_mem_reclaim(sk); -} - static inline struct sk_buff *tcp_write_queue_head(struct sock *sk) { return skb_peek(&sk->sk_write_queue); diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 64d0af6..0124f5b 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1992,7 +1992,7 @@ int tcp_disconnect(struct sock *sk, int flags) tcp_clear_xmit_timers(sk); __skb_queue_purge(&sk->sk_receive_queue); - tcp_write_queue_purge(sk); + sk_write_queue_purge(sk); __skb_queue_purge(&tp->out_of_order_queue); #ifdef CONFIG_NET_DMA __skb_queue_purge(&sk->sk_async_wait_queue); diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 7cda24b..76e59df 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1845,7 +1845,7 @@ void tcp_v4_destroy_sock(struct sock *sk) tcp_cleanup_congestion_control(sk); /* Cleanup up the write buffer. */ - tcp_write_queue_purge(sk); + sk_write_queue_purge(sk); /* Cleans up our, hopefully empty, out_of_order_queue. */ __skb_queue_purge(&tp->out_of_order_queue); diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 6ec6a8a..b6370d0 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -464,7 +464,8 @@ void udp_flush_pending_frames(struct sock *sk) if (up->pending) { up->len = 0; up->pending = 0; - ip_flush_pending_frames(sk); + sk_write_queue_purge(sk); + ip_cork_release(inet_sk(sk)); } } EXPORT_SYMBOL(udp_flush_pending_frames);