diff mbox

[Bug,#14301] WARNING: at net/ipv4/af_inet.c:154

Message ID 4AC710DF.5070705@gmail.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Oct. 3, 2009, 8:52 a.m. UTC
Eric Dumazet a écrit :
> Rafael J. Wysocki a écrit :
>> This message has been generated automatically as a part of a report
>> of regressions introduced between 2.6.30 and 2.6.31.
>>
>> The following bug entry is on the current list of known regressions
>> introduced between 2.6.30 and 2.6.31.  Please verify if it still should
>> be listed and let me know (either way).
>>
>>
>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=14301
>> Subject		: WARNING: at net/ipv4/af_inet.c:154
>> Submitter	: Ralf Hildebrandt <Ralf.Hildebrandt@charite.de>
>> Date		: 2009-09-30 12:24 (2 days old)
>> References	: http://marc.info/?l=linux-kernel&m=125431350218137&w=4
>>
>>
> 
> If commit d99927f4d93f36553699573b279e0ff98ad7dea6
> (net: Fix sock_wfree() race) doesnt fix this problem, then
> maybe we should take a look at an old patch.
> 
> < data mining... running... output results to lkml/netdev >
> 
> Random guesses
> 
>  1) : commit d55d87fdff8252d0e2f7c28c2d443aee17e9d70f
> (net: Move rx skb_orphan call to where needed)
> 
> A similar problem on SCTP was fixed by commit 
> 1bc4ee4088c9a502db0e9c87f675e61e57fa1734
> (sctp: fix warning at inet_sock_destruct() while release sctp socket)
> 
> 2) CORK and UDP sockets
>   It seems we can leave an UDP socket with a frame in sk_write_queue
>   Purge of this queue is done by udp_flush_pending_frames()
>    This calls ip_flush_pending_frames()
>    But this function only calls kfree_skb(), not sk_wmem_free_skb()...
> 
> 
> Could you try following patch ?
> 

Hmm, I missed the ip_cork_release(), here is an updated version.


[PATCH] net: UDP should not use ip_flush_pending_frames()

Now xmit UDP messages are charged, we must take care of calling right
skb freeing function.

In case a close() is performed on a socket where CORKED frame
is still queued in sk_write_queue, calling ip_flush_pending_frames()
leads to sk_forward_alloc leak.

Fix this by calling sk_write_queue_purge() and ip_cork_release()
instead.

Reported-by: Ralf Hildebrandt <Ralf.Hildebrandt@charite.de>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/ip.h    |    1 +
 include/net/sock.h  |   10 ++++++++++
 include/net/tcp.h   |   10 ----------
 net/ipv4/tcp.c      |    2 +-
 net/ipv4/tcp_ipv4.c |    2 +-
 net/ipv4/udp.c      |    3 ++-
 6 files changed, 15 insertions(+), 13 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Eric Dumazet Oct. 3, 2009, 5:53 p.m. UTC | #1
Eric Dumazet a écrit :
> Eric Dumazet a écrit :
>> Rafael J. Wysocki a écrit :
>>> This message has been generated automatically as a part of a report
>>> of regressions introduced between 2.6.30 and 2.6.31.
>>>
>>> The following bug entry is on the current list of known regressions
>>> introduced between 2.6.30 and 2.6.31.  Please verify if it still should
>>> be listed and let me know (either way).
>>>
>>>
>>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=14301
>>> Subject		: WARNING: at net/ipv4/af_inet.c:154
>>> Submitter	: Ralf Hildebrandt <Ralf.Hildebrandt@charite.de>
>>> Date		: 2009-09-30 12:24 (2 days old)
>>> References	: http://marc.info/?l=linux-kernel&m=125431350218137&w=4
>>>
>>>
>> If commit d99927f4d93f36553699573b279e0ff98ad7dea6
>> (net: Fix sock_wfree() race) doesnt fix this problem, then
>> maybe we should take a look at an old patch.
>>
>> < data mining... running... output results to lkml/netdev >
>>
>> Random guesses
>>
>>  1) : commit d55d87fdff8252d0e2f7c28c2d443aee17e9d70f
>> (net: Move rx skb_orphan call to where needed)
>>
>> A similar problem on SCTP was fixed by commit 
>> 1bc4ee4088c9a502db0e9c87f675e61e57fa1734
>> (sctp: fix warning at inet_sock_destruct() while release sctp socket)
>>
>> 2) CORK and UDP sockets
>>   It seems we can leave an UDP socket with a frame in sk_write_queue
>>   Purge of this queue is done by udp_flush_pending_frames()
>>    This calls ip_flush_pending_frames()
>>    But this function only calls kfree_skb(), not sk_wmem_free_skb()...
>>
>>
>> Could you try following patch ?
>>
> 
> Hmm, I missed the ip_cork_release(), here is an updated version.
> 

Please ignore this patch, I was wrong, sk_forward_alloc is not used
on xmit side for udp, only receive side. CORK/UDP should be fine

Investigation still needed...


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Oct. 7, 2009, 3:41 p.m. UTC | #2
Eric Dumazet a écrit :
> Eric Dumazet a écrit :
>> Eric Dumazet a écrit :
>>> Rafael J. Wysocki a écrit :
>>>> This message has been generated automatically as a part of a report
>>>> of regressions introduced between 2.6.30 and 2.6.31.
>>>>
>>>> The following bug entry is on the current list of known regressions
>>>> introduced between 2.6.30 and 2.6.31.  Please verify if it still should
>>>> be listed and let me know (either way).
>>>>
>>>>
>>>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=14301
>>>> Subject		: WARNING: at net/ipv4/af_inet.c:154
>>>> Submitter	: Ralf Hildebrandt <Ralf.Hildebrandt@charite.de>
>>>> Date		: 2009-09-30 12:24 (2 days old)
>>>> References	: http://marc.info/?l=linux-kernel&m=125431350218137&w=4
>>>>
> 
> Investigation still needed...
> 

OK, my last (buggy ???) feeling is about commit 95766fff6b9a78d1

[UDP]: Add memory accounting.

(Its a two years old patch, oh well...)

Problem is the udp_poll() :

We check the first frame to be dequeued from sk_receive_queue has a good checksum.

If it doesnt, we drop the frame ( calling kfree_skb(skb); )

Problem is now we perform memory accounting on UDP, this kfree_skb()
should be done with socket locked, but we are allowed to
call lock_sock() from this udp_poll() context

unsigned int udp_poll(struct file *file, struct socket *sock, poll_table *wait)
{
        unsigned int mask = datagram_poll(file, sock, wait);
        struct sock *sk = sock->sk;
        int     is_lite = IS_UDPLITE(sk);

        /* Check for false positives due to checksum errors */
        if ((mask & POLLRDNORM) &&
            !(file->f_flags & O_NONBLOCK) &&
            !(sk->sk_shutdown & RCV_SHUTDOWN)) {
                struct sk_buff_head *rcvq = &sk->sk_receive_queue;
                struct sk_buff *skb;

                spin_lock_bh(&rcvq->lock);
                while ((skb = skb_peek(rcvq)) != NULL &&
                       udp_lib_checksum_complete(skb)) {
                        UDP_INC_STATS_BH(sock_net(sk),
                                        UDP_MIB_INERRORS, is_lite);
                        __skb_unlink(skb, rcvq);
<<HERE>>                kfree_skb(skb);
                }
                spin_unlock_bh(&rcvq->lock);



David, Herbert, any idea how to solve this problem ?

1) Allow false positives

Or

2) Maybe we should finally convert sk_forward_alloc to an atomic_t after all...
   This would make things easier, and speedup UDP (no more need to lock_sock())

Or 

3) ???

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/net/ip.h b/include/net/ip.h
index 2f47e54..c8d8828 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -117,6 +117,7 @@  extern int		ip_generic_getfrag(void *from, char *to, int offset, int len, int od
 extern ssize_t		ip_append_page(struct sock *sk, struct page *page,
 				int offset, size_t size, int flags);
 extern int		ip_push_pending_frames(struct sock *sk);
+extern void     ip_cork_release(struct inet_sock *);
 extern void		ip_flush_pending_frames(struct sock *sk);
 
 /* datagram.c */
diff --git a/include/net/sock.h b/include/net/sock.h
index 1621935..7c80fec 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -882,6 +882,16 @@  static inline void sk_wmem_free_skb(struct sock *sk, struct sk_buff *skb)
 	__kfree_skb(skb);
 }
 
+/* write queue abstraction */
+static inline void sk_write_queue_purge(struct sock *sk)
+{
+	struct sk_buff *skb;
+
+	while ((skb = __skb_dequeue(&sk->sk_write_queue)) != NULL)
+		sk_wmem_free_skb(sk, skb);
+	sk_mem_reclaim(sk);
+}
+
 /* Used by processes to "lock" a socket state, so that
  * interrupts and bottom half handlers won't change it
  * from under us. It essentially blocks any incoming
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 03a49c7..4c7036a 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1220,16 +1220,6 @@  static inline void		tcp_put_md5sig_pool(void)
 	put_cpu();
 }
 
-/* write queue abstraction */
-static inline void tcp_write_queue_purge(struct sock *sk)
-{
-	struct sk_buff *skb;
-
-	while ((skb = __skb_dequeue(&sk->sk_write_queue)) != NULL)
-		sk_wmem_free_skb(sk, skb);
-	sk_mem_reclaim(sk);
-}
-
 static inline struct sk_buff *tcp_write_queue_head(struct sock *sk)
 {
 	return skb_peek(&sk->sk_write_queue);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 64d0af6..0124f5b 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1992,7 +1992,7 @@  int tcp_disconnect(struct sock *sk, int flags)
 
 	tcp_clear_xmit_timers(sk);
 	__skb_queue_purge(&sk->sk_receive_queue);
-	tcp_write_queue_purge(sk);
+	sk_write_queue_purge(sk);
 	__skb_queue_purge(&tp->out_of_order_queue);
 #ifdef CONFIG_NET_DMA
 	__skb_queue_purge(&sk->sk_async_wait_queue);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 7cda24b..76e59df 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1845,7 +1845,7 @@  void tcp_v4_destroy_sock(struct sock *sk)
 	tcp_cleanup_congestion_control(sk);
 
 	/* Cleanup up the write buffer. */
-	tcp_write_queue_purge(sk);
+	sk_write_queue_purge(sk);
 
 	/* Cleans up our, hopefully empty, out_of_order_queue. */
 	__skb_queue_purge(&tp->out_of_order_queue);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 6ec6a8a..b6370d0 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -464,7 +464,8 @@  void udp_flush_pending_frames(struct sock *sk)
 	if (up->pending) {
 		up->len = 0;
 		up->pending = 0;
-		ip_flush_pending_frames(sk);
+		sk_write_queue_purge(sk);
+		ip_cork_release(inet_sk(sk));
 	}
 }
 EXPORT_SYMBOL(udp_flush_pending_frames);