diff mbox

tcp_use_frto crashes on empty tcp_write_queue

Message ID 511C0127.1040604@citrix.com
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

Zoltan Kiss Feb. 13, 2013, 9:09 p.m. UTC
Hi,

I see the following WARN and then crash on 2.6.32.12:

<4>WARNING: at net/ipv4/tcp_timer.c:293 tcp_retransmit_timer+0x5dd/0x630()

...

<1>BUG: unable to handle kernel NULL pointer dereference at (null)
<1>IP: [<c033f2b5>] tcp_use_frto+0x45/0x90
...
<0>Call Trace:
<4> [<c0343ab9>] ? tcp_retransmit_timer+0xd9/0x630
<4> [<c0120e58>] ? __wake_up_common+0x48/0x70
<4> [<c0344580>] ? tcp_write_timer+0xe0/0x1a0
<4> [<c0137fe1>] ? run_timer_softirq+0x151/0x200
<4> [<c02af069>] ? maybe_schedule_tx_action+0x39/0x40
<4> [<c03444a0>] ? tcp_write_timer+0x0/0x1a0
<4> [<c013359a>] ? __do_softirq+0xba/0x180
<4> [<c015e7a7>] ? move_native_irq+0x47/0x50
...

I've checked the code, tcp_use_frto() crashes because

skb = tcp_write_queue_head(sk);

returns a NULL, as the queue is empty, and in the next line:

if (tcp_skb_is_last(sk, skb))


Credit goes to Frediano for the patch.

Regards,

Zoli
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Eric Dumazet Feb. 13, 2013, 9:37 p.m. UTC | #1
On Wed, 2013-02-13 at 21:09 +0000, Zoltan Kiss wrote:
> Hi,
> 
> I see the following WARN and then crash on 2.6.32.12:
> 
> <4>WARNING: at net/ipv4/tcp_timer.c:293 tcp_retransmit_timer+0x5dd/0x630()
> 
> ...
> 
> <1>BUG: unable to handle kernel NULL pointer dereference at (null)
> <1>IP: [<c033f2b5>] tcp_use_frto+0x45/0x90
> ...
> <0>Call Trace:
> <4> [<c0343ab9>] ? tcp_retransmit_timer+0xd9/0x630
> <4> [<c0120e58>] ? __wake_up_common+0x48/0x70
> <4> [<c0344580>] ? tcp_write_timer+0xe0/0x1a0
> <4> [<c0137fe1>] ? run_timer_softirq+0x151/0x200
> <4> [<c02af069>] ? maybe_schedule_tx_action+0x39/0x40
> <4> [<c03444a0>] ? tcp_write_timer+0x0/0x1a0
> <4> [<c013359a>] ? __do_softirq+0xba/0x180
> <4> [<c015e7a7>] ? move_native_irq+0x47/0x50
> ...
> 
> I've checked the code, tcp_use_frto() crashes because
> 
> skb = tcp_write_queue_head(sk);
> 
> returns a NULL, as the queue is empty, and in the next line:
> 
> if (tcp_skb_is_last(sk, skb))
> ===>
> static inline bool tcp_skb_is_last(const struct sock *sk,
>                     const struct sk_buff *skb)
> {
>      return skb_queue_is_last(&sk->sk_write_queue, skb);
> }
> ===>
> static inline bool skb_queue_is_last(const struct sk_buff_head *list,
>                       const struct sk_buff *skb)
> {
>      return (skb->next == (struct sk_buff *) list);
> }
> 
> That skb->next cause the NULL pointer dereference.
> 
> I've checked this in upstream, and it seems this would fail in the same 
> way. Wouldn't it be more reasonable to return from 
> tcp_retransmit_timer() instead of just signing a WARN? Something like this:
> 
> diff -r 7a748d2cb9f1 -r bb8257f0730a net/ipv4/tcp_timer.c
> --- a/net/ipv4/tcp_timer.c    Wed Feb 13 15:02:50 2013 +0000
> +++ b/net/ipv4/tcp_timer.c    Wed Feb 13 15:03:18 2013 +0000
> @@ -287,11 +287,9 @@ void tcp_retransmit_timer(struct sock *s
>       struct tcp_sock *tp = tcp_sk(sk);
>       struct inet_connection_sock *icsk = inet_csk(sk);
> 
> -    if (!tp->packets_out)
> +    if (!tp->packets_out || tcp_write_queue_empty(sk))
>           goto out;
> 
> -    WARN_ON(tcp_write_queue_empty(sk));
> -
>       if (!tp->snd_wnd && !sock_flag(sk, SOCK_DEAD) &&
>           !((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV))) {
>           /* Receiver dastardly shrinks window. Our retransmits

This doesn't seem to be a standard kernel.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zoltan Kiss Feb. 13, 2013, 10:14 p.m. UTC | #2
Hi,

On 13/02/13 21:37, Eric Dumazet wrote:
> This doesn't seem to be a standard kernel.

Indeed, this is a XenServer 6.0.2 kernel, which contains changes, 
however the TCP parts are barely modified. The related code path were 
not changed.
I'm trying to determine if this bug could be fixed in an obvious way, so 
let me rephrase my question: if the socket write queue is empty, 
shouldn't we just stop going further instead of dropping a WARN? Or, if 
there is a reason to do so, shouldn't we check at least that returned 
pointer in tcp_use_frto()?

Regards,

Zoltan Kiss
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Feb. 13, 2013, 11:24 p.m. UTC | #3
On Wed, 2013-02-13 at 22:14 +0000, Zoltan Kiss wrote:
> Hi,
> 
> On 13/02/13 21:37, Eric Dumazet wrote:
> > This doesn't seem to be a standard kernel.
> 
> Indeed, this is a XenServer 6.0.2 kernel, which contains changes, 
> however the TCP parts are barely modified. The related code path were 
> not changed.
> I'm trying to determine if this bug could be fixed in an obvious way, so 
> let me rephrase my question: if the socket write queue is empty, 
> shouldn't we just stop going further instead of dropping a WARN? Or, if 
> there is a reason to do so, shouldn't we check at least that returned 
> pointer in tcp_use_frto()?

You could change the WARN_ON() to BUG_ON(), as there is a severe bug in
your tree (and possibly in current trees as well, but its hard to say,
given 2.6.32 is probably missing some tcp fixes)

Trying to hide the bug wont really help.

How write_queue can be empty and packets_out not null ?




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

===>
static inline bool tcp_skb_is_last(const struct sock *sk,
                    const struct sk_buff *skb)
{
     return skb_queue_is_last(&sk->sk_write_queue, skb);
}
===>
static inline bool skb_queue_is_last(const struct sk_buff_head *list,
                      const struct sk_buff *skb)
{
     return (skb->next == (struct sk_buff *) list);
}

That skb->next cause the NULL pointer dereference.

I've checked this in upstream, and it seems this would fail in the same 
way. Wouldn't it be more reasonable to return from 
tcp_retransmit_timer() instead of just signing a WARN? Something like this:

diff -r 7a748d2cb9f1 -r bb8257f0730a net/ipv4/tcp_timer.c
--- a/net/ipv4/tcp_timer.c    Wed Feb 13 15:02:50 2013 +0000
+++ b/net/ipv4/tcp_timer.c    Wed Feb 13 15:03:18 2013 +0000
@@ -287,11 +287,9 @@  void tcp_retransmit_timer(struct sock *s
      struct tcp_sock *tp = tcp_sk(sk);
      struct inet_connection_sock *icsk = inet_csk(sk);

-    if (!tp->packets_out)
+    if (!tp->packets_out || tcp_write_queue_empty(sk))
          goto out;

-    WARN_ON(tcp_write_queue_empty(sk));
-
      if (!tp->snd_wnd && !sock_flag(sk, SOCK_DEAD) &&
          !((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV))) {
          /* Receiver dastardly shrinks window. Our retransmits