diff mbox series

[v2] datagram: return from __skb_recv_datagram() as soon as possible

Message ID 1531653191-6456-1-git-send-email-baoyou.xie@gmail.com
State Changes Requested, archived
Delegated to: David Miller
Headers show
Series [v2] datagram: return from __skb_recv_datagram() as soon as possible | expand

Commit Message

Baoyou Xie July 15, 2018, 11:13 a.m. UTC
We got a soft lockup in a heavy busy cloud server where RIP is
at _raw_spin_unlock_irqrestore+0x1b/0x40:
        [] finish_wait+0x56/0x70
        [] __skb_recv_datagram+0x3fb/0x5a0
        [] ? datagram_poll+0x100/0x100
        [] skb_recv_datagram+0x41/0x60
        [] netlink_recvmsg+0x62/0x450
        [] sock_recvmsg+0xbf/0x100
        [] ? futex_wait+0x193/0x280
        [] ? finish_task_switch+0x108/0x170
        [] SYSC_recvfrom+0xe8/0x160
        [] ? __schedule+0x3c8/0x990
        [] SyS_recvfrom+0xe/0x10
        [] system_call_fastpath+0x16/0x1b

In fact, a mistake exists in __skb_recv_datagram(). For example,
if a datagram come in persistently after go through the socket
queue, then __skb_wait_for_more_packets() will find out that the
last peeked skb is not the real last one, so it return 0. this
results in long time outer loop, and can trigger soft lockup.

So this patch changes the loop condition to prevent soft lockup.

Signed-off-by: Baoyou Xie <baoyou.xie@gmail.com>
---
 net/core/datagram.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Willem de Bruijn July 16, 2018, 10:17 p.m. UTC | #1
On Sun, Jul 15, 2018 at 4:17 AM Baoyou Xie <baoyou.xie@gmail.com> wrote:
>
> We got a soft lockup in a heavy busy cloud server where RIP is
> at _raw_spin_unlock_irqrestore+0x1b/0x40:
>         [] finish_wait+0x56/0x70
>         [] __skb_recv_datagram+0x3fb/0x5a0
>         [] ? datagram_poll+0x100/0x100
>         [] skb_recv_datagram+0x41/0x60
>         [] netlink_recvmsg+0x62/0x450
>         [] sock_recvmsg+0xbf/0x100
>         [] ? futex_wait+0x193/0x280
>         [] ? finish_task_switch+0x108/0x170
>         [] SYSC_recvfrom+0xe8/0x160
>         [] ? __schedule+0x3c8/0x990
>         [] SyS_recvfrom+0xe/0x10
>         [] system_call_fastpath+0x16/0x1b
>
> In fact, a mistake exists in __skb_recv_datagram(). For example,
> if a datagram come in persistently after go through the socket
> queue, then __skb_wait_for_more_packets() will find out that the
> last peeked skb is not the real last one, so it return 0. this
> results in long time outer loop, and can trigger soft lockup.

Is this with MSG_PEEK?

If the above occurs, that implies that the queue is not empty so the
next iteration of the loop in __skb_recv_datagram should return
the oldest packet on the queue.

This is a netlink socket. Those do not support SO_PEEK_OFF,
simplifying the problem somewhat.

I do not yet see how this can loop until timeout if data is queued.

> So this patch changes the loop condition to prevent soft lockup.

Bounding waiting time in this manner should not be needed, as
__skb_wait_for_more_packets reduces remaining timeo on wake.
David Miller July 18, 2018, 5:15 a.m. UTC | #2
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: Mon, 16 Jul 2018 15:17:54 -0700

> If the above occurs, that implies that the queue is not empty so the
> next iteration of the loop in __skb_recv_datagram should return
> the oldest packet on the queue.

Isn't it possible, with two threads pulling from the socket in
parallel, for one of them to be constantly unable to pass that
test:

	if (sk->sk_receive_queue.prev != skb)
		goto out;

because the other one empties the queue too quickly every time?

We sample 'last' with the queue lock held, but the above test is done
without that lock.
diff mbox series

Patch

diff --git a/net/core/datagram.c b/net/core/datagram.c
index 9938952..76c1001 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -295,9 +295,11 @@  struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned int flags,
 				    int *peeked, int *off, int *err)
 {
 	struct sk_buff *skb, *last;
+	unsigned long expire;
 	long timeo;
 
 	timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
+	expire = jiffies + timeo;
 
 	do {
 		skb = __skb_try_recv_datagram(sk, flags, destructor, peeked,
@@ -307,7 +309,7 @@  struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned int flags,
 
 		if (*err != -EAGAIN)
 			break;
-	} while (timeo &&
+	} while (time_before(jiffies, expire) &&
 		!__skb_wait_for_more_packets(sk, err, &timeo, last));
 
 	return NULL;