Message ID | 1531653191-6456-1-git-send-email-baoyou.xie@gmail.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
Series | [v2] datagram: return from __skb_recv_datagram() as soon as possible | expand |
On Sun, Jul 15, 2018 at 4:17 AM Baoyou Xie <baoyou.xie@gmail.com> wrote: > > We got a soft lockup in a heavy busy cloud server where RIP is > at _raw_spin_unlock_irqrestore+0x1b/0x40: > [] finish_wait+0x56/0x70 > [] __skb_recv_datagram+0x3fb/0x5a0 > [] ? datagram_poll+0x100/0x100 > [] skb_recv_datagram+0x41/0x60 > [] netlink_recvmsg+0x62/0x450 > [] sock_recvmsg+0xbf/0x100 > [] ? futex_wait+0x193/0x280 > [] ? finish_task_switch+0x108/0x170 > [] SYSC_recvfrom+0xe8/0x160 > [] ? __schedule+0x3c8/0x990 > [] SyS_recvfrom+0xe/0x10 > [] system_call_fastpath+0x16/0x1b > > In fact, a mistake exists in __skb_recv_datagram(). For example, > if a datagram come in persistently after go through the socket > queue, then __skb_wait_for_more_packets() will find out that the > last peeked skb is not the real last one, so it return 0. this > results in long time outer loop, and can trigger soft lockup. Is this with MSG_PEEK? If the above occurs, that implies that the queue is not empty so the next iteration of the loop in __skb_recv_datagram should return the oldest packet on the queue. This is a netlink socket. Those do not support SO_PEEK_OFF, simplifying the problem somewhat. I do not yet see how this can loop until timeout if data is queued. > So this patch changes the loop condition to prevent soft lockup. Bounding waiting time in this manner should not be needed, as __skb_wait_for_more_packets reduces remaining timeo on wake.
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Date: Mon, 16 Jul 2018 15:17:54 -0700 > If the above occurs, that implies that the queue is not empty so the > next iteration of the loop in __skb_recv_datagram should return > the oldest packet on the queue. Isn't it possible, with two threads pulling from the socket in parallel, for one of them to be constantly unable to pass that test: if (sk->sk_receive_queue.prev != skb) goto out; because the other one empties the queue too quickly every time? We sample 'last' with the queue lock held, but the above test is done without that lock.
diff --git a/net/core/datagram.c b/net/core/datagram.c index 9938952..76c1001 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -295,9 +295,11 @@ struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned int flags, int *peeked, int *off, int *err) { struct sk_buff *skb, *last; + unsigned long expire; long timeo; timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT); + expire = jiffies + timeo; do { skb = __skb_try_recv_datagram(sk, flags, destructor, peeked, @@ -307,7 +309,7 @@ struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned int flags, if (*err != -EAGAIN) break; - } while (timeo && + } while (time_before(jiffies, expire) && !__skb_wait_for_more_packets(sk, err, &timeo, last)); return NULL;
We got a soft lockup in a heavy busy cloud server where RIP is at _raw_spin_unlock_irqrestore+0x1b/0x40: [] finish_wait+0x56/0x70 [] __skb_recv_datagram+0x3fb/0x5a0 [] ? datagram_poll+0x100/0x100 [] skb_recv_datagram+0x41/0x60 [] netlink_recvmsg+0x62/0x450 [] sock_recvmsg+0xbf/0x100 [] ? futex_wait+0x193/0x280 [] ? finish_task_switch+0x108/0x170 [] SYSC_recvfrom+0xe8/0x160 [] ? __schedule+0x3c8/0x990 [] SyS_recvfrom+0xe/0x10 [] system_call_fastpath+0x16/0x1b In fact, a mistake exists in __skb_recv_datagram(). For example, if a datagram come in persistently after go through the socket queue, then __skb_wait_for_more_packets() will find out that the last peeked skb is not the real last one, so it return 0. this results in long time outer loop, and can trigger soft lockup. So this patch changes the loop condition to prevent soft lockup. Signed-off-by: Baoyou Xie <baoyou.xie@gmail.com> --- net/core/datagram.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)