diff mbox series

af_unix: utilize skb's fragment list for sending large datagrams

Message ID 1566470311-4089-1-git-send-email-jan.dakinevich@virtuozzo.com
State Rejected
Delegated to: David Miller
Headers show
Series af_unix: utilize skb's fragment list for sending large datagrams | expand

Commit Message

Jan Dakinevich Aug. 22, 2019, 10:38 a.m. UTC
When somebody tries to send big datagram, kernel makes an attempt to
avoid high-order allocation placing it into both: skb's data buffer
and skb's paged part (->frag).

However, paged part can not exceed MAX_SKB_FRAGS * PAGE_SIZE, and large
datagram causes increasing skb's data buffer. Thus, if any user-space
program sets send buffer (by calling setsockopt(SO_SNDBUF, ...)) to
maximum allowed size (wmem_max) it becomes able to cause any amount
of uncontrolled high-order kernel allocations.

To avoid this, do not pass more then SKB_MAX_ALLOC for skb's data
buffer and make use of fragment list of skb (->frag_list) in addition
to paged part for huge datagrams.

Signed-off-by: Jan Dakinevich <jan.dakinevich@virtuozzo.com>
---
 net/unix/af_unix.c | 38 +++++++++++++++++++++++++++-----------
 1 file changed, 27 insertions(+), 11 deletions(-)

Comments

David Miller Aug. 22, 2019, 7:04 p.m. UTC | #1
From: Jan Dakinevich <jan.dakinevich@virtuozzo.com>
Date: Thu, 22 Aug 2019 10:38:39 +0000

> However, paged part can not exceed MAX_SKB_FRAGS * PAGE_SIZE, and large
> datagram causes increasing skb's data buffer. Thus, if any user-space
> program sets send buffer (by calling setsockopt(SO_SNDBUF, ...)) to
> maximum allowed size (wmem_max) it becomes able to cause any amount
> of uncontrolled high-order kernel allocations.

So?  You want huge SKBs you get the high order allocations, seems
rather reasonable to me.

SKBs using fragment lists are the most difficult and cpu intensive
geometry for an SKB to have and we should avoid using it where
feasible.

I don't want to apply this, sorry.
Denis V. Lunev Aug. 24, 2019, 8:38 p.m. UTC | #2
On 8/22/19 9:04 PM, David Miller wrote:
> From: Jan Dakinevich <jan.dakinevich@virtuozzo.com>
> Date: Thu, 22 Aug 2019 10:38:39 +0000
>
>> However, paged part can not exceed MAX_SKB_FRAGS * PAGE_SIZE, and large
>> datagram causes increasing skb's data buffer. Thus, if any user-space
>> program sets send buffer (by calling setsockopt(SO_SNDBUF, ...)) to
>> maximum allowed size (wmem_max) it becomes able to cause any amount
>> of uncontrolled high-order kernel allocations.
> So?  You want huge SKBs you get the high order allocations, seems
> rather reasonable to me.
>
> SKBs using fragment lists are the most difficult and cpu intensive
> geometry for an SKB to have and we should avoid using it where
> feasible.
>
> I don't want to apply this, sorry.
Under even mediocre memory pressure this will either takes seconds or fail,
which does not look good. We can try to allocate memory of big order
but not that hard and switch to fragments when possible.

Please also note that even ordinary user could trigger really big
allocations
and thus force the whole node to dance.

Den

Den
diff mbox series

Patch

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 67e87db..0c13937 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1580,7 +1580,9 @@  static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
 	struct sk_buff *skb;
 	long timeo;
 	struct scm_cookie scm;
-	int data_len = 0;
+	unsigned long frag_len;
+	unsigned long paged_len;
+	unsigned long header_len;
 	int sk_locked;
 
 	wait_for_unix_gc();
@@ -1613,27 +1615,41 @@  static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
 	if (len > sk->sk_sndbuf - 32)
 		goto out;
 
-	if (len > SKB_MAX_ALLOC) {
-		data_len = min_t(size_t,
-				 len - SKB_MAX_ALLOC,
-				 MAX_SKB_FRAGS * PAGE_SIZE);
-		data_len = PAGE_ALIGN(data_len);
+	BUILD_BUG_ON(SKB_MAX_ALLOC < PAGE_SIZE);
 
-		BUILD_BUG_ON(SKB_MAX_ALLOC < PAGE_SIZE);
-	}
+	header_len = min(len, SKB_MAX_ALLOC);
+	paged_len = min(len - header_len, MAX_SKB_FRAGS * PAGE_SIZE);
+	frag_len = len - header_len - paged_len;
 
-	skb = sock_alloc_send_pskb(sk, len - data_len, data_len,
+	skb = sock_alloc_send_pskb(sk, header_len, paged_len,
 				   msg->msg_flags & MSG_DONTWAIT, &err,
 				   PAGE_ALLOC_COSTLY_ORDER);
 	if (skb == NULL)
 		goto out;
 
+	while (frag_len) {
+		unsigned long size = min(SKB_MAX_ALLOC, frag_len);
+		struct sk_buff *frag;
+
+		frag = sock_alloc_send_pskb(sk, size, 0,
+					    msg->msg_flags & MSG_DONTWAIT,
+					    &err, 0);
+		if (!frag)
+			goto out_free;
+
+		skb_put(frag, size);
+		frag->next = skb_shinfo(skb)->frag_list;
+		skb_shinfo(skb)->frag_list = frag;
+
+		frag_len -= size;
+	}
+
 	err = unix_scm_to_skb(&scm, skb, true);
 	if (err < 0)
 		goto out_free;
 
-	skb_put(skb, len - data_len);
-	skb->data_len = data_len;
+	skb_put(skb, header_len);
+	skb->data_len = len - header_len;
 	skb->len = len;
 	err = skb_copy_datagram_from_iter(skb, 0, &msg->msg_iter, len);
 	if (err)