Message ID | 1566470311-4089-1-git-send-email-jan.dakinevich@virtuozzo.com |
---|---|
State | Rejected |
Delegated to: | David Miller |
Headers | show |
Series | af_unix: utilize skb's fragment list for sending large datagrams | expand |
From: Jan Dakinevich <jan.dakinevich@virtuozzo.com> Date: Thu, 22 Aug 2019 10:38:39 +0000 > However, paged part can not exceed MAX_SKB_FRAGS * PAGE_SIZE, and large > datagram causes increasing skb's data buffer. Thus, if any user-space > program sets send buffer (by calling setsockopt(SO_SNDBUF, ...)) to > maximum allowed size (wmem_max) it becomes able to cause any amount > of uncontrolled high-order kernel allocations. So? You want huge SKBs you get the high order allocations, seems rather reasonable to me. SKBs using fragment lists are the most difficult and cpu intensive geometry for an SKB to have and we should avoid using it where feasible. I don't want to apply this, sorry.
On 8/22/19 9:04 PM, David Miller wrote: > From: Jan Dakinevich <jan.dakinevich@virtuozzo.com> > Date: Thu, 22 Aug 2019 10:38:39 +0000 > >> However, paged part can not exceed MAX_SKB_FRAGS * PAGE_SIZE, and large >> datagram causes increasing skb's data buffer. Thus, if any user-space >> program sets send buffer (by calling setsockopt(SO_SNDBUF, ...)) to >> maximum allowed size (wmem_max) it becomes able to cause any amount >> of uncontrolled high-order kernel allocations. > So? You want huge SKBs you get the high order allocations, seems > rather reasonable to me. > > SKBs using fragment lists are the most difficult and cpu intensive > geometry for an SKB to have and we should avoid using it where > feasible. > > I don't want to apply this, sorry. Under even mediocre memory pressure this will either takes seconds or fail, which does not look good. We can try to allocate memory of big order but not that hard and switch to fragments when possible. Please also note that even ordinary user could trigger really big allocations and thus force the whole node to dance. Den Den
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 67e87db..0c13937 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1580,7 +1580,9 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg, struct sk_buff *skb; long timeo; struct scm_cookie scm; - int data_len = 0; + unsigned long frag_len; + unsigned long paged_len; + unsigned long header_len; int sk_locked; wait_for_unix_gc(); @@ -1613,27 +1615,41 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg, if (len > sk->sk_sndbuf - 32) goto out; - if (len > SKB_MAX_ALLOC) { - data_len = min_t(size_t, - len - SKB_MAX_ALLOC, - MAX_SKB_FRAGS * PAGE_SIZE); - data_len = PAGE_ALIGN(data_len); + BUILD_BUG_ON(SKB_MAX_ALLOC < PAGE_SIZE); - BUILD_BUG_ON(SKB_MAX_ALLOC < PAGE_SIZE); - } + header_len = min(len, SKB_MAX_ALLOC); + paged_len = min(len - header_len, MAX_SKB_FRAGS * PAGE_SIZE); + frag_len = len - header_len - paged_len; - skb = sock_alloc_send_pskb(sk, len - data_len, data_len, + skb = sock_alloc_send_pskb(sk, header_len, paged_len, msg->msg_flags & MSG_DONTWAIT, &err, PAGE_ALLOC_COSTLY_ORDER); if (skb == NULL) goto out; + while (frag_len) { + unsigned long size = min(SKB_MAX_ALLOC, frag_len); + struct sk_buff *frag; + + frag = sock_alloc_send_pskb(sk, size, 0, + msg->msg_flags & MSG_DONTWAIT, + &err, 0); + if (!frag) + goto out_free; + + skb_put(frag, size); + frag->next = skb_shinfo(skb)->frag_list; + skb_shinfo(skb)->frag_list = frag; + + frag_len -= size; + } + err = unix_scm_to_skb(&scm, skb, true); if (err < 0) goto out_free; - skb_put(skb, len - data_len); - skb->data_len = data_len; + skb_put(skb, header_len); + skb->data_len = len - header_len; skb->len = len; err = skb_copy_datagram_from_iter(skb, 0, &msg->msg_iter, len); if (err)
When somebody tries to send big datagram, kernel makes an attempt to avoid high-order allocation placing it into both: skb's data buffer and skb's paged part (->frag). However, paged part can not exceed MAX_SKB_FRAGS * PAGE_SIZE, and large datagram causes increasing skb's data buffer. Thus, if any user-space program sets send buffer (by calling setsockopt(SO_SNDBUF, ...)) to maximum allowed size (wmem_max) it becomes able to cause any amount of uncontrolled high-order kernel allocations. To avoid this, do not pass more then SKB_MAX_ALLOC for skb's data buffer and make use of fragment list of skb (->frag_list) in addition to paged part for huge datagrams. Signed-off-by: Jan Dakinevich <jan.dakinevich@virtuozzo.com> --- net/unix/af_unix.c | 38 +++++++++++++++++++++++++++----------- 1 file changed, 27 insertions(+), 11 deletions(-)