From patchwork Tue Sep 21 09:23:13 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 65287 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 5B7A1B6F0D for ; Tue, 21 Sep 2010 19:23:24 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756605Ab0IUJXT (ORCPT ); Tue, 21 Sep 2010 05:23:19 -0400 Received: from mail-fx0-f46.google.com ([209.85.161.46]:59310 "EHLO mail-fx0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756467Ab0IUJXS (ORCPT ); Tue, 21 Sep 2010 05:23:18 -0400 Received: by fxm3 with SMTP id 3so1247469fxm.19 for ; Tue, 21 Sep 2010 02:23:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:subject:from:to:cc :in-reply-to:references:content-type:date:message-id:mime-version :x-mailer:content-transfer-encoding; bh=rUY/gfLGKLi+PEmCRM+hmt8NQrRe2PchSCL8Ti686qo=; b=ZzIDxDYhE3cPUTy5RVpqI5zxqtsSR7aQsGL9Ad6IHEBSa4UgYu+ZhQ/+BWW39b29E4 SzdkEOA3oKnGVeLL95ZykBcl9ETUj9o9JJIec4+cnIb1cMfkU7UN58eURV5NRTGpxjpj Bk0Ms1vAVzVXU8aDpHdfPel7r3Kld7G/gV4sw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=subject:from:to:cc:in-reply-to:references:content-type:date :message-id:mime-version:x-mailer:content-transfer-encoding; b=e/GmEyEif1y5ZTH/LF1nfqLiHjgHfXtT2B8qNItB8WBXih9zxOI9Jo5Z46nesKcBFs zNfNs36nrcn+urA42Klv1nhHQ88otqAxvzrOp0VC4/Y50VrbXXIU7BHjtYSbU5faWCo0 ZjdIXaR5cqUKbqvbcsUFtchUZ8+nLtS/z1RkQ= Received: by 10.223.120.148 with SMTP id d20mr1139452far.14.1285060997332; Tue, 21 Sep 2010 02:23:17 -0700 (PDT) Received: from [10.150.51.210] (gw0.net.jmsp.net [212.23.165.14]) by mx.google.com with ESMTPS id b11sm3452282faq.6.2010.09.21.02.23.14 (version=SSLv3 cipher=RC4-MD5); Tue, 21 Sep 2010 02:23:15 -0700 (PDT) Subject: RE: [PATCH] qlcnic: dont assume NET_IP_ALIGN is 2 From: Eric Dumazet To: Amit Salecha Cc: David Miller , "netdev@vger.kernel.org" , Ameen Rahman , Anirban Chakraborty In-Reply-To: <99737F4847ED0A48AECC9F4A1974A4B80F86F80278@MNEXMB2.qlogic.org> References: <1284700483-16397-2-git-send-email-amit.salecha@qlogic.com> <1284717448.3391.75.camel@edumazet-laptop> <99737F4847ED0A48AECC9F4A1974A4B80F86F8018D@MNEXMB2.qlogic.org> <20100920.085857.98907191.davem@davemloft.net> <99737F4847ED0A48AECC9F4A1974A4B80F86F80270@MNEXMB2.qlogic.org> <1285058073.2617.73.camel@edumazet-laptop> <99737F4847ED0A48AECC9F4A1974A4B80F86F80278@MNEXMB2.qlogic.org> Date: Tue, 21 Sep 2010 11:23:13 +0200 Message-ID: <1285060993.2617.163.camel@edumazet-laptop> Mime-Version: 1.0 X-Mailer: Evolution 2.28.3 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Le mardi 21 septembre 2010 à 03:41 -0500, Amit Salecha a écrit : > > Amit, if you believe this is a problem, you should address it for all > > NICS, not only qlcnic. > > > > Qlcnic was lying to stack, because it consumed 2Kbytes blocs and > > pretended they were consuming skb->len bytes. > > (assuming MTU=1500, problem is worse if MTU is bigger) > > > > So in order to improve "throughput", you were allowing for memory > > exhaust and freeze of the _machine_ ? > > > This won't lead to such problem. truesize is used for accounting only. You must have big machines in your lab and never hit OOM ? You really should take a look on various files in net/core, net/ipv4 trees. And files like "/proc/sys/net/tcp_mem", "/proc/sys/net/udp_mem" In fact, truesize is _underestimated_ : (we dont account for struct skb_shared_info) and kmalloc() rounding We probably should use this patch (without having to check all possible net drivers !) Problem is this would slow down alloc_skb(), so this patch is not for inclusion. cheap alternative would be to use size + sizeof(struct sk_buff) + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) If you think about it, when 128bit arches come, truesize will grow anyway. If some tuning is needed in our stack, we'll do it. (socket api SO_RCVBUF/ SO_SNDBUF is the problem, because applications are not aware of packetization or kernel internals) SOCK_MIN_RCVBUF is way too small, since sizeof(struct sk_buff) is already close to 256. I guess we cannot even receive a single frame. include/net/sock.h | 2 +- net/core/skbuff.c | 2 +- net/core/sock.c | 8 ++++---- 3 files changed, 6 insertions(+), 6 deletions(-) --- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/include/net/sock.h b/include/net/sock.h index 8ae97c4..348fc9e 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1558,7 +1558,7 @@ static inline void sk_wake_async(struct sock *sk, int how, int band) } #define SOCK_MIN_SNDBUF 2048 -#define SOCK_MIN_RCVBUF 256 +#define SOCK_MIN_RCVBUF 1024 static inline void sk_stream_moderate_sndbuf(struct sock *sk) { diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 752c197..5ab2e8e 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -196,7 +196,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, * the tail pointer in struct sk_buff! */ memset(skb, 0, offsetof(struct sk_buff, tail)); - skb->truesize = size + sizeof(struct sk_buff); + skb->truesize = ksize(data) + sizeof(struct sk_buff); atomic_set(&skb->users, 1); skb->head = data; skb->data = data; diff --git a/net/core/sock.c b/net/core/sock.c index f3a06c4..803e041 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -535,10 +535,10 @@ int sock_setsockopt(struct socket *sock, int level, int optname, val = sysctl_wmem_max; set_sndbuf: sk->sk_userlocks |= SOCK_SNDBUF_LOCK; - if ((val * 2) < SOCK_MIN_SNDBUF) + if ((val * 4) < SOCK_MIN_SNDBUF) sk->sk_sndbuf = SOCK_MIN_SNDBUF; else - sk->sk_sndbuf = val * 2; + sk->sk_sndbuf = val * 4; /* * Wake up sending tasks if we @@ -579,10 +579,10 @@ set_rcvbuf: * returning the value we actually used in getsockopt * is the most desirable behavior. */ - if ((val * 2) < SOCK_MIN_RCVBUF) + if ((val * 4) < SOCK_MIN_RCVBUF) sk->sk_rcvbuf = SOCK_MIN_RCVBUF; else - sk->sk_rcvbuf = val * 2; + sk->sk_rcvbuf = val * 4; break; case SO_RCVBUFFORCE: