From patchwork Mon Nov 7 17:28:50 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Glauber Costa X-Patchwork-Id: 124148 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 1F463B6F8E for ; Tue, 8 Nov 2011 04:29:19 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932781Ab1KGR24 (ORCPT ); Mon, 7 Nov 2011 12:28:56 -0500 Received: from mx2.parallels.com ([64.131.90.16]:46909 "EHLO mx2.parallels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751320Ab1KGR2z convert rfc822-to-8bit (ORCPT ); Mon, 7 Nov 2011 12:28:55 -0500 Received: from [96.31.168.206] (helo=mail.parallels.com) by mx2.parallels.com with esmtps (TLSv1:AES128-SHA:128) (Exim 4.74) (envelope-from ) id 1RNT0B-0008OP-3c; Mon, 07 Nov 2011 12:28:55 -0500 Received: from US-EXCH1.sw.swsoft.com ([fe80::f4cb:ddbf:40fa:d379]) by US-EXCH1.sw.swsoft.com ([fe80::f4cb:ddbf:40fa:d379%10]) with mapi id 14.01.0218.012; Mon, 7 Nov 2011 09:28:52 -0800 From: Glauber Costa To: Glauber Costa , "linux-kernel@vger.kernel.org" CC: "paul@paulmenage.org" , "lizf@cn.fujitsu.com" , "kamezawa.hiroyu@jp.fujitsu.com" , "ebiederm@xmission.com" , "davem@davemloft.net" , "gthelen@google.com" , "netdev@vger.kernel.org" , "linux-mm@kvack.org" , "kirill@shutemov.name" , Andrey Vagin , "devel@openvz.org" , "eric.dumazet@gmail.com" , Glauber Costa , "kamezawa.hiroyu@jp.fujtisu.com" Subject: RE: [PATCH v5 04/10] per-cgroup tcp buffers control Thread-Topic: [PATCH v5 04/10] per-cgroup tcp buffers control Thread-Index: AcydcraKg0L6nSzDiUqrkSQQPi4EDg== Date: Mon, 7 Nov 2011 17:28:50 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [195.124.9.51] MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Ok, I forgot to change the temporary name I was using for the jump label. Shame on me :) --- Mensagem Original --- De: Glauber Costa Enviado: 7 de novembro de 2011 07/11/11 Para: linux-kernel@vger.kernel.org Cc: paul@paulmenage.org, lizf@cn.fujitsu.com, kamezawa.hiroyu@jp.fujitsu.com, ebiederm@xmission.com, davem@davemloft.net, gthelen@google.com, netdev@vger.kernel.org, linux-mm@kvack.org, kirill@shutemov.name, Andrey Vagin , devel@openvz.org, eric.dumazet@gmail.com, Glauber Costa , KAMEZAWA Hiroyuki Assunto: [PATCH v5 04/10] per-cgroup tcp buffers control With all the infrastructure in place, this patch implements per-cgroup control for tcp memory pressure handling. A resource conter is used to control allocated memory, except for the root cgroup, that will keep using global counters. This patch is the one that actually enables/disables the jump labels controlling cgroup. To this point, they were always disabled. Signed-off-by: Glauber Costa CC: KAMEZAWA Hiroyuki CC: David S. Miller CC: Eric W. Biederman CC: Eric Dumazet --- include/net/tcp.h | 18 +++++++ include/net/transp_v6.h | 1 + mm/memcontrol.c | 125 ++++++++++++++++++++++++++++++++++++++++++++++- net/core/sock.c | 46 +++++++++++++++-- net/ipv4/af_inet.c | 3 + net/ipv4/tcp_ipv4.c | 12 +++++ net/ipv6/af_inet6.c | 3 + net/ipv6/tcp_ipv6.c | 10 ++++ 8 files changed, 211 insertions(+), 7 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index ccaa3b6..7301ca8 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -253,6 +253,22 @@ extern int sysctl_tcp_cookie_size; extern int sysctl_tcp_thin_linear_timeouts; extern int sysctl_tcp_thin_dupack; +struct tcp_memcontrol { + /* per-cgroup tcp memory pressure knobs */ + struct res_counter tcp_memory_allocated; + struct percpu_counter tcp_sockets_allocated; + /* those two are read-mostly, leave them at the end */ + long tcp_prot_mem[3]; + int tcp_memory_pressure; +}; + +long *sysctl_mem_tcp(struct mem_cgroup *memcg); +struct percpu_counter *sockets_allocated_tcp(struct mem_cgroup *memcg); +int *memory_pressure_tcp(struct mem_cgroup *memcg); +struct res_counter *memory_allocated_tcp(struct mem_cgroup *memcg); +int tcp_init_cgroup(struct cgroup *cgrp, struct cgroup_subsys *ss); +void tcp_destroy_cgroup(struct cgroup *cgrp, struct cgroup_subsys *ss); + extern atomic_long_t tcp_memory_allocated; extern struct percpu_counter tcp_sockets_allocated; extern int tcp_memory_pressure; @@ -305,6 +321,7 @@ static inline int tcp_synq_no_recent_overflow(const struct sock *sk) } extern struct proto tcp_prot; +extern struct cg_proto tcp_cg_prot; #define TCP_INC_STATS(net, field) SNMP_INC_STATS((net)->mib.tcp_statistics, field) #define TCP_INC_STATS_BH(net, field) SNMP_INC_STATS_BH((net)->mib.tcp_statistics, field) @@ -1022,6 +1039,7 @@ static inline void tcp_openreq_init(struct request_sock *req, ireq->loc_port = tcp_hdr(skb)->dest; } +extern void tcp_enter_memory_pressure_cg(struct sock *sk); extern void tcp_enter_memory_pressure(struct sock *sk); static inline int keepalive_intvl_when(const struct tcp_sock *tp) diff --git a/include/net/transp_v6.h b/include/net/transp_v6.h index 498433d..1e18849 100644 --- a/include/net/transp_v6.h +++ b/include/net/transp_v6.h @@ -11,6 +11,7 @@ extern struct proto rawv6_prot; extern struct proto udpv6_prot; extern struct proto udplitev6_prot; extern struct proto tcpv6_prot; +extern struct cg_proto tcpv6_cg_prot; struct flowi6; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 7d684d0..f14d7d2 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -49,6 +49,9 @@ #include #include #include "internal.h" +#ifdef CONFIG_INET +#include +#endif #include @@ -294,6 +297,10 @@ struct mem_cgroup { */ struct mem_cgroup_stat_cpu nocpu_base; spinlock_t pcp_counter_lock; + +#ifdef CONFIG_INET + struct tcp_memcontrol tcp; +#endif }; /* Stuffs for move charges at task migration. */ @@ -377,7 +384,7 @@ enum mem_type { #define MEM_CGROUP_RECLAIM_SOFT (1 << MEM_CGROUP_RECLAIM_SOFT_BIT) static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg); - +static struct mem_cgroup *mem_cgroup_from_cont(struct cgroup *cont); static inline bool mem_cgroup_is_root(struct mem_cgroup *mem) { return (mem == root_mem_cgroup); @@ -387,6 +394,7 @@ static inline bool mem_cgroup_is_root(struct mem_cgroup *mem) #ifdef CONFIG_CGROUP_MEM_RES_CTLR_KMEM #ifdef CONFIG_INET #include +#include void sock_update_memcg(struct sock *sk) { @@ -451,6 +459,93 @@ u64 memcg_memory_allocated_read(struct mem_cgroup *memcg, struct cg_proto *prot) RES_USAGE) >> PAGE_SHIFT ; } EXPORT_SYMBOL(memcg_memory_allocated_read); +/* + * Pressure flag: try to collapse. + * Technical note: it is used by multiple contexts non atomically. + * All the __sk_mem_schedule() is of this nature: accounting + * is strict, actions are advisory and have some latency. + */ +void tcp_enter_memory_pressure_cg(struct sock *sk) +{ + struct mem_cgroup *memcg = sk->sk_cgrp; + if (!memcg->tcp.tcp_memory_pressure) { + NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPMEMORYPRESSURES); + memcg->tcp.tcp_memory_pressure = 1; + } +} +EXPORT_SYMBOL(tcp_enter_memory_pressure_cg); + +long *sysctl_mem_tcp(struct mem_cgroup *memcg) +{ + return memcg--- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html