diff mbox

memcg/tcp: fix warning caused b res->usage go to negative.

Message ID 4F750FE8.2030800@jp.fujitsu.com
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

KAMEZAWA Hiroyuki March 30, 2012, 1:44 a.m. UTC
Maybe what we can do before lsf/mm summit will be this (avoid warning.)
This patch is onto linus's git tree. Patch description is updated.

Thanks.
-Kame
==
From 4ab80f84bbcb02a790342426c1de84aeb17fcbe9 Mon Sep 17 00:00:00 2001
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Date: Thu, 29 Mar 2012 14:59:04 +0900
Subject: [PATCH] memcg/tcp: fix warning caused b res->usage go to negative.

tcp memcontrol starts accouting after res->limit is set. So, if a sockets
starts before setting res->limit, there are already used resource.
At setting res->limit, accounting starts. The resource will be uncharged
and make res_counter below 0 because they are not charged. 
This causes warning.

==
kernel: ------------[ cut here ]----------
kernel: WARNING: at kernel/res_counter.c:96 res_counter_uncharge_locked+0x37/0x40()
 <snip>
kernel: Pid: 17753, comm: bash Tainted: G  W    3.3.0+ #99
kernel: Call Trace:
kernel: <IRQ>  [<ffffffff8104cc9f>] warn_slowpath_common+0x7f/0xc0
kernel: [<ffffffff810d7e88>] ? rb_reserve__next_event+0x68/0x470
kernel: [<ffffffff8104ccfa>] warn_slowpath_null+0x1a/0x20
kernel: [<ffffffff810b4e37>] res_counter_uncharge_locked+0x37/0x40
...
==

This patch fixes that by adding res_counter_uncharge_nowarn()
and hide the wrong accounting.

The reason for this fix is :
 - For avoid overheads, we don't account tcp usage by a cgroup before
   setting limit. So, we cannot know the amount of possible error when
   we start accounting. If we have this on/off switch for performance,
   this cannot be avoidable.

Consideration:
 - As memcg, if some marking to resources as PCG_USED could be used..
   ...we don't have problems. But it adds overhead.

TODO:
 - When enable/disable tcp accounting frequently, we'll see usage accounting
   leak. This should be fixed. (still under discussion.)
 - sockets_allocated has the same trouble. But it has no warning and never
   shows negative value even if it's negative.

Changelog:
 - updated patch description.

Acked-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/res_counter.h |    2 ++
 include/net/sock.h          |    3 ++-
 kernel/res_counter.c        |   18 ++++++++++++++++++
 3 files changed, 22 insertions(+), 1 deletions(-)
diff mbox

Patch

diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
index da81af0..e081948 100644
--- a/include/linux/res_counter.h
+++ b/include/linux/res_counter.h
@@ -134,6 +134,8 @@  int __must_check res_counter_charge_nofail(struct res_counter *counter,
 
 void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val);
 void res_counter_uncharge(struct res_counter *counter, unsigned long val);
+void res_counter_uncharge_nowarn(struct res_counter *counter,
+				unsigned long val);
 
 /**
  * res_counter_margin - calculate chargeable space of a counter
diff --git a/include/net/sock.h b/include/net/sock.h
index a6ba1f8..a1b3f4802 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1048,7 +1048,8 @@  static inline void memcg_memory_allocated_add(struct cg_proto *prot,
 static inline void memcg_memory_allocated_sub(struct cg_proto *prot,
 					      unsigned long amt)
 {
-	res_counter_uncharge(prot->memory_allocated, amt << PAGE_SHIFT);
+	res_counter_uncharge_nowarn(prot->memory_allocated,
+				amt << PAGE_SHIFT);
 }
 
 static inline u64 memcg_memory_allocated_read(struct cg_proto *prot)
diff --git a/kernel/res_counter.c b/kernel/res_counter.c
index d508363..2bb01ac 100644
--- a/kernel/res_counter.c
+++ b/kernel/res_counter.c
@@ -113,6 +113,24 @@  void res_counter_uncharge(struct res_counter *counter, unsigned long val)
 	local_irq_restore(flags);
 }
 
+void res_counter_uncharge_nowarn(struct res_counter *counter,
+					unsigned long val)
+{
+	struct res_counter *c;
+	unsigned long flags;
+
+	local_irq_save(flags);
+
+	for (c = counter; c != NULL; c = c->parent) {
+		spin_lock(&c->lock);
+		if (c->usage < val)
+			val = c->usage;
+		res_counter_uncharge_locked(c, val);
+		spin_unlock(&c->lock);
+	}
+	local_irq_restore(flags);
+}
+
 
 static inline unsigned long long *
 res_counter_member(struct res_counter *counter, int member)