Patchwork ext4: use percpu counter for extent cache count

login
register
mail settings
Submitter Eric Sandeen
Date March 1, 2013, 4:42 p.m.
Message ID <5130DA71.4040808@redhat.com>
Download mbox | patch
Permalink /patch/224407/
State Accepted
Headers show

Comments

Eric Sandeen - March 1, 2013, 4:42 p.m.
Use a percpu counter rather than atomic types for shrinker accounting.
There's no need for ultimate accuracy in the shrinker, so this
should come a little more cheaply.  The percpu struct is somewhat
large, but there was a big gap before the cache-aligned
s_es_lru_lock anyway, and it fits nicely in there.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o - March 1, 2013, 6 p.m.
On Fri, Mar 01, 2013 at 10:42:25AM -0600, Eric Sandeen wrote:
> Use a percpu counter rather than atomic types for shrinker accounting.
> There's no need for ultimate accuracy in the shrinker, so this
> should come a little more cheaply.  The percpu struct is somewhat
> large, but there was a big gap before the cache-aligned
> s_es_lru_lock anyway, and it fits nicely in there.

I thought about using percpu counters, but I was worried about the
size on really big machines.  OTOH, it will be the really large NUMA
machines where atomic_t will really hurt, so maybe we should use
percpu countesr and not really worry about it.  It's on a per file
system basis, so even if it is a few hundred bytes it shouldn't break
the bank.

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen - March 1, 2013, 6:02 p.m.
On 3/1/13 12:00 PM, Theodore Ts'o wrote:
> On Fri, Mar 01, 2013 at 10:42:25AM -0600, Eric Sandeen wrote:
>> Use a percpu counter rather than atomic types for shrinker accounting.
>> There's no need for ultimate accuracy in the shrinker, so this
>> should come a little more cheaply.  The percpu struct is somewhat
>> large, but there was a big gap before the cache-aligned
>> s_es_lru_lock anyway, and it fits nicely in there.
> 
> I thought about using percpu counters, but I was worried about the
> size on really big machines.  OTOH, it will be the really large NUMA
> machines where atomic_t will really hurt, so maybe we should use
> percpu countesr and not really worry about it.  It's on a per file
> system basis, so even if it is a few hundred bytes it shouldn't break
> the bank.
> 
> 						- Ted
> 

I was mostly keying off what quota felt was best, I guess.
I'm not wedded to either approach, it was just a thought.
So you can take it or leave it. :)

Thanks,
-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 96c1093..4a01ba3 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1268,7 +1268,6 @@  struct ext4_sb_info {
 	atomic_t s_mb_preallocated;
 	atomic_t s_mb_discarded;
 	atomic_t s_lock_busy;
-	atomic_t s_extent_cache_cnt;
 
 	/* locality groups */
 	struct ext4_locality_group __percpu *s_locality_groups;
@@ -1310,6 +1309,7 @@  struct ext4_sb_info {
 	/* Reclaim extents from extent status tree */
 	struct shrinker s_es_shrinker;
 	struct list_head s_es_lru;
+	struct percpu_counter s_extent_cache_cnt;
 	spinlock_t s_es_lru_lock ____cacheline_aligned_in_smp;
 };
 
diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
index 27fcdd2..95796a1 100644
--- a/fs/ext4/extents_status.c
+++ b/fs/ext4/extents_status.c
@@ -305,7 +305,7 @@  ext4_es_alloc_extent(struct inode *inode, ext4_lblk_t lblk, ext4_lblk_t len,
 	 */
 	if (!ext4_es_is_delayed(es)) {
 		EXT4_I(inode)->i_es_lru_nr++;
-		atomic_inc(&EXT4_SB(inode->i_sb)->s_extent_cache_cnt);
+		percpu_counter_inc(&EXT4_SB(inode->i_sb)->s_extent_cache_cnt);
 	}
 
 	return es;
@@ -317,7 +317,7 @@  static void ext4_es_free_extent(struct inode *inode, struct extent_status *es)
 	if (!ext4_es_is_delayed(es)) {
 		BUG_ON(EXT4_I(inode)->i_es_lru_nr == 0);
 		EXT4_I(inode)->i_es_lru_nr--;
-		atomic_dec(&EXT4_SB(inode->i_sb)->s_extent_cache_cnt);
+		percpu_counter_dec(&EXT4_SB(inode->i_sb)->s_extent_cache_cnt);
 	}
 
 	kmem_cache_free(ext4_es_cachep, es);
@@ -678,7 +678,7 @@  static int ext4_es_shrink(struct shrinker *shrink, struct shrink_control *sc)
 	int nr_to_scan = sc->nr_to_scan;
 	int ret, nr_shrunk = 0;
 
-	ret = atomic_read(&sbi->s_extent_cache_cnt);
+	ret = percpu_counter_read_positive(&sbi->s_extent_cache_cnt);
 	trace_ext4_es_shrink_enter(sbi->s_sb, nr_to_scan, ret);
 
 	if (!nr_to_scan)
@@ -711,7 +711,7 @@  static int ext4_es_shrink(struct shrinker *shrink, struct shrink_control *sc)
 	list_splice_tail(&scanned, &sbi->s_es_lru);
 	spin_unlock(&sbi->s_es_lru_lock);
 
-	ret = atomic_read(&sbi->s_extent_cache_cnt);
+	ret = percpu_counter_read_positive(&sbi->s_extent_cache_cnt);
 	trace_ext4_es_shrink_exit(sbi->s_sb, nr_shrunk, ret);
 	return ret;
 }