Patchwork slub: Lockout validation scans during freeing of object

login
register
mail settings
Submitter Christoph Lameter
Date Nov. 22, 2011, 4:53 p.m.
Message ID <alpine.DEB.2.00.1111221052130.28197@router.home>
Download mbox | patch
Permalink /patch/127130/
State Not Applicable
Delegated to: David Miller
Headers show

Comments

Christoph Lameter - Nov. 22, 2011, 4:53 p.m.
A bit heavy handed locking but this should do the trick.

Subject: slub: Lockout validation scans during freeing of object

Slab validation can run right now while the slab free paths prepare
the redzone fields etc around the objects in preparation of the
actual freeing of the object. This can lead to false positives.

Take the node lock unconditionally during free so that the validation
can examine objects without them being disturbed by freeing operations.

Signed-off-by: Christoph Lameter <cl@linux.com>

---
 mm/slub.c |   12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - Nov. 22, 2011, 5:21 p.m.
Le mardi 22 novembre 2011 à 10:53 -0600, Christoph Lameter a écrit :
> A bit heavy handed locking but this should do the trick.
> 
> Subject: slub: Lockout validation scans during freeing of object
> 
> Slab validation can run right now while the slab free paths prepare
> the redzone fields etc around the objects in preparation of the
> actual freeing of the object. This can lead to false positives.
> 
> Take the node lock unconditionally during free so that the validation
> can examine objects without them being disturbed by freeing operations.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> ---
>  mm/slub.c |   12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> Index: linux-2.6/mm/slub.c
> ===================================================================
> --- linux-2.6.orig/mm/slub.c	2011-11-22 10:42:19.000000000 -0600
> +++ linux-2.6/mm/slub.c	2011-11-22 10:44:34.000000000 -0600
> @@ -2391,8 +2391,15 @@ static void __slab_free(struct kmem_cach
> 
>  	stat(s, FREE_SLOWPATH);
> 
> -	if (kmem_cache_debug(s) && !free_debug_processing(s, page, x, addr))
> -		return;
> +	if (kmem_cache_debug(s)) {
> +
> +		/* Lock out any concurrent validate_slab calls */
> +		n = get_node(s, page_to_nid(page));
> +		spin_lock_irqsave(&n->list_lock, flags);
> +
> +		if (!free_debug_processing(s, page, x, addr))
> +			goto out;
> +	}
> 
>  	do {
>  		prior = page->freelist;
> @@ -2471,6 +2478,7 @@ static void __slab_free(struct kmem_cach
>  			stat(s, FREE_ADD_PARTIAL);
>  		}
>  	}
> +out:
>  	spin_unlock_irqrestore(&n->list_lock, flags);
>  	return;
> 


This seems better, but I still have some warnings :

[  162.117574] SLUB: selinux_inode_security 136 slabs counted but counter=137
[  179.879907] SLUB: task_xstate 1 slabs counted but counter=2
[  179.881745] SLUB: vm_area_struct 47 slabs counted but counter=48
[  180.381964] SLUB: kmalloc-64 46 slabs counted but counter=47
[  192.366437] SLUB: vm_area_struct 82 slabs counted but counter=83
[  195.016732] SLUB: names_cache 3 slabs counted but counter=4
[  196.073166] SLUB: dentry 623 slabs counted but counter=624
[  196.093857] SLUB: names_cache 3 slabs counted but counter=4
[  196.631420] SLUB: names_cache 5 slabs counted but counter=6
[  198.760180] SLUB: kmalloc-16 30 slabs counted but counter=31
[  198.773492] SLUB: names_cache 5 slabs counted but counter=6
[  198.783637] SLUB: selinux_inode_security 403 slabs counted but counter=404
[  201.717932] SLUB: filp 53 slabs counted but counter=54
[  202.984756] SLUB: filp 40 slabs counted but counter=41
[  203.819525] SLUB: task_xstate 3 slabs counted but counter=4
[  205.699916] SLUB: cfq_io_context 1 slabs counted but counter=2
[  206.526646] SLUB: skbuff_head_cache 4 slabs counted but counter=5
[  208.431951] SLUB: names_cache 3 slabs counted but counter=4
[  210.672056] SLUB: vm_area_struct 88 slabs counted but counter=89
[  213.160055] SLUB: vm_area_struct 94 slabs counted but counter=95
[  215.604856] SLUB: cfq_queue 1 slabs counted but counter=2
[  217.872494] SLUB: filp 56 slabs counted but counter=57
[  220.184599] SLUB: names_cache 3 slabs counted but counter=4
[  221.783732] SLUB: anon_vma_chain 53 slabs counted but counter=54
[  221.816662] SLUB: kmalloc-16 66 slabs counted but counter=67
[  221.828582] SLUB: names_cache 3 slabs counted but counter=4
[  221.848231] SLUB: vm_area_struct 99 slabs counted but counter=100
[  224.125411] SLUB: kmalloc-16 30 slabs counted but counter=31
[  224.126313] SLUB: kmalloc-16 67 slabs counted but counter=68
[  224.138768] SLUB: names_cache 3 slabs counted but counter=4
[  224.921409] SLUB: anon_vma 36 slabs counted but counter=37
[  224.927833] SLUB: buffer_head 294 slabs counted but counter=295
[  226.473891] SLUB: kmalloc-16 67 slabs counted but counter=68
[  228.801716] SLUB: names_cache 5 slabs counted but counter=6
[  229.610225] SLUB: filp 47 slabs counted but counter=48
[  232.050811] SLUB: filp 53 slabs counted but counter=54
[  235.835888] SLUB: names_cache 3 slabs counted but counter=4
[  236.625318] SLUB: filp 48 slabs counted but counter=49
[  236.634563] SLUB: kmalloc-16 30 slabs counted but counter=31
[  236.635667] SLUB: kmalloc-16 67 slabs counted but counter=68
[  237.500016] SLUB: radix_tree_node 100 slabs counted but counter=101
[  238.248677] SLUB: filp 48 slabs counted but counter=49
[  239.097674] SLUB: filp 49 slabs counted but counter=50
[  239.975020] SLUB: names_cache 3 slabs counted but counter=4
[  241.569766] SLUB: vm_area_struct 102 slabs counted but counter=103
[  242.388502] SLUB: names_cache 5 slabs counted but counter=6
[  243.152519] SLUB: anon_vma_chain 56 slabs counted but counter=57
[  245.661970] SLUB: filp 49 slabs counted but counter=50
[  247.298004] SLUB: filp 48 slabs counted but counter=50
[  248.851148] SLUB: journal_handle 3 slabs counted but counter=4
[  249.674320] SLUB: names_cache 3 slabs counted but counter=4
[  250.414476] SLUB: bio-0 24 slabs counted but counter=25
[  250.461655] SLUB: kmalloc-96 49 slabs counted but counter=50
[  250.477188] SLUB: sgpool-16 0 slabs counted but counter=1
[  251.298554] SLUB: kmalloc-32 9 slabs counted but counter=10
[  252.096119] SLUB: names_cache 3 slabs counted but counter=4
[  256.179892] SLUB: filp 58 slabs counted but counter=59
[  256.188385] SLUB: kmalloc-16 30 slabs counted but counter=31
[  257.040508] SLUB: kmalloc-16 30 slabs counted but counter=31
[  258.704236] SLUB: buffer_head 502 slabs counted but counter=503
[  258.745777] SLUB: kmalloc-32 9 slabs counted but counter=10
[  258.752285] SLUB: names_cache 3 slabs counted but counter=4
[  260.412312] SLUB: filp 54 slabs counted but counter=56
[  261.213526] SLUB: filp 44 slabs counted but counter=45
[  262.846810] SLUB: kmalloc-16 31 slabs counted but counter=32
[  262.859062] SLUB: names_cache 5 slabs counted but counter=6
[  262.881728] SLUB: task_xstate 3 slabs counted but counter=4
[  263.672055] SLUB: filp 54 slabs counted but counter=55
[  266.191191] SLUB: kmalloc-16 30 slabs counted but counter=31
[  266.203799] SLUB: names_cache 5 slabs counted but counter=6
[  268.486964] SLUB: filp 52 slabs counted but counter=53
[  268.509446] SLUB: names_cache 5 slabs counted but counter=6
[  269.365745] SLUB: vm_area_struct 88 slabs counted but counter=89



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Lameter - Nov. 22, 2011, 5:40 p.m.
On Tue, 22 Nov 2011, Eric Dumazet wrote:

> This seems better, but I still have some warnings :

Trying to reproduce with a kernel configured to do preempt. This is
actually quite interesting since its always off by 1.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Markus Trippelsdorf - Nov. 22, 2011, 6:55 p.m.
On 2011.11.22 at 11:40 -0600, Christoph Lameter wrote:
> On Tue, 22 Nov 2011, Eric Dumazet wrote:
> 
> > This seems better, but I still have some warnings :
> 
> Trying to reproduce with a kernel configured to do preempt. This is
> actually quite interesting since its always off by 1.

BTW there are some obvious overflows in the "slabinfo -l" output on my machine:

Name                   Objects Objsize    Space Slabs/Part/Cpu  O/S O %Fr %Ef Flg
:t-0000024                 680      24    16.3K          0/0/4  170 0   0  99 *
:t-0000032                 768      32    24.5K          0/0/6  128 0   0 100 *
:t-0000040                1632      40    65.5K         13/0/3  102 0   0  99 *
:t-0000048                4533      48   241.6K 18446744073709551583/9/92   85 0  15  90 *
:t-0000064                3392      64   217.0K        30/0/23   64 0   0 100 *
:t-0000072                4368      72   319.4K         73/0/5   56 0   0  98 *
:t-0000112                 720     112    81.9K         0/0/20   36 0   0  98 *
:t-0000128                1684     128   233.4K 18446744073709551607/7/66   32 0  12  92 *
:t-0000136                 120     136    16.3K          0/0/4   30 0   0  99 *
:t-0000144                8036     144     1.1M        283/0/4   28 0   0  98 *
:t-0000192                4452     192   872.4K       165/0/48   21 0   0  97 *
:t-0000216                  90     216    20.4K          1/0/4   18 0   0  94 *
:t-0000256                 539     256   147.4K 18446744073709551609/3/43   16 0   8  93 *
:t-0000320                2425     320   794.6K         93/0/4   25 1   0  97 *A
:t-0000400                  60     400    24.5K          0/0/3   20 1   0  97 *
:t-0000512                 640     512   327.6K        23/0/17   16 1   0 100 *
:t-0000704                 483     704   344.0K         0/0/21   23 2   0  98 *A
:t-0001024                 373    1024   475.1K       15/10/14   16 2  34  80 *
:t-0002048                 288    2048   589.8K         10/0/8   16 3   0 100 *
:t-0004096                 119    4096   524.2K         4/3/12    8 3  18  92 *
Acpi-State                 204      80    16.3K          0/0/4   51 0   0  99 
anon_vma                  3528      64   258.0K         9/0/54   56 0   0  87 
bdev_cache                  84     728    65.5K          0/0/4   21 2   0  93 Aa
blkdev_queue                38    1664    65.5K          0/0/2   19 3   0  96 
blkdev_requests            161     344    57.3K          0/0/7   23 1   0  96 
buffer_head               1014     104   106.4K         5/0/21   39 0   0  99 a
cfq_queue                  102     232    24.5K          1/0/5   17 0   0  96 
dentry                   15897     192     3.1M       723/0/34   21 0   0  98 a
idr_layer_cache            390     544   212.9K          8/0/5   30 2   0  99 
inode_cache               6448     512     3.4M        199/0/9   31 2   0  96 a
ip_fib_trie                219      56    12.2K          0/0/3   73 0   0  99 
kmalloc-16                3072      16    49.1K          7/0/5  256 0   0 100 
kmalloc-8                 4608       8    36.8K          4/0/5  512 0   0 100 
kmalloc-8192                20    8192   163.8K          1/0/4    4 3   0 100 
kmalloc-96                 840      96    81.9K         7/0/13   42 0   0  98 
kmem_cache                  21     192     4.0K          0/0/1   21 0   0  98 *A
kmem_cache_node            192      64    12.2K          0/0/3   64 0   0 100 *A
mm_struct                  171     832   147.4K          0/0/9   19 2   0  96 A
mqueue_inode_cache          19     800    16.3K          0/0/1   19 2   0  92 A
nf_conntrack_ffffffff8199e380       60     264    16.3K          0/0/2   30 1   0  96 
proc_inode_cache           420     576   245.7K         2/0/13   28 2   0  98 a
radix_tree_node           3472     560     2.0M        120/0/4   28 2   0  95 a
RAW                         42     712    32.7K          0/0/2   21 2   0  91 A
shmem_inode_cache         1036     576   606.2K        21/0/16   28 2   0  98 
sighand_cache              210    2088   458.7K         3/0/11   15 3   0  95 A
signal_cache               268     920   360.4K 18446744073709551614/7/24   17 2  31  68 A
sigqueue                   125     160    20.4K          0/0/5   25 0   0  97 
skbuff_fclone_cache         72     420    32.7K          0/0/4   18 1   0  92 A
sock_inode_cache           196     560   114.6K          0/0/7   28 2   0  95 Aa
task_struct                215    1504   557.0K          9/9/8   21 3  52  58 
TCP                         21    1504    32.7K          0/0/1   21 3   0  96 A
UDP                         63     736    49.1K          0/0/3   21 2   0  94 A
vm_area_struct            3278     168   602.1K       89/33/58   24 0  22  91 
xfs_btree_cur               76     208    16.3K          0/0/4   19 0   0  96 
xfs_buf_item               108     224    24.5K          0/0/6   18 0   0  98 
xfs_da_state                64     488    32.7K          0/0/4   16 1   0  95 
xfs_inode                 7055     896     6.7M        411/0/4   17 2   0  92 Aa
xfs_log_ticket              80     200    16.3K          0/0/4   20 0   0  97 
xfs_trans                  116     280    32.7K          0/0/4   29 1   0  99
Christoph Lameter - Nov. 22, 2011, 7:20 p.m.
On Tue, 22 Nov 2011, Markus Trippelsdorf wrote:

> On 2011.11.22 at 11:40 -0600, Christoph Lameter wrote:
> > On Tue, 22 Nov 2011, Eric Dumazet wrote:
> >
> > > This seems better, but I still have some warnings :
> >
> > Trying to reproduce with a kernel configured to do preempt. This is
> > actually quite interesting since its always off by 1.
>
> BTW there are some obvious overflows in the "slabinfo -l" output on my machine:

Could you get me the value of the "slabs" field for the slabs showing the
wierd values. I.e. do

cat /sys/kernel/slab/signal_cache/slabs

> signal_cache               268     920   360.4K 18446744073709551614/7/24   17 2  31  68 A
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Markus Trippelsdorf - Nov. 22, 2011, 7:32 p.m.
On 2011.11.22 at 13:20 -0600, Christoph Lameter wrote:
> On Tue, 22 Nov 2011, Markus Trippelsdorf wrote:
> 
> > On 2011.11.22 at 11:40 -0600, Christoph Lameter wrote:
> > > On Tue, 22 Nov 2011, Eric Dumazet wrote:
> > >
> > > > This seems better, but I still have some warnings :
> > >
> > > Trying to reproduce with a kernel configured to do preempt. This is
> > > actually quite interesting since its always off by 1.
> >
> > BTW there are some obvious overflows in the "slabinfo -l" output on my machine:
> 
> Could you get me the value of the "slabs" field for the slabs showing the
> wierd values. I.e. do
> 
> cat /sys/kernel/slab/signal_cache/slabs
> 
> > signal_cache               268     920   360.4K 18446744073709551614/7/24   17 2  31  68 A
> 

It's quite easy to explain. You're using unsigned ints in:
snprintf(dist_str, 40, "%lu/%lu/%d", s->slabs - s->cpu_slabs, s->partial, s->cpu_slabs);

and  (s->slabs - s->cpu_slabs) can get negative. For example:

task_struct                269    1504   557.0K 18446744073709551601/5/32   21 3  29  72

Here s-slabs is 17 and s->cpu_slabs is 32. 
That gives: 17-32=18446744073709551601.

Patch

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-11-22 10:42:19.000000000 -0600
+++ linux-2.6/mm/slub.c	2011-11-22 10:44:34.000000000 -0600
@@ -2391,8 +2391,15 @@  static void __slab_free(struct kmem_cach

 	stat(s, FREE_SLOWPATH);

-	if (kmem_cache_debug(s) && !free_debug_processing(s, page, x, addr))
-		return;
+	if (kmem_cache_debug(s)) {
+
+		/* Lock out any concurrent validate_slab calls */
+		n = get_node(s, page_to_nid(page));
+		spin_lock_irqsave(&n->list_lock, flags);
+
+		if (!free_debug_processing(s, page, x, addr))
+			goto out;
+	}

 	do {
 		prior = page->freelist;
@@ -2471,6 +2478,7 @@  static void __slab_free(struct kmem_cach
 			stat(s, FREE_ADD_PARTIAL);
 		}
 	}
+out:
 	spin_unlock_irqrestore(&n->list_lock, flags);
 	return;