diff mbox

ext4: memory leakage in ext4_discard_preallocations

Message ID ac8f92701003180539h7228040bm82a0c69d678ec93b@mail.gmail.com
State New, archived
Headers show

Commit Message

jing zhang March 18, 2010, 12:39 p.m. UTC
From: Jing Zhang <zj.barak@gmail.com>

Date: Thu Mar 18 20:33:44 2010

When unexpected errors occur, there is memory leakage, and more.

Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Andreas Dilger <adilger@sun.com>
Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Jing Zhang <zj.barak@gmail.com>

---

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Theodore Ts'o March 18, 2010, 5:46 p.m. UTC | #1
>  		ext4_get_group_no_and_offset(sb, pa->pa_pstart, &group, NULL);
> @@ -3811,6 +3813,12 @@ repeat:
>  		list_del(&pa->u.pa_tmp_list);
>  		call_rcu(&(pa)->u.pa_rcu, ext4_mb_pa_callback);
>  	}
> +	if (! list_empty(&list)) {
> +		if (occurs++ < 2)
> +			goto best_efforts;
> +		else
> +			BUG();
> +	}
>  	if (ac)
>  		kmem_cache_free(ext4_ac_cachep, ac);
>  }

Hmm, I'm not sure that BUG() is appropriate here.  If there is an
I/O error reading the block bitmap, #1, retrying isn't going to help,
and #2, bringing down the entire system just because of an I/O error
in reading the block bitmap doesn't seem right.

Right now, if there is a problem, we just end up leaving the
preallocated list on the inode.  Does that cause problems later on
down the line which you have observed?

					- Ted


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
jing zhang March 19, 2010, 2:17 p.m. UTC | #2
>>  		ext4_get_group_no_and_offset(sb, pa->pa_pstart, &group, NULL);
>> @@ -3811,6 +3813,12 @@ repeat:
>>  		list_del(&pa->u.pa_tmp_list);
>>  		call_rcu(&(pa)->u.pa_rcu, ext4_mb_pa_callback);
>>  	}
>> +	if (! list_empty(&list)) {
>> +		if (occurs++ < 2)
>> +			goto best_efforts;
>> +		else
>> +			BUG();
>> +	}
>>  	if (ac)
>>  		kmem_cache_free(ext4_ac_cachep, ac);
>>  }
>
> Hmm, I'm not sure that BUG() is appropriate here.  If there is an
> I/O error reading the block bitmap, #1, retrying isn't going to help,
> and #2, bringing down the entire system just because of an I/O error
> in reading the block bitmap doesn't seem right.

But disk hardware error is not rare,

> Right now, if there is a problem, we just end up leaving the
> preallocated list on the inode.  Does that cause problems later on
> down the line which you have observed?
>
> 					- Ted

and is there still chance to call the
       call_rcu(&(pa)->u.pa_rcu, ext4_mb_pa_callback);
function again later on? (I am not sure yet the chance does exist.)

If no chance, how about the kmem_cache subsystem then?
After reboot, the file system is still reliable, or just with a few lost blocks?

Thus it is necessary, at least for me, to make sure whether the chance exists.
                                      - zj
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Dilger March 19, 2010, 5:27 p.m. UTC | #3
On 2010-03-19, at 08:17, jing zhang wrote:
>>> 		ext4_get_group_no_and_offset(sb, pa->pa_pstart, &group, NULL);
>>> @@ -3811,6 +3813,12 @@ repeat:
>>> 		list_del(&pa->u.pa_tmp_list);
>>> 		call_rcu(&(pa)->u.pa_rcu, ext4_mb_pa_callback);
>>> 	}
>>> +	if (! list_empty(&list)) {
>>> +		if (occurs++ < 2)
>>> +			goto best_efforts;
>>> +		else
>>> +			BUG();
>>> +	}
>>> 	if (ac)
>>> 		kmem_cache_free(ext4_ac_cachep, ac);
>>> }
>>
>> Hmm, I'm not sure that BUG() is appropriate here.  If there is an
>> I/O error reading the block bitmap, #1, retrying isn't going to help,
>> and #2, bringing down the entire system just because of an I/O error
>> in reading the block bitmap doesn't seem right.
>
> But disk hardware error is not rare,

Exactly, which is the reason why it should not cause the system to  
hang.  The filesystem should handle such errors gracefully if this is  
possible, return an error to the application, and/or marking the  
filesystem in error so that it will be checked on next boot, or similar.

>> Right now, if there is a problem, we just end up leaving the
>> preallocated list on the inode.  Does that cause problems later on
>> down the line which you have observed?
>>
>> 					- Ted
>
> and is there still chance to call the
>       call_rcu(&(pa)->u.pa_rcu, ext4_mb_pa_callback);
> function again later on? (I am not sure yet the chance does exist.)
>
> If no chance, how about the kmem_cache subsystem then?
> After reboot, the file system is still reliable, or just with a few  
> lost blocks?
>
> Thus it is necessary, at least for me, to make sure whether the  
> chance exists.
>                                      - zj
> --
> To unsubscribe from this list: send the line "unsubscribe linux- 
> ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- linux-2.6.32/fs/ext4/mballoc.c	2009-12-03 11:51:22.000000000 +0800
+++ zj/mballoc.c	2010-03-18 20:41:32.000000000 +0800
@@ -3717,6 +3717,7 @@  void ext4_discard_preallocations(struct
 	struct list_head list;
 	struct ext4_buddy e4b;
 	int err;
+	int occurs = 0;

 	if (!S_ISREG(inode->i_mode)) {
 		/*BUG_ON(!list_empty(&ei->i_prealloc_list));*/
@@ -3781,6 +3782,7 @@  repeat:
 	}
 	spin_unlock(&ei->i_prealloc_lock);

+best_efforts:
 	list_for_each_entry_safe(pa, tmp, &list, u.pa_tmp_list) {
 		BUG_ON(pa->pa_type != MB_INODE_PA);
 		ext4_get_group_no_and_offset(sb, pa->pa_pstart, &group, NULL);
@@ -3811,6 +3813,12 @@  repeat:
 		list_del(&pa->u.pa_tmp_list);
 		call_rcu(&(pa)->u.pa_rcu, ext4_mb_pa_callback);
 	}
+	if (! list_empty(&list)) {
+		if (occurs++ < 2)
+			goto best_efforts;
+		else
+			BUG();
+	}
 	if (ac)
 		kmem_cache_free(ext4_ac_cachep, ac);
 }