diff mbox

ext4: fix null pointer deref on mount

Message ID 4961603B.5020505@ph.tum.de
State Not Applicable, archived
Headers show

Commit Message

Thiemo Nagel Jan. 5, 2009, 1:19 a.m. UTC
I came across a null pointer dereference when mounting an intentionally 
corrupted filesystem (cf. debug.dmesg).  In my opinion, the problem lies 
in ext4_fill_super(), where truncation may occur on setting the integer 
db_count, which results in too little memory being allocated for 
sbi->s_group_desc.  The attached patch (against 2.6.28) fixes this by 
changing the type of db_count to unsigned long.  I also took the 
opportunity to make the check against sign extension in calculation of 
db_count more strict, so that it now excludes cases in which db_count 
comes out as zero.

Comments are welcome!

Signed-off-by: Thiemo Nagel <thiemo.nagel@ph.tum.de>
[ 1381.571583] kobject: 'crc16' (ffffffffa001e7d0): kobject_add_internal: parent: 'module', set: 'module'
[ 1381.571854] kobject: 'holders' (ffff88007e46c940): kobject_add_internal: parent: 'crc16', set: '<NULL>'
[ 1381.571867] kobject: 'crc16' (ffffffffa001e7d0): kobject_uevent_env
[ 1381.571874] kobject: 'crc16' (ffffffffa001e7d0): fill_kobj_path: path = '/module/crc16'
[ 1381.571929] kobject: 'notes' (ffff88007a4a1fc0): kobject_add_internal: parent: 'crc16', set: '<NULL>'
[ 1381.656692] kobject: 'jbd2' (ffffffffa04f5550): kobject_add_internal: parent: 'module', set: 'module'
[ 1381.661578] kobject: 'holders' (ffff880077547e00): kobject_add_internal: parent: 'jbd2', set: '<NULL>'
[ 1381.661592] kobject: 'jbd2' (ffffffffa04f5550): kobject_uevent_env
[ 1381.661598] kobject: 'jbd2' (ffffffffa04f5550): fill_kobj_path: path = '/module/jbd2'
[ 1381.661680] kobject: 'notes' (ffff880077547200): kobject_add_internal: parent: 'jbd2', set: '<NULL>'
[ 1381.677700] kobject: 'ext4' (ffffffffa0531a50): kobject_add_internal: parent: 'module', set: 'module'
[ 1381.685726] kobject: 'holders' (ffff880077547640): kobject_add_internal: parent: 'ext4', set: '<NULL>'
[ 1381.685742] kobject: 'ext4' (ffffffffa0531a50): kobject_uevent_env
[ 1381.685748] kobject: 'ext4' (ffffffffa0531a50): fill_kobj_path: path = '/module/ext4'
[ 1381.685825] kobject: 'notes' (ffff880077547fc0): kobject_add_internal: parent: 'ext4', set: '<NULL>'
[ 1381.704772] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[ 1381.704778] IP: [<ffffffffa04f8689>] ext4_get_group_desc+0xc9/0x120 [ext4]
[ 1381.704798] PGD 7202d067 PUD 7e408067 PMD 0 
[ 1381.704803] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 1381.704810] last sysfs file: /sys/devices/pci0000:00/0000:00:02.0/usb2/idVendor
[ 1381.704813] CPU 1 
[ 1381.704815] Modules linked in: ext4 jbd2 crc16 binfmt_misc battery ppdev lp ipv6 powernow_k8 cpufreq_userspace cpufreq_powersave cpufreq_ondemand cpufreq_stats freq_table nfsd auth_rpcgss exportfs nfs lockd nfs_acl sunrpc loop snd_usb_audio snd_pcm_oss snd_mixer_oss snd_pcm snd_page_alloc snd_usb_lib snd_hwdep snd_seq_dummy arc4 ecb snd_seq_oss psmouse ath5k snd_seq_midi pcspkr serio_raw mac80211 snd_rawmidi led_class k8temp cfg80211 snd_seq_midi_event snd_seq snd_timer snd_seq_device parport_pc parport snd i2c_nforce2 button soundcore i2c_core evdev ext3 jbd mbcache sha256_generic aes_x86_64 aes_generic cbc dm_crypt dm_mirror dm_region_hash dm_log dm_snapshot dm_mod sd_mod ide_cd_mod cdrom ide_pci_generic sata_nv amd74xx ide_core floppy ata_generic ohci1394 ieee1394 forcedeth libata scsi_mod ohci_hcd ehci_hcd thermal processor fan thermal_sys
[ 1381.704903] Pid: 9868, comm: mount Not tainted 2.6.28-tn1-dbg #3
[ 1381.704906] RIP: 0010:[<ffffffffa04f8689>]  [<ffffffffa04f8689>] ext4_get_group_desc+0xc9/0x120 [ext4]
[ 1381.704919] RSP: 0000:ffff88007a41bc08  EFLAGS: 00010296
[ 1381.704922] RAX: 0000000000000010 RBX: 0000000000000000 RCX: 0000000000000000
[ 1381.704925] RDX: ffff880080bc4000 RSI: 0000000000000001 RDI: 0000000000000001
[ 1381.704927] RBP: ffff88007a41bc38 R08: 0000000000000002 R09: 0000000000000000
[ 1381.704930] R10: 0000000000000001 R11: 0000000000000001 R12: ffff880077555000
[ 1381.704932] R13: ffff880079c42000 R14: 0000000000000000 R15: 0000000000000000
[ 1381.704936] FS:  00007f11107fc7c0(0000) GS:ffff88007fb94b40(0000) knlGS:00000000f7e116b0
[ 1381.704939] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1381.704941] CR2: 0000000000000010 CR3: 0000000079476000 CR4: 00000000000006e0
[ 1381.704944] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1381.704947] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1381.704950] Process mount (pid: 9868, threadinfo ffff88007a41a000, task ffff8800720bb0c0)
[ 1381.704952] Stack:
[ 1381.704954]  0000000000000000 0000000000000001 0000000000002041 ffff880077555000
[ 1381.704960]  ffff880077555000 ffff880079c42000 ffff88007a41bd68 ffffffffa050b92a
[ 1381.704966]  ffff88007a41bd28 ffffffff803991b9 0000003000000020 000000007a41bd38
[ 1381.704973] Call Trace:
[ 1381.704975]  [<ffffffffa050b92a>] ext4_fill_super+0xc5a/0x2110 [ext4]
[ 1381.704989]  [<ffffffff803991b9>] ? snprintf+0x59/0x60
[ 1381.704996]  [<ffffffff80271f05>] ? mark_held_locks+0x65/0xc0
[ 1381.705001]  [<ffffffff80265614>] ? up+0x34/0x50
[ 1381.705005]  [<ffffffff80272161>] ? trace_hardirqs_on_caller+0x131/0x190
[ 1381.705010]  [<ffffffff802e2b72>] get_sb_bdev+0x162/0x190
[ 1381.705015]  [<ffffffffa050acd0>] ? ext4_fill_super+0x0/0x2110 [ext4]
[ 1381.705028]  [<ffffffff802b4522>] ? kstrdup+0x52/0x70
[ 1381.705034]  [<ffffffffa0507943>] ext4_get_sb+0x13/0x20 [ext4]
[ 1381.705047]  [<ffffffff802e21e9>] vfs_kern_mount+0x79/0x170
[ 1381.705053]  [<ffffffff802e234e>] do_kern_mount+0x4e/0x110
[ 1381.705058]  [<ffffffff802faaa6>] do_mount+0x676/0x8b0
[ 1381.705063]  [<ffffffff802fad98>] sys_mount+0xb8/0xf0
[ 1381.705067]  [<ffffffff80506fe6>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 1381.705072]  [<ffffffff8020c50a>] system_call_fastpath+0x16/0x1b
[ 1381.705077] Code: 52 a0 48 89 45 d0 31 c0 e8 d1 ba 00 e0 49 8b 54 24 68 31 c0 4c 89 f9 4c 89 e6 48 c7 c7 b0 24 52 a0 e8 b8 ba 00 e0 49 8b 44 24 68 <4a> 8b 14 f8 48 85 d2 74 24 49 8b 85 78 04 00 00 48 8b 00 48 0f 
[ 1381.705130] RIP  [<ffffffffa04f8689>] ext4_get_group_desc+0xc9/0x120 [ext4]
[ 1381.705130]  RSP <ffff88007a41bc08>
[ 1381.705130] CR2: 0000000000000010
[ 1381.705976] ---[ end trace 6048071769394822 ]---

Comments

Theodore Ts'o Jan. 5, 2009, 5:02 p.m. UTC | #1
On Mon, Jan 05, 2009 at 02:19:55AM +0100, Thiemo Nagel wrote:
> I came across a null pointer dereference when mounting an intentionally  
> corrupted filesystem (cf. debug.dmesg).  In my opinion, the problem lies  
> in ext4_fill_super(), where truncation may occur on setting the integer  
> db_count, which results in too little memory being allocated for  
> sbi->s_group_desc.  The attached patch (against 2.6.28) fixes this by  
> changing the type of db_count to unsigned long.  I also took the  
> opportunity to make the check against sign extension in calculation of  
> db_count more strict, so that it now excludes cases in which db_count  
> comes out as zero.

Usigned unsigned long is almost always wrong, because it's not a fixed
size; it's 32 bits on x86_32, but 64 bits on x86_64.  In this
particular case, db_count is always going to well under 32-bits for
any legitimate filesystem.  If it isn't we need to have better checks;
it sounds like the checks we need are ones that do a better job
checking s_blocks_per_group; am I right in assuming that
s_blocks_per_group was something ridiculous and that is what caused
the overflow?

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thiemo Nagel Jan. 5, 2009, 8:50 p.m. UTC | #2
Theodore Tso wrote:
> On Mon, Jan 05, 2009 at 02:19:55AM +0100, Thiemo Nagel wrote:
>> I came across a null pointer dereference when mounting an intentionally  
>> corrupted filesystem (cf. debug.dmesg).  In my opinion, the problem lies  
>> in ext4_fill_super(), where truncation may occur on setting the integer  
>> db_count, which results in too little memory being allocated for  
>> sbi->s_group_desc.  The attached patch (against 2.6.28) fixes this by  
>> changing the type of db_count to unsigned long.  I also took the  
>> opportunity to make the check against sign extension in calculation of  
>> db_count more strict, so that it now excludes cases in which db_count  
>> comes out as zero.
> 
> Usigned unsigned long is almost always wrong, because it's not a fixed
> size; it's 32 bits on x86_32, but 64 bits on x86_64.  In this
> particular case, db_count is always going to well under 32-bits for
> any legitimate filesystem. 

I have chosen unsigned long for the sole reason to avoid truncation in 
the assignment

db_count = (sbi->s_groups_count + EXT4_DESC_PER_BLOCK(sb) - 1) /
	   EXT4_DESC_PER_BLOCK(sb);

where the operands on the right side are of type unsigned long and 
ext4_group_t (which is typedef unsigned long), so I don't think to make 
db_count an unsigned long is hurting anything.

But maybe it's not desireable to allow filesystems which are mountable 
on x86_64 but not on x86_32?  Then a different solution would be to 
enforce s_groups_count < (1<<31).

But there is another caveat:  We also need to take care of the overflow 
in the argument to kmalloc(), and that further reduces the allowed range 
of s_groups_count for x86_32 (but not for x86_64):

sbi->s_group_desc = kmalloc(db_count * sizeof(struct buffer_head *),
			    GFP_KERNEL);

So, which approach do you think would be best?

> If it isn't we need to have better checks;
> it sounds like the checks we need are ones that do a better job
> checking s_blocks_per_group; am I right in assuming that
> s_blocks_per_group was something ridiculous and that is what caused
> the overflow?

No, it was a very large block count (but the small blocks per group 
helped, too):

block count 562949953423360, first data block 8257, blocks per group 512

BTW:  In case anybody likes to have a look at the corrupt filesystem:
It's available at 
http://www.e18.physik.tu-muenchen.de/~tnagel/misc/ext4.null_deref.image.bz2
The size of the download is 88k.

Kind regards,
Thiemo
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o Jan. 5, 2009, 9:39 p.m. UTC | #3
On Mon, Jan 05, 2009 at 09:50:13PM +0100, Thiemo Nagel wrote:
>
> I have chosen unsigned long for the sole reason to avoid truncation in  
> the assignment
>
> db_count = (sbi->s_groups_count + EXT4_DESC_PER_BLOCK(sb) - 1) /
> 	   EXT4_DESC_PER_BLOCK(sb);
>
> where the operands on the right side are of type unsigned long and  
> ext4_group_t (which is typedef unsigned long), so I don't think to make  
> db_count an unsigned long is hurting anything.

Err, no.  ext4_group_t is typedef'ed to be an unsigned int.  And there
are plenty of places in both the kernel and userspace code where the
number of groups is assumed to a quantity that can be held in a 2**32
bit field.  This isn't a problem, because normally the number of
blocks per group is fs->blocksize*8.  So for a 4k block filesystem,
the number of blocks per group is 32768, or 2**15.  So that means an
effective limit of 2**47 blocks before we overflow 2**32 block group
type width, and with 4k blocks, that means a max volume size of 512
petabytes.   

> But maybe it's not desireable to allow filesystems which are mountable  
> on x86_64 but not on x86_32?  Then a different solution would be to  
> enforce s_groups_count < (1<<31).

I'd say enforce s_groups_count < 2**32, because that's the limit we
have everywhere else.

> But there is another caveat:  We also need to take care of the overflow  
> in the argument to kmalloc(), and that further reduces the allowed range  
> of s_groups_count for x86_32 (but not for x86_64):
>
> sbi->s_group_desc = kmalloc(db_count * sizeof(struct buffer_head *),
> 			    GFP_KERNEL);
>
> So, which approach do you think would be best?

Well, obviously we need to check for this restriction, too.  At the
end of the day, though, we simply shouldn't allow s_blocks_count to be
bigger than either 2**32, or a limit which causes the above kmalloc
from overflowing on 32-bit systems.  Given that ext4_group_t is an
unsigned int, on 32-bit systems there will definitely be problems.

>> If it isn't we need to have better checks;
>> it sounds like the checks we need are ones that do a better job
>> checking s_blocks_per_group; am I right in assuming that
>> s_blocks_per_group was something ridiculous and that is what caused
>> the overflow?
>
> No, it was a very large block count (but the small blocks per group  
> helped, too):
>
> block count 562949953423360, first data block 8257, blocks per group 512
>

Well, as I pointed out, for 4k block filesystems, the number of blocks
per group is normally 32768.  There are times when we will use a
smaller number of blocks per group just to test how scalable various
filesystems will be at large sizes without having to create a huge
filesystem, 

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- linux-2.6.28-orig/fs/ext4/super.c	2008-12-25 00:26:37.000000000 +0100
+++ linux-2.6.28/fs/ext4/super.c	2009-01-05 01:38:23.000000000 +0100
@@ -1873,7 +1873,7 @@ 
 	char *cp;
 	int ret = -EINVAL;
 	int blocksize;
-	int db_count;
+	unsigned long db_count;
 	int i;
 	int needs_recovery, has_huge_files;
 	__le32 features;
@@ -2145,9 +2145,11 @@ 
 	if (EXT4_BLOCKS_PER_GROUP(sb) == 0)
 		goto cantfind_ext4;
 
-	/* ensure blocks_count calculation below doesn't sign-extend */
-	if (ext4_blocks_count(es) + EXT4_BLOCKS_PER_GROUP(sb) <
-	    le32_to_cpu(es->s_first_data_block) + 1) {
+	/*
+	 * ensure blocks_count calculation below doesn't sign-extend
+	 * and after do_div() still blocks_count > 0
+	 */
+	if (ext4_blocks_count(es) < le32_to_cpu(es->s_first_data_block) + 1) {
 		printk(KERN_WARNING "EXT4-fs: bad geometry: block count %llu, "
 		       "first data block %u, blocks per group %lu\n",
 			ext4_blocks_count(es),