diff mbox series

[RFC] ext4: don't remove already removed extent

Message ID 20230911094038.3602508-1-usama.anjum@collabora.com
State New
Headers show
Series [RFC] ext4: don't remove already removed extent | expand

Commit Message

Muhammad Usama Anjum Sept. 11, 2023, 9:40 a.m. UTC
Syzbot has hit the following bug on current and all older kernels:
BUG: KASAN: out-of-bounds in ext4_ext_rm_leaf fs/ext4/extents.c:2736 [inline]
BUG: KASAN: out-of-bounds in ext4_ext_remove_space+0x2482/0x4d90 fs/ext4/extents.c:2958
Read of size 18446744073709551508 at addr ffff888073aea078 by task syz-executor420/6443

On investigation, I've found that eh->eh_entries is zero, ex is
referring to last entry and EXT_LAST_EXTENT(eh) is referring to first.
Hence EXT_LAST_EXTENT(eh) - ex becomes negative and causes the wrong
buffer read.

element: FFFF8882F8F0D06C       <----- ex
element: FFFF8882F8F0D060
element: FFFF8882F8F0D054
element: FFFF8882F8F0D048
element: FFFF8882F8F0D03C
element: FFFF8882F8F0D030
element: FFFF8882F8F0D024
element: FFFF8882F8F0D018
element: FFFF8882F8F0D00C	<------  EXT_FIRST_EXTENT(eh)
header:  FFFF8882F8F0D000	<------  EXT_LAST_EXTENT(eh) and eh

Cc: stable@vger.kernel.org
Reported-by: syzbot+6e5f2db05775244c73b7@syzkaller.appspotmail.com
Closes: https://groups.google.com/g/syzkaller-bugs/c/G6zS-LKgDW0/m/63MgF6V7BAAJ
Fixes: d583fb87a3ff ("ext4: punch out extents")
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
---
This patch is only fixing the local issue. There may be bigger bug. Why
is ex set to last entry if the eh->eh_entries is 0. If any ext4
developer want to look at the bug, please don't hesitate.
---
 fs/ext4/extents.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Eric Whitney Sept. 20, 2023, 12:41 a.m. UTC | #1
* Muhammad Usama Anjum <usama.anjum@collabora.com>:
> Syzbot has hit the following bug on current and all older kernels:
> BUG: KASAN: out-of-bounds in ext4_ext_rm_leaf fs/ext4/extents.c:2736 [inline]
> BUG: KASAN: out-of-bounds in ext4_ext_remove_space+0x2482/0x4d90 fs/ext4/extents.c:2958
> Read of size 18446744073709551508 at addr ffff888073aea078 by task syz-executor420/6443
> 
> On investigation, I've found that eh->eh_entries is zero, ex is
> referring to last entry and EXT_LAST_EXTENT(eh) is referring to first.
> Hence EXT_LAST_EXTENT(eh) - ex becomes negative and causes the wrong
> buffer read.
> 
> element: FFFF8882F8F0D06C       <----- ex
> element: FFFF8882F8F0D060
> element: FFFF8882F8F0D054
> element: FFFF8882F8F0D048
> element: FFFF8882F8F0D03C
> element: FFFF8882F8F0D030
> element: FFFF8882F8F0D024
> element: FFFF8882F8F0D018
> element: FFFF8882F8F0D00C	<------  EXT_FIRST_EXTENT(eh)
> header:  FFFF8882F8F0D000	<------  EXT_LAST_EXTENT(eh) and eh
> 
> Cc: stable@vger.kernel.org
> Reported-by: syzbot+6e5f2db05775244c73b7@syzkaller.appspotmail.com
> Closes: https://groups.google.com/g/syzkaller-bugs/c/G6zS-LKgDW0/m/63MgF6V7BAAJ
> Fixes: d583fb87a3ff ("ext4: punch out extents")
> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
> ---
> This patch is only fixing the local issue. There may be bigger bug. Why
> is ex set to last entry if the eh->eh_entries is 0. If any ext4
> developer want to look at the bug, please don't hesitate.
> ---
>  fs/ext4/extents.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index e4115d338f101..7b7779b4cb87f 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -2726,7 +2726,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
>  		 * If the extent was completely released,
>  		 * we need to remove it from the leaf
>  		 */
> -		if (num == 0) {
> +		if (num == 0 && eh->eh_entries) {
>  			if (end != EXT_MAX_BLOCKS - 1) {
>  				/*
>  				 * For hole punching, we need to scoot all the
> -- 
> 2.40.1
> 

Hi:

First, thanks for taking the time to look at this.

I'm suspicious that syzbot may be fuzzing an extent header or other extent
tree components.  As you noticed, eh_entries and ex appear to be inconsistent.
Also, note the long series of corrupted file system reports in the console log
occurring before the KASAN bug - ext4 had been detecting and rejecting bad
data up to that point.  The file system on the disk image provided by sysbot
indicates that metadata checksumming was enabled (and it fscks cleanly).
That should have caught a corrupted extent header or inode, but perhaps
there's a problem.

The console log indicates that the problem occurred on inode #16.  Does the
information you've provided above come from testing you did on inode #16
(looks like the name was /bin/base64)?

By any chance, have you found a simpler reproducer than what syzbot provides?

Thanks,
Eric
Muhammad Usama Anjum Oct. 6, 2023, 10:47 a.m. UTC | #2
On 9/20/23 5:41 AM, Eric Whitney wrote:
> * Muhammad Usama Anjum <usama.anjum@collabora.com>:
>> Syzbot has hit the following bug on current and all older kernels:
>> BUG: KASAN: out-of-bounds in ext4_ext_rm_leaf fs/ext4/extents.c:2736 [inline]
>> BUG: KASAN: out-of-bounds in ext4_ext_remove_space+0x2482/0x4d90 fs/ext4/extents.c:2958
>> Read of size 18446744073709551508 at addr ffff888073aea078 by task syz-executor420/6443
>>
>> On investigation, I've found that eh->eh_entries is zero, ex is
>> referring to last entry and EXT_LAST_EXTENT(eh) is referring to first.
>> Hence EXT_LAST_EXTENT(eh) - ex becomes negative and causes the wrong
>> buffer read.
>>
>> element: FFFF8882F8F0D06C       <----- ex
>> element: FFFF8882F8F0D060
>> element: FFFF8882F8F0D054
>> element: FFFF8882F8F0D048
>> element: FFFF8882F8F0D03C
>> element: FFFF8882F8F0D030
>> element: FFFF8882F8F0D024
>> element: FFFF8882F8F0D018
>> element: FFFF8882F8F0D00C	<------  EXT_FIRST_EXTENT(eh)
>> header:  FFFF8882F8F0D000	<------  EXT_LAST_EXTENT(eh) and eh
>>
>> Cc: stable@vger.kernel.org
>> Reported-by: syzbot+6e5f2db05775244c73b7@syzkaller.appspotmail.com
>> Closes: https://groups.google.com/g/syzkaller-bugs/c/G6zS-LKgDW0/m/63MgF6V7BAAJ
>> Fixes: d583fb87a3ff ("ext4: punch out extents")
>> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
>> ---
>> This patch is only fixing the local issue. There may be bigger bug. Why
>> is ex set to last entry if the eh->eh_entries is 0. If any ext4
>> developer want to look at the bug, please don't hesitate.
>> ---
>>  fs/ext4/extents.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>> index e4115d338f101..7b7779b4cb87f 100644
>> --- a/fs/ext4/extents.c
>> +++ b/fs/ext4/extents.c
>> @@ -2726,7 +2726,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
>>  		 * If the extent was completely released,
>>  		 * we need to remove it from the leaf
>>  		 */
>> -		if (num == 0) {
>> +		if (num == 0 && eh->eh_entries) {
>>  			if (end != EXT_MAX_BLOCKS - 1) {
>>  				/*
>>  				 * For hole punching, we need to scoot all the
>> -- 
>> 2.40.1
>>
> 
> Hi:
> 
> First, thanks for taking the time to look at this.
Thank you for replying and giving me pointers that I need to start looking
at problem from first warning until the bug which can be difficult until I
debug the problem smartly and learn at least the basics of ext4.

> 
> I'm suspicious that syzbot may be fuzzing an extent header or other extent
> tree components.  As you noticed, eh_entries and ex appear to be inconsistent.
> Also, note the long series of corrupted file system reports in the console log
> occurring before the KASAN bug - ext4 had been detecting and rejecting bad
> data up to that point.  The file system on the disk image provided by sysbot
> indicates that metadata checksumming was enabled (and it fscks cleanly).
> That should have caught a corrupted extent header or inode, but perhaps
> there's a problem.
> 
> The console log indicates that the problem occurred on inode #16.  Does the
> information you've provided above come from testing you did on inode #16
> (looks like the name was /bin/base64)?
I couldn't analyze the problem in broad spectrum. There must be some bigger
thing wrong here.

> 
> By any chance, have you found a simpler reproducer than what syzbot provides?
Not yet, this gets reproduced after a while. I'll try to come up with
better reproducer if I can.

> 
> Thanks,
> Eric
> 
>
Eric Whitney Oct. 8, 2023, 9:10 p.m. UTC | #3
* Muhammad Usama Anjum <usama.anjum@collabora.com>:
> On 9/20/23 5:41 AM, Eric Whitney wrote:
> > * Muhammad Usama Anjum <usama.anjum@collabora.com>:
> >> Syzbot has hit the following bug on current and all older kernels:
> >> BUG: KASAN: out-of-bounds in ext4_ext_rm_leaf fs/ext4/extents.c:2736 [inline]
> >> BUG: KASAN: out-of-bounds in ext4_ext_remove_space+0x2482/0x4d90 fs/ext4/extents.c:2958
> >> Read of size 18446744073709551508 at addr ffff888073aea078 by task syz-executor420/6443
> >>
> >> On investigation, I've found that eh->eh_entries is zero, ex is
> >> referring to last entry and EXT_LAST_EXTENT(eh) is referring to first.
> >> Hence EXT_LAST_EXTENT(eh) - ex becomes negative and causes the wrong
> >> buffer read.
> >>
> >> element: FFFF8882F8F0D06C       <----- ex
> >> element: FFFF8882F8F0D060
> >> element: FFFF8882F8F0D054
> >> element: FFFF8882F8F0D048
> >> element: FFFF8882F8F0D03C
> >> element: FFFF8882F8F0D030
> >> element: FFFF8882F8F0D024
> >> element: FFFF8882F8F0D018
> >> element: FFFF8882F8F0D00C	<------  EXT_FIRST_EXTENT(eh)
> >> header:  FFFF8882F8F0D000	<------  EXT_LAST_EXTENT(eh) and eh
> >>
> >> Cc: stable@vger.kernel.org
> >> Reported-by: syzbot+6e5f2db05775244c73b7@syzkaller.appspotmail.com
> >> Closes: https://groups.google.com/g/syzkaller-bugs/c/G6zS-LKgDW0/m/63MgF6V7BAAJ
> >> Fixes: d583fb87a3ff ("ext4: punch out extents")
> >> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
> >> ---
> >> This patch is only fixing the local issue. There may be bigger bug. Why
> >> is ex set to last entry if the eh->eh_entries is 0. If any ext4
> >> developer want to look at the bug, please don't hesitate.
> >> ---
> >>  fs/ext4/extents.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> >> index e4115d338f101..7b7779b4cb87f 100644
> >> --- a/fs/ext4/extents.c
> >> +++ b/fs/ext4/extents.c
> >> @@ -2726,7 +2726,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
> >>  		 * If the extent was completely released,
> >>  		 * we need to remove it from the leaf
> >>  		 */
> >> -		if (num == 0) {
> >> +		if (num == 0 && eh->eh_entries) {
> >>  			if (end != EXT_MAX_BLOCKS - 1) {
> >>  				/*
> >>  				 * For hole punching, we need to scoot all the
> >> -- 
> >> 2.40.1
> >>
> > 
> > Hi:
> > 
> > First, thanks for taking the time to look at this.
> Thank you for replying and giving me pointers that I need to start looking
> at problem from first warning until the bug which can be difficult until I
> debug the problem smartly and learn at least the basics of ext4.
> 
> > 
> > I'm suspicious that syzbot may be fuzzing an extent header or other extent
> > tree components.  As you noticed, eh_entries and ex appear to be inconsistent.
> > Also, note the long series of corrupted file system reports in the console log
> > occurring before the KASAN bug - ext4 had been detecting and rejecting bad
> > data up to that point.  The file system on the disk image provided by sysbot
> > indicates that metadata checksumming was enabled (and it fscks cleanly).
> > That should have caught a corrupted extent header or inode, but perhaps
> > there's a problem.
> > 
> > The console log indicates that the problem occurred on inode #16.  Does the
> > information you've provided above come from testing you did on inode #16
> > (looks like the name was /bin/base64)?
> I couldn't analyze the problem in broad spectrum. There must be some bigger
> thing wrong here.
> 
> > 
> > By any chance, have you found a simpler reproducer than what syzbot provides?
> Not yet, this gets reproduced after a while. I'll try to come up with
> better reproducer if I can.
> 

My suggestion would be to first determine whether syzbot has disabled
metadata checksumming by the point in time when the problem occurs (or
whether temporarily modifying ext4 to make it impossible to disable
metadata checksumming also makes it impossible to reproduce the failure).
It may have done this as part of its test.  If so, this becomes a very low
priority bug for ext4, and you could avoid the effort to find a reproducer.

Eric
diff mbox series

Patch

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index e4115d338f101..7b7779b4cb87f 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2726,7 +2726,7 @@  ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
 		 * If the extent was completely released,
 		 * we need to remove it from the leaf
 		 */
-		if (num == 0) {
+		if (num == 0 && eh->eh_entries) {
 			if (end != EXT_MAX_BLOCKS - 1) {
 				/*
 				 * For hole punching, we need to scoot all the