mbox series

[0/1] ext4: fix potential negative array index in do_split

Message ID d08d63e9-8f74-b571-07c7-828b9629ce6a@redhat.com
Headers show
Series ext4: fix potential negative array index in do_split | expand

Message

Eric Sandeen June 17, 2020, 7:01 p.m. UTC
We recently had a report of a panic in do_split; the filesystem in question
panicked a distribution kernel when trying to add a new directory entry;
the behavior/bug persists upstream.

The directory block in question had lots of unused and un-coalesced
entries, like this, printed from the loop in ext4_insert_dentry():

[32778.024654] reclen 44 for name len 36
[32778.028745] start: de ffff9f4cb5309800 top ffff9f4cb5309bd4
[32778.034971]  offset 0 nlen 28 rlen 40, rlen-nlen 12, reclen 44 name <empty>
[32778.042744]  offset 40 nlen 28 rlen 28, rlen-nlen 0, reclen 44 name <empty>
[32778.050521]  offset 68 nlen 32 rlen 32, rlen-nlen 0, reclen 44 name <empty>
[32778.058294]  offset 100 nlen 28 rlen 28, rlen-nlen 0, reclen 44 name <empty>
[32778.066166]  offset 128 nlen 28 rlen 28, rlen-nlen 0, reclen 44 name <empty>
[32778.074035]  offset 156 nlen 28 rlen 28, rlen-nlen 0, reclen 44 name <empty>
[32778.081907]  offset 184 nlen 24 rlen 24, rlen-nlen 0, reclen 44 name <empty>
[32778.089779]  offset 208 nlen 36 rlen 36, rlen-nlen 0, reclen 44 name <empty>
[32778.097648]  offset 244 nlen 12 rlen 12, rlen-nlen 0, reclen 44 name REDACTED
[32778.105227]  offset 256 nlen 24 rlen 24, rlen-nlen 0, reclen 44 name <empty>
[32778.113099]  offset 280 nlen 24 rlen 24, rlen-nlen 0, reclen 44 name REDACTED
[32778.122134]  offset 304 nlen 20 rlen 20, rlen-nlen 0, reclen 44 name REDACTED
[32778.130780]  offset 324 nlen 16 rlen 16, rlen-nlen 0, reclen 44 name REDACTED
[32778.138746]  offset 340 nlen 24 rlen 24, rlen-nlen 0, reclen 44 name <empty>
[32778.146616]  offset 364 nlen 28 rlen 28, rlen-nlen 0, reclen 44 name <empty>
[32778.154487]  offset 392 nlen 24 rlen 24, rlen-nlen 0, reclen 44 name <empty>
[32778.162362]  offset 416 nlen 16 rlen 16, rlen-nlen 0, reclen 44 name <empty>
...

the file we were trying to insert needed a record length of 44, and none of the
non-coalesced <empty> slots were big enough, so we failed and told do_split
to get to work.

However, the sum of the non-empty entries didn't exceed half the block size, so
the loop in do_split() iterated over all of the entries, ended at "count," and
told us to split at (count - move) which is zero, and eventually:

        continued = hash2 == map[split - 1].hash;

exploded on the negative index.

It's an open question as to how this directory got into this format; I'm not
sure if this should ever happen or not.  But at a minimum, I think we should
be defensive here, hence [PATCH 1/1] will do that as an expedient fix and
backportable patch for this situation.  There may be some other underlying 
probem which led to this directory structure if it's unexpected, and maybe that
can come as another patch if anyone can investigate.

Thanks,
-Eric

Comments

Andreas Dilger June 19, 2020, 2:31 a.m. UTC | #1
On Jun 17, 2020, at 1:01 PM, Eric Sandeen <sandeen@redhat.com> wrote:
> 
> We recently had a report of a panic in do_split; the filesystem in question
> panicked a distribution kernel when trying to add a new directory entry;
> the behavior/bug persists upstream.
> 
> The directory block in question had lots of unused and un-coalesced
> entries, like this, printed from the loop in ext4_insert_dentry():
> 
> [32778.024654] reclen 44 for name len 36
> [32778.028745] start: de ffff9f4cb5309800 top ffff9f4cb5309bd4
> [32778.034971]  offset 0 nlen 28 rlen 40, rlen-nlen 12, reclen 44 name <empty>
> [32778.042744]  offset 40 nlen 28 rlen 28, rlen-nlen 0, reclen 44 name <empty>
> [32778.050521]  offset 68 nlen 32 rlen 32, rlen-nlen 0, reclen 44 name <empty>
> [32778.058294]  offset 100 nlen 28 rlen 28, rlen-nlen 0, reclen 44 name <empty>
> [32778.066166]  offset 128 nlen 28 rlen 28, rlen-nlen 0, reclen 44 name <empty>
> [32778.074035]  offset 156 nlen 28 rlen 28, rlen-nlen 0, reclen 44 name <empty>
> [32778.081907]  offset 184 nlen 24 rlen 24, rlen-nlen 0, reclen 44 name <empty>
> [32778.089779]  offset 208 nlen 36 rlen 36, rlen-nlen 0, reclen 44 name <empty>
> [32778.097648]  offset 244 nlen 12 rlen 12, rlen-nlen 0, reclen 44 name REDACTED
> [32778.105227]  offset 256 nlen 24 rlen 24, rlen-nlen 0, reclen 44 name <empty>
> [32778.113099]  offset 280 nlen 24 rlen 24, rlen-nlen 0, reclen 44 name REDACTED
> [32778.122134]  offset 304 nlen 20 rlen 20, rlen-nlen 0, reclen 44 name REDACTED
> [32778.130780]  offset 324 nlen 16 rlen 16, rlen-nlen 0, reclen 44 name REDACTED
> [32778.138746]  offset 340 nlen 24 rlen 24, rlen-nlen 0, reclen 44 name <empty>
> [32778.146616]  offset 364 nlen 28 rlen 28, rlen-nlen 0, reclen 44 name <empty>
> [32778.154487]  offset 392 nlen 24 rlen 24, rlen-nlen 0, reclen 44 name <empty>
> [32778.162362]  offset 416 nlen 16 rlen 16, rlen-nlen 0, reclen 44 name <empty>
> ...
> 
> the file we were trying to insert needed a record length of 44, and none of the
> non-coalesced <empty> slots were big enough, so we failed and told do_split
> to get to work.
> 
> However, the sum of the non-empty entries didn't exceed half the block size, so
> the loop in do_split() iterated over all of the entries, ended at "count," and
> told us to split at (count - move) which is zero, and eventually:
> 
>        continued = hash2 == map[split - 1].hash;
> 
> exploded on the negative index.
> 
> It's an open question as to how this directory got into this format; I'm not
> sure if this should ever happen or not.  But at a minimum, I think we should
> be defensive here, hence [PATCH 1/1] will do that as an expedient fix and
> backportable patch for this situation.  There may be some other underlying
> probem which led to this directory structure if it's unexpected, and maybe that
> can come as another patch if anyone can investigate.

I thought this might be a bit of a conundrum.  There is *supposed* to be
merging of adjacent entries, but in some quick testing on RHEL7 (kernel
3.10.0-957.12.1.el7, same with Debian 4.14.79) shows this to be broken
if the files are deleted in dirent order (which would seem to be the most
common order):

# mkdir tmp; cd tmp
# touch file{1..100}
# rm file{33,36,37,39,41,42,43,46,47}
# debugfs -c -R "ls -ld tmp" /dev/sda1
   366  100644 (1)      0      0       0 18-Jun-2020 18:43 file30
<   369>      0 (1)      0      0   <reclen=  16> <deleted> file33
<   372>      0 (1)      0      0   <reclen=  16> <deleted> file36
<   373>      0 (1)      0      0   <reclen=  16> <deleted> file37
<   375>      0 (1)      0      0   <reclen=  16> <deleted> file39
<   377>      0 (1)      0      0   <reclen=  16> <deleted> file41
<   378>      0 (1)      0      0   <reclen=  16> <deleted> file42
<   379>      0 (1)      0      0   <reclen=  16> <deleted> file43
<   382>      0 (1)      0      0   <reclen=  16> <deleted> file46
<   383>      0 (1)      0      0   <reclen=  16> <deleted> file47
    386  100644 (1)      0      0       0 18-Jun-2020 18:43 file50

Above shows (with modified debugfs to show reclen for deleted files)
that the dirents are *not* combined.  If the dirent *before* the
other entries is deleted, then they are merged:

# rm file30
<   366>      0 (1)      0      0   <reclen= 160> <deleted> file30
<   369>      0 (1)      0      0   <reclen=  16> <deleted> file33
<   372>      0 (1)      0      0   <reclen=  16> <deleted> file36
<   373>      0 (1)      0      0   <reclen=  16> <deleted> file37
<   375>      0 (1)      0      0   <reclen=  16> <deleted> file39
<   377>      0 (1)      0      0   <reclen=  16> <deleted> file41
<   378>      0 (1)      0      0   <reclen=  16> <deleted> file42
<   379>      0 (1)      0      0   <reclen=  16> <deleted> file43
<   382>      0 (1)      0      0   <reclen=  16> <deleted> file46
<   383>      0 (1)      0      0   <reclen=  16> <deleted> file47
    386  100644 (1)      0      0       0 18-Jun-2020 18:43 file50

Cheers, Andreas