Message ID | f53e246b-647c-64bb-16ec-135383c70ad7@redhat.com |
---|---|
State | Awaiting Upstream |
Headers | show |
Series | ext4: fix potential negative array index in do_split | expand |
On Jun 17, 2020, at 1:19 PM, Eric Sandeen <sandeen@redhat.com> wrote: > > If for any reason a directory passed to do_split() does not have enough > active entries to exceed half the size of the block, we can end up > iterating over all "count" entries without finding a split point. > > In this case, count == move, and split will be zero, and we will > attempt a negative index into map[]. > > Guard against this by detecting this case, and falling back to > split-to-half-of-count instead; in this case we will still have > plenty of space (> half blocksize) in each split block. > > Fixes: ef2b02d3e617 ("ext34: ensure do_split leaves enough free space in both blocks") > Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> > --- > > diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c > index a8aca4772aaa..8b60881f07ee 100644 > --- a/fs/ext4/namei.c > +++ b/fs/ext4/namei.c > @@ -1858,7 +1858,7 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir, > blocksize, hinfo, map); > map -= count; > dx_sort_map(map, count); > - /* Split the existing block in the middle, size-wise */ > + /* Ensure that neither split block is over half full */ > size = 0; > move = 0; > for (i = count-1; i >= 0; i--) { > @@ -1868,8 +1868,18 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir, > size += map[i].size; > move++; > } > - /* map index at which we will split */ > - split = count - move; > + /* > + * map index at which we will split > + * > + * If the sum of active entries didn't exceed half the block size, just > + * split it in half by count; each resulting block will have at least > + * half the space free. > + */ > + if (i > 0) > + split = count - move; > + else > + split = count/2; > + > hash2 = map[split].hash; > continued = hash2 == map[split - 1].hash; > dxtrace(printk(KERN_INFO "Split block %lu at %x, %i/%i\n", > > Cheers, Andreas
On Wed, Jun 17, 2020 at 02:19:04PM -0500, Eric Sandeen wrote: > If for any reason a directory passed to do_split() does not have enough > active entries to exceed half the size of the block, we can end up > iterating over all "count" entries without finding a split point. > > In this case, count == move, and split will be zero, and we will > attempt a negative index into map[]. > > Guard against this by detecting this case, and falling back to > split-to-half-of-count instead; in this case we will still have > plenty of space (> half blocksize) in each split block. > > Fixes: ef2b02d3e617 ("ext34: ensure do_split leaves enough free space in both blocks") > Signed-off-by: Eric Sandeen <sandeen@redhat.com> > --- > > diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c > index a8aca4772aaa..8b60881f07ee 100644 > --- a/fs/ext4/namei.c > +++ b/fs/ext4/namei.c > @@ -1858,7 +1858,7 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir, > blocksize, hinfo, map); > map -= count; > dx_sort_map(map, count); > - /* Split the existing block in the middle, size-wise */ > + /* Ensure that neither split block is over half full */ > size = 0; > move = 0; > for (i = count-1; i >= 0; i--) { > @@ -1868,8 +1868,18 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir, > size += map[i].size; > move++; > } > - /* map index at which we will split */ > - split = count - move; > + /* > + * map index at which we will split > + * > + * If the sum of active entries didn't exceed half the block size, just > + * split it in half by count; each resulting block will have at least > + * half the space free. > + */ > + if (i > 0) > + split = count - move; > + else > + split = count/2; Won't we have exactly the same problem as we did before your commit ef2b02d3e617cb0400eedf2668f86215e1b0e6af ? Since we do not know how much space we actually moved we might have not made enough space for the new entry ? Also since we have the move == count when the problem appears then it's clear that we never hit the condition 1865 → → /* is more than half of this entry in 2nd half of the block? */ 1866 → → if (size + map[i].size/2 > blocksize/2) 1867 → → → break; in the loop. This is surprising but it means the the entries must have gaps between them that are small enough that we can't fit the entry right in ? Should not we try to compact it before splitting, or is it the case that this should have been done somewhere else ? If we really want ot be fair and we want to split it right in the middle of the entries size-wise then we need to keep track of of sum of the entries and decide based on that, not blocksize/2. But maybe the problem could be solved by compacting the entries together because the condition seems to rely on that. -Lukas > + > hash2 = map[split].hash; > continued = hash2 == map[split - 1].hash; > dxtrace(printk(KERN_INFO "Split block %lu at %x, %i/%i\n", > >
On Fri, Jun 19, 2020 at 08:41:22AM +0200, Lukas Czerner wrote: > On Wed, Jun 17, 2020 at 02:19:04PM -0500, Eric Sandeen wrote: > > If for any reason a directory passed to do_split() does not have enough > > active entries to exceed half the size of the block, we can end up > > iterating over all "count" entries without finding a split point. > > > > In this case, count == move, and split will be zero, and we will > > attempt a negative index into map[]. > > > > Guard against this by detecting this case, and falling back to > > split-to-half-of-count instead; in this case we will still have > > plenty of space (> half blocksize) in each split block. > > > > Fixes: ef2b02d3e617 ("ext34: ensure do_split leaves enough free space in both blocks") > > Signed-off-by: Eric Sandeen <sandeen@redhat.com> > > --- > > > > diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c > > index a8aca4772aaa..8b60881f07ee 100644 > > --- a/fs/ext4/namei.c > > +++ b/fs/ext4/namei.c > > @@ -1858,7 +1858,7 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir, > > blocksize, hinfo, map); > > map -= count; > > dx_sort_map(map, count); > > - /* Split the existing block in the middle, size-wise */ > > + /* Ensure that neither split block is over half full */ > > size = 0; > > move = 0; > > for (i = count-1; i >= 0; i--) { > > @@ -1868,8 +1868,18 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir, > > size += map[i].size; > > move++; > > } > > - /* map index at which we will split */ > > - split = count - move; > > + /* > > + * map index at which we will split > > + * > > + * If the sum of active entries didn't exceed half the block size, just > > + * split it in half by count; each resulting block will have at least > > + * half the space free. > > + */ > > + if (i > 0) > > + split = count - move; > > + else > > + split = count/2; > > Won't we have exactly the same problem as we did before your commit > ef2b02d3e617cb0400eedf2668f86215e1b0e6af ? Since we do not know how much > space we actually moved we might have not made enough space for the new > entry ? > > Also since we have the move == count when the problem appears then it's > clear that we never hit the condition > > 1865 → → /* is more than half of this entry in 2nd half of the block? */ > 1866 → → if (size + map[i].size/2 > blocksize/2) > 1867 → → → break; > > in the loop. This is surprising but it means the the entries must have > gaps between them that are small enough that we can't fit the entry > right in ? Should not we try to compact it before splitting, or is it > the case that this should have been done somewhere else ? The other possibility is that map[i].size is not right and indeed there seems to be a bug in dx_make_map() map_tail->size = le16_to_cpu(de->rec_len); should be map_tail->size = ext4_rec_len_from_disk(de->rec_len, blocksize)); right ? Otherwise with large enough records the size will be smaller than it really is. A quick look at fs/ext4/namei.c reveals couple of places there rec_len is used without the conversion and we should check whether it needs fixing. -Lukas > > If we really want ot be fair and we want to split it right in the middle > of the entries size-wise then we need to keep track of of sum of the > entries and decide based on that, not blocksize/2. But maybe the problem > could be solved by compacting the entries together because the condition > seems to rely on that. > > -Lukas > > > + > > hash2 = map[split].hash; > > continued = hash2 == map[split - 1].hash; > > dxtrace(printk(KERN_INFO "Split block %lu at %x, %i/%i\n", > > > > >
On Fri, Jun 19, 2020 at 09:08:54AM +0200, Lukas Czerner wrote: > On Fri, Jun 19, 2020 at 08:41:22AM +0200, Lukas Czerner wrote: > > On Wed, Jun 17, 2020 at 02:19:04PM -0500, Eric Sandeen wrote: > > > If for any reason a directory passed to do_split() does not have enough > > > active entries to exceed half the size of the block, we can end up > > > iterating over all "count" entries without finding a split point. > > > > > > In this case, count == move, and split will be zero, and we will > > > attempt a negative index into map[]. > > > > > > Guard against this by detecting this case, and falling back to > > > split-to-half-of-count instead; in this case we will still have > > > plenty of space (> half blocksize) in each split block. > > > > > > Fixes: ef2b02d3e617 ("ext34: ensure do_split leaves enough free space in both blocks") > > > Signed-off-by: Eric Sandeen <sandeen@redhat.com> > > > --- > > > > > > diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c > > > index a8aca4772aaa..8b60881f07ee 100644 > > > --- a/fs/ext4/namei.c > > > +++ b/fs/ext4/namei.c > > > @@ -1858,7 +1858,7 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir, > > > blocksize, hinfo, map); > > > map -= count; > > > dx_sort_map(map, count); > > > - /* Split the existing block in the middle, size-wise */ > > > + /* Ensure that neither split block is over half full */ > > > size = 0; > > > move = 0; > > > for (i = count-1; i >= 0; i--) { > > > @@ -1868,8 +1868,18 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir, > > > size += map[i].size; > > > move++; > > > } > > > - /* map index at which we will split */ > > > - split = count - move; > > > + /* > > > + * map index at which we will split > > > + * > > > + * If the sum of active entries didn't exceed half the block size, just > > > + * split it in half by count; each resulting block will have at least > > > + * half the space free. > > > + */ > > > + if (i > 0) > > > + split = count - move; > > > + else > > > + split = count/2; > > > > Won't we have exactly the same problem as we did before your commit > > ef2b02d3e617cb0400eedf2668f86215e1b0e6af ? Since we do not know how much > > space we actually moved we might have not made enough space for the new > > entry ? > > > > Also since we have the move == count when the problem appears then it's > > clear that we never hit the condition > > > > 1865 → → /* is more than half of this entry in 2nd half of the block? */ > > 1866 → → if (size + map[i].size/2 > blocksize/2) > > 1867 → → → break; > > > > in the loop. This is surprising but it means the the entries must have > > gaps between them that are small enough that we can't fit the entry > > right in ? Should not we try to compact it before splitting, or is it > > the case that this should have been done somewhere else ? > > The other possibility is that map[i].size is not right and indeed there > seems to be a bug in dx_make_map() > > map_tail->size = le16_to_cpu(de->rec_len); > > should be > > map_tail->size = ext4_rec_len_from_disk(de->rec_len, blocksize)); > > right ? Otherwise with large enough records the size will be smaller > than it really is. > > A quick look at fs/ext4/namei.c reveals couple of places there rec_len > is used without the conversion and we should check whether it needs > fixing. > > -Lukas And indeed the following patch seems to have fixed the issue we were seeing. Eric I think that this might be a proper fix. But we still need to check the other uses of rec_len to make sure it's ok as well. diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index 94ec882..5509fdc 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -1068,7 +1068,7 @@ static int dx_make_map(struct ext4_dir_entry_2 *de, unsigned blocksize, map_tail--; map_tail->hash = h.hash; map_tail->offs = ((char *) de - base)>>2; - map_tail->size = le16_to_cpu(de->rec_len); + map_tail->size = ext4_rec_len_from_disk(le16_to_cpu(de->rec_len), blocksize); count++; cond_resched(); } > > > > > > > If we really want ot be fair and we want to split it right in the middle > > of the entries size-wise then we need to keep track of of sum of the > > entries and decide based on that, not blocksize/2. But maybe the problem > > could be solved by compacting the entries together because the condition > > seems to rely on that. > > > > -Lukas > > > > > + > > > hash2 = map[split].hash; > > > continued = hash2 == map[split - 1].hash; > > > dxtrace(printk(KERN_INFO "Split block %lu at %x, %i/%i\n", > > > > > > > > >
On 6/19/20 1:41 AM, Lukas Czerner wrote: > On Wed, Jun 17, 2020 at 02:19:04PM -0500, Eric Sandeen wrote: >> If for any reason a directory passed to do_split() does not have enough >> active entries to exceed half the size of the block, we can end up >> iterating over all "count" entries without finding a split point. >> >> In this case, count == move, and split will be zero, and we will >> attempt a negative index into map[]. >> >> Guard against this by detecting this case, and falling back to >> split-to-half-of-count instead; in this case we will still have >> plenty of space (> half blocksize) in each split block. ... >> + /* >> + * map index at which we will split >> + * >> + * If the sum of active entries didn't exceed half the block size, just >> + * split it in half by count; each resulting block will have at least >> + * half the space free. >> + */ >> + if (i > 0) >> + split = count - move; >> + else >> + split = count/2; > > Won't we have exactly the same problem as we did before your commit > ef2b02d3e617cb0400eedf2668f86215e1b0e6af ? Since we do not know how much > space we actually moved we might have not made enough space for the new > entry ? I don't think so - while we don't have the original reproducer, I assume that it was the case where the block was very full, and splitting by count left us with one of the split blocks still over half full (because ensuring that we split in half by size seemed to fix it) In this case, the sum of the active entries was <= half the block size. So if we split by count, we're guaranteed to have >= half the block size free in each side of the split. > Also since we have the move == count when the problem appears then it's > clear that we never hit the condition > > 1865 → → /* is more than half of this entry in 2nd half of the block? */ > 1866 → → if (size + map[i].size/2 > blocksize/2) > 1867 → → → break; > > in the loop. This is surprising but it means the the entries must have > gaps between them that are small enough that we can't fit the entry > right in ? Should not we try to compact it before splitting, or is it > the case that this should have been done somewhere else ? Yes, that's exactly what happened - see my 0/1 cover letter. Maybe that should be in the patch description itself. ALso, yes compaction would help but I was unclear as to whether that should be done here, is the side effect of some other bug, etc. In general, we do seem to do compaction elsewhere and I don't know how we got to this point. > If we really want ot be fair and we want to split it right in the middle > of the entries size-wise then we need to keep track of of sum of the > entries and decide based on that, not blocksize/2. But maybe the problem > could be solved by compacting the entries together because the condition > seems to rely on that. I thought about that as well, but it took a bit more code to do; we could make make_map() return both count and total size, for example. But based on my theory above that both sides of the split will have >= half block free, it didn't seem necessary, particularly since this seems like an edge case? -Eric > -Lukas > >> + >> hash2 = map[split].hash; >> continued = hash2 == map[split - 1].hash; >> dxtrace(printk(KERN_INFO "Split block %lu at %x, %i/%i\n", >> >> >
On 6/19/20 2:08 AM, Lukas Czerner wrote: > On Fri, Jun 19, 2020 at 08:41:22AM +0200, Lukas Czerner wrote: >> On Wed, Jun 17, 2020 at 02:19:04PM -0500, Eric Sandeen wrote: >>> If for any reason a directory passed to do_split() does not have enough >>> active entries to exceed half the size of the block, we can end up >>> iterating over all "count" entries without finding a split point. >>> >>> In this case, count == move, and split will be zero, and we will >>> attempt a negative index into map[]. >>> >>> Guard against this by detecting this case, and falling back to >>> split-to-half-of-count instead; in this case we will still have >>> plenty of space (> half blocksize) in each split block. >>> >>> Fixes: ef2b02d3e617 ("ext34: ensure do_split leaves enough free space in both blocks") >>> Signed-off-by: Eric Sandeen <sandeen@redhat.com> >>> --- >>> >>> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c >>> index a8aca4772aaa..8b60881f07ee 100644 >>> --- a/fs/ext4/namei.c >>> +++ b/fs/ext4/namei.c >>> @@ -1858,7 +1858,7 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir, >>> blocksize, hinfo, map); >>> map -= count; >>> dx_sort_map(map, count); >>> - /* Split the existing block in the middle, size-wise */ >>> + /* Ensure that neither split block is over half full */ >>> size = 0; >>> move = 0; >>> for (i = count-1; i >= 0; i--) { >>> @@ -1868,8 +1868,18 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir, >>> size += map[i].size; >>> move++; >>> } >>> - /* map index at which we will split */ >>> - split = count - move; >>> + /* >>> + * map index at which we will split >>> + * >>> + * If the sum of active entries didn't exceed half the block size, just >>> + * split it in half by count; each resulting block will have at least >>> + * half the space free. >>> + */ >>> + if (i > 0) >>> + split = count - move; >>> + else >>> + split = count/2; >> >> Won't we have exactly the same problem as we did before your commit >> ef2b02d3e617cb0400eedf2668f86215e1b0e6af ? Since we do not know how much >> space we actually moved we might have not made enough space for the new >> entry ? >> >> Also since we have the move == count when the problem appears then it's >> clear that we never hit the condition >> >> 1865 → → /* is more than half of this entry in 2nd half of the block? */ >> 1866 → → if (size + map[i].size/2 > blocksize/2) >> 1867 → → → break; >> >> in the loop. This is surprising but it means the the entries must have >> gaps between them that are small enough that we can't fit the entry >> right in ? Should not we try to compact it before splitting, or is it >> the case that this should have been done somewhere else ? > > The other possibility is that map[i].size is not right and indeed there > seems to be a bug in dx_make_map() > > map_tail->size = le16_to_cpu(de->rec_len); > > should be > > map_tail->size = ext4_rec_len_from_disk(de->rec_len, blocksize)); > > right ? Otherwise with large enough records the size will be smaller > than it really is. well, those are the same thing unless (PAGE_SIZE >= 65536) so I don't think that's the issue here. static inline unsigned int ext4_rec_len_from_disk(__le16 dlen, unsigned blocksize) { unsigned len = le16_to_cpu(dlen); #if (PAGE_SIZE >= 65536) ... #else return len; #endif } Should be fixed for consistency, but seems to not be a root cause here. > A quick look at fs/ext4/namei.c reveals couple of places there rec_len > is used without the conversion and we should check whether it needs > fixing. ...
On 6/19/20 6:16 AM, Lukas Czerner wrote: >> The other possibility is that map[i].size is not right and indeed there >> seems to be a bug in dx_make_map() >> >> map_tail->size = le16_to_cpu(de->rec_len); >> >> should be >> >> map_tail->size = ext4_rec_len_from_disk(de->rec_len, blocksize)); >> >> right ? Otherwise with large enough records the size will be smaller >> than it really is. >> >> A quick look at fs/ext4/namei.c reveals couple of places there rec_len >> is used without the conversion and we should check whether it needs >> fixing. >> >> -Lukas > > And indeed the following patch seems to have fixed the issue we were > seeing. Eric I think that this might be a proper fix. But we still need > to check the other uses of rec_len to make sure it's ok as well. > > diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c > index 94ec882..5509fdc 100644 > --- a/fs/ext4/namei.c > +++ b/fs/ext4/namei.c > @@ -1068,7 +1068,7 @@ static int dx_make_map(struct ext4_dir_entry_2 *de, unsigned blocksize, > map_tail--; > map_tail->hash = h.hash; > map_tail->offs = ((char *) de - base)>>2; > - map_tail->size = le16_to_cpu(de->rec_len); > + map_tail->size = ext4_rec_len_from_disk(le16_to_cpu(de->rec_len), blocksize); That isn't right, ext4_rec_len_from_disk /takes/ an __le16 :) - map_tail->size = le16_to_cpu(de->rec_len); + map_tail->size = ext4_rec_len_from_disk(de->rec_len), blocksize); would be more correct, but won't matter for PAGE_SIZE < 65536 right? -Eric
On Fri, Jun 19, 2020 at 08:42:23AM -0500, Eric Sandeen wrote: > On 6/19/20 2:08 AM, Lukas Czerner wrote: > > On Fri, Jun 19, 2020 at 08:41:22AM +0200, Lukas Czerner wrote: > >> On Wed, Jun 17, 2020 at 02:19:04PM -0500, Eric Sandeen wrote: > >>> If for any reason a directory passed to do_split() does not have enough > >>> active entries to exceed half the size of the block, we can end up > >>> iterating over all "count" entries without finding a split point. > >>> > >>> In this case, count == move, and split will be zero, and we will > >>> attempt a negative index into map[]. > >>> > >>> Guard against this by detecting this case, and falling back to > >>> split-to-half-of-count instead; in this case we will still have > >>> plenty of space (> half blocksize) in each split block. > >>> > >>> Fixes: ef2b02d3e617 ("ext34: ensure do_split leaves enough free space in both blocks") > >>> Signed-off-by: Eric Sandeen <sandeen@redhat.com> > >>> --- > >>> > >>> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c > >>> index a8aca4772aaa..8b60881f07ee 100644 > >>> --- a/fs/ext4/namei.c > >>> +++ b/fs/ext4/namei.c > >>> @@ -1858,7 +1858,7 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir, > >>> blocksize, hinfo, map); > >>> map -= count; > >>> dx_sort_map(map, count); > >>> - /* Split the existing block in the middle, size-wise */ > >>> + /* Ensure that neither split block is over half full */ > >>> size = 0; > >>> move = 0; > >>> for (i = count-1; i >= 0; i--) { > >>> @@ -1868,8 +1868,18 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir, > >>> size += map[i].size; > >>> move++; > >>> } > >>> - /* map index at which we will split */ > >>> - split = count - move; > >>> + /* > >>> + * map index at which we will split > >>> + * > >>> + * If the sum of active entries didn't exceed half the block size, just > >>> + * split it in half by count; each resulting block will have at least > >>> + * half the space free. > >>> + */ > >>> + if (i > 0) > >>> + split = count - move; > >>> + else > >>> + split = count/2; > >> > >> Won't we have exactly the same problem as we did before your commit > >> ef2b02d3e617cb0400eedf2668f86215e1b0e6af ? Since we do not know how much > >> space we actually moved we might have not made enough space for the new > >> entry ? > >> > >> Also since we have the move == count when the problem appears then it's > >> clear that we never hit the condition > >> > >> 1865 → → /* is more than half of this entry in 2nd half of the block? */ > >> 1866 → → if (size + map[i].size/2 > blocksize/2) > >> 1867 → → → break; > >> > >> in the loop. This is surprising but it means the the entries must have > >> gaps between them that are small enough that we can't fit the entry > >> right in ? Should not we try to compact it before splitting, or is it > >> the case that this should have been done somewhere else ? > > > > The other possibility is that map[i].size is not right and indeed there > > seems to be a bug in dx_make_map() > > > > map_tail->size = le16_to_cpu(de->rec_len); > > > > should be > > > > map_tail->size = ext4_rec_len_from_disk(de->rec_len, blocksize)); > > > > right ? Otherwise with large enough records the size will be smaller > > than it really is. > > well, those are the same thing unless (PAGE_SIZE >= 65536) so I don't > think that's the issue here. > > static inline unsigned int > ext4_rec_len_from_disk(__le16 dlen, unsigned blocksize) > { > unsigned len = le16_to_cpu(dlen); > > #if (PAGE_SIZE >= 65536) > ... > #else > return len; > #endif > } Ah you're right. The reproducer for this is kind of unreliable as well so that's why it looked to be fxied with this I guess. > > Should be fixed for consistency, but seems to not be a root cause here. Agreed. -Lukas > > > A quick look at fs/ext4/namei.c reveals couple of places there rec_len > > is used without the conversion and we should check whether it needs > > fixing. > > ... >
On Fri, Jun 19, 2020 at 08:44:19AM -0500, Eric Sandeen wrote: > On 6/19/20 6:16 AM, Lukas Czerner wrote: > > >> The other possibility is that map[i].size is not right and indeed there > >> seems to be a bug in dx_make_map() > >> > >> map_tail->size = le16_to_cpu(de->rec_len); > >> > >> should be > >> > >> map_tail->size = ext4_rec_len_from_disk(de->rec_len, blocksize)); > >> > >> right ? Otherwise with large enough records the size will be smaller > >> than it really is. > >> > >> A quick look at fs/ext4/namei.c reveals couple of places there rec_len > >> is used without the conversion and we should check whether it needs > >> fixing. > >> > >> -Lukas > > > > And indeed the following patch seems to have fixed the issue we were > > seeing. Eric I think that this might be a proper fix. But we still need > > to check the other uses of rec_len to make sure it's ok as well. > > > > diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c > > index 94ec882..5509fdc 100644 > > --- a/fs/ext4/namei.c > > +++ b/fs/ext4/namei.c > > @@ -1068,7 +1068,7 @@ static int dx_make_map(struct ext4_dir_entry_2 *de, unsigned blocksize, > > map_tail--; > > map_tail->hash = h.hash; > > map_tail->offs = ((char *) de - base)>>2; > > - map_tail->size = le16_to_cpu(de->rec_len); > > + map_tail->size = ext4_rec_len_from_disk(le16_to_cpu(de->rec_len), blocksize); > > That isn't right, ext4_rec_len_from_disk /takes/ an __le16 :) > > - map_tail->size = le16_to_cpu(de->rec_len); > + map_tail->size = ext4_rec_len_from_disk(de->rec_len), blocksize); Yep, my bad. > > would be more correct, but won't matter for PAGE_SIZE < 65536 right? True, it's not the problem we're seeing. -Lukas > > -Eric >
On Fri 19-06-20 08:39:53, Eric Sandeen wrote: > On 6/19/20 1:41 AM, Lukas Czerner wrote: > > On Wed, Jun 17, 2020 at 02:19:04PM -0500, Eric Sandeen wrote: > >> If for any reason a directory passed to do_split() does not have enough > >> active entries to exceed half the size of the block, we can end up > >> iterating over all "count" entries without finding a split point. > >> > >> In this case, count == move, and split will be zero, and we will > >> attempt a negative index into map[]. > >> > >> Guard against this by detecting this case, and falling back to > >> split-to-half-of-count instead; in this case we will still have > >> plenty of space (> half blocksize) in each split block. > > ... > > >> + /* > >> + * map index at which we will split > >> + * > >> + * If the sum of active entries didn't exceed half the block size, just > >> + * split it in half by count; each resulting block will have at least > >> + * half the space free. > >> + */ > >> + if (i > 0) > >> + split = count - move; > >> + else > >> + split = count/2; > > > > Won't we have exactly the same problem as we did before your commit > > ef2b02d3e617cb0400eedf2668f86215e1b0e6af ? Since we do not know how much > > space we actually moved we might have not made enough space for the new > > entry ? > > I don't think so - while we don't have the original reproducer, I assume that > it was the case where the block was very full, and splitting by count left us > with one of the split blocks still over half full (because ensuring that we > split in half by size seemed to fix it) > > In this case, the sum of the active entries was <= half the block size. > So if we split by count, we're guaranteed to have >= half the block size free > in each side of the split. > > > Also since we have the move == count when the problem appears then it's > > clear that we never hit the condition > > > > 1865 → → /* is more than half of this entry in 2nd half of the block? */ > > 1866 → → if (size + map[i].size/2 > blocksize/2) > > 1867 → → → break; > > > > in the loop. This is surprising but it means the the entries must have > > gaps between them that are small enough that we can't fit the entry > > right in ? Should not we try to compact it before splitting, or is it > > the case that this should have been done somewhere else ? > > Yes, that's exactly what happened - see my 0/1 cover letter. Maybe that should > be in the patch description itself. ALso, yes compaction would help but I was > unclear as to whether that should be done here, is the side effect of some other > bug, etc. In general, we do seem to do compaction elsewhere and I don't know > how we got to this point. > > > If we really want ot be fair and we want to split it right in the middle > > of the entries size-wise then we need to keep track of of sum of the > > entries and decide based on that, not blocksize/2. But maybe the problem > > could be solved by compacting the entries together because the condition > > seems to rely on that. > > I thought about that as well, but it took a bit more code to do; we could make > make_map() return both count and total size, for example. But based on my > theory above that both sides of the split will have >= half block free, it > didn't seem necessary, particularly since this seems like an edge case? This didn't seem to conclude in any way? The patch looks good to me FWIW so feel free to add: Reviewed-by: Jan Kara <jack@suse.cz> Ted, can you please pick this patch up? Thanks! Honza
On Wed, Jun 17, 2020 at 02:19:04PM -0500, Eric Sandeen wrote: > If for any reason a directory passed to do_split() does not have enough > active entries to exceed half the size of the block, we can end up > iterating over all "count" entries without finding a split point. > > In this case, count == move, and split will be zero, and we will > attempt a negative index into map[]. > > Guard against this by detecting this case, and falling back to > split-to-half-of-count instead; in this case we will still have > plenty of space (> half blocksize) in each split block. > > Fixes: ef2b02d3e617 ("ext34: ensure do_split leaves enough free space in both blocks") > Signed-off-by: Eric Sandeen <sandeen@redhat.com> Thanks, applied. - Ted
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index a8aca4772aaa..8b60881f07ee 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -1858,7 +1858,7 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir, blocksize, hinfo, map); map -= count; dx_sort_map(map, count); - /* Split the existing block in the middle, size-wise */ + /* Ensure that neither split block is over half full */ size = 0; move = 0; for (i = count-1; i >= 0; i--) { @@ -1868,8 +1868,18 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir, size += map[i].size; move++; } - /* map index at which we will split */ - split = count - move; + /* + * map index at which we will split + * + * If the sum of active entries didn't exceed half the block size, just + * split it in half by count; each resulting block will have at least + * half the space free. + */ + if (i > 0) + split = count - move; + else + split = count/2; + hash2 = map[split].hash; continued = hash2 == map[split - 1].hash; dxtrace(printk(KERN_INFO "Split block %lu at %x, %i/%i\n",
If for any reason a directory passed to do_split() does not have enough active entries to exceed half the size of the block, we can end up iterating over all "count" entries without finding a split point. In this case, count == move, and split will be zero, and we will attempt a negative index into map[]. Guard against this by detecting this case, and falling back to split-to-half-of-count instead; in this case we will still have plenty of space (> half blocksize) in each split block. Fixes: ef2b02d3e617 ("ext34: ensure do_split leaves enough free space in both blocks") Signed-off-by: Eric Sandeen <sandeen@redhat.com> ---