Message ID | 20081211121635.ff58193f.akpm@linux-foundation.org (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
On Thu, Dec 11, 2008 at 12:16:35PM -0800, Andrew Morton wrote: > > +#if (8 * THREAD_SIZE) > PAGE_SIZE > > max_threads = mempages / (8 * THREAD_SIZE / PAGE_SIZE); > > +#else > > + max_threads = mempages * (PAGE_SIZE / (8 * THREAD_SIZE)); > > +#endif > > The expression you've chosen here can be quite inacccurate, because > ((PAGE_SIZE / (8 * THREAD_SIZE)) is a small number. The way to > preserve accuracy is > > max_threads = (mempages * PAGE_SIZE) / (8 * THREAD_SIZE); > > so how about avoiding the nasty ifdefs and doing Are you sure? Do they actually cross the page boundaries?
On Thu, 11 Dec 2008 20:28:00 +0000 Al Viro <viro@ZenIV.linux.org.uk> wrote: > On Thu, Dec 11, 2008 at 12:16:35PM -0800, Andrew Morton wrote: > > > +#if (8 * THREAD_SIZE) > PAGE_SIZE > > > max_threads = mempages / (8 * THREAD_SIZE / PAGE_SIZE); > > > +#else > > > + max_threads = mempages * (PAGE_SIZE / (8 * THREAD_SIZE)); > > > +#endif > > > > The expression you've chosen here can be quite inacccurate, because > > ((PAGE_SIZE / (8 * THREAD_SIZE)) is a small number. The way to > > preserve accuracy is > > > > max_threads = (mempages * PAGE_SIZE) / (8 * THREAD_SIZE); > > > > so how about avoiding the nasty ifdefs and doing > > Are you sure? No, not at all. It's all too hard. Which is why I'm looking for simplification. > Do they actually cross the page boundaries? Some flavours of slab have at times done an order-1 allocation for objects which would fit into an order-0 page (etc) if it looks like that will be beneficial from a packing POV. I'm unsure whether that still happens - I tried to get it stamped out for reliability reasons.
Hello Andrew, On Thursday, December 11, 2008 you wrote: [snip] > The expression you've chosen here can be quite inacccurate, because > ((PAGE_SIZE / (8 * THREAD_SIZE)) is a small number. But why is it bad? We do multiplication to 'mempages', not division. All the numbers in the multiplier are the power of 2, so both expressions: mempages * (PAGE_SIZE / (8 * THREAD_SIZE)) and max_threads = (mempages * PAGE_SIZE) / (8 * THREAD_SIZE) are finally equal. > The way to preserve accuracy is > max_threads = (mempages * PAGE_SIZE) / (8 * THREAD_SIZE); > so how about avoiding the nasty ifdefs and doing I'm OK with the approach below, but, leading resulting to the same, this involves some overhead to the code where there was no this overhead before this patch: e.g. your implementation is finally boils down to ~5 times more processor instructions than there were before, plus operations with stack for the 'm' variable. On the other hand, my approach with nasty (I agree) ifdefs doesn't lead to overheads to the code which does not need this: i.e. the most common situation of small PAGE_SIZEs. Big PAGE_SIZE is the exception, so I believe that the more common cases should not suffer because of this. > --- a/kernel/fork.c~fork_init-fix-division-by-zero > +++ a/kernel/fork.c > @@ -69,6 +69,7 @@ > #include <asm/mmu_context.h> > #include <asm/cacheflush.h> > #include <asm/tlbflush.h> > +#include <asm/div64.h> > > /* > * Protected counters by write_lock_irq(&tasklist_lock) > @@ -185,10 +186,15 @@ void __init fork_init(unsigned long memp > > /* > * The default maximum number of threads is set to a safe > - * value: the thread structures can take up at most half > - * of memory. > + * value: the thread structures can take up at most > + * (1/8) part of memory. > */ > - max_threads = mempages / (8 * THREAD_SIZE / PAGE_SIZE); > + { > + /* max_threads = (mempages * PAGE_SIZE) / THREAD_SIZE / 8; */ > + u64 m = mempages * PAGE_SIZE; > + do_div(m, THREAD_SIZE * 8); > + max_threads = m; > + } > > /* > * we need to allow at least 20 threads to boot a system > _ > ? > The code is also inaccurate because it assumes that <whatever allocator is used for threads>> will pack the thread_structs into pages with best > possible density, which isn't necessarily the case. Let's not worry > about that. > OT: > max_threads is widly wrong anyway. > - the caller passes in num_physpages, which includes highmem. And we > can't allocate thread structs from highmem. > - num_physpages includes kernel pages and other stuff which can never > be allocated via the page allocator. > A suitable fix would be to switch the caller to the strangely-named > nr_free_buffer_pages(). > If you grep the tree for `num_physpages', you will find a splendid > number of similar bugs. num_physpages should be unexported, burnt, > deleted, etc. It's just an invitation to write buggy code. Regards, Yuri -- Yuri Tikhonov, Senior Software Engineer Emcraft Systems, www.emcraft.com
On Fri, 12 Dec 2008 01:22:32 +0300 Yuri Tikhonov <yur@emcraft.com> wrote: > > so how about avoiding the nasty ifdefs and doing > > I'm OK with the approach below, but, leading resulting to the same, > this involves some overhead to the code where there was no this > overhead before this patch: e.g. your implementation is finally boils > down to ~5 times more processor instructions than there were before, > plus operations with stack for the 'm' variable. > > On the other hand, my approach with nasty (I agree) ifdefs doesn't > lead to overheads to the code which does not need this: i.e. the most > common situation of small PAGE_SIZEs. Big PAGE_SIZE is the exception, > so I believe that the more common cases should not suffer because of > this. yes, but... > > --- a/kernel/fork.c~fork_init-fix-division-by-zero > > +++ a/kernel/fork.c > > @@ -69,6 +69,7 @@ > > #include <asm/mmu_context.h> > > #include <asm/cacheflush.h> > > #include <asm/tlbflush.h> > > +#include <asm/div64.h> > > > > /* > > * Protected counters by write_lock_irq(&tasklist_lock) > > @@ -185,10 +186,15 @@ void __init fork_init(unsigned long memp This is __init code and it gets thrown away after bootup.
Andrew Morton writes: > > +#if (8 * THREAD_SIZE) > PAGE_SIZE > > max_threads = mempages / (8 * THREAD_SIZE / PAGE_SIZE); > > +#else > > + max_threads = mempages * (PAGE_SIZE / (8 * THREAD_SIZE)); > > +#endif > > The expression you've chosen here can be quite inacccurate, because > ((PAGE_SIZE / (8 * THREAD_SIZE)) is a small number. The way to > preserve accuracy is The assumption is that THREAD_SIZE is a power of 2, as is PAGE_SIZE. I think Yuri should be increasing THREAD_SIZE for the larger page sizes he's implementing, because we have on-stack arrays whose size depends on the page size. I suspect that having THREAD_SIZE less than 1/8 of PAGE_SIZE risks stack overflows, and the better fix is for Yuri to make sure THREAD_SIZE is at least 1/8 of PAGE_SIZE. (In fact, more may be needed - someone should work out what fraction is actually needed.) Paul.
On Fri, 12 Dec 2008 11:48:29 +1100 Paul Mackerras <paulus@samba.org> wrote: > Andrew Morton writes: > > > > +#if (8 * THREAD_SIZE) > PAGE_SIZE > > > max_threads = mempages / (8 * THREAD_SIZE / PAGE_SIZE); > > > +#else > > > + max_threads = mempages * (PAGE_SIZE / (8 * THREAD_SIZE)); > > > +#endif > > > > The expression you've chosen here can be quite inacccurate, because > > ((PAGE_SIZE / (8 * THREAD_SIZE)) is a small number. The way to > > preserve accuracy is > > The assumption is that THREAD_SIZE is a power of 2, as is PAGE_SIZE. > > I think Yuri should be increasing THREAD_SIZE for the larger page > sizes he's implementing, because we have on-stack arrays whose size > depends on the page size. I suspect that having THREAD_SIZE less than > 1/8 of PAGE_SIZE risks stack overflows, and the better fix is for Yuri > to make sure THREAD_SIZE is at least 1/8 of PAGE_SIZE. (In fact, more > may be needed - someone should work out what fraction is actually > needed.) OK, yes. It's the MAX_BUF_PER_PAGE arrays which will hurt. iirc they nest three-deep on some codepaths.
On Friday 12 December 2008 07:43, Andrew Morton wrote: > On Thu, 11 Dec 2008 20:28:00 +0000 > > Do they actually cross the page boundaries? > > Some flavours of slab have at times done an order-1 allocation for > objects which would fit into an order-0 page (etc) if it looks like > that will be beneficial from a packing POV. I'm unsure whether that > still happens - I tried to get it stamped out for reliability reasons. Hmph, SLUB uses order-3 allocations for 832 byte sized objects by default here (mm struct).
On Fri, 12 Dec 2008 12:31:33 +1000 Nick Piggin <nickpiggin@yahoo.com.au> wrote: > On Friday 12 December 2008 07:43, Andrew Morton wrote: > > On Thu, 11 Dec 2008 20:28:00 +0000 > > > > Do they actually cross the page boundaries? > > > > Some flavours of slab have at times done an order-1 allocation for > > objects which would fit into an order-0 page (etc) if it looks like > > that will be beneficial from a packing POV. I'm unsure whether that > > still happens - I tried to get it stamped out for reliability reasons. > > Hmph, SLUB uses order-3 allocations for 832 byte sized objects > by default here (mm struct). That sucks, but at least it's <= PAGE_ALLOC_COSTLY_ORDER. It's fortunate that everyone has more than 128GB of memory.
On Friday 12 December 2008 13:47, Andrew Morton wrote: > On Fri, 12 Dec 2008 12:31:33 +1000 Nick Piggin <nickpiggin@yahoo.com.au> wrote: > > On Friday 12 December 2008 07:43, Andrew Morton wrote: > > > On Thu, 11 Dec 2008 20:28:00 +0000 > > > > > > > Do they actually cross the page boundaries? > > > > > > Some flavours of slab have at times done an order-1 allocation for > > > objects which would fit into an order-0 page (etc) if it looks like > > > that will be beneficial from a packing POV. I'm unsure whether that > > > still happens - I tried to get it stamped out for reliability reasons. > > > > Hmph, SLUB uses order-3 allocations for 832 byte sized objects > > by default here (mm struct). > > That sucks, but at least it's <= PAGE_ALLOC_COSTLY_ORDER. Which is somewhat arbitrary a value. order-1 is costly compared to order-0... After running my system here for a while and doing various things with it, I have the ability to allocate 898 order-0 pages (3592K), or 36 order-3 pages (1152K). Not as bad as I expected, but the system's only been up for an hour, and not exactly doing anything unusual (and it has nearly 30MB free, out of 4GB). > It's fortunate that everyone has more than 128GB of memory. And that SLAB still works quite well :)
--- a/kernel/fork.c~fork_init-fix-division-by-zero +++ a/kernel/fork.c @@ -69,6 +69,7 @@ #include <asm/mmu_context.h> #include <asm/cacheflush.h> #include <asm/tlbflush.h> +#include <asm/div64.h> /* * Protected counters by write_lock_irq(&tasklist_lock) @@ -185,10 +186,15 @@ void __init fork_init(unsigned long memp /* * The default maximum number of threads is set to a safe - * value: the thread structures can take up at most half - * of memory. + * value: the thread structures can take up at most + * (1/8) part of memory. */ - max_threads = mempages / (8 * THREAD_SIZE / PAGE_SIZE); + { + /* max_threads = (mempages * PAGE_SIZE) / THREAD_SIZE / 8; */ + u64 m = mempages * PAGE_SIZE; + do_div(m, THREAD_SIZE * 8); + max_threads = m; + } /* * we need to allow at least 20 threads to boot a system