Message ID | 1361865914-13911-8-git-send-email-aneesh.kumar@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
On Tue, Feb 26, 2013 at 01:34:57PM +0530, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> > > We will use this later with THP changes to request for pmd table of double the size. > THP code does PTE page allocation along with large page request and deposit them > for later use. This is to ensure that we won't have any failures when we split > huge pages to regular pages. > > On powerpc we want to use the deposited PTE page for storing hash pte slot and > secondary bit information for the HPTEs. Hence we save them in the second half > of the pmd table. Looks OK, but you should explain why you made the wholesale change of "shift" to "index". Is there some important semantic difference, or do you just prefer the "index" name for some reason? Paul.
Paul Mackerras <paulus@samba.org> writes: > On Tue, Feb 26, 2013 at 01:34:57PM +0530, Aneesh Kumar K.V wrote: >> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> >> >> We will use this later with THP changes to request for pmd table of double the size. >> THP code does PTE page allocation along with large page request and deposit them >> for later use. This is to ensure that we won't have any failures when we split >> huge pages to regular pages. >> >> On powerpc we want to use the deposited PTE page for storing hash pte slot and >> secondary bit information for the HPTEs. Hence we save them in the second half >> of the pmd table. > > Looks OK, but you should explain why you made the wholesale change of > "shift" to "index". Is there some important semantic difference, or > do you just prefer the "index" name for some reason? > Now with table_size argument, the first arg is no more the shift value, rather it is index into the array. Hence i changed the variable name. I will split that patch to make it easy for review. -aneesh
On Mon, Mar 04, 2013 at 04:32:24PM +0530, Aneesh Kumar K.V wrote: > > Now with table_size argument, the first arg is no more the shift value, > rather it is index into the array. Hence i changed the variable name. I > will split that patch to make it easy for review. OK, so you're saying that the simple relation between index and the size of the objects in PGT_CACHE(index) no longer holds. That worries me, because now, what guarantees that two callers won't use the same index value with two different sizes? And what guarantees that we won't have two callers using different index values but the same size (which wouldn't be a disaster but would be a waste of space)? I think it would be preferable to keep the relation between shift and the size of the objects and just arrange to use a different shift value for the pmd objects when you need to. Paul.
Paul Mackerras <paulus@samba.org> writes: > On Mon, Mar 04, 2013 at 04:32:24PM +0530, Aneesh Kumar K.V wrote: >> >> Now with table_size argument, the first arg is no more the shift value, >> rather it is index into the array. Hence i changed the variable name. I >> will split that patch to make it easy for review. > > OK, so you're saying that the simple relation between index and the > size of the objects in PGT_CACHE(index) no longer holds. That worries > me, because now, what guarantees that two callers won't use the same > index value with two different sizes? And what guarantees that we > won't have two callers using different index values but the same size > (which wouldn't be a disaster but would be a waste of space)? > > I think it would be preferable to keep the relation between shift and > the size of the objects and just arrange to use a different shift > value for the pmd objects when you need to. Most of the places we get the cache pointer by doing something like. PGT_CACHE(PMD_INDEX_SIZE). What we need is that kmem_cache to return an object twice the size of PMD_TABLE_SIZE. The relevant diff in the later patch is below. +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + /* + * we store the pgtable details in the second half of PMD + */ + if (PGT_CACHE(PMD_INDEX_SIZE)) + pr_err("PMD Page cache already initialized with different size\n"); + __pgtable_cache_add(PMD_INDEX_SIZE, PMD_TABLE_SIZE * 2, pmd_ctor); +#else pgtable_cache_add(PMD_INDEX_SIZE, pmd_ctor); +#endif
diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h index 0182c20..658ba7c 100644 --- a/arch/powerpc/include/asm/pgtable-ppc64.h +++ b/arch/powerpc/include/asm/pgtable-ppc64.h @@ -338,8 +338,13 @@ static inline void __ptep_set_access_flags(pte_t *ptep, pte_t entry) #define pgoff_to_pte(off) ((pte_t) {((off) << PTE_RPN_SHIFT)|_PAGE_FILE}) #define PTE_FILE_MAX_BITS (BITS_PER_LONG - PTE_RPN_SHIFT) -void pgtable_cache_add(unsigned shift, void (*ctor)(void *)); +extern void __pgtable_cache_add(unsigned index, unsigned long table_size, + void (*ctor)(void *)); void pgtable_cache_init(void); +static inline void pgtable_cache_add(unsigned shift, void (*ctor)(void *)) +{ + return __pgtable_cache_add(shift, sizeof(void *) << shift, ctor); +} /* * find_linux_pte returns the address of a linux pte for a given diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c index 95a4529..b378438 100644 --- a/arch/powerpc/mm/init_64.c +++ b/arch/powerpc/mm/init_64.c @@ -100,10 +100,10 @@ struct kmem_cache *pgtable_cache[MAX_PGTABLE_INDEX_SIZE]; * everything else. Caches created by this function are used for all * the higher level pagetables, and for hugepage pagetables. */ -void pgtable_cache_add(unsigned shift, void (*ctor)(void *)) +void __pgtable_cache_add(unsigned int index, unsigned long table_size, + void (*ctor)(void *)) { char *name; - unsigned long table_size = sizeof(void *) << shift; unsigned long align = table_size; /* When batching pgtable pointers for RCU freeing, we store @@ -111,7 +111,7 @@ void pgtable_cache_add(unsigned shift, void (*ctor)(void *)) * big enough to fit it. * * Likewise, hugeapge pagetable pointers contain a (different) - * shift value in the low bits. All tables must be aligned so + * huge page size in the low bits. All tables must be aligned so * as to leave enough 0 bits in the address to contain it. */ unsigned long minalign = max(MAX_PGTABLE_INDEX_SIZE + 1, HUGEPD_SHIFT_MASK + 1); @@ -121,17 +121,17 @@ void pgtable_cache_add(unsigned shift, void (*ctor)(void *)) * moment, gcc doesn't seem to recognize is_power_of_2 as a * constant expression, so so much for that. */ BUG_ON(!is_power_of_2(minalign)); - BUG_ON((shift < 1) || (shift > MAX_PGTABLE_INDEX_SIZE)); + BUG_ON((index < 1) || (index > MAX_PGTABLE_INDEX_SIZE)); - if (PGT_CACHE(shift)) + if (PGT_CACHE(index)) return; /* Already have a cache of this size */ align = max_t(unsigned long, align, minalign); - name = kasprintf(GFP_KERNEL, "pgtable-2^%d", shift); + name = kasprintf(GFP_KERNEL, "pgtable-2^%d", index); new = kmem_cache_create(name, table_size, align, 0, ctor); - PGT_CACHE(shift) = new; + PGT_CACHE(index) = new; - pr_debug("Allocated pgtable cache for order %d\n", shift); + pr_debug("Allocated pgtable cache for order %d\n", index); }