Message ID | 20120208163421.GL5938@suse.de |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Wed, 8 Feb 2012, Mel Gorman wrote: > Ok, I looked into what is necessary to replace these with checking a page > flag and the cost shifts quite a bit and ends up being more expensive. That is only true if you go the slab route. Slab suffers from not having the page struct pointer readily available. The changes are likely already impacting slab performance without the virt_to_page patch. > In slub, it's sufficient to check kmem_cache_cpu to know whether the > objects in the list are pfmemalloc or not. We try to minimize the size of kmem_cache_cpu. The page pointer is readily available. We just removed the node field from kmem_cache_cpu because it was less expensive to get the node number from the struct page field. The same is certainly true for a PFMEMALLOC flag. > Yeah, you're right on the button there. I did my checking assuming that > PG_active+PG_slab were safe to use. The following is an untested patch that > I probably got details wrong in but it illustrates where virt_to_page() > starts cropping up. Yes you need to come up with a way to not use virt_to_page otherwise slab performance is significantly impacted. On NUMA we are already doing a page struct lookup on free in slab. If you would save the page struct pointer there and reuse it then you would not have an issue at least on free. You still would need to determine which "struct slab" pointer is in use which will also require similar lookups in varous places. Transfer of the pfmemalloc flags (guess you must have a pfmemalloc field in struct slab then) in slab is best be done when allocating and freeing a slab page from the page allocator. I think its rather trivial to add the support you want in a non intrusive way to slub. Slab would require some more thought and discussion. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Feb 08, 2012 at 01:49:05PM -0600, Christoph Lameter wrote: > On Wed, 8 Feb 2012, Mel Gorman wrote: > > > Ok, I looked into what is necessary to replace these with checking a page > > flag and the cost shifts quite a bit and ends up being more expensive. > > That is only true if you go the slab route. Well, yes but both slab and slub have to be supported. I see no reason why I would choose to make this a slab-only or slub-only feature. Slob is not supported because it's not expected that a platform using slob is also going to use network-based swap. > Slab suffers from not having > the page struct pointer readily available. The changes are likely already > impacting slab performance without the virt_to_page patch. > The performance impact only comes into play when swap is on a network device and pfmemalloc reserves are in use. The rest of the time the check on ac avoids all the cost and there is a micro-optimisation later to avoid calling a function (patch 12). > > In slub, it's sufficient to check kmem_cache_cpu to know whether the > > objects in the list are pfmemalloc or not. > > We try to minimize the size of kmem_cache_cpu. The page pointer is readily > available. We just removed the node field from kmem_cache_cpu because it > was less expensive to get the node number from the struct page field. > > The same is certainly true for a PFMEMALLOC flag. > Ok, are you asking that I use the page flag for slub and leave kmem_cache_cpu alone in the slub case? I can certainly check it out if that's what you are asking for. > > Yeah, you're right on the button there. I did my checking assuming that > > PG_active+PG_slab were safe to use. The following is an untested patch that > > I probably got details wrong in but it illustrates where virt_to_page() > > starts cropping up. > > Yes you need to come up with a way to not use virt_to_page otherwise slab > performance is significantly impacted. I did come up with a way: the necessary information is in ac and slabp on slab :/ . There are not exactly many ways that the information can be recorded. > On NUMA we are already doing a page struct lookup on free in slab. > If you would save the page struct pointer > there and reuse it then you would not have an issue at least on free. > That information is only available on NUMA and only when there is more than one node. Having cache_free_alien return the page for passing to ac_put_obj() would also be ugly. The biggest downfall by far is that single-node machines incur the cost of virt_to_page() where they did not have to before. This is not a solution and it is not better than the current simply check on a struct field. > You still would need to determine which "struct slab" pointer is in use > which will also require similar lookups in varous places. > > Transfer of the pfmemalloc flags (guess you must have a pfmemalloc > field in struct slab then) in slab is best be done when allocating and > freeing a slab page from the page allocator. > The page->pfmemalloc is already been transferred to the slab in cache_grow. > I think its rather trivial to add the support you want in a non intrusive > way to slub. Slab would require some more thought and discussion. > I'm slightly confused by this sentence. Support for slub is already in the patch and as you say, it's fairly straight-forward. Supporting a page flag and leaving kmem_cache_cpu alone may also be easier as kmem_cache_cpu->page can be used instead of a kmem_cache_cpu->pfmemalloc field.
On Wed, 8 Feb 2012, Mel Gorman wrote: > On Wed, Feb 08, 2012 at 01:49:05PM -0600, Christoph Lameter wrote: > > On Wed, 8 Feb 2012, Mel Gorman wrote: > > > > > Ok, I looked into what is necessary to replace these with checking a page > > > flag and the cost shifts quite a bit and ends up being more expensive. > > > > That is only true if you go the slab route. > > Well, yes but both slab and slub have to be supported. I see no reason > why I would choose to make this a slab-only or slub-only feature. Slob is > not supported because it's not expected that a platform using slob is also > going to use network-based swap. I think so far the patches in particular to slab.c are pretty significant in impact. > > Slab suffers from not having > > the page struct pointer readily available. The changes are likely already > > impacting slab performance without the virt_to_page patch. > > > > The performance impact only comes into play when swap is on a network > device and pfmemalloc reserves are in use. The rest of the time the check > on ac avoids all the cost and there is a micro-optimisation later to avoid > calling a function (patch 12). We have been down this road too many times. Logic is added to critical paths and memory structures grow. This is not free. And for NBD swap support? Pretty exotic use case. > Ok, are you asking that I use the page flag for slub and leave kmem_cache_cpu > alone in the slub case? I can certainly check it out if that's what you > are asking for. No I am not asking for something. Still thinking about the best way to address the issues. I think we can easily come up with a minimally invasive patch for slub. Not sure about slab at this point. I think we could avoid most of the new fields but this requires some tinkering. I have a day @ home tomorrow which hopefully gives me a chance to put some focus on this issue. > I did come up with a way: the necessary information is in ac and slabp > on slab :/ . There are not exactly many ways that the information can > be recorded. Wish we had something that would not involve increasing the number of fields in these slab structures. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index e90a673..108f3ce 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -432,6 +432,35 @@ static inline int PageTransCompound(struct page *page) } #endif +#ifdef CONFIG_NFS_SWAP +static inline int PageSlabPfmemalloc(struct page *page) +{ + VM_BUG_ON(!PageSlab(page)); + return PageActive(page); +} + +static inline void SetPageSlabPfmemalloc(struct page *page) +{ + VM_BUG_ON(!PageSlab(page)); + SetPageActive(page); +} + +static inline void ClearPageSlabPfmemalloc(struct page *page) +{ + VM_BUG_ON(!PageSlab(page)); + ClearPageActive(page); +} +#else +static inline int PageSlabPfmemalloc(struct page *page) +{ + return 0; +} + +static inline void SetPageSlabPfmemalloc(struct page *page) +{ +} +#endif + #ifdef CONFIG_MMU #define __PG_MLOCKED (1 << PG_mlocked) #else diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h index 1d9ae40..a32bcfd 100644 --- a/include/linux/slub_def.h +++ b/include/linux/slub_def.h @@ -46,7 +46,6 @@ struct kmem_cache_cpu { struct page *page; /* The slab from which we are allocating */ struct page *partial; /* Partially allocated frozen slabs */ int node; /* The node of the page (or -1 for debug) */ - bool pfmemalloc; /* Slab page had pfmemalloc set */ #ifdef CONFIG_SLUB_STATS unsigned stat[NR_SLUB_STAT_ITEMS]; #endif diff --git a/mm/slab.c b/mm/slab.c index 268cd96..3012186 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -233,7 +233,6 @@ struct slab { unsigned int inuse; /* num of objs active in slab */ kmem_bufctl_t free; unsigned short nodeid; - bool pfmemalloc; /* Slab had pfmemalloc set */ }; struct slab_rcu __slab_cover_slab_rcu; }; @@ -255,8 +254,7 @@ struct array_cache { unsigned int avail; unsigned int limit; unsigned int batchcount; - bool touched; - bool pfmemalloc; + unsigned int touched; spinlock_t lock; void *entry[]; /* * Must have this definition in here for the proper @@ -978,6 +976,13 @@ static struct array_cache *alloc_arraycache(int node, int entries, return nc; } +static inline bool is_slab_pfmemalloc(struct slab *slabp) +{ + struct page *page = virt_to_page(slabp->s_mem); + + return PageSlabPfmemalloc(page); +} + /* Clears ac->pfmemalloc if no slabs have pfmalloc set */ static void check_ac_pfmemalloc(struct kmem_cache *cachep, struct array_cache *ac) @@ -985,22 +990,18 @@ static void check_ac_pfmemalloc(struct kmem_cache *cachep, struct kmem_list3 *l3 = cachep->nodelists[numa_mem_id()]; struct slab *slabp; - if (!ac->pfmemalloc) - return; - list_for_each_entry(slabp, &l3->slabs_full, list) - if (slabp->pfmemalloc) + if (is_slab_pfmemalloc(slabp)) return; list_for_each_entry(slabp, &l3->slabs_partial, list) - if (slabp->pfmemalloc) + if (is_slab_pfmemalloc(slabp)) return; list_for_each_entry(slabp, &l3->slabs_free, list) - if (slabp->pfmemalloc) + if (is_slab_pfmemalloc(slabp)) return; - ac->pfmemalloc = false; } static void *__ac_get_obj(struct kmem_cache *cachep, struct array_cache *ac, @@ -1036,7 +1037,7 @@ static void *__ac_get_obj(struct kmem_cache *cachep, struct array_cache *ac, l3 = cachep->nodelists[numa_mem_id()]; if (!list_empty(&l3->slabs_free) && force_refill) { struct slab *slabp = virt_to_slab(objp); - slabp->pfmemalloc = false; + ClearPageSlabPfmemalloc(virt_to_page(slabp->s_mem)); clear_obj_pfmemalloc(&objp); check_ac_pfmemalloc(cachep, ac); return objp; @@ -1066,15 +1067,11 @@ static inline void *ac_get_obj(struct kmem_cache *cachep, static void *__ac_put_obj(struct kmem_cache *cachep, struct array_cache *ac, void *objp) { - struct slab *slabp; + struct page *page = virt_to_page(objp); /* If there are pfmemalloc slabs, check if the object is part of one */ - if (unlikely(ac->pfmemalloc)) { - slabp = virt_to_slab(objp); - - if (slabp->pfmemalloc) - set_obj_pfmemalloc(&objp); - } + if (PageSlabPfmemalloc(page)) + set_obj_pfmemalloc(&objp); return objp; } @@ -1906,9 +1903,13 @@ static void *kmem_getpages(struct kmem_cache *cachep, gfp_t flags, int nodeid, else add_zone_page_state(page_zone(page), NR_SLAB_UNRECLAIMABLE, nr_pages); - for (i = 0; i < nr_pages; i++) + for (i = 0; i < nr_pages; i++) { __SetPageSlab(page + i); + if (*pfmemalloc) + SetPageSlabPfmemalloc(page); + } + if (kmemcheck_enabled && !(cachep->flags & SLAB_NOTRACK)) { kmemcheck_alloc_shadow(page, cachep->gfporder, flags, nodeid); @@ -2888,7 +2889,6 @@ static struct slab *alloc_slabmgmt(struct kmem_cache *cachep, void *objp, slabp->s_mem = objp + colour_off; slabp->nodeid = nodeid; slabp->free = 0; - slabp->pfmemalloc = false; return slabp; } @@ -3074,13 +3074,6 @@ static int cache_grow(struct kmem_cache *cachep, if (!slabp) goto opps1; - /* Record if ALLOC_NO_WATERMARKS was set when allocating the slab */ - if (pfmemalloc) { - struct array_cache *ac = cpu_cache_get(cachep); - slabp->pfmemalloc = true; - ac->pfmemalloc = true; - } - slab_map_pages(cachep, slabp, objp); cache_init_objs(cachep, slabp);