Message ID | 157383036914.3173.12541360542055110975.stgit@firesoul |
---|---|
State | Superseded |
Delegated to: | David Miller |
Headers | show |
Series | page_pool: followup changes to restore tracepoint features | expand |
On 15 Nov 2019, at 7:06, Jesper Dangaard Brouer wrote: > It is documented that driver API users, have to disconnect > the page_pool, before starting shutdown phase, but is it only > documentation, there is not code catching such violations. > > Given (in page_pool_empty_alloc_cache_once) alloc cache is only > flushed once, there is now an opportunity to catch this case. > > This patch blocks the RX/alloc-side cache, via pretending it is > full and poison last element. This code change will enforce that > drivers cannot use alloc cache during shutdown phase. > > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> > --- > net/core/page_pool.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/net/core/page_pool.c b/net/core/page_pool.c > index e28db2ef8e12..b31f3bb7818d 100644 > --- a/net/core/page_pool.c > +++ b/net/core/page_pool.c > @@ -364,6 +364,10 @@ static void > page_pool_empty_alloc_cache_once(struct page_pool *pool) > page = pool->alloc.cache[--pool->alloc.count]; > __page_pool_return_page(pool, page); > } > + > + /* Block alloc cache, pretend it's full and poison last element */ > + pool->alloc.cache[PP_ALLOC_CACHE_SIZE - 1] = NULL; > + pool->alloc.count = PP_ALLOC_CACHE_SIZE; > } > > static void page_pool_scrub(struct page_pool *pool) I'm really unclear on how this is going to be helpful at all. Suppose that page_pool_destroy() is called, and this is the last user of the pool, so page_pool_release() is entered, but finds there are outstanding pages, so the actual free is delayed. Case 1: Now, if the driver screws up and tries to re-use the pool and allocate another packet, it enters __page_pool_get_cached(), which will decrement the alloc.count, and return NULL. This causes a fallback to __get_alloc_pages_slow(), which bumps up the pool inflight count. A repeat call of the above results in garbage, since the alloc.count will return stale values. A call to __page_pool_put_page() may put the value back in the cache since the alloc.count was decremented. Now, the retry may sit there forever since the cache is never flushed. Case 2: An inflight packet somehow arrives at __page_pool_put_page() and enters __page_pool_recycle_direct(). With the above patch, the cache pretends it is full, (assuming nothing was allocated) so it falls back to the ring. Without the patch, the packet is placed in the cache, and then flushed out to the ring later, resulting in the same behavior. Any mistaken allocations are handled gracefully.
On Fri, 15 Nov 2019 10:38:21 -0800 "Jonathan Lemon" <jonathan.lemon@gmail.com> wrote: > Case 1: > Now, if the driver screws up and tries to re-use the pool and allocate > another packet, it enters __page_pool_get_cached(), which will decrement > the alloc.count, and return NULL. This causes a fallback to > __get_alloc_pages_slow(), which bumps up the pool inflight count. I can see that I made a mistake, and cannot use NULL as the poison value. Let me drop this patch, so others (and yours) can go in before merge window.
diff --git a/net/core/page_pool.c b/net/core/page_pool.c index e28db2ef8e12..b31f3bb7818d 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -364,6 +364,10 @@ static void page_pool_empty_alloc_cache_once(struct page_pool *pool) page = pool->alloc.cache[--pool->alloc.count]; __page_pool_return_page(pool, page); } + + /* Block alloc cache, pretend it's full and poison last element */ + pool->alloc.cache[PP_ALLOC_CACHE_SIZE - 1] = NULL; + pool->alloc.count = PP_ALLOC_CACHE_SIZE; } static void page_pool_scrub(struct page_pool *pool)
It is documented that driver API users, have to disconnect the page_pool, before starting shutdown phase, but is it only documentation, there is not code catching such violations. Given (in page_pool_empty_alloc_cache_once) alloc cache is only flushed once, there is now an opportunity to catch this case. This patch blocks the RX/alloc-side cache, via pretending it is full and poison last element. This code change will enforce that drivers cannot use alloc cache during shutdown phase. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> --- net/core/page_pool.c | 4 ++++ 1 file changed, 4 insertions(+)