diff mbox series

[1/1] net/netfilter/x_tables.c: make allocation less aggressive

Message ID 5a70c7c3.JeIh2XMA2ZATeitK%akpm@linux-foundation.org
State Awaiting Upstream, archived
Delegated to: David Miller
Headers show
Series [1/1] net/netfilter/x_tables.c: make allocation less aggressive | expand

Commit Message

Andrew Morton Jan. 30, 2018, 7:30 p.m. UTC
From: Michal Hocko <mhocko@kernel.org>
Subject: net/netfilter/x_tables.c: make allocation less aggressive

syzbot has noticed that xt_alloc_table_info can allocate a lot of memory. 
This is an admin only interface but an admin in a namespace is sufficient
as well.  eacd86ca3b03 ("net/netfilter/x_tables.c: use kvmalloc() in
xt_alloc_table_info()") has changed the opencoded kmalloc->vmalloc
fallback into kvmalloc.  It has dropped __GFP_NORETRY on the way because
vmalloc has simply never fully supported __GFP_NORETRY semantic.  This is
still the case because e.g.  page tables backing the vmalloc area are
hardcoded GFP_KERNEL.

Revert back to __GFP_NORETRY as a poors man defence against excessively
large allocation request here.  We will not rule out the OOM killer
completely but __GFP_NORETRY should at least stop the large request in
most cases.

[akpm@linux-foundation.org: coding-style fixes]
Fixes: eacd86ca3b03 ("net/netfilter/x_tables.c: use kvmalloc() in xt_alloc_tableLink: http://lkml.kernel.org/r/20180130140104.GE21609@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 net/netfilter/x_tables.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Eric Dumazet Jan. 30, 2018, 7:53 p.m. UTC | #1
On Tue, 2018-01-30 at 11:30 -0800, akpm@linux-foundation.org wrote:
> From: Michal Hocko <mhocko@kernel.org>
> Subject: net/netfilter/x_tables.c: make allocation less aggressive
> 
> syzbot has noticed that xt_alloc_table_info can allocate a lot of memory. 
> This is an admin only interface but an admin in a namespace is sufficient
> as well.  eacd86ca3b03 ("net/netfilter/x_tables.c: use kvmalloc() in
> xt_alloc_table_info()") has changed the opencoded kmalloc->vmalloc
> fallback into kvmalloc.  It has dropped __GFP_NORETRY on the way because
> vmalloc has simply never fully supported __GFP_NORETRY semantic.  This is
> still the case because e.g.  page tables backing the vmalloc area are
> hardcoded GFP_KERNEL.
> 
> Revert back to __GFP_NORETRY as a poors man defence against excessively
> large allocation request here.  We will not rule out the OOM killer
> completely but __GFP_NORETRY should at least stop the large request in
> most cases.
> 
> [akpm@linux-foundation.org: coding-style fixes]
> Fixes: eacd86ca3b03 ("net/netfilter/x_tables.c: use kvmalloc() in xt_alloc_tableLink: http://lkml.kernel.org/r/20180130140104.GE21609@dhcp22.suse.cz
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> Acked-by: Florian Westphal <fw@strlen.de>
> Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
> Cc: David S. Miller <davem@davemloft.net>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
>  net/netfilter/x_tables.c |    7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff -puN net/netfilter/x_tables.c~net-netfilter-x_tablesc-make-allocation-less-aggressive net/netfilter/x_tables.c
> --- a/net/netfilter/x_tables.c~net-netfilter-x_tablesc-make-allocation-less-aggressive
> +++ a/net/netfilter/x_tables.c
> @@ -1008,7 +1008,12 @@ struct xt_table_info *xt_alloc_table_inf
>  	if ((size >> PAGE_SHIFT) + 2 > totalram_pages)
>  		return NULL;
>  
> -	info = kvmalloc(sz, GFP_KERNEL);
> +	/* __GFP_NORETRY is not fully supported by kvmalloc but it should
> +	 * work reasonably well if sz is too large and bail out rather
> +	 * than shoot all processes down before realizing there is nothing
> +	 * more to reclaim.
> +	 */
> +	info = kvmalloc(sz, GFP_KERNEL | __GFP_NORETRY);
>  	if (!info)
>  		return NULL;


How is __GFP_NORETRY working exactly ?

Surely, if some firewall tools attempt to load a new iptables rules, we
do not want to abort them if the request can be satisfied after few
pages moved on swap or written back to disk.

We want to avoid huge allocations, but leave reasonable ones succeed.

Thanks.
Michal Hocko Jan. 31, 2018, 8:08 a.m. UTC | #2
On Tue 30-01-18 11:53:58, Eric Dumazet wrote:
[...]
> How is __GFP_NORETRY working exactly ?

this is what the documentation says.
 * __GFP_NORETRY: The VM implementation will try only very lightweight
 *   memory direct reclaim to get some memory under memory pressure (thus
 *   it can sleep). It will avoid disruptive actions like OOM killer. The
 *   caller must handle the failure which is quite likely to happen under
 *   heavy memory pressure. The flag is suitable when failure can easily be
 *   handled at small cost, such as reduced throughput

> Surely, if some firewall tools attempt to load a new iptables rules, we
> do not want to abort them if the request can be satisfied after few
> pages moved on swap or written back to disk.

I am not sure this really goes along with "namespace admin can request
arbitrary amount of memory" very well.
 
> We want to avoid huge allocations, but leave reasonable ones succeed.

Yes, that would be the best way forward. From the previous discussion
with Florian [1] it seems that "reasonable" is not that easy to figure
out. Anyway, this patch merely gets us back to pre eacd86ca3b03 times
where __GFP_NORETRY has been used for both kmalloc and vmalloc paths.
So it is more a quick band aid than a longterm solution.

[1] http://lkml.kernel.org/r/20180129165722.GF5906@breakpoint.cc
diff mbox series

Patch

diff -puN net/netfilter/x_tables.c~net-netfilter-x_tablesc-make-allocation-less-aggressive net/netfilter/x_tables.c
--- a/net/netfilter/x_tables.c~net-netfilter-x_tablesc-make-allocation-less-aggressive
+++ a/net/netfilter/x_tables.c
@@ -1008,7 +1008,12 @@  struct xt_table_info *xt_alloc_table_inf
 	if ((size >> PAGE_SHIFT) + 2 > totalram_pages)
 		return NULL;
 
-	info = kvmalloc(sz, GFP_KERNEL);
+	/* __GFP_NORETRY is not fully supported by kvmalloc but it should
+	 * work reasonably well if sz is too large and bail out rather
+	 * than shoot all processes down before realizing there is nothing
+	 * more to reclaim.
+	 */
+	info = kvmalloc(sz, GFP_KERNEL | __GFP_NORETRY);
 	if (!info)
 		return NULL;