From patchwork Mon Mar 11 19:31:12 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martynas Pumputis X-Patchwork-Id: 1054822 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=lambda.lt Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=messagingengine.com header.i=@messagingengine.com header.b="cRxS0zqw"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 44J7VY14Tcz9s5c for ; Tue, 12 Mar 2019 06:30:29 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727891AbfCKTa2 (ORCPT ); Mon, 11 Mar 2019 15:30:28 -0400 Received: from out2-smtp.messagingengine.com ([66.111.4.26]:41483 "EHLO out2-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727147AbfCKTa1 (ORCPT ); Mon, 11 Mar 2019 15:30:27 -0400 Received: from compute6.internal (compute6.nyi.internal [10.202.2.46]) by mailout.nyi.internal (Postfix) with ESMTP id 3AE48250B1; Mon, 11 Mar 2019 15:30:26 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute6.internal (MEProxy); Mon, 11 Mar 2019 15:30:26 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :message-id:mime-version:subject:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm2; bh=6fFNmScQfgrQpZ6cl pWQQY/j9DsG5eC9wlv+zC1NA9Y=; b=cRxS0zqw1xmQidVTfEg/VADZAYmOy3HIP FZup+X1tsX/QO3V2ehpgOzRkDJmxLtpQC+lwb3/1fot2kNVDud3vtbfkmXdSfZJo 1p9sMI1UkRipbR8lpoSZUG4vhTmDbbdanF4w5Rsft1pX/73gujpmqSAuFNCQiBkf eBK+EpqBKXG1jdQjf21iEEpPSSZe8m4udIU0ZtkEKtiqeVzJq4nEVIl+Ij+9BttX TZ+DmO8un9boYWVklLkZ54vQBksQuXWrVGPA/TWUcfDXKObI6fi4T8UIh1LVGCF2 CeJJW021JBNiiKpXXglFmLN2RQGukGDgmwmYJVLsXXkP7aMTgRQjg== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedutddrgeeigdduvdekucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgggfestdekredtre dttdenucfhrhhomhepofgrrhhthihnrghsucfruhhmphhuthhishcuoehmsehlrghmsggu rgdrlhhtqeenucffohhmrghinhepkhgvrhhnvghlrdhorhhgnecukfhppedvudejrdduge elrdduieehrdduudelnecurfgrrhgrmhepmhgrihhlfhhrohhmpehmsehlrghmsggurgdr lhhtnecuvehluhhsthgvrhfuihiivgeptd X-ME-Proxy: Received: from ceuse.localdomain (217-149-165-119.nat.highway.telekom.at [217.149.165.119]) by mail.messagingengine.com (Postfix) with ESMTPA id 8A8F0E456D; Mon, 11 Mar 2019 15:30:22 -0400 (EDT) From: Martynas Pumputis To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, mhocko@suse.com, m@lambda.lt Subject: [PATCH] bpf: Try harder when allocating memory for large maps Date: Mon, 11 Mar 2019 20:31:12 +0100 Message-Id: <20190311193112.25527-1-m@lambda.lt> X-Mailer: git-send-email 2.21.0 MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org It has been observed that sometimes a higher order memory allocation for BPF maps fails when there is no obvious memory pressure in a system. E.g. the map (BPF_MAP_TYPE_LRU_HASH, key=38, value=56, max_elems=524288) could not be created due to vmalloc unable to allocate 75497472B, when the system's memory consumption (in MB) was the following: Total: 3942 Used: 837 (21.24%) Free: 138 Buffers: 239 Cached: 2727 Later analysis [1] by Michal Hocko showed that the vmalloc was not trying to reclaim memory from the page cache and was failing prematurely due to __GFP_NORETRY. Considering dcda9b0471 ("mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic") and [1], we can replace __GFP_NORETRY with __GFP_RETRY_MAYFAIL, as it won't invoke OOM killer and will try harder to fulfil allocation requests. The change has been tested with the workloads mentioned above and by observing oom_kill value from /proc/vmstat. [1]: https://lore.kernel.org/bpf/20190310071318.GW5232@dhcp22.suse.cz/ Signed-off-by: Martynas Pumputis Acked-by: Yonghong Song --- kernel/bpf/syscall.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 62f6bced3a3c..1b0a057ed6d5 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -136,20 +136,26 @@ static struct bpf_map *find_and_alloc_map(union bpf_attr *attr) void *bpf_map_area_alloc(size_t size, int numa_node) { - /* We definitely need __GFP_NORETRY, so OOM killer doesn't - * trigger under memory pressure as we really just want to - * fail instead. + /* We definitely need __GFP_NORETRY or __GFP_RETRY_MAYFAIL, so + * OOM killer doesn't trigger under memory pressure as we really + * just want to fail instead. */ - const gfp_t flags = __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO; + const gfp_t flags = __GFP_NOWARN | __GFP_ZERO; void *area; if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) { - area = kmalloc_node(size, GFP_USER | flags, numa_node); + /* To avoid bypassing slab alloc for lower order allocs, + * __GFP_NORETRY is used instead of __GFP_RETRY_MAYFAIL. + */ + area = kmalloc_node(size, GFP_USER | __GFP_NORETRY | flags, + numa_node); if (area != NULL) return area; } - return __vmalloc_node_flags_caller(size, numa_node, GFP_KERNEL | flags, + return __vmalloc_node_flags_caller(size, numa_node, + GFP_KERNEL | __GFP_RETRY_MAYFAIL | + flags, __builtin_return_address(0)); }