{"id":808940,"url":"http://patchwork.ozlabs.org/api/1.2/patches/808940/?format=json","web_url":"http://patchwork.ozlabs.org/project/netdev/patch/20170901182926.8981.77450.stgit@john-Precision-Tower-5810/","project":{"id":7,"url":"http://patchwork.ozlabs.org/api/1.2/projects/7/?format=json","name":"Linux network development","link_name":"netdev","list_id":"netdev.vger.kernel.org","list_email":"netdev@vger.kernel.org","web_url":null,"scm_url":null,"webscm_url":null,"list_archive_url":"","list_archive_url_format":"","commit_url_format":""},"msgid":"<20170901182926.8981.77450.stgit@john-Precision-Tower-5810>","list_archive_url":null,"date":"2017-09-01T18:29:26","name":"[net-next] bpf: sockmap update/simplify memory accounting scheme","commit_ref":null,"pull_url":null,"state":"accepted","archived":true,"hash":"c0ce87795283db2977281ede074b1b9d047c1279","submitter":{"id":20028,"url":"http://patchwork.ozlabs.org/api/1.2/people/20028/?format=json","name":"John Fastabend","email":"john.fastabend@gmail.com"},"delegate":{"id":34,"url":"http://patchwork.ozlabs.org/api/1.2/users/34/?format=json","username":"davem","first_name":"David","last_name":"Miller","email":"davem@davemloft.net"},"mbox":"http://patchwork.ozlabs.org/project/netdev/patch/20170901182926.8981.77450.stgit@john-Precision-Tower-5810/mbox/","series":[{"id":1093,"url":"http://patchwork.ozlabs.org/api/1.2/series/1093/?format=json","web_url":"http://patchwork.ozlabs.org/project/netdev/list/?series=1093","date":"2017-09-01T18:29:26","name":"[net-next] bpf: sockmap update/simplify memory accounting scheme","version":1,"mbox":"http://patchwork.ozlabs.org/series/1093/mbox/"}],"comments":"http://patchwork.ozlabs.org/api/patches/808940/comments/","check":"pending","checks":"http://patchwork.ozlabs.org/api/patches/808940/checks/","tags":{},"related":[],"headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ozlabs.org; dkim=pass (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"cT0bI47z\"; dkim-atps=neutral"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xkSTB3Y3wz9sPk\n\tfor <patchwork-incoming@ozlabs.org>;\n\tSat,  2 Sep 2017 04:29:50 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1752202AbdIAS3s (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tFri, 1 Sep 2017 14:29:48 -0400","from mail-pg0-f65.google.com ([74.125.83.65]:38580 \"EHLO\n\tmail-pg0-f65.google.com\" rhost-flags-OK-OK-OK-OK) by vger.kernel.org\n\twith ESMTP id S1750955AbdIAS3r (ORCPT\n\t<rfc822;netdev@vger.kernel.org>); Fri, 1 Sep 2017 14:29:47 -0400","by mail-pg0-f65.google.com with SMTP id t3so569420pgt.5\n\tfor <netdev@vger.kernel.org>; Fri, 01 Sep 2017 11:29:47 -0700 (PDT)","from [127.0.1.1] ([72.168.144.71])\n\tby smtp.gmail.com with ESMTPSA id\n\te27sm1132778pfk.71.2017.09.01.11.29.37\n\t(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);\n\tFri, 01 Sep 2017 11:29:45 -0700 (PDT)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=gmail.com; s=20161025;\n\th=subject:from:to:cc:date:message-id:user-agent:mime-version\n\t:content-transfer-encoding;\n\tbh=orp6Bd5c6lEDUz0c1i/lJx/ncxMDvax0jwB9qy4cSTo=;\n\tb=cT0bI47zYAy4QO0i+oYTW7b7zyjxqvm+u86mOPlFFDeG2MfGhP33I+zpNuaV41ic3r\n\tUq6cOescvY7RQ/lY5RDvA7NuLVKRBeFR3dHSC8XMagcX7sV7EOsVElGjkAHV2Vc5Gwts\n\tjiJ7rpgh/NMoPUSJcIphlTjPLRThvn1hjiaqhzCYYRXtaOJEdo7du3Wvc7KwnvvyBXuE\n\tTTlYy5WAo9syTzIsnJ40yQwH89iX3Gv/f3NM9f+uULVwK8EwdUkhpXPfZWZkASszXP/6\n\tKQlrK+LkoNzZb3BM4F93uX2wvO8HTGGbN/WW6Gatyl45s/VMn3TtKuaY6dN4kvI6dmfT\n\tWqDQ==","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:subject:from:to:cc:date:message-id:user-agent\n\t:mime-version:content-transfer-encoding;\n\tbh=orp6Bd5c6lEDUz0c1i/lJx/ncxMDvax0jwB9qy4cSTo=;\n\tb=SxbE9qK42O0M1DGDy7zLlAcg9Ji5kRU75JSyC8yhfcemzyYFt2AHtHJZuXKO+5cFBN\n\t5GcNM+0RY1+UTP1i9rkwlhoCKn2+Yjhlp4ksKiW3LZ/AV3PEOpvlRtlZ3PQjentK/JjL\n\tg74t8L7E1BNazUYlN3+uqmQNfC/75TBUhAwrSOji8XSMNf21V1wN6obUxjcZ1WMpx+7v\n\tHQG0nnCRpMMvWqc8CZ1ZN/nisWqFFh5zLT4dGuUzRHJIFM0/mD6gxOL+/3vCWaENJUc+\n\tH+I0z64+noeosSNCPEJcF6nlmCl3lE44ve3z1ciy94yu1ilmbJBbg/ZNlz5UD2z1jLrI\n\tSzEw==","X-Gm-Message-State":"AHPjjUh4wyIRJ/47iT1WqELrdcMMuVdhR5fy1EVgJ3zOj7vfhrKdhyBi\n\tjOn+55kusbf2Zk3V","X-Google-Smtp-Source":"ADKCNb6su2M+3htR0RmHVo5V+helPokO/ed8ux27o1o41imECRsntTah9jKNmsbB8H4eK+4zX1/qqg==","X-Received":"by 10.99.44.23 with SMTP id s23mr3492586pgs.212.1504290586434;\n\tFri, 01 Sep 2017 11:29:46 -0700 (PDT)","Subject":"[net-next PATCH] bpf: sockmap update/simplify memory accounting\n\tscheme","From":"John Fastabend <john.fastabend@gmail.com>","To":"davem@davemloft.net","Cc":"netdev@vger.kernel.org, daniel@iogearbox.net, ast@fb.com","Date":"Fri, 01 Sep 2017 11:29:26 -0700","Message-ID":"<20170901182926.8981.77450.stgit@john-Precision-Tower-5810>","User-Agent":"StGit/0.17.1-dirty","MIME-Version":"1.0","Content-Type":"text/plain; charset=\"utf-8\"","Content-Transfer-Encoding":"7bit","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"},"content":"Instead of tracking wmem_queued and sk_mem_charge by incrementing\nin the verdict SK_REDIRECT paths and decrementing in the tx work\npath use skb_set_owner_w and sock_writeable helpers. This solves\na few issues with the current code. First, in SK_REDIRECT inc on\nsk_wmem_queued and sk_mem_charge were being done without the peers\nsock lock being held. Under stress this can result in accounting\nerrors when tx work and/or multiple verdict decisions are working\non the peer psock.\n\nAdditionally, this cleans up the code because we can rely on the\ndefault destructor to decrement memory accounting on kfree_skb. Also\nthis will trigger sk_write_space when space becomes available on\nkfree_skb() which wasn't happening before and prevent __sk_free\nfrom being called until all in-flight packets are completed.\n\nFixes: 174a79ff9515 (\"bpf: sockmap with sk redirect support\")\nSigned-off-by: John Fastabend <john.fastabend@gmail.com>\nAcked-by: Daniel Borkmann <daniel@iogearbox.net>\n---\n kernel/bpf/sockmap.c |   18 +++++++-----------\n 1 file changed, 7 insertions(+), 11 deletions(-)","diff":"diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c\nindex db0d99d..f6ffde9 100644\n--- a/kernel/bpf/sockmap.c\n+++ b/kernel/bpf/sockmap.c\n@@ -111,7 +111,7 @@ static int smap_verdict_func(struct smap_psock *psock, struct sk_buff *skb)\n \n static void smap_do_verdict(struct smap_psock *psock, struct sk_buff *skb)\n {\n-\tstruct sock *sock;\n+\tstruct sock *sk;\n \tint rc;\n \n \t/* Because we use per cpu values to feed input from sock redirect\n@@ -123,16 +123,16 @@ static void smap_do_verdict(struct smap_psock *psock, struct sk_buff *skb)\n \trc = smap_verdict_func(psock, skb);\n \tswitch (rc) {\n \tcase SK_REDIRECT:\n-\t\tsock = do_sk_redirect_map();\n+\t\tsk = do_sk_redirect_map();\n \t\tpreempt_enable();\n-\t\tif (likely(sock)) {\n-\t\t\tstruct smap_psock *peer = smap_psock_sk(sock);\n+\t\tif (likely(sk)) {\n+\t\t\tstruct smap_psock *peer = smap_psock_sk(sk);\n \n \t\t\tif (likely(peer &&\n \t\t\t\t   test_bit(SMAP_TX_RUNNING, &peer->state) &&\n-\t\t\t\t   sk_stream_memory_free(peer->sock))) {\n-\t\t\t\tpeer->sock->sk_wmem_queued += skb->truesize;\n-\t\t\t\tsk_mem_charge(peer->sock, skb->truesize);\n+\t\t\t\t   !sock_flag(sk, SOCK_DEAD) &&\n+\t\t\t\t   sock_writeable(sk))) {\n+\t\t\t\tskb_set_owner_w(skb, sk);\n \t\t\t\tskb_queue_tail(&peer->rxqueue, skb);\n \t\t\t\tschedule_work(&peer->tx_work);\n \t\t\t\tbreak;\n@@ -282,16 +282,12 @@ static void smap_tx_work(struct work_struct *w)\n \t\t\t\t/* Hard errors break pipe and stop xmit */\n \t\t\t\tsmap_report_sk_error(psock, n ? -n : EPIPE);\n \t\t\t\tclear_bit(SMAP_TX_RUNNING, &psock->state);\n-\t\t\t\tsk_mem_uncharge(psock->sock, skb->truesize);\n-\t\t\t\tpsock->sock->sk_wmem_queued -= skb->truesize;\n \t\t\t\tkfree_skb(skb);\n \t\t\t\tgoto out;\n \t\t\t}\n \t\t\trem -= n;\n \t\t\toff += n;\n \t\t} while (rem);\n-\t\tsk_mem_uncharge(psock->sock, skb->truesize);\n-\t\tpsock->sock->sk_wmem_queued -= skb->truesize;\n \t\tkfree_skb(skb);\n \t}\n out:\n","prefixes":["net-next"]}