From patchwork Wed Jan 17 12:19:59 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sowmini Varadhan X-Patchwork-Id: 862252 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=oracle.com header.i=@oracle.com header.b="mkcm3aiL"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3zM68J0wbSz9sNV for ; Wed, 17 Jan 2018 23:38:40 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753023AbeAQMha (ORCPT ); Wed, 17 Jan 2018 07:37:30 -0500 Received: from userp2120.oracle.com ([156.151.31.85]:45292 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752999AbeAQMhW (ORCPT ); Wed, 17 Jan 2018 07:37:22 -0500 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w0HCbH15078952; Wed, 17 Jan 2018 12:37:17 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : in-reply-to : references; s=corp-2017-10-26; bh=WQOwEzP8EimSGc0T9xNWPK2JwGZLOd4mRRVHJ7zOVjo=; b=mkcm3aiLW+UT0ovaanWF8sNUKcnd4MQRmY0g93uxMjPD9IMh33A8yCegJ9J4bEX5GjZ4 EjCBDMQQ6yPnZN9A7PiHlaAo6lv27KRud4SvyT+Ux8nSbhWpLQt+W9PamW2FlkRRLndf zJHBrbVwaEd6D+zXvLD59IHs8ytw6dllItyhWxgfihFfvFO9hT7f2lDgUvs140uExSmV 0oKrkbow5OrD8LJkCz3BqFe4jHy9FdzdHODgnejLfW3bCYP1GNZ6mXlrHV4q3ZcW786y 0JF1kVYXD/qvhliaPYxEK/YF89/u3tnum2hJFOMKYaBzNWz9DYdmqdwoyxdu7DxBDqNw GQ== Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by userp2120.oracle.com with ESMTP id 2fj44ngkut-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 17 Jan 2018 12:37:17 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w0HCbFY4024794 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Wed, 17 Jan 2018 12:37:15 GMT Received: from abhmp0008.oracle.com (abhmp0008.oracle.com [141.146.116.14]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w0HCbE50022530; Wed, 17 Jan 2018 12:37:14 GMT Received: from ipftiger1.us.oracle.com (/10.208.179.35) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 17 Jan 2018 04:37:14 -0800 From: Sowmini Varadhan To: netdev@vger.kernel.org, willemdebruijn.kernel@gmail.com Cc: davem@davemloft.net, rds-devel@oss.oracle.com, sowmini.varadhan@oracle.com, santosh.shilimkar@oracle.com Subject: [PATCH RFC net-next 1/6] sock: MSG_PEEK support for sk_error_queue Date: Wed, 17 Jan 2018 04:19:59 -0800 Message-Id: <05d060dc1169649d84c37ad51b0f8fe54a2a3185.1516147540.git.sowmini.varadhan@oracle.com> X-Mailer: git-send-email 1.7.1 In-Reply-To: References: In-Reply-To: References: X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8776 signatures=668653 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1801170183 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Allow the application the ability to use MSG_PEEK with sk_error_queue so that it can peek and re-read message in cases where MSG_TRUNC may be encountered. Signed-off-by: Sowmini Varadhan --- drivers/net/tun.c | 2 +- include/net/sock.h | 2 +- net/core/sock.c | 7 +++++-- net/packet/af_packet.c | 3 ++- 4 files changed, 9 insertions(+), 5 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 2fba3be..cfd0e0f 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -2313,7 +2313,7 @@ static int tun_recvmsg(struct socket *sock, struct msghdr *m, size_t total_len, } if (flags & MSG_ERRQUEUE) { ret = sock_recv_errqueue(sock->sk, m, total_len, - SOL_PACKET, TUN_TX_TIMESTAMP); + SOL_PACKET, TUN_TX_TIMESTAMP, flags); goto out; } ret = tun_do_read(tun, tfile, &m->msg_iter, flags & MSG_DONTWAIT, ptr); diff --git a/include/net/sock.h b/include/net/sock.h index 73b7830..f0b6990 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -2343,7 +2343,7 @@ static inline bool sk_listener(const struct sock *sk) int sock_get_timestamp(struct sock *, struct timeval __user *); int sock_get_timestampns(struct sock *, struct timespec __user *); int sock_recv_errqueue(struct sock *sk, struct msghdr *msg, int len, int level, - int type); + int type, int flags); bool sk_ns_capable(const struct sock *sk, struct user_namespace *user_ns, int cap); diff --git a/net/core/sock.c b/net/core/sock.c index 72d14b2..4f52677 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -2887,7 +2887,7 @@ void sock_enable_timestamp(struct sock *sk, int flag) } int sock_recv_errqueue(struct sock *sk, struct msghdr *msg, int len, - int level, int type) + int level, int type, int flags) { struct sock_exterr_skb *serr; struct sk_buff *skb; @@ -2916,7 +2916,10 @@ int sock_recv_errqueue(struct sock *sk, struct msghdr *msg, int len, err = copied; out_free_skb: - kfree_skb(skb); + if (likely(!(flags & MSG_PEEK))) + kfree_skb(skb); + else + skb_queue_head(&sk->sk_error_queue, skb); out: return err; } diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index ee7aa0b..4314f31 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -3294,7 +3294,8 @@ static int packet_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, if (flags & MSG_ERRQUEUE) { err = sock_recv_errqueue(sk, msg, len, - SOL_PACKET, PACKET_TX_TIMESTAMP); + SOL_PACKET, PACKET_TX_TIMESTAMP, + flags); goto out; } From patchwork Wed Jan 17 12:20:00 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sowmini Varadhan X-Patchwork-Id: 862247 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=oracle.com header.i=@oracle.com header.b="S3RHzy75"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3zM6715FdGz9sNx for ; Wed, 17 Jan 2018 23:37:33 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753032AbeAQMhb (ORCPT ); Wed, 17 Jan 2018 07:37:31 -0500 Received: from aserp2120.oracle.com ([141.146.126.78]:53156 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752993AbeAQMhV (ORCPT ); Wed, 17 Jan 2018 07:37:21 -0500 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w0HCbG8h015696; Wed, 17 Jan 2018 12:37:17 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : in-reply-to : references; s=corp-2017-10-26; bh=C3veAv3r5kD4px2Bpz7en3/fAh6ZjhM4DGG9b8wQrck=; b=S3RHzy75QUlYADf5Q9OULXy8ZieBOfbu2MWzVS0lIxd7kS9eWO8xhK2Bivv+yGAsMLtn TAsuKKyv2pQ7ug3QTjSDZ/ef49xX2O4l6GlcuD1iE8wQVIRt979z6Bg31tCMch+lnUbW vZdKK2Qw2DTsTPlTwD8QXpH/H3wADrZT57UzgbQlYuYUdfv0Fr1lFoenGGxgfpC7Nwai 03HLB73889iebLG8i4KRCK+hpIGctawcd7nDZumJm1P2kY9n7SYq+TicWDSaBAqcH9yZ C18QkZmVntoxZDUjuTDyx/BO8nG1jW+bjHh+cfR7b1tJxf9QNvhHkBh56fgTSbByXC/t Fw== Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by aserp2120.oracle.com with ESMTP id 2fj3wp8spf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 17 Jan 2018 12:37:17 +0000 Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w0HCbEY4024790 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Wed, 17 Jan 2018 12:37:15 GMT Received: from abhmp0008.oracle.com (abhmp0008.oracle.com [141.146.116.14]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w0HCbE5A016101; Wed, 17 Jan 2018 12:37:14 GMT Received: from ipftiger1.us.oracle.com (/10.208.179.35) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 17 Jan 2018 04:37:14 -0800 From: Sowmini Varadhan To: netdev@vger.kernel.org, willemdebruijn.kernel@gmail.com Cc: davem@davemloft.net, rds-devel@oss.oracle.com, sowmini.varadhan@oracle.com, santosh.shilimkar@oracle.com Subject: [PATCH RFC net-next 2/6] skbuff: export mm_[un]account_pinned_pages for other modules Date: Wed, 17 Jan 2018 04:20:00 -0800 Message-Id: <554ae642aaadb76510912312137847924d06fce9.1516147540.git.sowmini.varadhan@oracle.com> X-Mailer: git-send-email 1.7.1 In-Reply-To: References: In-Reply-To: References: X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8776 signatures=668653 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1801170183 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org RDS would like to use the helper functions for managing pinned pages added by Commit a91dbff551a6 ("sock: ulimit on MSG_ZEROCOPY pages") Signed-off-by: Sowmini Varadhan --- include/linux/skbuff.h | 3 +++ net/core/skbuff.c | 6 ++++-- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index b8e0da6..8e2730a 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -466,6 +466,9 @@ struct ubuf_info { #define skb_uarg(SKB) ((struct ubuf_info *)(skb_shinfo(SKB)->destructor_arg)) +int mm_account_pinned_pages(struct mmpin *mmp, size_t size); +void mm_unaccount_pinned_pages(struct mmpin *mmp); + struct ubuf_info *sock_zerocopy_alloc(struct sock *sk, size_t size); struct ubuf_info *sock_zerocopy_realloc(struct sock *sk, size_t size, struct ubuf_info *uarg); diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 01e8285..272a513 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -890,7 +890,7 @@ struct sk_buff *skb_morph(struct sk_buff *dst, struct sk_buff *src) } EXPORT_SYMBOL_GPL(skb_morph); -static int mm_account_pinned_pages(struct mmpin *mmp, size_t size) +int mm_account_pinned_pages(struct mmpin *mmp, size_t size) { unsigned long max_pg, num_pg, new_pg, old_pg; struct user_struct *user; @@ -919,14 +919,16 @@ static int mm_account_pinned_pages(struct mmpin *mmp, size_t size) return 0; } +EXPORT_SYMBOL_GPL(mm_account_pinned_pages); -static void mm_unaccount_pinned_pages(struct mmpin *mmp) +void mm_unaccount_pinned_pages(struct mmpin *mmp) { if (mmp->user) { atomic_long_sub(mmp->num_pg, &mmp->user->locked_vm); free_uid(mmp->user); } } +EXPORT_SYMBOL_GPL(mm_unaccount_pinned_pages); struct ubuf_info *sock_zerocopy_alloc(struct sock *sk, size_t size) { From patchwork Wed Jan 17 12:20:01 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sowmini Varadhan X-Patchwork-Id: 862246 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=oracle.com header.i=@oracle.com header.b="deZixzF4"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3zM66s4RRBz9sNV for ; Wed, 17 Jan 2018 23:37:25 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753009AbeAQMhX (ORCPT ); Wed, 17 Jan 2018 07:37:23 -0500 Received: from aserp2120.oracle.com ([141.146.126.78]:53140 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752978AbeAQMhU (ORCPT ); Wed, 17 Jan 2018 07:37:20 -0500 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w0HCbHDW015713; Wed, 17 Jan 2018 12:37:17 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : in-reply-to : references; s=corp-2017-10-26; bh=O5zsZluM/4KILqG8Y4zFJqQjBa640mwpHeaQdNpdr0A=; b=deZixzF44cZGWV0BZ86xaYBZxFy4eJmK5tVvuh9u2VHP2js+EwV+57JERFZLoGu8oIp6 p1c6ZUFtZDAkXzlQBsKB+NF239kwTRa1fT8o+y3qox3NmhY7kk5ukp24JVsNEO/SEyuC cwq1ALSa7pWjCRVlJUGeAGkZdHnXnGlXhTzRA5Ax3J97QbnAKzSiv+uHi5zNHGGn5CZf NMXlrnbeFAnfSMkDXRxZx1FHqAsprD+eLMpqqhaSHQgQN17QWuA4uG7OIzbVsfKWD0nX Cyg9sCd18TmTaV3qCqzV0KaWqog0C/Gn/N7ascaVmCZHuKJTMyBJz1jW5uyCk4eQDkRS mQ== Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by aserp2120.oracle.com with ESMTP id 2fj3wp8spg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 17 Jan 2018 12:37:17 +0000 Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w0HCbFDw010976 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Wed, 17 Jan 2018 12:37:15 GMT Received: from abhmp0008.oracle.com (abhmp0008.oracle.com [141.146.116.14]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w0HCbFpf018519; Wed, 17 Jan 2018 12:37:15 GMT Received: from ipftiger1.us.oracle.com (/10.208.179.35) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 17 Jan 2018 04:37:15 -0800 From: Sowmini Varadhan To: netdev@vger.kernel.org, willemdebruijn.kernel@gmail.com Cc: davem@davemloft.net, rds-devel@oss.oracle.com, sowmini.varadhan@oracle.com, santosh.shilimkar@oracle.com Subject: [PATCH RFC net-next 3/6] rds: hold a sock ref from rds_message to the rds_sock Date: Wed, 17 Jan 2018 04:20:01 -0800 Message-Id: <668969fa176f2f0edfb14d96bfd9537b1473fb9b.1516147540.git.sowmini.varadhan@oracle.com> X-Mailer: git-send-email 1.7.1 In-Reply-To: References: In-Reply-To: References: X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8776 signatures=668653 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1801170183 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org The existing model holds a reference from the rds_sock to the rds_message, but the rds_message does not itself hold a sock_put() on the rds_sock. Instead the m_rs field in the rds_message is assigned when the message is queued on the sock, and nulled when the message is dequeued from the sock. We want to be able to notify userspace when the rds_message is actually freed (from rds_message_purge(), after the refcounts to the rds_message go to 0). These notifications will signal that it is safe for uspace to free/reuse any pages that may have been pinned down for zerocopy, and are sent up on the PF_RDS socket. In order to be able to send the notifications on the rds_sock, we need to retain the m_rs assignment in the rds_message with the necessary refcount book-keeping. Signed-off-by: Sowmini Varadhan --- net/rds/message.c | 8 +++++++- net/rds/send.c | 7 +------ 2 files changed, 8 insertions(+), 7 deletions(-) diff --git a/net/rds/message.c b/net/rds/message.c index 4318cc9..ef3daaf 100644 --- a/net/rds/message.c +++ b/net/rds/message.c @@ -58,7 +58,7 @@ void rds_message_addref(struct rds_message *rm) */ static void rds_message_purge(struct rds_message *rm) { - unsigned long i; + unsigned long i, flags; if (unlikely(test_bit(RDS_MSG_PAGEVEC, &rm->m_flags))) return; @@ -69,6 +69,12 @@ static void rds_message_purge(struct rds_message *rm) __free_page(sg_page(&rm->data.op_sg[i])); } rm->data.op_nents = 0; + spin_lock_irqsave(&rm->m_rs_lock, flags); + if (rm->m_rs) { + sock_put(rds_rs_to_sk(rm->m_rs)); + rm->m_rs = NULL; + } + spin_unlock_irqrestore(&rm->m_rs_lock, flags); if (rm->rdma.op_active) rds_rdma_free_op(&rm->rdma); diff --git a/net/rds/send.c b/net/rds/send.c index d3e32d1..5ac0925 100644 --- a/net/rds/send.c +++ b/net/rds/send.c @@ -649,7 +649,6 @@ static void rds_send_remove_from_sock(struct list_head *messages, int status) rm->rdma.op_notifier = NULL; } was_on_sock = 1; - rm->m_rs = NULL; } spin_unlock(&rs->rs_lock); @@ -756,9 +755,6 @@ void rds_send_drop_to(struct rds_sock *rs, struct sockaddr_in *dest) */ if (!test_and_clear_bit(RDS_MSG_ON_CONN, &rm->m_flags)) { spin_unlock_irqrestore(&cp->cp_lock, flags); - spin_lock_irqsave(&rm->m_rs_lock, flags); - rm->m_rs = NULL; - spin_unlock_irqrestore(&rm->m_rs_lock, flags); continue; } list_del_init(&rm->m_conn_item); @@ -774,7 +770,6 @@ void rds_send_drop_to(struct rds_sock *rs, struct sockaddr_in *dest) __rds_send_complete(rs, rm, RDS_RDMA_CANCELED); spin_unlock(&rs->rs_lock); - rm->m_rs = NULL; spin_unlock_irqrestore(&rm->m_rs_lock, flags); rds_message_put(rm); @@ -798,7 +793,6 @@ void rds_send_drop_to(struct rds_sock *rs, struct sockaddr_in *dest) __rds_send_complete(rs, rm, RDS_RDMA_CANCELED); spin_unlock(&rs->rs_lock); - rm->m_rs = NULL; spin_unlock_irqrestore(&rm->m_rs_lock, flags); rds_message_put(rm); @@ -849,6 +843,7 @@ static int rds_send_queue_rm(struct rds_sock *rs, struct rds_connection *conn, list_add_tail(&rm->m_sock_item, &rs->rs_send_queue); set_bit(RDS_MSG_ON_SOCK, &rm->m_flags); rds_message_addref(rm); + sock_hold(rds_rs_to_sk(rs)); rm->m_rs = rs; /* The code ordering is a little weird, but we're From patchwork Wed Jan 17 12:20:02 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sowmini Varadhan X-Patchwork-Id: 862250 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=oracle.com header.i=@oracle.com header.b="CTHNPHTb"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3zM67B4QtJz9s0g for ; Wed, 17 Jan 2018 23:37:42 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753062AbeAQMhe (ORCPT ); Wed, 17 Jan 2018 07:37:34 -0500 Received: from aserp2120.oracle.com ([141.146.126.78]:53146 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752983AbeAQMhU (ORCPT ); Wed, 17 Jan 2018 07:37:20 -0500 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w0HCbHAp015708; Wed, 17 Jan 2018 12:37:17 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : in-reply-to : references; s=corp-2017-10-26; bh=isQZ+kWXjOTsk9/nv3aAzEIQnAqx2nHTPTCWLs8+pWc=; b=CTHNPHTbZiXzyFPgBQoDLPFwDHda9ZkT1BOwipyO15atqhe+2nEPzCZ2M9rlo3IBsO+4 y9WPj/L0sL+YUMryWyqX3UdFbHE0fRcvq/6KYVRRsFuVMz6zSspFK6b/gqKcqSR7s2p/ Ywha0QpgzwGGUFh/7VQO38ESiX9ZXpWOqES7Os4Y8JyWNZQCCNAUDYxnU4tHRyV0PLWb KSwufR7ZQ9ZJY9gNhbD/qX4KBpUg6F3zuz2DA/1JF7QOK525et59rVNpimIwrY3tgYnT g1yOUwJl5zKWfF7oNquXnTAHLaku35/4OfUZbDm/abvFmm6LWzzJp9Deaw9gWAlLZMGu mg== Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by aserp2120.oracle.com with ESMTP id 2fj3wp8spe-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 17 Jan 2018 12:37:17 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w0HCbFpV005348 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Wed, 17 Jan 2018 12:37:15 GMT Received: from abhmp0008.oracle.com (abhmp0008.oracle.com [141.146.116.14]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w0HCbFDt022539; Wed, 17 Jan 2018 12:37:15 GMT Received: from ipftiger1.us.oracle.com (/10.208.179.35) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 17 Jan 2018 04:37:15 -0800 From: Sowmini Varadhan To: netdev@vger.kernel.org, willemdebruijn.kernel@gmail.com Cc: davem@davemloft.net, rds-devel@oss.oracle.com, sowmini.varadhan@oracle.com, santosh.shilimkar@oracle.com Subject: [PATCH RFC net-next 4/6] sock: permit SO_ZEROCOPY on PF_RDS socket Date: Wed, 17 Jan 2018 04:20:02 -0800 Message-Id: <1a25fd67586301a5d8e8fa91152e0cd07bc5ee0f.1516147540.git.sowmini.varadhan@oracle.com> X-Mailer: git-send-email 1.7.1 In-Reply-To: References: In-Reply-To: References: X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8776 signatures=668653 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=866 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1801170183 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org allow the application to set SO_ZEROCOPY on the underlying sk of a PF_RDS socket Signed-off-by: Sowmini Varadhan --- net/core/sock.c | 7 +++++++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/net/core/sock.c b/net/core/sock.c index 4f52677..f0f44b0 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1049,6 +1049,13 @@ int sock_setsockopt(struct socket *sock, int level, int optname, break; case SO_ZEROCOPY: + if (sk->sk_family == PF_RDS) { + if (val < 0 || val > 1) + ret = -EINVAL; + else + sock_valbool_flag(sk, SOCK_ZEROCOPY, valbool); + break; + } if (sk->sk_family != PF_INET && sk->sk_family != PF_INET6) ret = -ENOTSUPP; else if (sk->sk_protocol != IPPROTO_TCP) From patchwork Wed Jan 17 12:20:03 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sowmini Varadhan X-Patchwork-Id: 862251 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=oracle.com header.i=@oracle.com header.b="IuZ3z5AD"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3zM6876HVtz9sNV for ; Wed, 17 Jan 2018 23:38:31 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753051AbeAQMhc (ORCPT ); Wed, 17 Jan 2018 07:37:32 -0500 Received: from aserp2130.oracle.com ([141.146.126.79]:59702 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752988AbeAQMhV (ORCPT ); Wed, 17 Jan 2018 07:37:21 -0500 Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w0HCbHgp048774; Wed, 17 Jan 2018 12:37:18 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : in-reply-to : references; s=corp-2017-10-26; bh=4/0rbpDZHQ2+RZpfP9c9j1CS3E8nKGA8m8KhsICiTgk=; b=IuZ3z5ADsEEJ1NGsLTY86lHU8FeW3x7rmBKRTnpu1Nsc8eUD93gsRaFe7z4o0R1tCERY JxjQJ81y5eCOpnR62h5RX3iUbaeHDkzREyhMIPmtlMk5xZxPlL40JDXxfHzXE7/dp5QF CIAMBpB81YAHoNkQC770HL60JTFuvCU+FeNe8ZyNRGaWivYnHzqRo4JDwkD08iC3kXwA nCC9KbSD8jVwcDy9RFBquTBTlUYgiRbdqfqZnUtr1tcxSiLbfVfB4mui3nVvxPH/2LXA H+mJDZ6TzRCn2xco1qRA8x00T4p0dRMUpVUHU5KNuRPFCRd62wetCw/OAMuyg4pGl7MA Yw== Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by aserp2130.oracle.com with ESMTP id 2fj2gy15wt-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 17 Jan 2018 12:37:18 +0000 Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w0HCbGmQ005379 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Wed, 17 Jan 2018 12:37:16 GMT Received: from abhmp0008.oracle.com (abhmp0008.oracle.com [141.146.116.14]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w0HCbG4Y030109; Wed, 17 Jan 2018 12:37:16 GMT Received: from ipftiger1.us.oracle.com (/10.208.179.35) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 17 Jan 2018 04:37:15 -0800 From: Sowmini Varadhan To: netdev@vger.kernel.org, willemdebruijn.kernel@gmail.com Cc: davem@davemloft.net, rds-devel@oss.oracle.com, sowmini.varadhan@oracle.com, santosh.shilimkar@oracle.com Subject: [PATCH RFC net-next 5/6] rds: support for zcopy completion notification Date: Wed, 17 Jan 2018 04:20:03 -0800 Message-Id: <9ff41bc8f61a112138287b5029369a9910477811.1516147540.git.sowmini.varadhan@oracle.com> X-Mailer: git-send-email 1.7.1 In-Reply-To: References: In-Reply-To: References: X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8776 signatures=668653 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1801170183 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org RDS removes a datagram from the retransmit queue when an ACK is received. The ACK indicates that the receiver has queued the RDS datagram, so that the sender can safely forget the datagram. If the datagram to be removed had pinned pages set up, add an entry to the rs->rs_znotify_queue so that the notifcation will be sent up via rds_rm_zerocopy_callback() when the rds_message is eventually freed by rds_message_purge. Signed-off-by: Sowmini Varadhan --- net/rds/af_rds.c | 3 ++ net/rds/message.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++- net/rds/rds.h | 13 +++++++++- net/rds/recv.c | 3 ++ net/rds/send.c | 7 +++++ 5 files changed, 91 insertions(+), 2 deletions(-) diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c index b405f77..23126db 100644 --- a/net/rds/af_rds.c +++ b/net/rds/af_rds.c @@ -183,6 +183,8 @@ static unsigned int rds_poll(struct file *file, struct socket *sock, mask |= (POLLIN | POLLRDNORM); if (rs->rs_snd_bytes < rds_sk_sndbuf(rs)) mask |= (POLLOUT | POLLWRNORM); + if (sk->sk_err || !skb_queue_empty(&sk->sk_error_queue)) + mask |= POLLERR; read_unlock_irqrestore(&rs->rs_recv_lock, flags); /* clear state any time we wake a seen-congested socket */ @@ -511,6 +513,7 @@ static int __rds_create(struct socket *sock, struct sock *sk, int protocol) INIT_LIST_HEAD(&rs->rs_send_queue); INIT_LIST_HEAD(&rs->rs_recv_queue); INIT_LIST_HEAD(&rs->rs_notify_queue); + INIT_LIST_HEAD(&rs->rs_znotify_queue); INIT_LIST_HEAD(&rs->rs_cong_list); spin_lock_init(&rs->rs_rdma_lock); rs->rs_rdma_keys = RB_ROOT; diff --git a/net/rds/message.c b/net/rds/message.c index ef3daaf..25c74b3 100644 --- a/net/rds/message.c +++ b/net/rds/message.c @@ -33,6 +33,9 @@ #include #include #include +#include +#include +#include #include "rds.h" @@ -53,6 +56,64 @@ void rds_message_addref(struct rds_message *rm) } EXPORT_SYMBOL_GPL(rds_message_addref); +static void rds_rm_zerocopy_callback(struct rds_sock *rs) +{ + struct sock *sk = rds_rs_to_sk(rs); + struct sk_buff *skb; + struct sock_exterr_skb *serr; + struct sk_buff_head *q; + unsigned long flags; + struct sk_buff *tail; + u32 *ptr; + int ncookies = 0, i; + struct rds_znotifier *znotif, *ztmp; + LIST_HEAD(tmp_list); + + spin_lock_irqsave(&rs->rs_lock, flags); + list_splice(&rs->rs_znotify_queue, &tmp_list); + INIT_LIST_HEAD(&rs->rs_znotify_queue); + spin_unlock_irqrestore(&rs->rs_lock, flags); + + list_for_each_entry_safe(znotif, ztmp, &tmp_list, z_list) + ncookies++; + if (ncookies == 0) + return; + skb = alloc_skb(ncookies * sizeof(u32), GFP_ATOMIC); + if (!skb) { + spin_lock_irqsave(&rs->rs_lock, flags); + list_splice(&tmp_list, &rs->rs_znotify_queue); + spin_unlock_irqrestore(&rs->rs_lock, flags); + return; + } + serr = SKB_EXT_ERR(skb); + serr->ee.ee_errno = 0; + serr->ee.ee_origin = SO_EE_ORIGIN_ZEROCOPY; + serr->ee.ee_data = ncookies; + serr->ee.ee_info = 0; + serr->ee.ee_code |= SO_EE_CODE_ZEROCOPY_COPIED; + ptr = skb_put(skb, ncookies * sizeof(u32)); + + i = 0; + list_for_each_entry_safe(znotif, ztmp, &tmp_list, z_list) { + list_del(&znotif->z_list); + ptr[i++] = znotif->z_cookie; + mm_unaccount_pinned_pages(&znotif->z_mmp); + kfree(znotif); + } + WARN_ON(!list_empty(&tmp_list)); + q = &sk->sk_error_queue; + spin_lock_irqsave(&q->lock, flags); + tail = skb_peek_tail(q); + if (!tail || + SKB_EXT_ERR(tail)->ee.ee_origin != SO_EE_ORIGIN_ZEROCOPY) { + __skb_queue_tail(q, skb); + skb = NULL; + } + spin_unlock_irqrestore(&q->lock, flags); + sk->sk_error_report(sk); + consume_skb(skb); +} + /* * This relies on dma_map_sg() not touching sg[].page during merging. */ @@ -66,11 +127,15 @@ static void rds_message_purge(struct rds_message *rm) for (i = 0; i < rm->data.op_nents; i++) { rdsdebug("putting data page %p\n", (void *)sg_page(&rm->data.op_sg[i])); /* XXX will have to put_page for page refs */ - __free_page(sg_page(&rm->data.op_sg[i])); + if (!rm->data.op_zcopy) + __free_page(sg_page(&rm->data.op_sg[i])); + else + put_page(sg_page(&rm->data.op_sg[i])); } rm->data.op_nents = 0; spin_lock_irqsave(&rm->m_rs_lock, flags); if (rm->m_rs) { + rds_rm_zerocopy_callback(rm->m_rs); sock_put(rds_rs_to_sk(rm->m_rs)); rm->m_rs = NULL; } diff --git a/net/rds/rds.h b/net/rds/rds.h index 374ae83..de5015a 100644 --- a/net/rds/rds.h +++ b/net/rds/rds.h @@ -356,6 +356,12 @@ static inline u32 rds_rdma_cookie_offset(rds_rdma_cookie_t cookie) #define RDS_MSG_PAGEVEC 7 #define RDS_MSG_FLUSH 8 +struct rds_znotifier { + struct list_head z_list; + u32 z_cookie; + struct mmpin z_mmp; +}; + struct rds_message { refcount_t m_refcount; struct list_head m_sock_item; @@ -431,11 +437,14 @@ struct rds_message { } rdma; struct rm_data_op { unsigned int op_active:1; - unsigned int op_notify:1; + unsigned int op_notify:1, + op_zcopy:1, + op_pad_to_32:30; unsigned int op_nents; unsigned int op_count; unsigned int op_dmasg; unsigned int op_dmaoff; + struct rds_znotifier *op_mmp_znotifier; struct scatterlist *op_sg; } data; }; @@ -588,6 +597,8 @@ struct rds_sock { /* Socket receive path trace points*/ u8 rs_rx_traces; u8 rs_rx_trace[RDS_MSG_RX_DGRAM_TRACE_MAX]; + + struct list_head rs_znotify_queue; /* zerocopy completion */ }; static inline struct rds_sock *rds_sk_to_rs(const struct sock *sk) diff --git a/net/rds/recv.c b/net/rds/recv.c index b25bcfe..043f667 100644 --- a/net/rds/recv.c +++ b/net/rds/recv.c @@ -594,6 +594,9 @@ int rds_recvmsg(struct socket *sock, struct msghdr *msg, size_t size, if (msg_flags & MSG_OOB) goto out; + if (msg_flags & MSG_ERRQUEUE) + return sock_recv_errqueue(sk, msg, size, SOL_IP, IP_RECVERR, + msg_flags); while (1) { /* If there are pending notifications, do those - and nothing else */ diff --git a/net/rds/send.c b/net/rds/send.c index 5ac0925..5c38ce3 100644 --- a/net/rds/send.c +++ b/net/rds/send.c @@ -635,7 +635,14 @@ static void rds_send_remove_from_sock(struct list_head *messages, int status) if (test_and_clear_bit(RDS_MSG_ON_SOCK, &rm->m_flags)) { struct rm_rdma_op *ro = &rm->rdma; struct rds_notifier *notifier; + struct rds_znotifier *znotifier; + if (rm->data.op_zcopy) { + znotifier = rm->data.op_mmp_znotifier; + list_add_tail(&znotifier->z_list, + &rs->rs_znotify_queue); + rm->data.op_mmp_znotifier = NULL; + } list_del_init(&rm->m_sock_item); rds_send_sndbuf_remove(rs, rm); From patchwork Wed Jan 17 12:20:04 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sowmini Varadhan X-Patchwork-Id: 862249 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=oracle.com header.i=@oracle.com header.b="Ke3WN/xs"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3zM6774Bslz9s0g for ; Wed, 17 Jan 2018 23:37:39 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753069AbeAQMhf (ORCPT ); Wed, 17 Jan 2018 07:37:35 -0500 Received: from aserp2120.oracle.com ([141.146.126.78]:53142 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752979AbeAQMhU (ORCPT ); Wed, 17 Jan 2018 07:37:20 -0500 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w0HCbInl015719; Wed, 17 Jan 2018 12:37:18 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : in-reply-to : references; s=corp-2017-10-26; bh=SCukGnbaXa5rvnrurx/V3ovJdlbWzHmqPswthYKma2A=; b=Ke3WN/xsnBPDATTEMF+FwNSDMQ7znACANGVm1mhvVGcMoafRIhltEYXlAJBbxDVWIt+A ZFKEjWROWOzVSI+W0WOREqNkZF2CTKEr0sAJ4vFC9yYB69CsL51kb2pNozJVgqoj+Ct1 G+OUSY/lE6QYWZLb8nmBqTIh1lEb464OyC6/JLf1n9X+jWiAxeNb7to8vYTeb8809Vbm TsMwZXtjHFoYXURXnumCdo68d1jKLi2o/S3GFNsOXcXiY/aCj6hzH0a4pq/bHnMZ25ka 4rKyL8/OSdrnDjvYeUgjQ3EnyxYK01hRVzir4IPKljdGV4u9LETjhJatma++jR7zivXh OQ== Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by aserp2120.oracle.com with ESMTP id 2fj3wp8sph-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 17 Jan 2018 12:37:18 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w0HCbG8d024904 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Wed, 17 Jan 2018 12:37:17 GMT Received: from abhmp0008.oracle.com (abhmp0008.oracle.com [141.146.116.14]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w0HCbGRL022542; Wed, 17 Jan 2018 12:37:16 GMT Received: from ipftiger1.us.oracle.com (/10.208.179.35) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 17 Jan 2018 04:37:16 -0800 From: Sowmini Varadhan To: netdev@vger.kernel.org, willemdebruijn.kernel@gmail.com Cc: davem@davemloft.net, rds-devel@oss.oracle.com, sowmini.varadhan@oracle.com, santosh.shilimkar@oracle.com Subject: [PATCH RFC net-next 6/6] rds: zerocopy Tx support. Date: Wed, 17 Jan 2018 04:20:04 -0800 Message-Id: <33ac553cf0839d78a45b8bda9cdd918526ee2328.1516147540.git.sowmini.varadhan@oracle.com> X-Mailer: git-send-email 1.7.1 In-Reply-To: References: In-Reply-To: References: X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8776 signatures=668653 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1801170183 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org If the MSG_ZEROCOPY flag is specified with rds_sendmsg(), and, if the SO_ZEROCOPY socket option has been set on the PF_RDS socket, application pages sent down with rds_sendmsg() are pinned. The pinning uses the accounting infrastructure added by Commit a91dbff551a6 ("sock: ulimit on MSG_ZEROCOPY pages") The payload bytes in the message may not be modified for the duration that the message has been pinned. A multi-threaded application using this infrastructure may thus need to be notified about send-completion so that it can free/reuse the buffers passed to rds_sendmsg(). Notification of send-completion will identify each message-buffer by a cookie that the application must specify as ancillary data to rds_sendmsg(). The ancillary data in this case has cmsg_level == SOL_RDS and cmsg_type == RDS_CMSG_ZCOPY_COOKIE. Signed-off-by: Sowmini Varadhan --- include/uapi/linux/rds.h | 1 + net/rds/message.c | 44 +++++++++++++++++++++++++++++++++++++++++++- net/rds/rds.h | 3 ++- net/rds/send.c | 27 ++++++++++++++++++++++++--- 4 files changed, 70 insertions(+), 5 deletions(-) diff --git a/include/uapi/linux/rds.h b/include/uapi/linux/rds.h index e71d449..09b8cc6 100644 --- a/include/uapi/linux/rds.h +++ b/include/uapi/linux/rds.h @@ -102,6 +102,7 @@ #define RDS_CMSG_ATOMIC_CSWP 7 #define RDS_CMSG_MASKED_ATOMIC_FADD 8 #define RDS_CMSG_MASKED_ATOMIC_CSWP 9 +#define RDS_CMSG_ZCOPY_COOKIE 10 #define RDS_CMSG_RXPATH_LATENCY 11 #define RDS_INFO_FIRST 10000 diff --git a/net/rds/message.c b/net/rds/message.c index 25c74b3..111d5a3 100644 --- a/net/rds/message.c +++ b/net/rds/message.c @@ -337,12 +337,14 @@ struct rds_message *rds_message_map_pages(unsigned long *page_addrs, unsigned in return rm; } -int rds_message_copy_from_user(struct rds_message *rm, struct iov_iter *from) +int rds_message_copy_from_user(struct rds_message *rm, struct iov_iter *from, + struct rds_sock *rs, bool zcopy) { unsigned long to_copy, nbytes; unsigned long sg_off; struct scatterlist *sg; int ret = 0; + int length = iov_iter_count(from); rm->m_inc.i_hdr.h_len = cpu_to_be32(iov_iter_count(from)); @@ -352,6 +354,46 @@ int rds_message_copy_from_user(struct rds_message *rm, struct iov_iter *from) sg = rm->data.op_sg; sg_off = 0; /* Dear gcc, sg->page will be null from kzalloc. */ + if (zcopy) { + int total_copied = 0; + size_t zsize = sizeof(struct rds_znotifier); + + rm->data.op_zcopy = 1; + rm->data.op_mmp_znotifier = kzalloc(zsize, GFP_KERNEL); + if (!rm->data.op_mmp_znotifier) + return -ENOMEM; + if (mm_account_pinned_pages(&rm->data.op_mmp_znotifier->z_mmp, + length)) { + kfree(rm->data.op_mmp_znotifier); + rm->data.op_mmp_znotifier = NULL; + return -ENOMEM; + } + while (iov_iter_count(from)) { + struct page *pages; + size_t start; + ssize_t copied; + + copied = iov_iter_get_pages(from, &pages, PAGE_SIZE, + 1, &start); + if (copied < 0) { + /* XXX revert pinning */ + kfree(rm->data.op_mmp_znotifier); + rm->data.op_mmp_znotifier = NULL; + return -EFAULT; + } + total_copied += copied; + iov_iter_advance(from, copied); + length -= copied; + sg_set_page(sg, pages, copied, start); + rm->data.op_nents++; + sg++; + } + WARN_ON_ONCE(length != 0); + return ret; + } /* zcopy */ + + rm->data.op_zcopy = 0; + while (iov_iter_count(from)) { if (!sg_page(sg)) { ret = rds_page_remainder_alloc(sg, iov_iter_count(from), diff --git a/net/rds/rds.h b/net/rds/rds.h index de5015a..884d61c 100644 --- a/net/rds/rds.h +++ b/net/rds/rds.h @@ -781,7 +781,8 @@ void rds_for_each_conn_info(struct socket *sock, unsigned int len, /* message.c */ struct rds_message *rds_message_alloc(unsigned int nents, gfp_t gfp); struct scatterlist *rds_message_alloc_sgs(struct rds_message *rm, int nents); -int rds_message_copy_from_user(struct rds_message *rm, struct iov_iter *from); +int rds_message_copy_from_user(struct rds_message *rm, struct iov_iter *from, + struct rds_sock *rs, bool zcopy); struct rds_message *rds_message_map_pages(unsigned long *page_addrs, unsigned int total_len); void rds_message_populate_header(struct rds_header *hdr, __be16 sport, __be16 dport, u64 seq); diff --git a/net/rds/send.c b/net/rds/send.c index 5c38ce3..e085730 100644 --- a/net/rds/send.c +++ b/net/rds/send.c @@ -908,6 +908,7 @@ static int rds_rm_size(struct msghdr *msg, int data_len) case RDS_CMSG_RDMA_DEST: case RDS_CMSG_RDMA_MAP: + case RDS_CMSG_ZCOPY_COOKIE: cmsg_groups |= 2; /* these are valid but do no add any size */ break; @@ -935,6 +936,18 @@ static int rds_rm_size(struct msghdr *msg, int data_len) return size; } +static int rds_cmsg_zcopy(struct rds_sock *rs, struct rds_message *rm, + struct cmsghdr *cmsg) +{ + unsigned int *cookie; + + if (cmsg->cmsg_len < CMSG_LEN(sizeof(*cookie))) + return -EINVAL; + cookie = CMSG_DATA(cmsg); + rm->data.op_mmp_znotifier->z_cookie = *cookie; + return 0; +} + static int rds_cmsg_send(struct rds_sock *rs, struct rds_message *rm, struct msghdr *msg, int *allocated_mr) { @@ -977,6 +990,10 @@ static int rds_cmsg_send(struct rds_sock *rs, struct rds_message *rm, ret = rds_cmsg_atomic(rs, rm, cmsg); break; + case RDS_CMSG_ZCOPY_COOKIE: + ret = rds_cmsg_zcopy(rs, rm, cmsg); + break; + default: return -EINVAL; } @@ -1047,10 +1064,12 @@ int rds_sendmsg(struct socket *sock, struct msghdr *msg, size_t payload_len) long timeo = sock_sndtimeo(sk, nonblock); struct rds_conn_path *cpath; size_t total_payload_len = payload_len, rdma_payload_len = 0; + bool zcopy = ((msg->msg_flags & MSG_ZEROCOPY) && + sock_flag(rds_rs_to_sk(rs), SOCK_ZEROCOPY)); /* Mirror Linux UDP mirror of BSD error message compatibility */ /* XXX: Perhaps MSG_MORE someday */ - if (msg->msg_flags & ~(MSG_DONTWAIT | MSG_CMSG_COMPAT)) { + if (msg->msg_flags & ~(MSG_DONTWAIT | MSG_CMSG_COMPAT | MSG_ZEROCOPY)) { ret = -EOPNOTSUPP; goto out; } @@ -1107,12 +1126,14 @@ int rds_sendmsg(struct socket *sock, struct msghdr *msg, size_t payload_len) /* Attach data to the rm */ if (payload_len) { - rm->data.op_sg = rds_message_alloc_sgs(rm, ceil(payload_len, PAGE_SIZE)); + int num_sgs = ceil(payload_len, PAGE_SIZE); + + rm->data.op_sg = rds_message_alloc_sgs(rm, num_sgs); if (!rm->data.op_sg) { ret = -ENOMEM; goto out; } - ret = rds_message_copy_from_user(rm, &msg->msg_iter); + ret = rds_message_copy_from_user(rm, &msg->msg_iter, rs, zcopy); if (ret) goto out; }