From patchwork Thu Feb 15 18:49:32 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sowmini Varadhan X-Patchwork-Id: 874085 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=oracle.com header.i=@oracle.com header.b="SCBzt8kq"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3zj5R834Vxz9sRW for ; Fri, 16 Feb 2018 06:08:52 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1033808AbeBOTIt (ORCPT ); Thu, 15 Feb 2018 14:08:49 -0500 Received: from aserp2120.oracle.com ([141.146.126.78]:56208 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1033770AbeBOTIr (ORCPT ); Thu, 15 Feb 2018 14:08:47 -0500 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w1FJ7OMK171223; Thu, 15 Feb 2018 19:08:45 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : in-reply-to : references; s=corp-2017-10-26; bh=Iboq2S/3VgevH5G4W2eTzeTG2ZbuWd+Q+39fv6e5BAM=; b=SCBzt8kq/DR6ZUPZyDIx5QeRjl7A+Xm5tgKNFv690Me9qmdLKfMFNtZ1yuCi3eAhajce PsAUrXpkvl8cxoVO5ITlv8zGdBPzXwMkbOqHQR2CCOhYlRT8xPM/7mB0Mi53Nm5R6UXt FT8X/I/29BJ1WEKMqJ7WBKNGE1hCIHcifo3gFVTQW5TXFIEpREms7huSv9Mch05S2ZaK sAsQJJTAagoUBvz96EdWoaEHEyphW0Z8tY9iL0JfA09J6fjVYHlcdxyBZSgXEi9LvTqY Kd+SY3LQ/foWjct+fpQ/3DuGuv9mhbsyLnk4bMNfDrX00ZZ4VvlMuZ48Ss54YzYli38/ bg== Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by aserp2120.oracle.com with ESMTP id 2g5f1s8j5n-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 15 Feb 2018 19:08:44 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w1FJ8i4D001626 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Thu, 15 Feb 2018 19:08:44 GMT Received: from abhmp0009.oracle.com (abhmp0009.oracle.com [141.146.116.15]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w1FJ8itC008658; Thu, 15 Feb 2018 19:08:44 GMT Received: from ipftiger1.us.oracle.com (/10.208.179.35) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 15 Feb 2018 11:08:43 -0800 From: Sowmini Varadhan To: sowmini.varadhan@oracle.com, netdev@vger.kernel.org, willemdebruijn.kernel@gmail.com Cc: davem@davemloft.net, rds-devel@oss.oracle.com, santosh.shilimkar@oracle.com Subject: [PATCH v3 net-next 1/7] skbuff: export mm_[un]account_pinned_pages for other modules Date: Thu, 15 Feb 2018 10:49:32 -0800 Message-Id: <195217cf6ebdfa3e413a72f2da0532486d638f6c.1518718761.git.sowmini.varadhan@oracle.com> X-Mailer: git-send-email 1.7.1 In-Reply-To: References: In-Reply-To: References: X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8806 signatures=668672 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1802150229 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org RDS would like to use the helper functions for managing pinned pages added by Commit a91dbff551a6 ("sock: ulimit on MSG_ZEROCOPY pages") Signed-off-by: Sowmini Varadhan Acked-by: Willem de Bruijn --- include/linux/skbuff.h | 3 +++ net/core/skbuff.c | 6 ++++-- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 5ebc0f8..b1cc38a 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -466,6 +466,9 @@ struct ubuf_info { #define skb_uarg(SKB) ((struct ubuf_info *)(skb_shinfo(SKB)->destructor_arg)) +int mm_account_pinned_pages(struct mmpin *mmp, size_t size); +void mm_unaccount_pinned_pages(struct mmpin *mmp); + struct ubuf_info *sock_zerocopy_alloc(struct sock *sk, size_t size); struct ubuf_info *sock_zerocopy_realloc(struct sock *sk, size_t size, struct ubuf_info *uarg); diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 09bd89c..1a7485a 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -890,7 +890,7 @@ struct sk_buff *skb_morph(struct sk_buff *dst, struct sk_buff *src) } EXPORT_SYMBOL_GPL(skb_morph); -static int mm_account_pinned_pages(struct mmpin *mmp, size_t size) +int mm_account_pinned_pages(struct mmpin *mmp, size_t size) { unsigned long max_pg, num_pg, new_pg, old_pg; struct user_struct *user; @@ -919,14 +919,16 @@ static int mm_account_pinned_pages(struct mmpin *mmp, size_t size) return 0; } +EXPORT_SYMBOL_GPL(mm_account_pinned_pages); -static void mm_unaccount_pinned_pages(struct mmpin *mmp) +void mm_unaccount_pinned_pages(struct mmpin *mmp) { if (mmp->user) { atomic_long_sub(mmp->num_pg, &mmp->user->locked_vm); free_uid(mmp->user); } } +EXPORT_SYMBOL_GPL(mm_unaccount_pinned_pages); struct ubuf_info *sock_zerocopy_alloc(struct sock *sk, size_t size) { From patchwork Thu Feb 15 18:49:33 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sowmini Varadhan X-Patchwork-Id: 874086 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=oracle.com header.i=@oracle.com header.b="UBw79eSD"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3zj5RK24MCz9sRW for ; Fri, 16 Feb 2018 06:09:01 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1161191AbeBOTI6 (ORCPT ); Thu, 15 Feb 2018 14:08:58 -0500 Received: from aserp2120.oracle.com ([141.146.126.78]:56240 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1033793AbeBOTIs (ORCPT ); Thu, 15 Feb 2018 14:08:48 -0500 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w1FJ6nko170790; Thu, 15 Feb 2018 19:08:46 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : in-reply-to : references; s=corp-2017-10-26; bh=oOYiVLY8EWc8QWmS5y5BlMM9KfDkUlzDGd1YuuLVCq0=; b=UBw79eSDsHFiJ1Jo9I8GdLwgUGEyMfPuSQ0wYyqDCn6uMgyQNv5aivJYWYKGtZe6cvoY 8grlqNta9BMuLQPQvIFLN1m+umEGLmRU1Km6g5BPHLjB5IcNl+/ANpBNnHgDlrhj115G J50UyUXDE7BJST3kV6MLraEGz+G/UIVuFI/+32C0wxDnL2bxGC0YMEIPiIeBVD3CdC9f +aWELe5tiIXc8xDO2tPYtFlK2CoZ9bRK5keqsnz/7PJK5YVf2kJG2DymYgufMkzsgtVV fqqo2PE5Tup4Xf4K8HmhZHUqPry4r7dWAbG9oUfUGFiAn2LwpPDfzUqQw+S2MFSUVcI4 zw== Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by aserp2120.oracle.com with ESMTP id 2g5f1s8j5u-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 15 Feb 2018 19:08:46 +0000 Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w1FJ8i17015134 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Thu, 15 Feb 2018 19:08:44 GMT Received: from abhmp0009.oracle.com (abhmp0009.oracle.com [141.146.116.15]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w1FJ8iKv030094; Thu, 15 Feb 2018 19:08:44 GMT Received: from ipftiger1.us.oracle.com (/10.208.179.35) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 15 Feb 2018 11:08:44 -0800 From: Sowmini Varadhan To: sowmini.varadhan@oracle.com, netdev@vger.kernel.org, willemdebruijn.kernel@gmail.com Cc: davem@davemloft.net, rds-devel@oss.oracle.com, santosh.shilimkar@oracle.com Subject: [PATCH v3 net-next 2/7] rds: hold a sock ref from rds_message to the rds_sock Date: Thu, 15 Feb 2018 10:49:33 -0800 Message-Id: X-Mailer: git-send-email 1.7.1 In-Reply-To: References: In-Reply-To: References: X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8806 signatures=668672 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=952 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1802150229 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org The existing model holds a reference from the rds_sock to the rds_message, but the rds_message does not itself hold a sock_put() on the rds_sock. Instead the m_rs field in the rds_message is assigned when the message is queued on the sock, and nulled when the message is dequeued from the sock. We want to be able to notify userspace when the rds_message is actually freed (from rds_message_purge(), after the refcounts to the rds_message go to 0). At the time that rds_message_purge() is called, the message is no longer on the rds_sock retransmit queue. Thus the explicit reference for the m_rs is needed to send a notification that will signal to userspace that it is now safe to free/reuse any pages that may have been pinned down for zerocopy. This patch manages the m_rs assignment in the rds_message with the necessary refcount book-keeping. Signed-off-by: Sowmini Varadhan Acked-by: Santosh Shilimkar --- net/rds/message.c | 8 +++++++- net/rds/send.c | 7 +------ 2 files changed, 8 insertions(+), 7 deletions(-) diff --git a/net/rds/message.c b/net/rds/message.c index 4318cc9..ef3daaf 100644 --- a/net/rds/message.c +++ b/net/rds/message.c @@ -58,7 +58,7 @@ void rds_message_addref(struct rds_message *rm) */ static void rds_message_purge(struct rds_message *rm) { - unsigned long i; + unsigned long i, flags; if (unlikely(test_bit(RDS_MSG_PAGEVEC, &rm->m_flags))) return; @@ -69,6 +69,12 @@ static void rds_message_purge(struct rds_message *rm) __free_page(sg_page(&rm->data.op_sg[i])); } rm->data.op_nents = 0; + spin_lock_irqsave(&rm->m_rs_lock, flags); + if (rm->m_rs) { + sock_put(rds_rs_to_sk(rm->m_rs)); + rm->m_rs = NULL; + } + spin_unlock_irqrestore(&rm->m_rs_lock, flags); if (rm->rdma.op_active) rds_rdma_free_op(&rm->rdma); diff --git a/net/rds/send.c b/net/rds/send.c index b1b0022..e8f3ff4 100644 --- a/net/rds/send.c +++ b/net/rds/send.c @@ -649,7 +649,6 @@ static void rds_send_remove_from_sock(struct list_head *messages, int status) rm->rdma.op_notifier = NULL; } was_on_sock = 1; - rm->m_rs = NULL; } spin_unlock(&rs->rs_lock); @@ -756,9 +755,6 @@ void rds_send_drop_to(struct rds_sock *rs, struct sockaddr_in *dest) */ if (!test_and_clear_bit(RDS_MSG_ON_CONN, &rm->m_flags)) { spin_unlock_irqrestore(&cp->cp_lock, flags); - spin_lock_irqsave(&rm->m_rs_lock, flags); - rm->m_rs = NULL; - spin_unlock_irqrestore(&rm->m_rs_lock, flags); continue; } list_del_init(&rm->m_conn_item); @@ -774,7 +770,6 @@ void rds_send_drop_to(struct rds_sock *rs, struct sockaddr_in *dest) __rds_send_complete(rs, rm, RDS_RDMA_CANCELED); spin_unlock(&rs->rs_lock); - rm->m_rs = NULL; spin_unlock_irqrestore(&rm->m_rs_lock, flags); rds_message_put(rm); @@ -798,7 +793,6 @@ void rds_send_drop_to(struct rds_sock *rs, struct sockaddr_in *dest) __rds_send_complete(rs, rm, RDS_RDMA_CANCELED); spin_unlock(&rs->rs_lock); - rm->m_rs = NULL; spin_unlock_irqrestore(&rm->m_rs_lock, flags); rds_message_put(rm); @@ -849,6 +843,7 @@ static int rds_send_queue_rm(struct rds_sock *rs, struct rds_connection *conn, list_add_tail(&rm->m_sock_item, &rs->rs_send_queue); set_bit(RDS_MSG_ON_SOCK, &rm->m_flags); rds_message_addref(rm); + sock_hold(rds_rs_to_sk(rs)); rm->m_rs = rs; /* The code ordering is a little weird, but we're From patchwork Thu Feb 15 18:49:34 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sowmini Varadhan X-Patchwork-Id: 874084 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=oracle.com header.i=@oracle.com header.b="TS74Al04"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3zj5RB25klz9t5R for ; Fri, 16 Feb 2018 06:08:53 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1161052AbeBOTIv (ORCPT ); Thu, 15 Feb 2018 14:08:51 -0500 Received: from aserp2120.oracle.com ([141.146.126.78]:56230 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1033791AbeBOTIs (ORCPT ); Thu, 15 Feb 2018 14:08:48 -0500 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w1FJ6kBJ170716; Thu, 15 Feb 2018 19:08:46 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : in-reply-to : references; s=corp-2017-10-26; bh=Sjy9T6q+ENExk/ZCxNoHkwe3ZefluOwm7owTyTefzQQ=; b=TS74Al04+FE0ihvCMhftuFNCPvuvtD7UlLHPFxl1DLU8TZSXLkmULDYvBOJvCauQ9UBq zzM+G56fgd6Qkao1csWvrT2R1PCCJC6GrXCLlMjSQRsiHEGyllvmFfC9V5jrzXKmXn/z No2AY6o6NdDsoNzhwpjupUiAimgZHm/ImDh0VjX75wDNdPKM2sM9Xecf0BTCz8VnHhxp qZBsR5CDcM2HbRC/997I22VdpekPCp/DVlKxTYDczSnFIJi5CUKPs+OCV6M6/+XBRdjM QO6WNmVpcxPibmmqPTNzcGm9jgzZR9Q24eBn9Dg8vzdduRbv0SpHYzTsrwqDP/4x6LcW gg== Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by aserp2120.oracle.com with ESMTP id 2g5f1s8j5w-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 15 Feb 2018 19:08:46 +0000 Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w1FJ8jGj001664 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Thu, 15 Feb 2018 19:08:45 GMT Received: from abhmp0009.oracle.com (abhmp0009.oracle.com [141.146.116.15]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w1FJ8inh003154; Thu, 15 Feb 2018 19:08:45 GMT Received: from ipftiger1.us.oracle.com (/10.208.179.35) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 15 Feb 2018 11:08:44 -0800 From: Sowmini Varadhan To: sowmini.varadhan@oracle.com, netdev@vger.kernel.org, willemdebruijn.kernel@gmail.com Cc: davem@davemloft.net, rds-devel@oss.oracle.com, santosh.shilimkar@oracle.com Subject: [PATCH v3 net-next 3/7] sock: permit SO_ZEROCOPY on PF_RDS socket Date: Thu, 15 Feb 2018 10:49:34 -0800 Message-Id: <03e35c00a592f39592254de31b32afb6160007ca.1518718761.git.sowmini.varadhan@oracle.com> X-Mailer: git-send-email 1.7.1 In-Reply-To: References: In-Reply-To: References: X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8806 signatures=668672 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=869 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1802150229 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org allow the application to set SO_ZEROCOPY on the underlying sk of a PF_RDS socket Signed-off-by: Sowmini Varadhan Acked-by: Willem de Bruijn --- net/core/sock.c | 25 ++++++++++++++----------- 1 files changed, 14 insertions(+), 11 deletions(-) diff --git a/net/core/sock.c b/net/core/sock.c index e90d461..a1fa4a5 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1049,18 +1049,21 @@ int sock_setsockopt(struct socket *sock, int level, int optname, break; case SO_ZEROCOPY: - if (sk->sk_family != PF_INET && sk->sk_family != PF_INET6) + if (sk->sk_family == PF_INET || sk->sk_family == PF_INET6) { + if (sk->sk_protocol != IPPROTO_TCP) + ret = -ENOTSUPP; + else if (sk->sk_state != TCP_CLOSE) + ret = -EBUSY; + } else if (sk->sk_family != PF_RDS) { ret = -ENOTSUPP; - else if (sk->sk_protocol != IPPROTO_TCP) - ret = -ENOTSUPP; - else if (sk->sk_state != TCP_CLOSE) - ret = -EBUSY; - else if (val < 0 || val > 1) - ret = -EINVAL; - else - sock_valbool_flag(sk, SOCK_ZEROCOPY, valbool); - break; - + } + if (!ret) { + if (val < 0 || val > 1) + ret = -EINVAL; + else + sock_valbool_flag(sk, SOCK_ZEROCOPY, valbool); + break; + } default: ret = -ENOPROTOOPT; break; From patchwork Thu Feb 15 18:49:35 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sowmini Varadhan X-Patchwork-Id: 874091 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=oracle.com header.i=@oracle.com header.b="Dtcu32Is"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3zj5YB3hqGz9sRW for ; Fri, 16 Feb 2018 06:14:06 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1163300AbeBOTOE (ORCPT ); Thu, 15 Feb 2018 14:14:04 -0500 Received: from aserp2130.oracle.com ([141.146.126.79]:37266 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1033768AbeBOTNz (ORCPT ); Thu, 15 Feb 2018 14:13:55 -0500 Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w1FJCJ5j003441; Thu, 15 Feb 2018 19:13:47 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : in-reply-to : references; s=corp-2017-10-26; bh=6r9Vc6XBQr8RYTaR8i1clOH0frm4005RthOWnHB5fxs=; b=Dtcu32Isb35/LJQrY5ndJHtbefgJMchWTcwaw+00T+IUGOE5Qm387v5umCm5q3a2chzm PMzKy2IYh34cT5k4ispZdwzmLXAfd8WvQ+ojx4VPV6VwO6YUJ/byzpMOGdkaTC2Kj995 0JUnRhC4slBKNlbxEwGLjLRQYH5zVoNvLkmCiWs8y6mdCb/59IhuTsutxPTRfswS0iRL cLS/I3frsnm8n/GhXtiqGhTjtTP+26VRDLnsyeuEPPVRb0d6z+oafAkt7M76jCslQIfD ZpjQgfBfIiON7TgOiBVFNvrlYoTCfooivGkIyAjhQ6CeD/Fjo7QeAVWNpvX2auCKnRHi Wg== Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by aserp2130.oracle.com with ESMTP id 2g5famran3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 15 Feb 2018 19:13:47 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w1FJ8jTN005183 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Thu, 15 Feb 2018 19:08:45 GMT Received: from abhmp0009.oracle.com (abhmp0009.oracle.com [141.146.116.15]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w1FJ8jMO008668; Thu, 15 Feb 2018 19:08:45 GMT Received: from ipftiger1.us.oracle.com (/10.208.179.35) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 15 Feb 2018 11:08:45 -0800 From: Sowmini Varadhan To: sowmini.varadhan@oracle.com, netdev@vger.kernel.org, willemdebruijn.kernel@gmail.com Cc: davem@davemloft.net, rds-devel@oss.oracle.com, santosh.shilimkar@oracle.com Subject: [PATCH v3 net-next 4/7] rds: support for zcopy completion notification Date: Thu, 15 Feb 2018 10:49:35 -0800 Message-Id: <5b693dc83bbe115a020af81faa4439d0fdf44abf.1518718761.git.sowmini.varadhan@oracle.com> X-Mailer: git-send-email 1.7.1 In-Reply-To: References: In-Reply-To: References: X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8806 signatures=668672 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1802150230 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org RDS removes a datagram (rds_message) from the retransmit queue when an ACK is received. The ACK indicates that the receiver has queued the RDS datagram, so that the sender can safely forget the datagram. When all references to the rds_message are quiesced, rds_message_purge is called to release resources used by the rds_message If the datagram to be removed had pinned pages set up, add an entry to the rs->rs_znotify_queue so that the notifcation will be sent up via rds_rm_zerocopy_callback() when the rds_message is eventually freed by rds_message_purge. rds_rm_zerocopy_callback() attempts to batch the number of cookies sent with each notification to a max of SO_EE_ORIGIN_MAX_ZCOOKIES. This is achieved by checking the tail skb in the sk_error_queue: if this has room for one more cookie, the cookie from the current notification is added; else a new skb is added to the sk_error_queue. Every invocation of rds_rm_zerocopy_callback() will trigger a ->sk_error_report to notify the application. Signed-off-by: Sowmini Varadhan Acked-by: Santosh Shilimkar Acked-by: Willem de Bruijn --- v2: - make sure to always sock_put m_rs even if there is no znotifier. - major rewrite of notification, resulting in much simplification. v3: - fix fragile use of skb->cb[], do not set ee_code incorrectly. include/uapi/linux/errqueue.h | 2 + net/rds/af_rds.c | 2 + net/rds/message.c | 83 +++++++++++++++++++++++++++++++++++++--- net/rds/rds.h | 14 +++++++ net/rds/recv.c | 2 + 5 files changed, 96 insertions(+), 7 deletions(-) diff --git a/include/uapi/linux/errqueue.h b/include/uapi/linux/errqueue.h index dc64cfa..28812ed 100644 --- a/include/uapi/linux/errqueue.h +++ b/include/uapi/linux/errqueue.h @@ -20,11 +20,13 @@ struct sock_extended_err { #define SO_EE_ORIGIN_ICMP6 3 #define SO_EE_ORIGIN_TXSTATUS 4 #define SO_EE_ORIGIN_ZEROCOPY 5 +#define SO_EE_ORIGIN_ZCOOKIE 6 #define SO_EE_ORIGIN_TIMESTAMPING SO_EE_ORIGIN_TXSTATUS #define SO_EE_OFFENDER(ee) ((struct sockaddr*)((ee)+1)) #define SO_EE_CODE_ZEROCOPY_COPIED 1 +#define SO_EE_ORIGIN_MAX_ZCOOKIES 8 /** * struct scm_timestamping - timestamps exposed through cmsg diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c index 0a8eefd..a937f18 100644 --- a/net/rds/af_rds.c +++ b/net/rds/af_rds.c @@ -182,6 +182,8 @@ static __poll_t rds_poll(struct file *file, struct socket *sock, mask |= (EPOLLIN | EPOLLRDNORM); if (rs->rs_snd_bytes < rds_sk_sndbuf(rs)) mask |= (EPOLLOUT | EPOLLWRNORM); + if (sk->sk_err || !skb_queue_empty(&sk->sk_error_queue)) + mask |= POLLERR; read_unlock_irqrestore(&rs->rs_recv_lock, flags); /* clear state any time we wake a seen-congested socket */ diff --git a/net/rds/message.c b/net/rds/message.c index ef3daaf..bf1a656 100644 --- a/net/rds/message.c +++ b/net/rds/message.c @@ -33,6 +33,9 @@ #include #include #include +#include +#include +#include #include "rds.h" @@ -53,29 +56,95 @@ void rds_message_addref(struct rds_message *rm) } EXPORT_SYMBOL_GPL(rds_message_addref); +static inline bool skb_zcookie_add(struct sk_buff *skb, u32 cookie) +{ + struct sock_exterr_skb *serr = SKB_EXT_ERR(skb); + int ncookies; + u32 *ptr; + + if (serr->ee.ee_origin != SO_EE_ORIGIN_ZCOOKIE) + return false; + ncookies = serr->ee.ee_data; + if (ncookies == SO_EE_ORIGIN_MAX_ZCOOKIES) + return false; + ptr = skb_put(skb, sizeof(u32)); + *ptr = cookie; + serr->ee.ee_data = ++ncookies; + return true; +} + +static void rds_rm_zerocopy_callback(struct rds_sock *rs, + struct rds_znotifier *znotif) +{ + struct sock *sk = rds_rs_to_sk(rs); + struct sk_buff *skb, *tail; + struct sock_exterr_skb *serr; + unsigned long flags; + struct sk_buff_head *q; + u32 cookie = znotif->z_cookie; + + q = &sk->sk_error_queue; + spin_lock_irqsave(&q->lock, flags); + tail = skb_peek_tail(q); + + if (tail && skb_zcookie_add(tail, cookie)) { + spin_unlock_irqrestore(&q->lock, flags); + mm_unaccount_pinned_pages(&znotif->z_mmp); + consume_skb(rds_skb_from_znotifier(znotif)); + sk->sk_error_report(sk); + return; + } + + skb = rds_skb_from_znotifier(znotif); + serr = SKB_EXT_ERR(skb); + memset(&serr->ee, 0, sizeof(serr->ee)); + serr->ee.ee_errno = 0; + serr->ee.ee_origin = SO_EE_ORIGIN_ZCOOKIE; + serr->ee.ee_info = 0; + WARN_ON(!skb_zcookie_add(skb, cookie)); + + __skb_queue_tail(q, skb); + + spin_unlock_irqrestore(&q->lock, flags); + sk->sk_error_report(sk); + + mm_unaccount_pinned_pages(&znotif->z_mmp); +} + /* * This relies on dma_map_sg() not touching sg[].page during merging. */ static void rds_message_purge(struct rds_message *rm) { unsigned long i, flags; + bool zcopy = false; if (unlikely(test_bit(RDS_MSG_PAGEVEC, &rm->m_flags))) return; - for (i = 0; i < rm->data.op_nents; i++) { - rdsdebug("putting data page %p\n", (void *)sg_page(&rm->data.op_sg[i])); - /* XXX will have to put_page for page refs */ - __free_page(sg_page(&rm->data.op_sg[i])); - } - rm->data.op_nents = 0; spin_lock_irqsave(&rm->m_rs_lock, flags); if (rm->m_rs) { - sock_put(rds_rs_to_sk(rm->m_rs)); + struct rds_sock *rs = rm->m_rs; + + if (rm->data.op_mmp_znotifier) { + zcopy = true; + rds_rm_zerocopy_callback(rs, rm->data.op_mmp_znotifier); + rm->data.op_mmp_znotifier = NULL; + } + sock_put(rds_rs_to_sk(rs)); rm->m_rs = NULL; } spin_unlock_irqrestore(&rm->m_rs_lock, flags); + for (i = 0; i < rm->data.op_nents; i++) { + /* XXX will have to put_page for page refs */ + if (!zcopy) + __free_page(sg_page(&rm->data.op_sg[i])); + else + put_page(sg_page(&rm->data.op_sg[i])); + } + rm->data.op_nents = 0; + if (rm->rdma.op_active) rds_rdma_free_op(&rm->rdma); if (rm->rdma.op_rdma_mr) diff --git a/net/rds/rds.h b/net/rds/rds.h index 7301b9b..24576bc 100644 --- a/net/rds/rds.h +++ b/net/rds/rds.h @@ -356,6 +356,19 @@ static inline u32 rds_rdma_cookie_offset(rds_rdma_cookie_t cookie) #define RDS_MSG_PAGEVEC 7 #define RDS_MSG_FLUSH 8 +struct rds_znotifier { + struct list_head z_list; + struct mmpin z_mmp; + u32 z_cookie; +}; + +#define RDS_ZCOPY_SKB(__skb) ((struct rds_znotifier *)&((__skb)->cb[0])) + +static inline struct sk_buff *rds_skb_from_znotifier(struct rds_znotifier *z) +{ + return container_of((void *)z, struct sk_buff, cb); +} + struct rds_message { refcount_t m_refcount; struct list_head m_sock_item; @@ -436,6 +449,7 @@ struct rds_message { unsigned int op_count; unsigned int op_dmasg; unsigned int op_dmaoff; + struct rds_znotifier *op_mmp_znotifier; struct scatterlist *op_sg; } data; }; diff --git a/net/rds/recv.c b/net/rds/recv.c index b25bcfe..b080961 100644 --- a/net/rds/recv.c +++ b/net/rds/recv.c @@ -594,6 +594,8 @@ int rds_recvmsg(struct socket *sock, struct msghdr *msg, size_t size, if (msg_flags & MSG_OOB) goto out; + if (msg_flags & MSG_ERRQUEUE) + return sock_recv_errqueue(sk, msg, size, SOL_IP, IP_RECVERR); while (1) { /* If there are pending notifications, do those - and nothing else */ From patchwork Thu Feb 15 18:49:36 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sowmini Varadhan X-Patchwork-Id: 874092 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=oracle.com header.i=@oracle.com header.b="jsaFFkHD"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3zj5YH0LlQz9sRW for ; Fri, 16 Feb 2018 06:14:11 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1163382AbeBOTOJ (ORCPT ); Thu, 15 Feb 2018 14:14:09 -0500 Received: from aserp2130.oracle.com ([141.146.126.79]:37268 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1033764AbeBOTNz (ORCPT ); Thu, 15 Feb 2018 14:13:55 -0500 Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w1FJCKCw003452; Thu, 15 Feb 2018 19:13:47 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : in-reply-to : references; s=corp-2017-10-26; bh=WTGlMzf+WNBPEsrVKaI84FLR+WR2t73xi9cXa2QMi58=; b=jsaFFkHDKPZV3mGQLa0NpJZruEHD9pAt1g49i75lNQfVYyUsYhHUV+yM+GYBn+iqx/L2 rJiyK+Q5bFJRboGIF5VEXP7tY9CKsUVeo70MlyB8upXDj1YO86X+VU6yvs33z+5bG5E9 75is3uRWWixKQCNdYgdBwPIQemEkBpL1rlLSfzPaH9OY1xDbiXFuyYkkB84nLXR8jFcO fjDtrp8xoTavpHuUKEouSWAj9idzal2v6Sn0GcrErv9LUzNEYjcPhves/MNlEKFCHR2v W/HHCyWHbQDPnrEMHMwwxtmBirokv82N+vEAeAuQ0qNk5WY7bl83F6lM3JfSZLh0rK5M cg== Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by aserp2130.oracle.com with ESMTP id 2g5famran7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 15 Feb 2018 19:13:47 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w1FJ8kel015184 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Thu, 15 Feb 2018 19:08:46 GMT Received: from abhmp0009.oracle.com (abhmp0009.oracle.com [141.146.116.15]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w1FJ8jQf008675; Thu, 15 Feb 2018 19:08:45 GMT Received: from ipftiger1.us.oracle.com (/10.208.179.35) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 15 Feb 2018 11:08:45 -0800 From: Sowmini Varadhan To: sowmini.varadhan@oracle.com, netdev@vger.kernel.org, willemdebruijn.kernel@gmail.com Cc: davem@davemloft.net, rds-devel@oss.oracle.com, santosh.shilimkar@oracle.com Subject: [PATCH v3 net-next 5/7] rds: zerocopy Tx support. Date: Thu, 15 Feb 2018 10:49:36 -0800 Message-Id: X-Mailer: git-send-email 1.7.1 In-Reply-To: References: In-Reply-To: References: X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8806 signatures=668672 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1802150230 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org If the MSG_ZEROCOPY flag is specified with rds_sendmsg(), and, if the SO_ZEROCOPY socket option has been set on the PF_RDS socket, application pages sent down with rds_sendmsg() are pinned. The pinning uses the accounting infrastructure added by Commit a91dbff551a6 ("sock: ulimit on MSG_ZEROCOPY pages") The payload bytes in the message may not be modified for the duration that the message has been pinned. A multi-threaded application using this infrastructure may thus need to be notified about send-completion so that it can free/reuse the buffers passed to rds_sendmsg(). Notification of send-completion will identify each message-buffer by a cookie that the application must specify as ancillary data to rds_sendmsg(). The ancillary data in this case has cmsg_level == SOL_RDS and cmsg_type == RDS_CMSG_ZCOPY_COOKIE. Signed-off-by: Sowmini Varadhan Acked-by: Santosh Shilimkar Acked-by: Willem de Bruijn --- v2: - remove unused data_len argument to rds_rm_size; - unmap as necessary if we fail in the middle of zerocopy setup v3: remove needless bzero of skb->cb[], consolidate err cleanup include/uapi/linux/rds.h | 1 + net/rds/message.c | 51 +++++++++++++++++++++++++++++++++++++++++++++- net/rds/rds.h | 3 +- net/rds/send.c | 44 ++++++++++++++++++++++++++++++++++----- 4 files changed, 91 insertions(+), 8 deletions(-) diff --git a/include/uapi/linux/rds.h b/include/uapi/linux/rds.h index e71d449..12e3bca 100644 --- a/include/uapi/linux/rds.h +++ b/include/uapi/linux/rds.h @@ -103,6 +103,7 @@ #define RDS_CMSG_MASKED_ATOMIC_FADD 8 #define RDS_CMSG_MASKED_ATOMIC_CSWP 9 #define RDS_CMSG_RXPATH_LATENCY 11 +#define RDS_CMSG_ZCOPY_COOKIE 12 #define RDS_INFO_FIRST 10000 #define RDS_INFO_COUNTERS 10000 diff --git a/net/rds/message.c b/net/rds/message.c index bf1a656..6518345 100644 --- a/net/rds/message.c +++ b/net/rds/message.c @@ -341,12 +341,14 @@ struct rds_message *rds_message_map_pages(unsigned long *page_addrs, unsigned in return rm; } -int rds_message_copy_from_user(struct rds_message *rm, struct iov_iter *from) +int rds_message_copy_from_user(struct rds_message *rm, struct iov_iter *from, + bool zcopy) { unsigned long to_copy, nbytes; unsigned long sg_off; struct scatterlist *sg; int ret = 0; + int length = iov_iter_count(from); rm->m_inc.i_hdr.h_len = cpu_to_be32(iov_iter_count(from)); @@ -356,6 +358,53 @@ int rds_message_copy_from_user(struct rds_message *rm, struct iov_iter *from) sg = rm->data.op_sg; sg_off = 0; /* Dear gcc, sg->page will be null from kzalloc. */ + if (zcopy) { + int total_copied = 0; + struct sk_buff *skb; + + skb = alloc_skb(SO_EE_ORIGIN_MAX_ZCOOKIES * sizeof(u32), + GFP_KERNEL); + if (!skb) + return -ENOMEM; + rm->data.op_mmp_znotifier = RDS_ZCOPY_SKB(skb); + if (mm_account_pinned_pages(&rm->data.op_mmp_znotifier->z_mmp, + length)) { + ret = -ENOMEM; + goto err; + } + while (iov_iter_count(from)) { + struct page *pages; + size_t start; + ssize_t copied; + + copied = iov_iter_get_pages(from, &pages, PAGE_SIZE, + 1, &start); + if (copied < 0) { + struct mmpin *mmp; + int i; + + for (i = 0; i < rm->data.op_nents; i++) + put_page(sg_page(&rm->data.op_sg[i])); + mmp = &rm->data.op_mmp_znotifier->z_mmp; + mm_unaccount_pinned_pages(mmp); + ret = -EFAULT; + goto err; + } + total_copied += copied; + iov_iter_advance(from, copied); + length -= copied; + sg_set_page(sg, pages, copied, start); + rm->data.op_nents++; + sg++; + } + WARN_ON_ONCE(length != 0); + return ret; +err: + consume_skb(skb); + rm->data.op_mmp_znotifier = NULL; + return ret; + } /* zcopy */ + while (iov_iter_count(from)) { if (!sg_page(sg)) { ret = rds_page_remainder_alloc(sg, iov_iter_count(from), diff --git a/net/rds/rds.h b/net/rds/rds.h index 24576bc..31cd388 100644 --- a/net/rds/rds.h +++ b/net/rds/rds.h @@ -785,7 +785,8 @@ void rds_for_each_conn_info(struct socket *sock, unsigned int len, /* message.c */ struct rds_message *rds_message_alloc(unsigned int nents, gfp_t gfp); struct scatterlist *rds_message_alloc_sgs(struct rds_message *rm, int nents); -int rds_message_copy_from_user(struct rds_message *rm, struct iov_iter *from); +int rds_message_copy_from_user(struct rds_message *rm, struct iov_iter *from, + bool zcopy); struct rds_message *rds_message_map_pages(unsigned long *page_addrs, unsigned int total_len); void rds_message_populate_header(struct rds_header *hdr, __be16 sport, __be16 dport, u64 seq); diff --git a/net/rds/send.c b/net/rds/send.c index e8f3ff4..028ab59 100644 --- a/net/rds/send.c +++ b/net/rds/send.c @@ -875,12 +875,13 @@ static int rds_send_queue_rm(struct rds_sock *rs, struct rds_connection *conn, * rds_message is getting to be quite complicated, and we'd like to allocate * it all in one go. This figures out how big it needs to be up front. */ -static int rds_rm_size(struct msghdr *msg, int data_len) +static int rds_rm_size(struct msghdr *msg, int num_sgs) { struct cmsghdr *cmsg; int size = 0; int cmsg_groups = 0; int retval; + bool zcopy_cookie = false; for_each_cmsghdr(cmsg, msg) { if (!CMSG_OK(msg, cmsg)) @@ -899,6 +900,8 @@ static int rds_rm_size(struct msghdr *msg, int data_len) break; + case RDS_CMSG_ZCOPY_COOKIE: + zcopy_cookie = true; case RDS_CMSG_RDMA_DEST: case RDS_CMSG_RDMA_MAP: cmsg_groups |= 2; @@ -919,7 +922,10 @@ static int rds_rm_size(struct msghdr *msg, int data_len) } - size += ceil(data_len, PAGE_SIZE) * sizeof(struct scatterlist); + if ((msg->msg_flags & MSG_ZEROCOPY) && !zcopy_cookie) + return -EINVAL; + + size += num_sgs * sizeof(struct scatterlist); /* Ensure (DEST, MAP) are never used with (ARGS, ATOMIC) */ if (cmsg_groups == 3) @@ -928,6 +934,18 @@ static int rds_rm_size(struct msghdr *msg, int data_len) return size; } +static int rds_cmsg_zcopy(struct rds_sock *rs, struct rds_message *rm, + struct cmsghdr *cmsg) +{ + u32 *cookie; + + if (cmsg->cmsg_len < CMSG_LEN(sizeof(*cookie))) + return -EINVAL; + cookie = CMSG_DATA(cmsg); + rm->data.op_mmp_znotifier->z_cookie = *cookie; + return 0; +} + static int rds_cmsg_send(struct rds_sock *rs, struct rds_message *rm, struct msghdr *msg, int *allocated_mr) { @@ -970,6 +988,10 @@ static int rds_cmsg_send(struct rds_sock *rs, struct rds_message *rm, ret = rds_cmsg_atomic(rs, rm, cmsg); break; + case RDS_CMSG_ZCOPY_COOKIE: + ret = rds_cmsg_zcopy(rs, rm, cmsg); + break; + default: return -EINVAL; } @@ -1040,10 +1062,13 @@ int rds_sendmsg(struct socket *sock, struct msghdr *msg, size_t payload_len) long timeo = sock_sndtimeo(sk, nonblock); struct rds_conn_path *cpath; size_t total_payload_len = payload_len, rdma_payload_len = 0; + bool zcopy = ((msg->msg_flags & MSG_ZEROCOPY) && + sock_flag(rds_rs_to_sk(rs), SOCK_ZEROCOPY)); + int num_sgs = ceil(payload_len, PAGE_SIZE); /* Mirror Linux UDP mirror of BSD error message compatibility */ /* XXX: Perhaps MSG_MORE someday */ - if (msg->msg_flags & ~(MSG_DONTWAIT | MSG_CMSG_COMPAT)) { + if (msg->msg_flags & ~(MSG_DONTWAIT | MSG_CMSG_COMPAT | MSG_ZEROCOPY)) { ret = -EOPNOTSUPP; goto out; } @@ -1087,8 +1112,15 @@ int rds_sendmsg(struct socket *sock, struct msghdr *msg, size_t payload_len) goto out; } + if (zcopy) { + if (rs->rs_transport->t_type != RDS_TRANS_TCP) { + ret = -EOPNOTSUPP; + goto out; + } + num_sgs = iov_iter_npages(&msg->msg_iter, INT_MAX); + } /* size of rm including all sgs */ - ret = rds_rm_size(msg, payload_len); + ret = rds_rm_size(msg, num_sgs); if (ret < 0) goto out; @@ -1100,12 +1132,12 @@ int rds_sendmsg(struct socket *sock, struct msghdr *msg, size_t payload_len) /* Attach data to the rm */ if (payload_len) { - rm->data.op_sg = rds_message_alloc_sgs(rm, ceil(payload_len, PAGE_SIZE)); + rm->data.op_sg = rds_message_alloc_sgs(rm, num_sgs); if (!rm->data.op_sg) { ret = -ENOMEM; goto out; } - ret = rds_message_copy_from_user(rm, &msg->msg_iter); + ret = rds_message_copy_from_user(rm, &msg->msg_iter, zcopy); if (ret) goto out; } From patchwork Thu Feb 15 18:49:37 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sowmini Varadhan X-Patchwork-Id: 874090 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=oracle.com header.i=@oracle.com header.b="GNNjiAhS"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3zj5Y64Jlfz9sRW for ; Fri, 16 Feb 2018 06:14:02 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1161251AbeBOTN4 (ORCPT ); Thu, 15 Feb 2018 14:13:56 -0500 Received: from aserp2130.oracle.com ([141.146.126.79]:37258 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1033759AbeBOTNy (ORCPT ); Thu, 15 Feb 2018 14:13:54 -0500 Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w1FJCIgN003378; Thu, 15 Feb 2018 19:13:47 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : in-reply-to : references; s=corp-2017-10-26; bh=9CEWnNnQo0JPxi2uICfJeTNHCLiMXJnYjTU+kxecNQU=; b=GNNjiAhSVoz0bcWg798ptbnqdnWDEtpVMmn1fLbhwOmWI/ALBNf41Ryu0dgWfypRiEi3 M9zB+Rp/FGgGZCZja+di6vjmoSpiWmtxy+8Dpdm//HgMwnmPliUjnd6cilVUKXb0I6NF HPPichNcQboacgJdkgvIPh38urchBd6Z9QUvYpm6CNvOkcObKU/TO1elKSh11o1r5xtU 1BA5Vvb2DL/0dyoApRsov2BCdnmurRHTkf0/UgXQuEX675jKVb2Vqg7OJIDUznmqJm86 nrKXUHSSXLw6cIvkWUnnXWTghlaj21Pqcisi3bZmASl9ht4XrU5cufni3W7OSvEzxuC9 uw== Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by aserp2130.oracle.com with ESMTP id 2g5famran8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 15 Feb 2018 19:13:47 +0000 Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w1FJ8k3I001691 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Thu, 15 Feb 2018 19:08:46 GMT Received: from abhmp0009.oracle.com (abhmp0009.oracle.com [141.146.116.15]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w1FJ8kVS003160; Thu, 15 Feb 2018 19:08:46 GMT Received: from ipftiger1.us.oracle.com (/10.208.179.35) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 15 Feb 2018 11:08:45 -0800 From: Sowmini Varadhan To: sowmini.varadhan@oracle.com, netdev@vger.kernel.org, willemdebruijn.kernel@gmail.com Cc: davem@davemloft.net, rds-devel@oss.oracle.com, santosh.shilimkar@oracle.com Subject: [PATCH v3 net-next 6/7] selftests/net: add support for PF_RDS sockets Date: Thu, 15 Feb 2018 10:49:37 -0800 Message-Id: <51219b78ae2e5dad3c53fd55ac3b00fdb90b8fbd.1518718761.git.sowmini.varadhan@oracle.com> X-Mailer: git-send-email 1.7.1 In-Reply-To: References: In-Reply-To: References: X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8806 signatures=668672 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=882 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1802150230 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Add support for basic PF_RDS client-server testing in msg_zerocopy Signed-off-by: Sowmini Varadhan Acked-by: Willem de Bruijn --- tools/testing/selftests/net/msg_zerocopy.c | 65 +++++++++++++++++++++++++++- 1 files changed, 64 insertions(+), 1 deletions(-) diff --git a/tools/testing/selftests/net/msg_zerocopy.c b/tools/testing/selftests/net/msg_zerocopy.c index e11fe84..7a5b353 100644 --- a/tools/testing/selftests/net/msg_zerocopy.c +++ b/tools/testing/selftests/net/msg_zerocopy.c @@ -14,6 +14,9 @@ * - SOCK_DGRAM * - SOCK_RAW * + * PF_RDS + * - SOCK_SEQPACKET + * * Start this program on two connected hosts, one in send mode and * the other with option '-r' to put it in receiver mode. * @@ -53,6 +56,7 @@ #include #include #include +#include #ifndef SO_EE_ORIGIN_ZEROCOPY #define SO_EE_ORIGIN_ZEROCOPY 5 @@ -300,10 +304,15 @@ static int do_setup_tx(int domain, int type, int protocol) if (cfg_zerocopy) do_setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, 1); - if (domain != PF_PACKET) + if (domain != PF_PACKET && domain != PF_RDS) if (connect(fd, (void *) &cfg_dst_addr, cfg_alen)) error(1, errno, "connect"); + if (domain == PF_RDS) { + if (bind(fd, (void *) &cfg_src_addr, cfg_alen)) + error(1, errno, "bind"); + } + return fd; } @@ -444,6 +453,13 @@ static void do_tx(int domain, int type, int protocol) msg.msg_iovlen++; } + if (domain == PF_RDS) { + msg.msg_name = &cfg_dst_addr; + msg.msg_namelen = (cfg_dst_addr.ss_family == AF_INET ? + sizeof(struct sockaddr_in) : + sizeof(struct sockaddr_in6)); + } + iov[2].iov_base = payload; iov[2].iov_len = cfg_payload_len; msg.msg_iovlen++; @@ -555,6 +571,40 @@ static void do_flush_datagram(int fd, int type) bytes += cfg_payload_len; } + +static void do_recvmsg(int fd) +{ + int ret, off = 0; + char *buf; + struct iovec iov; + struct msghdr msg; + struct sockaddr_storage din; + + buf = calloc(cfg_payload_len, sizeof(char)); + iov.iov_base = buf; + iov.iov_len = cfg_payload_len; + + memset(&msg, 0, sizeof(msg)); + msg.msg_name = &din; + msg.msg_namelen = sizeof(din); + msg.msg_iov = &iov; + msg.msg_iovlen = 1; + + ret = recvmsg(fd, &msg, MSG_TRUNC); + + if (ret == -1) + error(1, errno, "recv"); + if (ret != cfg_payload_len) + error(1, 0, "recv: ret=%u != %u", ret, cfg_payload_len); + + if (memcmp(buf + off, payload, ret)) + error(1, 0, "recv: data mismatch"); + + free(buf); + packets++; + bytes += cfg_payload_len; +} + static void do_rx(int domain, int type, int protocol) { uint64_t tstop; @@ -566,6 +616,8 @@ static void do_rx(int domain, int type, int protocol) do { if (type == SOCK_STREAM) do_flush_tcp(fd); + else if (domain == PF_RDS) + do_recvmsg(fd); else do_flush_datagram(fd, type); @@ -610,6 +662,7 @@ static void parse_opts(int argc, char **argv) 40 /* max tcp options */; int c; char *daddr = NULL, *saddr = NULL; + char *cfg_test; cfg_payload_len = max_payload_len; @@ -667,6 +720,14 @@ static void parse_opts(int argc, char **argv) break; } } + + cfg_test = argv[argc - 1]; + if (strcmp(cfg_test, "rds") == 0) { + if (!daddr) + error(1, 0, "-D required for PF_RDS\n"); + if (!cfg_rx && !saddr) + error(1, 0, "-S required for PF_RDS\n"); + } setup_sockaddr(cfg_family, daddr, &cfg_dst_addr); setup_sockaddr(cfg_family, saddr, &cfg_src_addr); @@ -699,6 +760,8 @@ int main(int argc, char **argv) do_test(cfg_family, SOCK_STREAM, 0); else if (!strcmp(cfg_test, "udp")) do_test(cfg_family, SOCK_DGRAM, 0); + else if (!strcmp(cfg_test, "rds")) + do_test(PF_RDS, SOCK_SEQPACKET, 0); else error(1, 0, "unknown cfg_test %s", cfg_test); From patchwork Thu Feb 15 18:49:38 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sowmini Varadhan X-Patchwork-Id: 874089 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=oracle.com header.i=@oracle.com header.b="Mq5NPJNQ"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3zj5Y318V5z9sRW for ; Fri, 16 Feb 2018 06:13:59 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1163236AbeBOTN5 (ORCPT ); Thu, 15 Feb 2018 14:13:57 -0500 Received: from aserp2130.oracle.com ([141.146.126.79]:37260 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1033760AbeBOTNy (ORCPT ); Thu, 15 Feb 2018 14:13:54 -0500 Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w1FJCIox003396; Thu, 15 Feb 2018 19:13:48 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : in-reply-to : references; s=corp-2017-10-26; bh=WL0akCZTyPtoAtgmxXjaR3Cq5f0rXf7xpOXRkJ9p1sg=; b=Mq5NPJNQ4BRowACBMWExNXeXvXCO6rYslA3IzabEhdvq7yjICbGT5mnF0wArIp3J4eHO tYNfUvFq/DzK/HL5uMIaTtudXZBV0Q38wL2VtD1iaj2U95601Q+miWX0Z+7Qw8YrAz6v H2QNz7dkX0POyc9b1pGJJD+k6baJdZPjdShwr8SGauM717cAc7cwVxWNKrroX4HFaSI7 conA2S0ntYLbjUSEIIARGtqFu1hV2eM3qORLgqRAYUcKK8pxYNyqXZmYFg+iZSZp82XX dKmEiNr2vNsygD8prCGF1eUuDJYjFYBOM/iKNgKHhtfnz9OHiQRSOP6aLVtqV5QOShL7 /Q== Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by aserp2130.oracle.com with ESMTP id 2g5famran9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 15 Feb 2018 19:13:48 +0000 Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w1FJ8kYi015205 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Thu, 15 Feb 2018 19:08:46 GMT Received: from abhmp0009.oracle.com (abhmp0009.oracle.com [141.146.116.15]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w1FJ8k4v003165; Thu, 15 Feb 2018 19:08:46 GMT Received: from ipftiger1.us.oracle.com (/10.208.179.35) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 15 Feb 2018 11:08:46 -0800 From: Sowmini Varadhan To: sowmini.varadhan@oracle.com, netdev@vger.kernel.org, willemdebruijn.kernel@gmail.com Cc: davem@davemloft.net, rds-devel@oss.oracle.com, santosh.shilimkar@oracle.com Subject: [PATCH v3 net-next 7/7] selftests/net: add zerocopy support for PF_RDS test case Date: Thu, 15 Feb 2018 10:49:38 -0800 Message-Id: <09f5e45c76278cf11d978d670ece28da6090a930.1518718761.git.sowmini.varadhan@oracle.com> X-Mailer: git-send-email 1.7.1 In-Reply-To: References: In-Reply-To: References: X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8806 signatures=668672 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=762 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1802150230 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Send a cookie with sendmsg() on PF_RDS sockets, and process the returned batched cookies in do_recv_completion() Signed-off-by: Sowmini Varadhan Acked-by: Willem de Bruijn --- v2: - restructured do_recv_completion to avoid excessive code re-indent - on-stack allocation of cmsghdr for cookie in do_sendmsg - Additional verification: Verify ncookies <= MAX_.., verify ret == ncookies * sizeof(uint32_t) tools/testing/selftests/net/msg_zerocopy.c | 68 ++++++++++++++++++++++++++-- 1 files changed, 64 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/net/msg_zerocopy.c b/tools/testing/selftests/net/msg_zerocopy.c index 7a5b353..5cc2a53 100644 --- a/tools/testing/selftests/net/msg_zerocopy.c +++ b/tools/testing/selftests/net/msg_zerocopy.c @@ -168,17 +168,39 @@ static int do_accept(int fd) return fd; } -static bool do_sendmsg(int fd, struct msghdr *msg, bool do_zerocopy) +static void add_zcopy_cookie(struct msghdr *msg, uint32_t cookie) +{ + struct cmsghdr *cm; + + if (!msg->msg_control) + error(1, errno, "NULL cookie"); + cm = (void *)msg->msg_control; + cm->cmsg_len = CMSG_LEN(sizeof(cookie)); + cm->cmsg_level = SOL_RDS; + cm->cmsg_type = RDS_CMSG_ZCOPY_COOKIE; + memcpy(CMSG_DATA(cm), &cookie, sizeof(cookie)); +} + +static bool do_sendmsg(int fd, struct msghdr *msg, bool do_zerocopy, int domain) { int ret, len, i, flags; + static uint32_t cookie; + char ckbuf[CMSG_SPACE(sizeof(cookie))]; len = 0; for (i = 0; i < msg->msg_iovlen; i++) len += msg->msg_iov[i].iov_len; flags = MSG_DONTWAIT; - if (do_zerocopy) + if (do_zerocopy) { flags |= MSG_ZEROCOPY; + if (domain == PF_RDS) { + memset(&msg->msg_control, 0, sizeof(msg->msg_control)); + msg->msg_controllen = CMSG_SPACE(sizeof(cookie)); + msg->msg_control = (struct cmsghdr *)ckbuf; + add_zcopy_cookie(msg, ++cookie); + } + } ret = sendmsg(fd, msg, flags); if (ret == -1 && errno == EAGAIN) @@ -194,6 +216,10 @@ static bool do_sendmsg(int fd, struct msghdr *msg, bool do_zerocopy) if (do_zerocopy && ret) expected_completions++; } + if (do_zerocopy && domain == PF_RDS) { + msg->msg_control = NULL; + msg->msg_controllen = 0; + } return true; } @@ -220,7 +246,9 @@ static void do_sendmsg_corked(int fd, struct msghdr *msg) msg->msg_iov[0].iov_len = payload_len + extra_len; extra_len = 0; - do_sendmsg(fd, msg, do_zerocopy); + do_sendmsg(fd, msg, do_zerocopy, + (cfg_dst_addr.ss_family == AF_INET ? + PF_INET : PF_INET6)); } do_setsockopt(fd, IPPROTO_UDP, UDP_CORK, 0); @@ -316,6 +344,26 @@ static int do_setup_tx(int domain, int type, int protocol) return fd; } +static int do_process_zerocopy_cookies(struct sock_extended_err *serr, + uint32_t *ckbuf, size_t nbytes) +{ + int ncookies, i; + + if (serr->ee_errno != 0) + error(1, 0, "serr: wrong error code: %u", serr->ee_errno); + ncookies = serr->ee_data; + if (ncookies > SO_EE_ORIGIN_MAX_ZCOOKIES) + error(1, 0, "Returned %d cookies, max expected %d\n", + ncookies, SO_EE_ORIGIN_MAX_ZCOOKIES); + if (nbytes != ncookies * sizeof(uint32_t)) + error(1, 0, "Expected %d cookies, got %ld\n", + ncookies, nbytes/sizeof(uint32_t)); + for (i = 0; i < ncookies; i++) + if (cfg_verbose >= 2) + fprintf(stderr, "%d\n", ckbuf[i]); + return ncookies; +} + static bool do_recv_completion(int fd) { struct sock_extended_err *serr; @@ -324,10 +372,17 @@ static bool do_recv_completion(int fd) uint32_t hi, lo, range; int ret, zerocopy; char control[100]; + uint32_t ckbuf[SO_EE_ORIGIN_MAX_ZCOOKIES]; + struct iovec iov; msg.msg_control = control; msg.msg_controllen = sizeof(control); + iov.iov_base = ckbuf; + iov.iov_len = (SO_EE_ORIGIN_MAX_ZCOOKIES * sizeof(ckbuf[0])); + msg.msg_iov = &iov; + msg.msg_iovlen = 1; + ret = recvmsg(fd, &msg, MSG_ERRQUEUE); if (ret == -1 && errno == EAGAIN) return false; @@ -346,6 +401,11 @@ static bool do_recv_completion(int fd) cm->cmsg_level, cm->cmsg_type); serr = (void *) CMSG_DATA(cm); + + if (serr->ee_origin == SO_EE_ORIGIN_ZCOOKIE) { + completions += do_process_zerocopy_cookies(serr, ckbuf, ret); + return true; + } if (serr->ee_origin != SO_EE_ORIGIN_ZEROCOPY) error(1, 0, "serr: wrong origin: %u", serr->ee_origin); if (serr->ee_errno != 0) @@ -470,7 +530,7 @@ static void do_tx(int domain, int type, int protocol) if (cfg_cork) do_sendmsg_corked(fd, &msg); else - do_sendmsg(fd, &msg, cfg_zerocopy); + do_sendmsg(fd, &msg, cfg_zerocopy, domain); while (!do_poll(fd, POLLOUT)) { if (cfg_zerocopy)