From patchwork Sun Mar 24 15:51:53 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Michael S. Tsirkin" X-Patchwork-Id: 230465 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 55B922C00C4 for ; Mon, 25 Mar 2013 02:51:52 +1100 (EST) Received: from localhost ([::1]:51493 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UJnD4-0004R5-Dk for incoming@patchwork.ozlabs.org; Sun, 24 Mar 2013 11:51:50 -0400 Received: from eggs.gnu.org ([208.118.235.92]:36744) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UJnCl-0004Ng-2t for qemu-devel@nongnu.org; Sun, 24 Mar 2013 11:51:33 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UJnCg-0006Ju-Ae for qemu-devel@nongnu.org; Sun, 24 Mar 2013 11:51:31 -0400 Received: from mx1.redhat.com ([209.132.183.28]:30700) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UJnCg-0006Jd-2x for qemu-devel@nongnu.org; Sun, 24 Mar 2013 11:51:26 -0400 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r2OFpCIe014454 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Sun, 24 Mar 2013 11:51:12 -0400 Received: from redhat.com (vpn-200-70.tlv.redhat.com [10.35.200.70]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with SMTP id r2OFp7Q2001881; Sun, 24 Mar 2013 11:51:08 -0400 Date: Sun, 24 Mar 2013 17:51:53 +0200 From: "Michael S. Tsirkin" To: Roland Dreier Message-ID: <20130324155153.GA8597@redhat.com> MIME-Version: 1.0 Content-Disposition: inline X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 209.132.183.28 Cc: qemu-devel@nongnu.org, "linux-rdma@vger.kernel.org" , Yishai Hadas , LKML , "Michael R. Hines" , Hal Rosenstock , Jason Gunthorpe , Sean Hefty , Christoph Lameter Subject: [Qemu-devel] [PATCHv2] rdma: add a new IB_ACCESS_GIFT flag X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org At the moment registering an MR breaks COW. This breaks memory overcommit for users such as KVM: we have a lot of COW pages, e.g. instances of the zero page or pages shared using KSM. If the application does not care that adapter sees stale data (for example, it tracks writes reregisters and resends), it can use a new IBV_ACCESS_GIFT flag to prevent registration from breaking COW. The semantics are similar to that of SPLICE_F_GIFT thus the name. Signed-off-by: Michael S. Tsirkin --- Please review and consider for 3.10. Changes from v1: rename APP_READONLY to _GIFT: similar to vmsplice's F_GIFT. drivers/infiniband/core/umem.c | 21 ++++++++++++--------- include/rdma/ib_verbs.h | 9 ++++++++- 2 files changed, 20 insertions(+), 10 deletions(-) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index a841123..5dee86d 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -89,6 +89,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, int ret; int off; int i; + bool gift, writable; DEFINE_DMA_ATTRS(attrs); if (dmasync) @@ -96,6 +97,15 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, if (!can_do_mlock()) return ERR_PTR(-EPERM); + /* + * We ask for writable memory if any access flags other than + * "remote read" or "gift" are set. "Local write" and "remote write" + * obviously require write access. "Remote atomic" can do + * things like fetch and add, which will modify memory, and + * "MW bind" can change permissions by binding a window. + */ + gift = access & IB_ACCESS_GIFT; + writable = access & ~(IB_ACCESS_REMOTE_READ | IB_ACCESS_GIFT); umem = kmalloc(sizeof *umem, GFP_KERNEL); if (!umem) @@ -105,14 +115,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, umem->length = size; umem->offset = addr & ~PAGE_MASK; umem->page_size = PAGE_SIZE; - /* - * We ask for writable memory if any access flags other than - * "remote read" are set. "Local write" and "remote write" - * obviously require write access. "Remote atomic" can do - * things like fetch and add, which will modify memory, and - * "MW bind" can change permissions by binding a window. - */ - umem->writable = !!(access & ~IB_ACCESS_REMOTE_READ); + umem->writable = writable; /* We assume the memory is from hugetlb until proved otherwise */ umem->hugetlb = 1; @@ -152,7 +155,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, ret = get_user_pages(current, current->mm, cur_base, min_t(unsigned long, npages, PAGE_SIZE / sizeof (struct page *)), - 1, !umem->writable, page_list, vma_list); + !gift, !umem->writable, page_list, vma_list); if (ret < 0) goto out; diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 98cc4b2..2e6e13c 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -871,7 +871,14 @@ enum ib_access_flags { IB_ACCESS_REMOTE_READ = (1<<2), IB_ACCESS_REMOTE_ATOMIC = (1<<3), IB_ACCESS_MW_BIND = (1<<4), - IB_ZERO_BASED = (1<<5) + IB_ZERO_BASED = (1<<5), + /* + * IB_ACCESS_GIFT: This memory is a gift to the adapter. If memory is + * modified after registration, the local version and data seen by the + * adapter through this region rkey may differ. + * Only legal with IB_ACCESS_REMOTE_READ or no permissions. + */ + IB_ACCESS_GIFT = (1<<6) }; struct ib_phys_buf {