From patchwork Tue Jan 25 13:57:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 1584043 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=VX9YqNT1; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-cifs-owner@vger.kernel.org; receiver=) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by bilbo.ozlabs.org (Postfix) with ESMTP id 4JjpST3dhHz9t5H for ; Wed, 26 Jan 2022 01:02:01 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237282AbiAYOBw (ORCPT ); Tue, 25 Jan 2022 09:01:52 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:34634 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1386774AbiAYN62 (ORCPT ); Tue, 25 Jan 2022 08:58:28 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643119071; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9HP3MD1DfSSpDSHqdPw1zELtDVvoyDQenq8/d5gy62Y=; b=VX9YqNT1HxhCjcsGJXortDDF6IJH10V1QAv91V6VUqFJhkD4YU5M1HHod3RCBMJhLHkRFu ig+ygYa7NE1IQL+mADEu/reD2cG0V8q3C3lgwIpOLjK9oTJV9Rts+U0I3MFiAzjhjf0QSJ cJebi0LntsxyGiRxj0JP8R87qdsEVQM= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-25-dmO4Of7fNvau3IY-F6jBwQ-1; Tue, 25 Jan 2022 08:57:49 -0500 X-MC-Unique: dmO4Of7fNvau3IY-F6jBwQ-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 4E47A1922020; Tue, 25 Jan 2022 13:57:47 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.5]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8FA307D73F; Tue, 25 Jan 2022 13:57:45 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [RFC PATCH 1/7] cifs: Transition from ->readpages() to ->readahead() From: David Howells To: smfrench@gmail.com, nspmangalore@gmail.com Cc: Matthew Wilcox , Jeff Layton , linux-cifs@vger.kernel.org, linux-cachefs@redhat.com, dhowells@redhat.com, jlayton@kernel.org, linux-cifs@vger.kernel.org, linux-cachefs@redhat.com, linux-fsdevel@vger.kernel.org Date: Tue, 25 Jan 2022 13:57:44 +0000 Message-ID: <164311906472.2806745.605202239282432844.stgit@warthog.procyon.org.uk> In-Reply-To: <164311902471.2806745.10187041199819525677.stgit@warthog.procyon.org.uk> References: <164311902471.2806745.10187041199819525677.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Precedence: bulk List-ID: X-Mailing-List: linux-cifs@vger.kernel.org Transition the cifs filesystem from using the old ->readpages() method to using the new ->readahead() method. For the moment, this removes any invocation of fscache to read data from the local cache, leaving that to another patch. Signed-off-by: David Howells cc: Steve French cc: Shyam Prasad N cc: Matthew Wilcox cc: Jeff Layton cc: linux-cifs@vger.kernel.org cc: linux-cachefs@redhat.com --- fs/cifs/file.c | 169 +++++++++++--------------------------------------------- 1 file changed, 33 insertions(+), 136 deletions(-) diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 015fd415e5ee..1cce7e5b2334 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -4269,8 +4269,6 @@ cifs_readv_complete(struct work_struct *work) for (i = 0; i < rdata->nr_pages; i++) { struct page *page = rdata->pages[i]; - lru_cache_add(page); - if (rdata->result == 0 || (rdata->result == -EAGAIN && got_bytes)) { flush_dcache_page(page); @@ -4340,7 +4338,6 @@ readpages_fill_pages(struct TCP_Server_Info *server, * fill them until the writes are flushed. */ zero_user(page, 0, PAGE_SIZE); - lru_cache_add(page); flush_dcache_page(page); SetPageUptodate(page); unlock_page(page); @@ -4350,7 +4347,6 @@ readpages_fill_pages(struct TCP_Server_Info *server, continue; } else { /* no need to hold page hostage */ - lru_cache_add(page); unlock_page(page); put_page(page); rdata->pages[i] = NULL; @@ -4393,92 +4389,16 @@ cifs_readpages_copy_into_pages(struct TCP_Server_Info *server, return readpages_fill_pages(server, rdata, iter, iter->count); } -static int -readpages_get_pages(struct address_space *mapping, struct list_head *page_list, - unsigned int rsize, struct list_head *tmplist, - unsigned int *nr_pages, loff_t *offset, unsigned int *bytes) -{ - struct page *page, *tpage; - unsigned int expected_index; - int rc; - gfp_t gfp = readahead_gfp_mask(mapping); - - INIT_LIST_HEAD(tmplist); - - page = lru_to_page(page_list); - - /* - * Lock the page and put it in the cache. Since no one else - * should have access to this page, we're safe to simply set - * PG_locked without checking it first. - */ - __SetPageLocked(page); - rc = add_to_page_cache_locked(page, mapping, - page->index, gfp); - - /* give up if we can't stick it in the cache */ - if (rc) { - __ClearPageLocked(page); - return rc; - } - - /* move first page to the tmplist */ - *offset = (loff_t)page->index << PAGE_SHIFT; - *bytes = PAGE_SIZE; - *nr_pages = 1; - list_move_tail(&page->lru, tmplist); - - /* now try and add more pages onto the request */ - expected_index = page->index + 1; - list_for_each_entry_safe_reverse(page, tpage, page_list, lru) { - /* discontinuity ? */ - if (page->index != expected_index) - break; - - /* would this page push the read over the rsize? */ - if (*bytes + PAGE_SIZE > rsize) - break; - - __SetPageLocked(page); - rc = add_to_page_cache_locked(page, mapping, page->index, gfp); - if (rc) { - __ClearPageLocked(page); - break; - } - list_move_tail(&page->lru, tmplist); - (*bytes) += PAGE_SIZE; - expected_index++; - (*nr_pages)++; - } - return rc; -} - -static int cifs_readpages(struct file *file, struct address_space *mapping, - struct list_head *page_list, unsigned num_pages) +static void cifs_readahead(struct readahead_control *ractl) { int rc; - int err = 0; - struct list_head tmplist; - struct cifsFileInfo *open_file = file->private_data; - struct cifs_sb_info *cifs_sb = CIFS_FILE_SB(file); + struct cifsFileInfo *open_file = ractl->file->private_data; + struct cifs_sb_info *cifs_sb = CIFS_FILE_SB(ractl->file); struct TCP_Server_Info *server; pid_t pid; unsigned int xid; xid = get_xid(); - /* - * Reads as many pages as possible from fscache. Returns -ENOBUFS - * immediately if the cookie is negative - * - * After this point, every page in the list might have PG_fscache set, - * so we will need to clean that up off of every page we don't use. - */ - rc = cifs_readpages_from_fscache(mapping->host, mapping, page_list, - &num_pages); - if (rc == 0) { - free_xid(xid); - return rc; - } if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD) pid = open_file->pid; @@ -4489,39 +4409,32 @@ static int cifs_readpages(struct file *file, struct address_space *mapping, server = cifs_pick_channel(tlink_tcon(open_file->tlink)->ses); cifs_dbg(FYI, "%s: file=%p mapping=%p num_pages=%u\n", - __func__, file, mapping, num_pages); + __func__, ractl->file, ractl->mapping, readahead_count(ractl)); /* - * Start with the page at end of list and move it to private - * list. Do the same with any following pages until we hit - * the rsize limit, hit an index discontinuity, or run out of - * pages. Issue the async read and then start the loop again - * until the list is empty. - * - * Note that list order is important. The page_list is in - * the order of declining indexes. When we put the pages in - * the rdata->pages, then we want them in increasing order. + * Chop the readahead request up into rsize-sized read requests. */ - while (!list_empty(page_list) && !err) { - unsigned int i, nr_pages, bytes, rsize; - loff_t offset; - struct page *page, *tpage; + while (readahead_count(ractl) - ractl->_batch_count) { + unsigned int i, nr_pages, got, rsize; + struct page *page; struct cifs_readdata *rdata; struct cifs_credits credits_on_stack; struct cifs_credits *credits = &credits_on_stack; if (open_file->invalidHandle) { rc = cifs_reopen_file(open_file, true); - if (rc == -EAGAIN) - continue; - else if (rc) + if (rc) { + if (rc == -EAGAIN) + continue; break; + } } rc = server->ops->wait_mtu_credits(server, cifs_sb->ctx->rsize, &rsize, credits); if (rc) break; + nr_pages = min_t(size_t, rsize / PAGE_SIZE, readahead_count(ractl)); /* * Give up immediately if rsize is too small to read an entire @@ -4529,16 +4442,7 @@ static int cifs_readpages(struct file *file, struct address_space *mapping, * reach this point however since we set ra_pages to 0 when the * rsize is smaller than a cache page. */ - if (unlikely(rsize < PAGE_SIZE)) { - add_credits_and_wake_if(server, credits, 0); - free_xid(xid); - return 0; - } - - nr_pages = 0; - err = readpages_get_pages(mapping, page_list, rsize, &tmplist, - &nr_pages, &offset, &bytes); - if (!nr_pages) { + if (unlikely(!nr_pages)) { add_credits_and_wake_if(server, credits, 0); break; } @@ -4546,36 +4450,31 @@ static int cifs_readpages(struct file *file, struct address_space *mapping, rdata = cifs_readdata_alloc(nr_pages, cifs_readv_complete); if (!rdata) { /* best to give up if we're out of mem */ - list_for_each_entry_safe(page, tpage, &tmplist, lru) { - list_del(&page->lru); - lru_cache_add(page); - unlock_page(page); - put_page(page); - } - rc = -ENOMEM; add_credits_and_wake_if(server, credits, 0); break; } - rdata->cfile = cifsFileInfo_get(open_file); - rdata->server = server; - rdata->mapping = mapping; - rdata->offset = offset; - rdata->bytes = bytes; - rdata->pid = pid; - rdata->pagesz = PAGE_SIZE; - rdata->tailsz = PAGE_SIZE; + got = __readahead_batch(ractl, rdata->pages, nr_pages); + if (got != nr_pages) { + pr_warn("__readahead_batch() returned %u/%u\n", + got, nr_pages); + nr_pages = got; + } + + rdata->nr_pages = nr_pages; + rdata->bytes = readahead_batch_length(ractl); + rdata->cfile = cifsFileInfo_get(open_file); + rdata->server = server; + rdata->mapping = ractl->mapping; + rdata->offset = readahead_pos(ractl); + rdata->pid = pid; + rdata->pagesz = PAGE_SIZE; + rdata->tailsz = PAGE_SIZE; rdata->read_into_pages = cifs_readpages_read_into_pages; rdata->copy_into_pages = cifs_readpages_copy_into_pages; - rdata->credits = credits_on_stack; - - list_for_each_entry_safe(page, tpage, &tmplist, lru) { - list_del(&page->lru); - rdata->pages[rdata->nr_pages++] = page; - } + rdata->credits = credits_on_stack; rc = adjust_credits(server, &rdata->credits, rdata->bytes); - if (!rc) { if (rdata->cfile->invalidHandle) rc = -EAGAIN; @@ -4587,7 +4486,6 @@ static int cifs_readpages(struct file *file, struct address_space *mapping, add_credits_and_wake_if(server, &rdata->credits, 0); for (i = 0; i < rdata->nr_pages; i++) { page = rdata->pages[i]; - lru_cache_add(page); unlock_page(page); put_page(page); } @@ -4600,7 +4498,6 @@ static int cifs_readpages(struct file *file, struct address_space *mapping, } free_xid(xid); - return rc; } /* @@ -4905,7 +4802,7 @@ void cifs_oplock_break(struct work_struct *work) * In the non-cached mode (mount with cache=none), we shunt off direct read and write requests * so this method should never be called. * - * Direct IO is not yet supported in the cached mode. + * Direct IO is not yet supported in the cached mode. */ static ssize_t cifs_direct_io(struct kiocb *iocb, struct iov_iter *iter) @@ -4987,7 +4884,7 @@ static int cifs_set_page_dirty(struct page *page) const struct address_space_operations cifs_addr_ops = { .readpage = cifs_readpage, - .readpages = cifs_readpages, + .readahead = cifs_readahead, .writepage = cifs_writepage, .writepages = cifs_writepages, .write_begin = cifs_write_begin, From patchwork Tue Jan 25 13:57:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 1584044 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=GECqZXyG; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-cifs-owner@vger.kernel.org; receiver=) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by bilbo.ozlabs.org (Postfix) with ESMTP id 4JjpSx6N5Dz9t5H for ; Wed, 26 Jan 2022 01:02:25 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1357032AbiAYOB4 (ORCPT ); Tue, 25 Jan 2022 09:01:56 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:47003 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1386843AbiAYN62 (ORCPT ); Tue, 25 Jan 2022 08:58:28 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643119079; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jYgB9bHfzimI06W2QhFWyhlriTJczgzjgiDDccLCb5U=; b=GECqZXyGneJRVyypve41/EmBPdOX8eJvzl/K6Zpq+uWDARzWhPevJ+WxGceOb7RtIEn5mo W+QqpXB+aZeZm2ZW7iJqxKCiRUAk7F6UtPOqWs5mShId5KiNtLc/I9n+QeNTkVnakVGmXu DrJT3BBXNN7H5FXxZZ6EAIiFYGBi1Bg= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-497-m-ZJYUkzOYm2LaHJM4XvCw-1; Tue, 25 Jan 2022 08:57:56 -0500 X-MC-Unique: m-ZJYUkzOYm2LaHJM4XvCw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id B0B091923B80; Tue, 25 Jan 2022 13:57:54 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.5]) by smtp.corp.redhat.com (Postfix) with ESMTP id 615C87E11B; Tue, 25 Jan 2022 13:57:53 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [RFC PATCH 2/7] cifs: Miscellaneous bits From: David Howells To: smfrench@gmail.com, nspmangalore@gmail.com Cc: dhowells@redhat.com, jlayton@kernel.org, linux-cifs@vger.kernel.org, linux-cachefs@redhat.com, linux-fsdevel@vger.kernel.org Date: Tue, 25 Jan 2022 13:57:52 +0000 Message-ID: <164311907254.2806745.4350376870116513772.stgit@warthog.procyon.org.uk> In-Reply-To: <164311902471.2806745.10187041199819525677.stgit@warthog.procyon.org.uk> References: <164311902471.2806745.10187041199819525677.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: linux-cifs@vger.kernel.org --- fs/cifs/connect.c | 2 +- fs/cifs/file.c | 8 +++++++- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index 11a22a30ee14..ed210d774a21 100644 --- a/fs/cifs/connect.c +++ b/fs/cifs/connect.c @@ -162,7 +162,7 @@ static void cifs_resolve_server(struct work_struct *work) mutex_unlock(&server->srv_mutex); } -/** +/* * Mark all sessions and tcons for reconnect. * * @server needs to be previously set to CifsNeedReconnect. diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 1cce7e5b2334..24722fe75def 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -4205,13 +4205,19 @@ cifs_page_mkwrite(struct vm_fault *vmf) { struct page *page = vmf->page; + /* Wait for the page to be written to the cache before we allow it to + * be modified. We then assume the entire page will need writing back. + */ #ifdef CONFIG_CIFS_FSCACHE if (PageFsCache(page) && wait_on_page_fscache_killable(page) < 0) return VM_FAULT_RETRY; #endif - lock_page(page); + wait_on_page_writeback(page); + + if (lock_page_killable(page) < 0) + return VM_FAULT_RETRY; return VM_FAULT_LOCKED; } From patchwork Tue Jan 25 13:57:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 1584045 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=c87FE/5a; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-cifs-owner@vger.kernel.org; receiver=) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by bilbo.ozlabs.org (Postfix) with ESMTP id 4JjpSy2sZ6z9t6h for ; Wed, 26 Jan 2022 01:02:26 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1386106AbiAYOCA (ORCPT ); Tue, 25 Jan 2022 09:02:00 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:49810 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1386686AbiAYN6j (ORCPT ); Tue, 25 Jan 2022 08:58:39 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643119097; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nhbFfRwOSfjK7caSGMuqNUieLROGtYnUrwx3E/4C6Yw=; b=c87FE/5aVmdvGsMWzhLgyQM3qdtQoXRIfVfD1no7vZi0wTVz8BHf7D7sIqHQ34rUeQG2t6 dCt6EoGCivWW5P4BDld+/VZc1vF2lNOlVaII7m/tBIIvrJymbUi2jlqe4nQp79dsuVQ4l2 mdsrj5HVkKIH6j0Bd7qRKOhgqTtJ5tU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-433-8neAd8qiMUuAtoXZX43WWw-1; Tue, 25 Jan 2022 08:58:15 -0500 X-MC-Unique: 8neAd8qiMUuAtoXZX43WWw-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 0CC551091DA1; Tue, 25 Jan 2022 13:58:14 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.5]) by smtp.corp.redhat.com (Postfix) with ESMTP id C8236703BB; Tue, 25 Jan 2022 13:58:00 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [RFC PATCH 3/7] cifs: Change the I/O paths to use an iterator rather than a page list From: David Howells To: smfrench@gmail.com, nspmangalore@gmail.com Cc: dhowells@redhat.com, jlayton@kernel.org, linux-cifs@vger.kernel.org, linux-cachefs@redhat.com, linux-fsdevel@vger.kernel.org Date: Tue, 25 Jan 2022 13:57:59 +0000 Message-ID: <164311907995.2806745.400147335497304099.stgit@warthog.procyon.org.uk> In-Reply-To: <164311902471.2806745.10187041199819525677.stgit@warthog.procyon.org.uk> References: <164311902471.2806745.10187041199819525677.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: linux-cifs@vger.kernel.org --- fs/cifs/cifsencrypt.c | 40 +++-- fs/cifs/cifsfs.c | 2 fs/cifs/cifsfs.h | 3 fs/cifs/cifsglob.h | 28 +--- fs/cifs/cifsproto.h | 10 + fs/cifs/cifssmb.c | 224 +++++++++++++++++++----------- fs/cifs/connect.c | 16 ++ fs/cifs/misc.c | 19 --- fs/cifs/smb2ops.c | 365 ++++++++++++++++++++++++------------------------- fs/cifs/smb2pdu.c | 12 -- fs/cifs/transport.c | 37 +---- 11 files changed, 379 insertions(+), 377 deletions(-) Reviewed-by: Shyam Prasad N diff --git a/fs/cifs/cifsencrypt.c b/fs/cifs/cifsencrypt.c index 0912d8bbbac1..69bbf3d6c4d4 100644 --- a/fs/cifs/cifsencrypt.c +++ b/fs/cifs/cifsencrypt.c @@ -24,12 +24,27 @@ #include "../smbfs_common/arc4.h" #include +static ssize_t cifs_signature_scan(struct iov_iter *i, const void *p, + size_t len, size_t off, void *priv) +{ + struct shash_desc *shash = priv; + int rc; + + rc = crypto_shash_update(shash, p, len); + if (rc) { + cifs_dbg(VFS, "%s: Could not update with payload\n", __func__); + return rc; + } + + return len; +} + int __cifs_calc_signature(struct smb_rqst *rqst, struct TCP_Server_Info *server, char *signature, struct shash_desc *shash) { int i; - int rc; + ssize_t rc; struct kvec *iov = rqst->rq_iov; int n_vec = rqst->rq_nvec; int is_smb2 = server->vals->header_preamble_size == 0; @@ -62,25 +77,10 @@ int __cifs_calc_signature(struct smb_rqst *rqst, } } - /* now hash over the rq_pages array */ - for (i = 0; i < rqst->rq_npages; i++) { - void *kaddr; - unsigned int len, offset; - - rqst_page_get_length(rqst, i, &len, &offset); - - kaddr = (char *) kmap(rqst->rq_pages[i]) + offset; - - rc = crypto_shash_update(shash, kaddr, len); - if (rc) { - cifs_dbg(VFS, "%s: Could not update with payload\n", - __func__); - kunmap(rqst->rq_pages[i]); - return rc; - } - - kunmap(rqst->rq_pages[i]); - } + rc = iov_iter_scan(&rqst->rq_iter, iov_iter_count(&rqst->rq_iter), + cifs_signature_scan, shash); + if (rc < 0) + return rc; rc = crypto_shash_final(shash, signature); if (rc) diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c index 199edac0cb59..a56cb9c8c5ff 100644 --- a/fs/cifs/cifsfs.c +++ b/fs/cifs/cifsfs.c @@ -935,7 +935,7 @@ cifs_loose_read_iter(struct kiocb *iocb, struct iov_iter *iter) ssize_t rc; struct inode *inode = file_inode(iocb->ki_filp); - if (iocb->ki_filp->f_flags & O_DIRECT) + if (iocb->ki_flags & IOCB_DIRECT) return cifs_user_readv(iocb, iter); rc = cifs_revalidate_mapping(inode); diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h index 15a5c5db038b..1c77bbc0815f 100644 --- a/fs/cifs/cifsfs.h +++ b/fs/cifs/cifsfs.h @@ -110,6 +110,9 @@ extern int cifs_file_strict_mmap(struct file * , struct vm_area_struct *); extern const struct file_operations cifs_dir_ops; extern int cifs_dir_open(struct inode *inode, struct file *file); extern int cifs_readdir(struct file *file, struct dir_context *ctx); +extern void cifs_pages_written_back(struct inode *inode, loff_t start, unsigned int len); +extern void cifs_pages_write_failed(struct inode *inode, loff_t start, unsigned int len); +extern void cifs_pages_write_redirty(struct inode *inode, loff_t start, unsigned int len); /* Functions related to dir entries */ extern const struct dentry_operations cifs_dentry_ops; diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h index 0a4085ced40f..3a4fed645636 100644 --- a/fs/cifs/cifsglob.h +++ b/fs/cifs/cifsglob.h @@ -191,11 +191,8 @@ struct cifs_cred { struct smb_rqst { struct kvec *rq_iov; /* array of kvecs */ unsigned int rq_nvec; /* number of kvecs in array */ - struct page **rq_pages; /* pointer to array of page ptrs */ - unsigned int rq_offset; /* the offset to the 1st page */ - unsigned int rq_npages; /* number pages in array */ - unsigned int rq_pagesz; /* page size to use */ - unsigned int rq_tailsz; /* length of last page */ + struct iov_iter rq_iter; /* Data iterator */ + struct xarray rq_buffer; /* Page buffer for encryption */ }; struct mid_q_entry; @@ -1323,28 +1320,18 @@ struct cifs_readdata { struct address_space *mapping; struct cifs_aio_ctx *ctx; __u64 offset; + ssize_t got_bytes; unsigned int bytes; - unsigned int got_bytes; pid_t pid; int result; struct work_struct work; - int (*read_into_pages)(struct TCP_Server_Info *server, - struct cifs_readdata *rdata, - unsigned int len); - int (*copy_into_pages)(struct TCP_Server_Info *server, - struct cifs_readdata *rdata, - struct iov_iter *iter); + struct iov_iter iter; struct kvec iov[2]; struct TCP_Server_Info *server; #ifdef CONFIG_CIFS_SMB_DIRECT struct smbd_mr *mr; #endif - unsigned int pagesz; - unsigned int page_offset; - unsigned int tailsz; struct cifs_credits credits; - unsigned int nr_pages; - struct page **pages; }; /* asynchronous write support */ @@ -1356,6 +1343,8 @@ struct cifs_writedata { struct work_struct work; struct cifsFileInfo *cfile; struct cifs_aio_ctx *ctx; + struct iov_iter iter; + struct bio_vec *bv; __u64 offset; pid_t pid; unsigned int bytes; @@ -1364,12 +1353,7 @@ struct cifs_writedata { #ifdef CONFIG_CIFS_SMB_DIRECT struct smbd_mr *mr; #endif - unsigned int pagesz; - unsigned int page_offset; - unsigned int tailsz; struct cifs_credits credits; - unsigned int nr_pages; - struct page **pages; }; /* diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h index d3701295402d..1b143f0a03c0 100644 --- a/fs/cifs/cifsproto.h +++ b/fs/cifs/cifsproto.h @@ -242,6 +242,9 @@ extern int cifs_read_page_from_socket(struct TCP_Server_Info *server, unsigned int page_offset, unsigned int to_read); extern int cifs_setup_cifs_sb(struct cifs_sb_info *cifs_sb); +extern int cifs_read_iter_from_socket(struct TCP_Server_Info *server, + struct iov_iter *iter, + unsigned int to_read); extern int cifs_match_super(struct super_block *, void *); extern int cifs_mount(struct cifs_sb_info *cifs_sb, struct smb3_fs_context *ctx); extern void cifs_umount(struct cifs_sb_info *); @@ -575,10 +578,7 @@ int cifs_readv_receive(struct TCP_Server_Info *server, struct mid_q_entry *mid); int cifs_async_writev(struct cifs_writedata *wdata, void (*release)(struct kref *kref)); void cifs_writev_complete(struct work_struct *work); -struct cifs_writedata *cifs_writedata_alloc(unsigned int nr_pages, - work_func_t complete); -struct cifs_writedata *cifs_writedata_direct_alloc(struct page **pages, - work_func_t complete); +struct cifs_writedata *cifs_writedata_alloc(work_func_t complete); void cifs_writedata_release(struct kref *refcount); int cifs_query_mf_symlink(unsigned int xid, struct cifs_tcon *tcon, struct cifs_sb_info *cifs_sb, @@ -602,8 +602,6 @@ int cifs_alloc_hash(const char *name, struct crypto_shash **shash, struct sdesc **sdesc); void cifs_free_hash(struct crypto_shash **shash, struct sdesc **sdesc); -extern void rqst_page_get_length(struct smb_rqst *rqst, unsigned int page, - unsigned int *len, unsigned int *offset); struct cifs_chan * cifs_ses_find_chan(struct cifs_ses *ses, struct TCP_Server_Info *server); int cifs_try_adding_channels(struct cifs_sb_info *cifs_sb, struct cifs_ses *ses); diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c index 071e2f21a7db..38e7276352e2 100644 --- a/fs/cifs/cifssmb.c +++ b/fs/cifs/cifssmb.c @@ -24,6 +24,7 @@ #include #include #include "cifspdu.h" +#include "cifsfs.h" #include "cifsglob.h" #include "cifsacl.h" #include "cifsproto.h" @@ -1388,11 +1389,11 @@ int cifs_discard_remaining_data(struct TCP_Server_Info *server) { unsigned int rfclen = server->pdu_size; - int remaining = rfclen + server->vals->header_preamble_size - + size_t remaining = rfclen + server->vals->header_preamble_size - server->total_read; while (remaining > 0) { - int length; + ssize_t length; length = cifs_discard_from_socket(server, min_t(size_t, remaining, @@ -1539,10 +1540,15 @@ cifs_readv_receive(struct TCP_Server_Info *server, struct mid_q_entry *mid) return cifs_readv_discard(server, mid); } - length = rdata->read_into_pages(server, rdata, data_len); - if (length < 0) - return length; - +#ifdef CONFIG_CIFS_SMB_DIRECT + if (rdata->mr) + length = data_len; /* An RDMA read is already done. */ + else +#endif + length = cifs_read_iter_from_socket(server, &rdata->iter, + data_len); + if (length > 0) + rdata->got_bytes += length; server->total_read += length; cifs_dbg(FYI, "total_read=%u buflen=%u remaining=%u\n", @@ -1566,11 +1572,7 @@ cifs_readv_callback(struct mid_q_entry *mid) struct TCP_Server_Info *server = tcon->ses->server; struct smb_rqst rqst = { .rq_iov = rdata->iov, .rq_nvec = 2, - .rq_pages = rdata->pages, - .rq_offset = rdata->page_offset, - .rq_npages = rdata->nr_pages, - .rq_pagesz = rdata->pagesz, - .rq_tailsz = rdata->tailsz }; + .rq_iter = rdata->iter }; struct cifs_credits credits = { .value = 1, .instance = 0 }; cifs_dbg(FYI, "%s: mid=%llu state=%d result=%d bytes=%u\n", @@ -1925,10 +1927,93 @@ cifs_writedata_release(struct kref *refcount) if (wdata->cfile) cifsFileInfo_put(wdata->cfile); - kvfree(wdata->pages); kfree(wdata); } +/* + * Completion of write to server. + */ +void cifs_pages_written_back(struct inode *inode, loff_t start, unsigned int len) +{ + struct address_space *mapping = inode->i_mapping; + struct folio *folio; + pgoff_t end; + + XA_STATE(xas, &mapping->i_pages, start / PAGE_SIZE); + + rcu_read_lock(); + + end = (start + len - 1) / PAGE_SIZE; + xas_for_each(&xas, folio, end) { + if (!folio_test_writeback(folio)) { + pr_err("bad %x @%llx page %lx %lx\n", + len, start, folio_index(folio), end); + BUG(); + } + + folio_detach_private(folio); + folio_end_writeback(folio); + } + + rcu_read_unlock(); +} + +/* + * Failure of write to server. + */ +void cifs_pages_write_failed(struct inode *inode, loff_t start, unsigned int len) +{ + struct address_space *mapping = inode->i_mapping; + struct folio *folio; + pgoff_t end; + + XA_STATE(xas, &mapping->i_pages, start / PAGE_SIZE); + + rcu_read_lock(); + + end = (start + len - 1) / PAGE_SIZE; + xas_for_each(&xas, folio, end) { + if (!folio_test_writeback(folio)) { + pr_err("bad %x @%llx page %lx %lx\n", + len, start, folio_index(folio), end); + BUG(); + } + + folio_set_error(folio); + folio_end_writeback(folio); + } + + rcu_read_unlock(); +} + +/* + * Redirty pages after a temporary failure. + */ +void cifs_pages_write_redirty(struct inode *inode, loff_t start, unsigned int len) +{ + struct address_space *mapping = inode->i_mapping; + struct folio *folio; + pgoff_t end; + + XA_STATE(xas, &mapping->i_pages, start / PAGE_SIZE); + + rcu_read_lock(); + + end = (start + len - 1) / PAGE_SIZE; + xas_for_each(&xas, folio, end) { + if (!folio_test_writeback(folio)) { + pr_err("bad %x @%llx page %lx %lx\n", + len, start, folio_index(folio), end); + BUG(); + } + + filemap_dirty_folio(folio->mapping, folio); + folio_end_writeback(folio); + } + + rcu_read_unlock(); +} + /* * Write failed with a retryable error. Resend the write request. It's also * possible that the page was redirtied so re-clean the page. @@ -1936,51 +2021,56 @@ cifs_writedata_release(struct kref *refcount) static void cifs_writev_requeue(struct cifs_writedata *wdata) { - int i, rc = 0; + int rc = 0; struct inode *inode = d_inode(wdata->cfile->dentry); struct TCP_Server_Info *server; - unsigned int rest_len; + unsigned int rest_len = wdata->bytes; + loff_t fpos = wdata->offset; server = tlink_tcon(wdata->cfile->tlink)->ses->server; - i = 0; - rest_len = wdata->bytes; do { struct cifs_writedata *wdata2; - unsigned int j, nr_pages, wsize, tailsz, cur_len; + unsigned int wsize, cur_len; wsize = server->ops->wp_retry_size(inode); if (wsize < rest_len) { - nr_pages = wsize / PAGE_SIZE; - if (!nr_pages) { + if (wsize < PAGE_SIZE) { rc = -ENOTSUPP; break; } - cur_len = nr_pages * PAGE_SIZE; - tailsz = PAGE_SIZE; + cur_len = min(round_down(wsize, PAGE_SIZE), rest_len); } else { - nr_pages = DIV_ROUND_UP(rest_len, PAGE_SIZE); cur_len = rest_len; - tailsz = rest_len - (nr_pages - 1) * PAGE_SIZE; } - wdata2 = cifs_writedata_alloc(nr_pages, cifs_writev_complete); + wdata2 = cifs_writedata_alloc(cifs_writev_complete); if (!wdata2) { rc = -ENOMEM; break; } - for (j = 0; j < nr_pages; j++) { - wdata2->pages[j] = wdata->pages[i + j]; - lock_page(wdata2->pages[j]); - clear_page_dirty_for_io(wdata2->pages[j]); - } - wdata2->sync_mode = wdata->sync_mode; - wdata2->nr_pages = nr_pages; - wdata2->offset = page_offset(wdata2->pages[0]); - wdata2->pagesz = PAGE_SIZE; - wdata2->tailsz = tailsz; - wdata2->bytes = cur_len; + wdata2->offset = fpos; + wdata2->bytes = cur_len; + wdata2->iter = wdata->iter; + + iov_iter_advance(&wdata2->iter, fpos - wdata->offset); + iov_iter_truncate(&wdata2->iter, wdata2->bytes); + +#if 0 + if (iov_iter_is_xarray(&wdata2->iter)) { + /* TODO: Check for pages having been redirtied and + * clean them. We can do this by walking the xarray. + * If it's not an xarray, then it's a DIO and we + * shouldn't be mucking around with the page bits. + */ + for (j = 0; j < nr_pages; j++) { + wdata2->pages[j] = wdata->pages[i + j]; + lock_page(wdata2->pages[j]); + clear_page_dirty_for_io(wdata2->pages[j]); + } + } +#endif rc = cifs_get_writable_file(CIFS_I(inode), FIND_WR_ANY, &wdata2->cfile); @@ -1995,33 +2085,25 @@ cifs_writev_requeue(struct cifs_writedata *wdata) cifs_writedata_release); } - for (j = 0; j < nr_pages; j++) { - unlock_page(wdata2->pages[j]); - if (rc != 0 && !is_retryable_error(rc)) { - SetPageError(wdata2->pages[j]); - end_page_writeback(wdata2->pages[j]); - put_page(wdata2->pages[j]); - } - } + if (iov_iter_is_xarray(&wdata2->iter)) + cifs_pages_written_back(inode, wdata2->offset, wdata2->bytes); kref_put(&wdata2->refcount, cifs_writedata_release); if (rc) { if (is_retryable_error(rc)) continue; - i += nr_pages; + fpos += cur_len; + rest_len -= cur_len; break; } + fpos += cur_len; rest_len -= cur_len; - i += nr_pages; - } while (i < wdata->nr_pages); + } while (rest_len > 0); - /* cleanup remaining pages from the original wdata */ - for (; i < wdata->nr_pages; i++) { - SetPageError(wdata->pages[i]); - end_page_writeback(wdata->pages[i]); - put_page(wdata->pages[i]); - } + /* Clean up remaining pages from the original wdata */ + if (iov_iter_is_xarray(&wdata->iter)) + cifs_pages_written_back(inode, fpos, rest_len); if (rc != 0 && !is_retryable_error(rc)) mapping_set_error(inode->i_mapping, rc); @@ -2034,7 +2116,6 @@ cifs_writev_complete(struct work_struct *work) struct cifs_writedata *wdata = container_of(work, struct cifs_writedata, work); struct inode *inode = d_inode(wdata->cfile->dentry); - int i = 0; if (wdata->result == 0) { spin_lock(&inode->i_lock); @@ -2045,40 +2126,25 @@ cifs_writev_complete(struct work_struct *work) } else if (wdata->sync_mode == WB_SYNC_ALL && wdata->result == -EAGAIN) return cifs_writev_requeue(wdata); - for (i = 0; i < wdata->nr_pages; i++) { - struct page *page = wdata->pages[i]; - if (wdata->result == -EAGAIN) - __set_page_dirty_nobuffers(page); - else if (wdata->result < 0) - SetPageError(page); - end_page_writeback(page); - cifs_readpage_to_fscache(inode, page); - put_page(page); - } + if (wdata->result == -EAGAIN) + cifs_pages_write_redirty(inode, wdata->offset, wdata->bytes); + else if (wdata->result < 0) + cifs_pages_write_failed(inode, wdata->offset, wdata->bytes); + else + cifs_pages_written_back(inode, wdata->offset, wdata->bytes); + if (wdata->result != -EAGAIN) mapping_set_error(inode->i_mapping, wdata->result); kref_put(&wdata->refcount, cifs_writedata_release); } struct cifs_writedata * -cifs_writedata_alloc(unsigned int nr_pages, work_func_t complete) -{ - struct page **pages = - kcalloc(nr_pages, sizeof(struct page *), GFP_NOFS); - if (pages) - return cifs_writedata_direct_alloc(pages, complete); - - return NULL; -} - -struct cifs_writedata * -cifs_writedata_direct_alloc(struct page **pages, work_func_t complete) +cifs_writedata_alloc(work_func_t complete) { struct cifs_writedata *wdata; wdata = kzalloc(sizeof(*wdata), GFP_NOFS); if (wdata != NULL) { - wdata->pages = pages; kref_init(&wdata->refcount); INIT_LIST_HEAD(&wdata->list); init_completion(&wdata->done); @@ -2186,11 +2252,7 @@ cifs_async_writev(struct cifs_writedata *wdata, rqst.rq_iov = iov; rqst.rq_nvec = 2; - rqst.rq_pages = wdata->pages; - rqst.rq_offset = wdata->page_offset; - rqst.rq_npages = wdata->nr_pages; - rqst.rq_pagesz = wdata->pagesz; - rqst.rq_tailsz = wdata->tailsz; + rqst.rq_iter = wdata->iter; cifs_dbg(FYI, "async write at %llu %u bytes\n", wdata->offset, wdata->bytes); diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index ed210d774a21..d0851c9881b3 100644 --- a/fs/cifs/connect.c +++ b/fs/cifs/connect.c @@ -704,6 +704,22 @@ cifs_read_page_from_socket(struct TCP_Server_Info *server, struct page *page, return cifs_readv_from_socket(server, &smb_msg); } +int +cifs_read_iter_from_socket(struct TCP_Server_Info *server, struct iov_iter *iter, + unsigned int to_read) +{ + struct msghdr smb_msg; + int ret; + + smb_msg.msg_iter = *iter; + if (smb_msg.msg_iter.count > to_read) + smb_msg.msg_iter.count = to_read; + ret = cifs_readv_from_socket(server, &smb_msg); + if (ret > 0) + iov_iter_advance(iter, ret); + return ret; +} + static bool is_smb_response(struct TCP_Server_Info *server, unsigned char type) { diff --git a/fs/cifs/misc.c b/fs/cifs/misc.c index 56598f7dbe00..f5fe5720456a 100644 --- a/fs/cifs/misc.c +++ b/fs/cifs/misc.c @@ -1122,25 +1122,6 @@ cifs_free_hash(struct crypto_shash **shash, struct sdesc **sdesc) *shash = NULL; } -/** - * rqst_page_get_length - obtain the length and offset for a page in smb_rqst - * @rqst: The request descriptor - * @page: The index of the page to query - * @len: Where to store the length for this page: - * @offset: Where to store the offset for this page - */ -void rqst_page_get_length(struct smb_rqst *rqst, unsigned int page, - unsigned int *len, unsigned int *offset) -{ - *len = rqst->rq_pagesz; - *offset = (page == 0) ? rqst->rq_offset : 0; - - if (rqst->rq_npages == 1 || page == rqst->rq_npages-1) - *len = rqst->rq_tailsz; - else if (page == 0) - *len = rqst->rq_pagesz - rqst->rq_offset; -} - void extract_unc_hostname(const char *unc, const char **h, size_t *len) { const char *end; diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c index af5d0830bc8a..e1649ac194db 100644 --- a/fs/cifs/smb2ops.c +++ b/fs/cifs/smb2ops.c @@ -4406,15 +4406,30 @@ fill_transform_hdr(struct smb2_transform_hdr *tr_hdr, unsigned int orig_len, static inline void smb2_sg_set_buf(struct scatterlist *sg, const void *buf, unsigned int buflen) { - void *addr; + struct page *page; + /* * VMAP_STACK (at least) puts stack into the vmalloc address space */ if (is_vmalloc_addr(buf)) - addr = vmalloc_to_page(buf); + page = vmalloc_to_page(buf); else - addr = virt_to_page(buf); - sg_set_page(sg, addr, buflen, offset_in_page(buf)); + page = virt_to_page(buf); + sg_set_page(sg, page, buflen, offset_in_page(buf)); +} + +struct cifs_init_sg_priv { + struct scatterlist *sg; + unsigned int idx; +}; + +static ssize_t cifs_init_sg_scan(struct iov_iter *i, const void *p, + size_t len, size_t off, void *_priv) +{ + struct cifs_init_sg_priv *priv = _priv; + + smb2_sg_set_buf(&priv->sg[priv->idx++], p, len); + return len; } /* Assumes the first rqst has a transform header as the first iov. @@ -4426,43 +4441,46 @@ static inline void smb2_sg_set_buf(struct scatterlist *sg, const void *buf, static struct scatterlist * init_sg(int num_rqst, struct smb_rqst *rqst, u8 *sign) { + struct cifs_init_sg_priv priv; unsigned int sg_len; - struct scatterlist *sg; unsigned int i; unsigned int j; - unsigned int idx = 0; + ssize_t rc; int skip; sg_len = 1; - for (i = 0; i < num_rqst; i++) - sg_len += rqst[i].rq_nvec + rqst[i].rq_npages; + for (i = 0; i < num_rqst; i++) { + unsigned int np = iov_iter_npages(&rqst[i].rq_iter, INT_MAX); + sg_len += rqst[i].rq_nvec + np; + } - sg = kmalloc_array(sg_len, sizeof(struct scatterlist), GFP_KERNEL); - if (!sg) + priv.idx = 0; + priv.sg = kmalloc_array(sg_len, sizeof(struct scatterlist), GFP_KERNEL); + if (!priv.sg) return NULL; - sg_init_table(sg, sg_len); + sg_init_table(priv.sg, sg_len); for (i = 0; i < num_rqst; i++) { + struct iov_iter *iter = &rqst[i].rq_iter; + size_t count = iov_iter_count(iter); + for (j = 0; j < rqst[i].rq_nvec; j++) { /* * The first rqst has a transform header where the * first 20 bytes are not part of the encrypted blob */ skip = (i == 0) && (j == 0) ? 20 : 0; - smb2_sg_set_buf(&sg[idx++], + smb2_sg_set_buf(&priv.sg[priv.idx++], rqst[i].rq_iov[j].iov_base + skip, rqst[i].rq_iov[j].iov_len - skip); - } - - for (j = 0; j < rqst[i].rq_npages; j++) { - unsigned int len, offset; - - rqst_page_get_length(&rqst[i], j, &len, &offset); - sg_set_page(&sg[idx++], rqst[i].rq_pages[j], len, offset); } + + rc = iov_iter_scan(iter, count, cifs_init_sg_scan, &priv); + iov_iter_revert(iter, count); + WARN_ON(rc < 0); } - smb2_sg_set_buf(&sg[idx], sign, SMB2_SIGNATURE_SIZE); - return sg; + smb2_sg_set_buf(&priv.sg[priv.idx], sign, SMB2_SIGNATURE_SIZE); + return priv.sg; } static int @@ -4599,18 +4617,30 @@ crypt_message(struct TCP_Server_Info *server, int num_rqst, return rc; } +/* + * Clear a read buffer, discarding the folios which have XA_MARK_0 set. + */ +static void cifs_clear_xarray_buffer(struct xarray *buffer) +{ + struct folio *folio; + XA_STATE(xas, buffer, 0); + + rcu_read_lock(); + xas_for_each_marked(&xas, folio, ULONG_MAX, XA_MARK_0) { + folio_put(folio); + } + rcu_read_unlock(); + xa_destroy(buffer); +} + void smb3_free_compound_rqst(int num_rqst, struct smb_rqst *rqst) { - int i, j; + int i; - for (i = 0; i < num_rqst; i++) { - if (rqst[i].rq_pages) { - for (j = rqst[i].rq_npages - 1; j >= 0; j--) - put_page(rqst[i].rq_pages[j]); - kfree(rqst[i].rq_pages); - } - } + for (i = 0; i < num_rqst; i++) + if (!xa_empty(&rqst[i].rq_buffer)) + cifs_clear_xarray_buffer(&rqst[i].rq_buffer); } /* @@ -4630,50 +4660,51 @@ static int smb3_init_transform_rq(struct TCP_Server_Info *server, int num_rqst, struct smb_rqst *new_rq, struct smb_rqst *old_rq) { - struct page **pages; struct smb2_transform_hdr *tr_hdr = new_rq[0].rq_iov[0].iov_base; - unsigned int npages; + struct page *page; unsigned int orig_len = 0; int i, j; int rc = -ENOMEM; for (i = 1; i < num_rqst; i++) { - npages = old_rq[i - 1].rq_npages; - pages = kmalloc_array(npages, sizeof(struct page *), - GFP_KERNEL); - if (!pages) - goto err_free; - - new_rq[i].rq_pages = pages; - new_rq[i].rq_npages = npages; - new_rq[i].rq_offset = old_rq[i - 1].rq_offset; - new_rq[i].rq_pagesz = old_rq[i - 1].rq_pagesz; - new_rq[i].rq_tailsz = old_rq[i - 1].rq_tailsz; - new_rq[i].rq_iov = old_rq[i - 1].rq_iov; - new_rq[i].rq_nvec = old_rq[i - 1].rq_nvec; - - orig_len += smb_rqst_len(server, &old_rq[i - 1]); - - for (j = 0; j < npages; j++) { - pages[j] = alloc_page(GFP_KERNEL|__GFP_HIGHMEM); - if (!pages[j]) - goto err_free; - } - - /* copy pages form the old */ - for (j = 0; j < npages; j++) { - char *dst, *src; - unsigned int offset, len; - - rqst_page_get_length(&new_rq[i], j, &len, &offset); - - dst = (char *) kmap(new_rq[i].rq_pages[j]) + offset; - src = (char *) kmap(old_rq[i - 1].rq_pages[j]) + offset; + struct smb_rqst *old = &old_rq[i - 1]; + struct smb_rqst *new = &new_rq[i]; + struct xarray *buffer = &new->rq_buffer; + unsigned int npages; + size_t size = iov_iter_count(&old->rq_iter), seg, copied = 0; + + xa_init(buffer); + + if (size > 0) { + npages = DIV_ROUND_UP(size, PAGE_SIZE); + for (j = 0; j < npages; j++) { + void *o; + + rc = -ENOMEM; + page = alloc_page(GFP_KERNEL|__GFP_HIGHMEM); + if (!page) + goto err_free; + page->index = j; + o = xa_store(buffer, j, page, GFP_KERNEL); + if (xa_is_err(o)) { + rc = xa_err(o); + put_page(page); + goto err_free; + } - memcpy(dst, src, len); - kunmap(new_rq[i].rq_pages[j]); - kunmap(old_rq[i - 1].rq_pages[j]); + seg = min(size - copied, PAGE_SIZE); + if (copy_page_from_iter(page, 0, seg, &old->rq_iter) != seg) { + rc = -EFAULT; + goto err_free; + } + copied += seg; + } + iov_iter_xarray(&new->rq_iter, iov_iter_rw(&old->rq_iter), + buffer, 0, size); } + new->rq_iov = old->rq_iov; + new->rq_nvec = old->rq_nvec; + orig_len += smb_rqst_len(server, new); } /* fill the 1st iov with a transform header */ @@ -4701,12 +4732,12 @@ smb3_is_transform_hdr(void *buf) static int decrypt_raw_data(struct TCP_Server_Info *server, char *buf, - unsigned int buf_data_size, struct page **pages, - unsigned int npages, unsigned int page_data_size, + unsigned int buf_data_size, struct iov_iter *iter, bool is_offloaded) { struct kvec iov[2]; struct smb_rqst rqst = {NULL}; + size_t iter_size = 0; int rc; iov[0].iov_base = buf; @@ -4716,10 +4747,10 @@ decrypt_raw_data(struct TCP_Server_Info *server, char *buf, rqst.rq_iov = iov; rqst.rq_nvec = 2; - rqst.rq_pages = pages; - rqst.rq_npages = npages; - rqst.rq_pagesz = PAGE_SIZE; - rqst.rq_tailsz = (page_data_size % PAGE_SIZE) ? : PAGE_SIZE; + if (iter) { + rqst.rq_iter = *iter; + iter_size = iov_iter_count(iter); + } rc = crypt_message(server, 1, &rqst, 0); cifs_dbg(FYI, "Decrypt message returned %d\n", rc); @@ -4730,73 +4761,37 @@ decrypt_raw_data(struct TCP_Server_Info *server, char *buf, memmove(buf, iov[1].iov_base, buf_data_size); if (!is_offloaded) - server->total_read = buf_data_size + page_data_size; + server->total_read = buf_data_size + iter_size; return rc; } static int -read_data_into_pages(struct TCP_Server_Info *server, struct page **pages, - unsigned int npages, unsigned int len) +cifs_copy_pages_to_iter(struct xarray *pages, unsigned int data_size, + unsigned int skip, struct iov_iter *iter) { - int i; - int length; + struct page *page; + unsigned long index; - for (i = 0; i < npages; i++) { - struct page *page = pages[i]; - size_t n; + xa_for_each(pages, index, page) { + size_t n, len = min_t(unsigned int, PAGE_SIZE - skip, data_size); - n = len; - if (len >= PAGE_SIZE) { - /* enough data to fill the page */ - n = PAGE_SIZE; - len -= n; - } else { - zero_user(page, len, PAGE_SIZE - len); - len = 0; + n = copy_page_to_iter(page, skip, len, iter); + if (n != len) { + cifs_dbg(VFS, "%s: something went wrong\n", __func__); + return -EIO; } - length = cifs_read_page_from_socket(server, page, 0, n); - if (length < 0) - return length; - server->total_read += length; + data_size -= n; + skip = 0; } return 0; } -static int -init_read_bvec(struct page **pages, unsigned int npages, unsigned int data_size, - unsigned int cur_off, struct bio_vec **page_vec) -{ - struct bio_vec *bvec; - int i; - - bvec = kcalloc(npages, sizeof(struct bio_vec), GFP_KERNEL); - if (!bvec) - return -ENOMEM; - - for (i = 0; i < npages; i++) { - bvec[i].bv_page = pages[i]; - bvec[i].bv_offset = (i == 0) ? cur_off : 0; - bvec[i].bv_len = min_t(unsigned int, PAGE_SIZE, data_size); - data_size -= bvec[i].bv_len; - } - - if (data_size != 0) { - cifs_dbg(VFS, "%s: something went wrong\n", __func__); - kfree(bvec); - return -EIO; - } - - *page_vec = bvec; - return 0; -} - static int handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid, - char *buf, unsigned int buf_len, struct page **pages, - unsigned int npages, unsigned int page_data_size, - bool is_offloaded) + char *buf, unsigned int buf_len, struct xarray *pages, + unsigned int pages_len, bool is_offloaded) { unsigned int data_offset; unsigned int data_len; @@ -4805,9 +4800,6 @@ handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid, unsigned int pad_len; struct cifs_readdata *rdata = mid->callback_data; struct smb2_hdr *shdr = (struct smb2_hdr *)buf; - struct bio_vec *bvec = NULL; - struct iov_iter iter; - struct kvec iov; int length; bool use_rdma_mr = false; @@ -4896,7 +4888,7 @@ handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid, return 0; } - if (data_len > page_data_size - pad_len) { + if (data_len > pages_len - pad_len) { /* data_len is corrupt -- discard frame */ rdata->result = -EIO; if (is_offloaded) @@ -4906,8 +4898,9 @@ handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid, return 0; } - rdata->result = init_read_bvec(pages, npages, page_data_size, - cur_off, &bvec); + /* Copy the data to the output I/O iterator. */ + rdata->result = cifs_copy_pages_to_iter(pages, pages_len, + cur_off, &rdata->iter); if (rdata->result != 0) { if (is_offloaded) mid->mid_state = MID_RESPONSE_MALFORMED; @@ -4915,14 +4908,15 @@ handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid, dequeue_mid(mid, rdata->result); return 0; } + rdata->got_bytes = pages_len; - iov_iter_bvec(&iter, WRITE, bvec, npages, data_len); } else if (buf_len >= data_offset + data_len) { /* read response payload is in buf */ - WARN_ONCE(npages > 0, "read data can be either in buf or in pages"); - iov.iov_base = buf + data_offset; - iov.iov_len = data_len; - iov_iter_kvec(&iter, WRITE, &iov, 1, data_len); + WARN_ONCE(pages && !xa_empty(pages), + "read data can be either in buf or in pages"); + length = copy_to_iter(buf + data_offset, data_len, &rdata->iter); + if (length < 0) + return length; } else { /* read response payload cannot be in both buf and pages */ WARN_ONCE(1, "buf can not contain only a part of read data"); @@ -4934,13 +4928,6 @@ handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid, return 0; } - length = rdata->copy_into_pages(server, rdata, &iter); - - kfree(bvec); - - if (length < 0) - return length; - if (is_offloaded) mid->mid_state = MID_RESPONSE_RECEIVED; else @@ -4951,9 +4938,8 @@ handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid, struct smb2_decrypt_work { struct work_struct decrypt; struct TCP_Server_Info *server; - struct page **ppages; + struct xarray buffer; char *buf; - unsigned int npages; unsigned int len; }; @@ -4962,11 +4948,13 @@ static void smb2_decrypt_offload(struct work_struct *work) { struct smb2_decrypt_work *dw = container_of(work, struct smb2_decrypt_work, decrypt); - int i, rc; + int rc; struct mid_q_entry *mid; + struct iov_iter iter; + iov_iter_xarray(&iter, READ, &dw->buffer, 0, dw->len); rc = decrypt_raw_data(dw->server, dw->buf, dw->server->vals->read_rsp_size, - dw->ppages, dw->npages, dw->len, true); + &iter, true); if (rc) { cifs_dbg(VFS, "error decrypting rc=%d\n", rc); goto free_pages; @@ -4980,7 +4968,7 @@ static void smb2_decrypt_offload(struct work_struct *work) mid->decrypted = true; rc = handle_read_data(dw->server, mid, dw->buf, dw->server->vals->read_rsp_size, - dw->ppages, dw->npages, dw->len, + &dw->buffer, dw->len, true); if (rc >= 0) { #ifdef CONFIG_CIFS_STATS2 @@ -5012,10 +5000,7 @@ static void smb2_decrypt_offload(struct work_struct *work) } free_pages: - for (i = dw->npages-1; i >= 0; i--) - put_page(dw->ppages[i]); - - kfree(dw->ppages); + cifs_clear_xarray_buffer(&dw->buffer); cifs_small_buf_release(dw->buf); kfree(dw); } @@ -5025,47 +5010,66 @@ static int receive_encrypted_read(struct TCP_Server_Info *server, struct mid_q_entry **mid, int *num_mids) { + struct page *page; char *buf = server->smallbuf; struct smb2_transform_hdr *tr_hdr = (struct smb2_transform_hdr *)buf; - unsigned int npages; - struct page **pages; - unsigned int len; + struct iov_iter iter; + unsigned int len, npages; unsigned int buflen = server->pdu_size; int rc; int i = 0; struct smb2_decrypt_work *dw; + dw = kzalloc(sizeof(struct smb2_decrypt_work), GFP_KERNEL); + if (!dw) + return -ENOMEM; + xa_init(&dw->buffer); + INIT_WORK(&dw->decrypt, smb2_decrypt_offload); + dw->server = server; + *num_mids = 1; len = min_t(unsigned int, buflen, server->vals->read_rsp_size + sizeof(struct smb2_transform_hdr)) - HEADER_SIZE(server) + 1; rc = cifs_read_from_socket(server, buf + HEADER_SIZE(server) - 1, len); if (rc < 0) - return rc; + goto free_dw; server->total_read += rc; len = le32_to_cpu(tr_hdr->OriginalMessageSize) - server->vals->read_rsp_size; + dw->len = len; npages = DIV_ROUND_UP(len, PAGE_SIZE); - pages = kmalloc_array(npages, sizeof(struct page *), GFP_KERNEL); - if (!pages) { - rc = -ENOMEM; - goto discard_data; - } - + rc = -ENOMEM; for (; i < npages; i++) { - pages[i] = alloc_page(GFP_KERNEL|__GFP_HIGHMEM); - if (!pages[i]) { - rc = -ENOMEM; + void *old; + + page = alloc_page(GFP_KERNEL|__GFP_HIGHMEM); + if (!page) { + goto discard_data; + } + page->index = i; + old = xa_store(&dw->buffer, i, page, GFP_KERNEL); + if (xa_is_err(old)) { + rc = xa_err(old); + put_page(page); goto discard_data; } } - /* read read data into pages */ - rc = read_data_into_pages(server, pages, npages, len); - if (rc) - goto free_pages; + iov_iter_xarray(&iter, READ, &dw->buffer, 0, npages * PAGE_SIZE); + + /* Read the data into the buffer and clear excess bufferage. */ + rc = cifs_read_iter_from_socket(server, &iter, dw->len); + if (rc < 0) + goto discard_data; + + server->total_read += rc; + if (rc < npages * PAGE_SIZE) + iov_iter_zero(npages * PAGE_SIZE - rc, &iter); + iov_iter_revert(&iter, npages * PAGE_SIZE); + iov_iter_truncate(&iter, dw->len); rc = cifs_discard_remaining_data(server); if (rc) @@ -5078,39 +5082,28 @@ receive_encrypted_read(struct TCP_Server_Info *server, struct mid_q_entry **mid, if ((server->min_offload) && (server->in_flight > 1) && (server->pdu_size >= server->min_offload)) { - dw = kmalloc(sizeof(struct smb2_decrypt_work), GFP_KERNEL); - if (dw == NULL) - goto non_offloaded_decrypt; - dw->buf = server->smallbuf; server->smallbuf = (char *)cifs_small_buf_get(); - INIT_WORK(&dw->decrypt, smb2_decrypt_offload); - - dw->npages = npages; - dw->server = server; - dw->ppages = pages; - dw->len = len; queue_work(decrypt_wq, &dw->decrypt); *num_mids = 0; /* worker thread takes care of finding mid */ return -1; } -non_offloaded_decrypt: rc = decrypt_raw_data(server, buf, server->vals->read_rsp_size, - pages, npages, len, false); + &iter, false); if (rc) goto free_pages; *mid = smb2_find_mid(server, buf); - if (*mid == NULL) + if (*mid == NULL) { cifs_dbg(FYI, "mid not found\n"); - else { + } else { cifs_dbg(FYI, "mid found\n"); (*mid)->decrypted = true; rc = handle_read_data(server, *mid, buf, server->vals->read_rsp_size, - pages, npages, len, false); + &dw->buffer, dw->len, false); if (rc >= 0) { if (server->ops->is_network_name_deleted) { server->ops->is_network_name_deleted(buf, @@ -5120,9 +5113,9 @@ receive_encrypted_read(struct TCP_Server_Info *server, struct mid_q_entry **mid, } free_pages: - for (i = i - 1; i >= 0; i--) - put_page(pages[i]); - kfree(pages); + cifs_clear_xarray_buffer(&dw->buffer); +free_dw: + kfree(dw); return rc; discard_data: cifs_discard_remaining_data(server); @@ -5160,7 +5153,7 @@ receive_encrypted_standard(struct TCP_Server_Info *server, server->total_read += length; buf_size = pdu_length - sizeof(struct smb2_transform_hdr); - length = decrypt_raw_data(server, buf, buf_size, NULL, 0, 0, false); + length = decrypt_raw_data(server, buf, buf_size, NULL, false); if (length) return length; @@ -5259,7 +5252,7 @@ smb3_handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid) char *buf = server->large_buf ? server->bigbuf : server->smallbuf; return handle_read_data(server, mid, buf, server->pdu_size, - NULL, 0, 0, false); + NULL, 0, false); } static int diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c index 7e7909b1ae11..ebbea7526ee2 100644 --- a/fs/cifs/smb2pdu.c +++ b/fs/cifs/smb2pdu.c @@ -4118,11 +4118,7 @@ smb2_readv_callback(struct mid_q_entry *mid) struct cifs_credits credits = { .value = 0, .instance = 0 }; struct smb_rqst rqst = { .rq_iov = &rdata->iov[1], .rq_nvec = 1, - .rq_pages = rdata->pages, - .rq_offset = rdata->page_offset, - .rq_npages = rdata->nr_pages, - .rq_pagesz = rdata->pagesz, - .rq_tailsz = rdata->tailsz }; + .rq_iter = rdata->iter }; WARN_ONCE(rdata->server != mid->server, "rdata server %p != mid server %p", @@ -4522,11 +4518,7 @@ smb2_async_writev(struct cifs_writedata *wdata, rqst.rq_iov = iov; rqst.rq_nvec = 1; - rqst.rq_pages = wdata->pages; - rqst.rq_offset = wdata->page_offset; - rqst.rq_npages = wdata->nr_pages; - rqst.rq_pagesz = wdata->pagesz; - rqst.rq_tailsz = wdata->tailsz; + rqst.rq_iter = wdata->iter; #ifdef CONFIG_CIFS_SMB_DIRECT if (wdata->mr) { iov[0].iov_len += sizeof(struct smbd_buffer_descriptor_v1); diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c index 8540f7c13eae..cb19c43c0009 100644 --- a/fs/cifs/transport.c +++ b/fs/cifs/transport.c @@ -276,26 +276,7 @@ smb_rqst_len(struct TCP_Server_Info *server, struct smb_rqst *rqst) for (i = 0; i < nvec; i++) buflen += iov[i].iov_len; - /* - * Add in the page array if there is one. The caller needs to make - * sure rq_offset and rq_tailsz are set correctly. If a buffer of - * multiple pages ends at page boundary, rq_tailsz needs to be set to - * PAGE_SIZE. - */ - if (rqst->rq_npages) { - if (rqst->rq_npages == 1) - buflen += rqst->rq_tailsz; - else { - /* - * If there is more than one page, calculate the - * buffer length based on rq_offset and rq_tailsz - */ - buflen += rqst->rq_pagesz * (rqst->rq_npages - 1) - - rqst->rq_offset; - buflen += rqst->rq_tailsz; - } - } - + buflen += iov_iter_count(&rqst->rq_iter); return buflen; } @@ -382,23 +363,15 @@ __smb_send_rqst(struct TCP_Server_Info *server, int num_rqst, total_len += sent; - /* now walk the page array and send each page in it */ - for (i = 0; i < rqst[j].rq_npages; i++) { - struct bio_vec bvec; - - bvec.bv_page = rqst[j].rq_pages[i]; - rqst_page_get_length(&rqst[j], i, &bvec.bv_len, - &bvec.bv_offset); - - iov_iter_bvec(&smb_msg.msg_iter, WRITE, - &bvec, 1, bvec.bv_len); + if (iov_iter_count(&rqst[j].rq_iter) > 0) { + smb_msg.msg_iter = rqst[j].rq_iter; rc = smb_send_kvec(server, &smb_msg, &sent); if (rc < 0) break; - total_len += sent; } - } + +} unmask: sigprocmask(SIG_SETMASK, &oldmask, NULL); From patchwork Tue Jan 25 13:58:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 1584046 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=guUmTEQN; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-cifs-owner@vger.kernel.org; receiver=) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by bilbo.ozlabs.org (Postfix) with ESMTP id 4JjpT34Vmfz9t5H for ; Wed, 26 Jan 2022 01:02:31 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1386292AbiAYOCM (ORCPT ); Tue, 25 Jan 2022 09:02:12 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:42036 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1386887AbiAYN6n (ORCPT ); Tue, 25 Jan 2022 08:58:43 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643119107; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=j930tURWJrqWngUWd4DU/5vAYbSK3uD9wDJcU0irB28=; b=guUmTEQNYImZ6LhxeQldE0sLV2bd/8J5gQ5q0BIinNg3G05M7fJTm2+AngeBBxSz2iJ24e zIISqxQPLT1QTDs6fDn+0W6wVVpyNZ/3lJ5OP+cr+ID99hvlDqQSjSL0pEnkoGyjOI9E40 82RwpqX/B/3TaUg2k/m0+RPpEjvq2WU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-96-H9Rc9HtZMYC8XyQfoaDnbw-1; Tue, 25 Jan 2022 08:58:23 -0500 X-MC-Unique: H9Rc9HtZMYC8XyQfoaDnbw-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id C1072801ADA; Tue, 25 Jan 2022 13:58:21 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.5]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1FDA7703AB; Tue, 25 Jan 2022 13:58:19 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [RFC PATCH 4/7] cifs: Make cifs_writepages() hand an iterator down From: David Howells To: smfrench@gmail.com, nspmangalore@gmail.com Cc: dhowells@redhat.com, jlayton@kernel.org, linux-cifs@vger.kernel.org, linux-cachefs@redhat.com, linux-fsdevel@vger.kernel.org Date: Tue, 25 Jan 2022 13:58:19 +0000 Message-ID: <164311909927.2806745.2328370924720971057.stgit@warthog.procyon.org.uk> In-Reply-To: <164311902471.2806745.10187041199819525677.stgit@warthog.procyon.org.uk> References: <164311902471.2806745.10187041199819525677.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: linux-cifs@vger.kernel.org --- fs/cifs/file.c | 725 +++++++++++++++++++++++--------------------------------- 1 file changed, 304 insertions(+), 421 deletions(-) diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 24722fe75def..f40e5b938d43 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -2262,294 +2262,334 @@ static int cifs_partialpagewrite(struct page *page, unsigned from, unsigned to) return rc; } -static struct cifs_writedata * -wdata_alloc_and_fillpages(pgoff_t tofind, struct address_space *mapping, - pgoff_t end, pgoff_t *index, - unsigned int *found_pages) +/* + * Extend the region to be written back to include subsequent contiguously + * dirty pages if possible, but don't sleep while doing so. + */ +static void cifs_extend_writeback(struct address_space *mapping, + long *_count, + loff_t start, + loff_t max_len, + unsigned int *_len) { - struct cifs_writedata *wdata; - - wdata = cifs_writedata_alloc((unsigned int)tofind, - cifs_writev_complete); - if (!wdata) - return NULL; + struct pagevec pvec; + struct folio *folio; + unsigned int psize; + loff_t len = *_len; + pgoff_t index = (start + len) / PAGE_SIZE; + bool stop = true; + unsigned int i; - *found_pages = find_get_pages_range_tag(mapping, index, end, - PAGECACHE_TAG_DIRTY, tofind, wdata->pages); - return wdata; -} + XA_STATE(xas, &mapping->i_pages, index); + pagevec_init(&pvec); -static unsigned int -wdata_prepare_pages(struct cifs_writedata *wdata, unsigned int found_pages, - struct address_space *mapping, - struct writeback_control *wbc, - pgoff_t end, pgoff_t *index, pgoff_t *next, bool *done) -{ - unsigned int nr_pages = 0, i; - struct page *page; - - for (i = 0; i < found_pages; i++) { - page = wdata->pages[i]; - /* - * At this point we hold neither the i_pages lock nor the - * page lock: the page may be truncated or invalidated - * (changing page->mapping to NULL), or even swizzled - * back from swapper_space to tmpfs file mapping + do { + /* Firstly, we gather up a batch of contiguous dirty pages + * under the RCU read lock - but we can't clear the dirty flags + * there if any of those pages are mapped. */ + rcu_read_lock(); - if (nr_pages == 0) - lock_page(page); - else if (!trylock_page(page)) - break; + xas_for_each(&xas, folio, ULONG_MAX) { + stop = true; + if (xas_retry(&xas, folio)) + continue; + if (xa_is_value(folio)) + break; + if (folio_index(folio) != index) + break; - if (unlikely(page->mapping != mapping)) { - unlock_page(page); - break; - } + if (!folio_try_get_rcu(folio)) { + xas_reset(&xas); + continue; + } - if (!wbc->range_cyclic && page->index > end) { - *done = true; - unlock_page(page); - break; - } + /* Has the page moved or been split? */ + if (unlikely(folio != xas_reload(&xas))) { + folio_put(folio); + break; + } - if (*next && (page->index != *next)) { - /* Not next consecutive page */ - unlock_page(page); - break; - } + if (!folio_trylock(folio)) { + folio_put(folio); + break; + } + if (!folio_test_dirty(folio) || folio_test_writeback(folio)) { + folio_unlock(folio); + folio_put(folio); + break; + } - if (wbc->sync_mode != WB_SYNC_NONE) - wait_on_page_writeback(page); + psize = folio_size(folio); + len += psize; + if (len >= max_len || *_count <= 0) + stop = true; - if (PageWriteback(page) || - !clear_page_dirty_for_io(page)) { - unlock_page(page); - break; + index += folio_nr_pages(folio); + if (!pagevec_add(&pvec, &folio->page)) + break; + if (stop) + break; } - /* - * This actually clears the dirty bit in the radix tree. - * See cifs_writepage() for more commentary. + if (!stop) + xas_pause(&xas); + rcu_read_unlock(); + + /* Now, if we obtained any pages, we can shift them to being + * writable and mark them for caching. */ - set_page_writeback(page); - if (page_offset(page) >= i_size_read(mapping->host)) { - *done = true; - unlock_page(page); - end_page_writeback(page); + if (!pagevec_count(&pvec)) break; - } - - wdata->pages[i] = page; - *next = page->index + 1; - ++nr_pages; - } - - /* reset index to refind any pages skipped */ - if (nr_pages == 0) - *index = wdata->pages[0]->index + 1; - /* put any pages we aren't going to use */ - for (i = nr_pages; i < found_pages; i++) { - put_page(wdata->pages[i]); - wdata->pages[i] = NULL; - } - - return nr_pages; -} + for (i = 0; i < pagevec_count(&pvec); i++) { + folio = page_folio(pvec.pages[i]); + if (!folio_clear_dirty_for_io(folio)) + BUG(); + if (folio_start_writeback(folio)) + BUG(); -static int -wdata_send_pages(struct cifs_writedata *wdata, unsigned int nr_pages, - struct address_space *mapping, struct writeback_control *wbc) -{ - int rc; - - wdata->sync_mode = wbc->sync_mode; - wdata->nr_pages = nr_pages; - wdata->offset = page_offset(wdata->pages[0]); - wdata->pagesz = PAGE_SIZE; - wdata->tailsz = min(i_size_read(mapping->host) - - page_offset(wdata->pages[nr_pages - 1]), - (loff_t)PAGE_SIZE); - wdata->bytes = ((nr_pages - 1) * PAGE_SIZE) + wdata->tailsz; - wdata->pid = wdata->cfile->pid; - - rc = adjust_credits(wdata->server, &wdata->credits, wdata->bytes); - if (rc) - return rc; + *_count -= folio_nr_pages(folio); + folio_unlock(folio); + } - if (wdata->cfile->invalidHandle) - rc = -EAGAIN; - else - rc = wdata->server->ops->async_writev(wdata, - cifs_writedata_release); + pagevec_release(&pvec); + cond_resched(); + } while (!stop); - return rc; + *_len = len; } -static int cifs_writepages(struct address_space *mapping, - struct writeback_control *wbc) +/* + * Write back the locked page and any subsequent non-locked dirty pages. + */ +static ssize_t cifs_write_back_from_locked_folio(struct address_space *mapping, + struct writeback_control *wbc, + struct folio *folio, + loff_t start, loff_t end) { struct inode *inode = mapping->host; - struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb); struct TCP_Server_Info *server; - bool done = false, scanned = false, range_whole = false; - pgoff_t end, index; struct cifs_writedata *wdata; + struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb); + struct cifs_credits credits_on_stack; + struct cifs_credits *credits = &credits_on_stack; struct cifsFileInfo *cfile = NULL; - int rc = 0; - int saved_rc = 0; - unsigned int xid; + unsigned int xid, wsize, len, max_len; + loff_t i_size = i_size_read(inode); + long count = wbc->nr_to_write; + int rc; - /* - * If wsize is smaller than the page cache size, default to writing - * one page at a time via cifs_writepage - */ - if (cifs_sb->ctx->wsize < PAGE_SIZE) - return generic_writepages(mapping, wbc); + if (folio_start_writeback(folio)) + BUG(); + + count -= folio_nr_pages(folio); + len = folio_size(folio); xid = get_xid(); - if (wbc->range_cyclic) { - index = mapping->writeback_index; /* Start from prev offset */ - end = -1; - } else { - index = wbc->range_start >> PAGE_SHIFT; - end = wbc->range_end >> PAGE_SHIFT; - if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX) - range_whole = true; - scanned = true; - } server = cifs_pick_channel(cifs_sb_master_tcon(cifs_sb)->ses); -retry: - while (!done && index <= end) { - unsigned int i, nr_pages, found_pages, wsize; - pgoff_t next = 0, tofind, saved_index = index; - struct cifs_credits credits_on_stack; - struct cifs_credits *credits = &credits_on_stack; - int get_file_rc = 0; + rc = cifs_get_writable_file(CIFS_I(inode), FIND_WR_ANY, &cfile); + if (rc) { + cifs_dbg(VFS, "No writable handle in writepages rc=%d\n", rc); + goto err_xid; + } - if (cfile) - cifsFileInfo_put(cfile); + rc = server->ops->wait_mtu_credits(server, cifs_sb->ctx->wsize, + &wsize, credits); + if (rc != 0) + goto err_close; - rc = cifs_get_writable_file(CIFS_I(inode), FIND_WR_ANY, &cfile); + wdata = cifs_writedata_alloc(cifs_writev_complete); + if (!wdata) { + rc = -ENOMEM; + goto err_uncredit; + } - /* in case of an error store it to return later */ - if (rc) - get_file_rc = rc; + wdata->sync_mode = wbc->sync_mode; + wdata->offset = folio_pos(folio); + wdata->pid = cfile->pid; + wdata->credits = credits_on_stack; + wdata->cfile = cfile; + wdata->server = server; + cfile = NULL; + + /* Find all consecutive lockable dirty pages, stopping when we find a + * page that is not immediately lockable, is not dirty or is missing, + * or we reach the end of the range. + */ + if (start < i_size) { + /* Trim the write to the EOF; the extra data is ignored. Also + * put an upper limit on the size of a single storedata op. + */ + max_len = wsize; + max_len = min_t(unsigned long long, max_len, end - start + 1); + max_len = min_t(unsigned long long, max_len, i_size - start); - rc = server->ops->wait_mtu_credits(server, cifs_sb->ctx->wsize, - &wsize, credits); - if (rc != 0) { - done = true; - break; - } + if (len < max_len) + cifs_extend_writeback(mapping, &count, start, + max_len, &len); + len = min_t(loff_t, len, max_len); + } - tofind = min((wsize / PAGE_SIZE) - 1, end - index) + 1; + wdata->bytes = len; - wdata = wdata_alloc_and_fillpages(tofind, mapping, end, &index, - &found_pages); - if (!wdata) { - rc = -ENOMEM; - done = true; - add_credits_and_wake_if(server, credits, 0); - break; - } + /* We now have a contiguous set of dirty pages, each with writeback + * set; the first page is still locked at this point, but all the rest + * have been unlocked. + */ + folio_unlock(folio); - if (found_pages == 0) { - kref_put(&wdata->refcount, cifs_writedata_release); - add_credits_and_wake_if(server, credits, 0); - break; - } + if (start < i_size) { + iov_iter_xarray(&wdata->iter, WRITE, &mapping->i_pages, start, len); - nr_pages = wdata_prepare_pages(wdata, found_pages, mapping, wbc, - end, &index, &next, &done); + rc = adjust_credits(wdata->server, &wdata->credits, wdata->bytes); + if (rc) + goto err_wdata; - /* nothing to write? */ - if (nr_pages == 0) { - kref_put(&wdata->refcount, cifs_writedata_release); - add_credits_and_wake_if(server, credits, 0); - continue; - } + if (wdata->cfile->invalidHandle) + rc = -EAGAIN; + else + rc = wdata->server->ops->async_writev(wdata, + cifs_writedata_release); + } else { + /* The dirty region was entirely beyond the EOF. */ + cifs_pages_written_back(inode, start, len); + rc = 0; + } - wdata->credits = credits_on_stack; - wdata->cfile = cfile; - wdata->server = server; - cfile = NULL; +err_wdata: + kref_put(&wdata->refcount, cifs_writedata_release); +err_uncredit: + add_credits_and_wake_if(server, credits, 0); +err_close: + if (cfile) + cifsFileInfo_put(cfile); +err_xid: + free_xid(xid); + if (rc == 0) { + wbc->nr_to_write = count; + } else if (is_retryable_error(rc)) { + cifs_pages_write_redirty(inode, start, len); + } else { + cifs_pages_write_failed(inode, start, len); + mapping_set_error(mapping, rc); + } + /* Indication to update ctime and mtime as close is deferred */ + set_bit(CIFS_INO_MODIFIED_ATTR, &CIFS_I(inode)->flags); + return rc; +} - if (!wdata->cfile) { - cifs_dbg(VFS, "No writable handle in writepages rc=%d\n", - get_file_rc); - if (is_retryable_error(get_file_rc)) - rc = get_file_rc; - else - rc = -EBADF; - } else - rc = wdata_send_pages(wdata, nr_pages, mapping, wbc); +/* + * write a region of pages back to the server + */ +static int cifs_writepages_region(struct address_space *mapping, + struct writeback_control *wbc, + loff_t start, loff_t end, loff_t *_next) +{ + struct folio *folio; + struct page *head_page; + ssize_t ret; + int n; - for (i = 0; i < nr_pages; ++i) - unlock_page(wdata->pages[i]); + do { + pgoff_t index = start / PAGE_SIZE; - /* send failure -- clean up the mess */ - if (rc != 0) { - add_credits_and_wake_if(server, &wdata->credits, 0); - for (i = 0; i < nr_pages; ++i) { - if (is_retryable_error(rc)) - redirty_page_for_writepage(wbc, - wdata->pages[i]); - else - SetPageError(wdata->pages[i]); - end_page_writeback(wdata->pages[i]); - put_page(wdata->pages[i]); + n = find_get_pages_range_tag(mapping, &index, end / PAGE_SIZE, + PAGECACHE_TAG_DIRTY, 1, &head_page); + if (!n) + break; + + folio = page_folio(head_page); + start = folio_pos(folio); /* May regress with THPs */ + + /* At this point we hold neither the i_pages lock nor the + * page lock: the page may be truncated or invalidated + * (changing page->mapping to NULL), or even swizzled + * back from swapper_space to tmpfs file mapping + */ + if (wbc->sync_mode != WB_SYNC_NONE) { + ret = folio_lock_killable(folio); + if (ret < 0) { + folio_put(folio); + return ret; + } + } else { + if (!folio_trylock(folio)) { + folio_put(folio); + return 0; } - if (!is_retryable_error(rc)) - mapping_set_error(mapping, rc); } - kref_put(&wdata->refcount, cifs_writedata_release); - if (wbc->sync_mode == WB_SYNC_ALL && rc == -EAGAIN) { - index = saved_index; + if (folio_mapping(folio) != mapping || + !folio_test_dirty(folio)) { + start += folio_size(folio); + folio_unlock(folio); + folio_put(folio); continue; } - /* Return immediately if we received a signal during writing */ - if (is_interrupt_error(rc)) { - done = true; - break; + if (folio_test_writeback(folio)) { + folio_unlock(folio); + if (wbc->sync_mode != WB_SYNC_NONE) + folio_wait_writeback(folio); + folio_put(folio); + continue; } - if (rc != 0 && saved_rc == 0) - saved_rc = rc; + if (!folio_clear_dirty_for_io(folio)) + BUG(); - wbc->nr_to_write -= nr_pages; - if (wbc->nr_to_write <= 0) - done = true; + ret = cifs_write_back_from_locked_folio(mapping, wbc, folio, start, end); + folio_put(folio); + if (ret < 0) + return ret; - index = next; - } + start += ret; + cond_resched(); + } while (wbc->nr_to_write > 0); - if (!scanned && !done) { - /* - * We hit the last page and there is more work to be done: wrap - * back to the start of the file - */ - scanned = true; - index = 0; - goto retry; - } + *_next = start; + return 0; +} - if (saved_rc != 0) - rc = saved_rc; +/* + * Write some of the pending data back to the server + */ +static int cifs_writepages(struct address_space *mapping, + struct writeback_control *wbc) +{ + loff_t start, next; + int ret; - if (wbc->range_cyclic || (range_whole && wbc->nr_to_write > 0)) - mapping->writeback_index = index; + /* We have to be careful as we can end up racing with setattr() + * truncating the pagecache since the caller doesn't take a lock here + * to prevent it. + */ - if (cfile) - cifsFileInfo_put(cfile); - free_xid(xid); - /* Indication to update ctime and mtime as close is deferred */ - set_bit(CIFS_INO_MODIFIED_ATTR, &CIFS_I(inode)->flags); - return rc; + if (wbc->range_cyclic) { + start = mapping->writeback_index * PAGE_SIZE; + ret = cifs_writepages_region(mapping, wbc, start, LLONG_MAX, &next); + if (ret == 0) { + mapping->writeback_index = next / PAGE_SIZE; + if (start > 0 && wbc->nr_to_write > 0) { + ret = cifs_writepages_region(mapping, wbc, 0, + start, &next); + if (ret == 0) + mapping->writeback_index = + next / PAGE_SIZE; + } + } + } else if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX) { + ret = cifs_writepages_region(mapping, wbc, 0, LLONG_MAX, &next); + if (wbc->nr_to_write > 0 && ret == 0) + mapping->writeback_index = next / PAGE_SIZE; + } else { + ret = cifs_writepages_region(mapping, wbc, + wbc->range_start, wbc->range_end, &next); + } + + return ret; } static int @@ -2608,6 +2648,7 @@ static int cifs_write_end(struct file *file, struct address_space *mapping, struct inode *inode = mapping->host; struct cifsFileInfo *cfile = file->private_data; struct cifs_sb_info *cifs_sb = CIFS_SB(cfile->dentry->d_sb); + struct folio *folio = page_folio(page); __u32 pid; if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD) @@ -2618,14 +2659,14 @@ static int cifs_write_end(struct file *file, struct address_space *mapping, cifs_dbg(FYI, "write_end for page %p from pos %lld with %d bytes\n", page, pos, copied); - if (PageChecked(page)) { + if (folio_test_checked(folio)) { if (copied == len) - SetPageUptodate(page); - ClearPageChecked(page); - } else if (!PageUptodate(page) && copied == PAGE_SIZE) - SetPageUptodate(page); + folio_mark_uptodate(folio); + folio_clear_checked(folio); + } else if (!folio_test_uptodate(folio) && copied == PAGE_SIZE) + folio_mark_uptodate(folio); - if (!PageUptodate(page)) { + if (!folio_test_uptodate(folio)) { char *page_data; unsigned offset = pos & (PAGE_SIZE - 1); unsigned int xid; @@ -2782,57 +2823,13 @@ int cifs_flush(struct file *file, fl_owner_t id) return rc; } -static int -cifs_write_allocate_pages(struct page **pages, unsigned long num_pages) -{ - int rc = 0; - unsigned long i; - - for (i = 0; i < num_pages; i++) { - pages[i] = alloc_page(GFP_KERNEL|__GFP_HIGHMEM); - if (!pages[i]) { - /* - * save number of pages we have already allocated and - * return with ENOMEM error - */ - num_pages = i; - rc = -ENOMEM; - break; - } - } - - if (rc) { - for (i = 0; i < num_pages; i++) - put_page(pages[i]); - } - return rc; -} - -static inline -size_t get_numpages(const size_t wsize, const size_t len, size_t *cur_len) -{ - size_t num_pages; - size_t clen; - - clen = min_t(const size_t, len, wsize); - num_pages = DIV_ROUND_UP(clen, PAGE_SIZE); - - if (cur_len) - *cur_len = clen; - - return num_pages; -} - static void cifs_uncached_writedata_release(struct kref *refcount) { - int i; struct cifs_writedata *wdata = container_of(refcount, struct cifs_writedata, refcount); kref_put(&wdata->ctx->refcount, cifs_aio_ctx_release); - for (i = 0; i < wdata->nr_pages; i++) - put_page(wdata->pages[i]); cifs_writedata_release(refcount); } @@ -2858,48 +2855,6 @@ cifs_uncached_writev_complete(struct work_struct *work) kref_put(&wdata->refcount, cifs_uncached_writedata_release); } -static int -wdata_fill_from_iovec(struct cifs_writedata *wdata, struct iov_iter *from, - size_t *len, unsigned long *num_pages) -{ - size_t save_len, copied, bytes, cur_len = *len; - unsigned long i, nr_pages = *num_pages; - - save_len = cur_len; - for (i = 0; i < nr_pages; i++) { - bytes = min_t(const size_t, cur_len, PAGE_SIZE); - copied = copy_page_from_iter(wdata->pages[i], 0, bytes, from); - cur_len -= copied; - /* - * If we didn't copy as much as we expected, then that - * may mean we trod into an unmapped area. Stop copying - * at that point. On the next pass through the big - * loop, we'll likely end up getting a zero-length - * write and bailing out of it. - */ - if (copied < bytes) - break; - } - cur_len = save_len - cur_len; - *len = cur_len; - - /* - * If we have no data to send, then that probably means that - * the copy above failed altogether. That's most likely because - * the address in the iovec was bogus. Return -EFAULT and let - * the caller free anything we allocated and bail out. - */ - if (!cur_len) - return -EFAULT; - - /* - * i + 1 now represents the number of pages we actually used in - * the copy phase above. - */ - *num_pages = i + 1; - return 0; -} - static int cifs_resend_wdata(struct cifs_writedata *wdata, struct list_head *wdata_list, struct cifs_aio_ctx *ctx) @@ -2978,14 +2933,11 @@ cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from, { int rc = 0; size_t cur_len; - unsigned long nr_pages, num_pages, i; struct cifs_writedata *wdata; struct iov_iter saved_from = *from; loff_t saved_offset = offset; pid_t pid; struct TCP_Server_Info *server; - struct page **pagevec; - size_t start; unsigned int xid; if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD) @@ -3016,93 +2968,22 @@ cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from, cur_len = min_t(const size_t, len, wsize); - if (ctx->direct_io) { - ssize_t result; - - result = iov_iter_get_pages_alloc( - from, &pagevec, cur_len, &start); - if (result < 0) { - cifs_dbg(VFS, - "direct_writev couldn't get user pages (rc=%zd) iter type %d iov_offset %zd count %zd\n", - result, iov_iter_type(from), - from->iov_offset, from->count); - dump_stack(); - - rc = result; - add_credits_and_wake_if(server, credits, 0); - break; - } - cur_len = (size_t)result; - iov_iter_advance(from, cur_len); - - nr_pages = - (cur_len + start + PAGE_SIZE - 1) / PAGE_SIZE; - - wdata = cifs_writedata_direct_alloc(pagevec, - cifs_uncached_writev_complete); - if (!wdata) { - rc = -ENOMEM; - add_credits_and_wake_if(server, credits, 0); - break; - } - - - wdata->page_offset = start; - wdata->tailsz = - nr_pages > 1 ? - cur_len - (PAGE_SIZE - start) - - (nr_pages - 2) * PAGE_SIZE : - cur_len; - } else { - nr_pages = get_numpages(wsize, len, &cur_len); - wdata = cifs_writedata_alloc(nr_pages, - cifs_uncached_writev_complete); - if (!wdata) { - rc = -ENOMEM; - add_credits_and_wake_if(server, credits, 0); - break; - } - - rc = cifs_write_allocate_pages(wdata->pages, nr_pages); - if (rc) { - kvfree(wdata->pages); - kfree(wdata); - add_credits_and_wake_if(server, credits, 0); - break; - } - - num_pages = nr_pages; - rc = wdata_fill_from_iovec( - wdata, from, &cur_len, &num_pages); - if (rc) { - for (i = 0; i < nr_pages; i++) - put_page(wdata->pages[i]); - kvfree(wdata->pages); - kfree(wdata); - add_credits_and_wake_if(server, credits, 0); - break; - } - - /* - * Bring nr_pages down to the number of pages we - * actually used, and free any pages that we didn't use. - */ - for ( ; nr_pages > num_pages; nr_pages--) - put_page(wdata->pages[nr_pages - 1]); - - wdata->tailsz = cur_len - ((nr_pages - 1) * PAGE_SIZE); + wdata = cifs_writedata_alloc(cifs_uncached_writev_complete); + if (!wdata) { + rc = -ENOMEM; + add_credits_and_wake_if(server, credits, 0); + break; } wdata->sync_mode = WB_SYNC_ALL; - wdata->nr_pages = nr_pages; - wdata->offset = (__u64)offset; - wdata->cfile = cifsFileInfo_get(open_file); - wdata->server = server; - wdata->pid = pid; - wdata->bytes = cur_len; - wdata->pagesz = PAGE_SIZE; - wdata->credits = credits_on_stack; - wdata->ctx = ctx; + wdata->offset = (__u64)offset; + wdata->cfile = cifsFileInfo_get(open_file); + wdata->server = server; + wdata->pid = pid; + wdata->bytes = cur_len; + wdata->credits = credits_on_stack; + wdata->iter = *from; + wdata->ctx = ctx; kref_get(&ctx->refcount); rc = adjust_credits(server, &wdata->credits, wdata->bytes); @@ -3228,7 +3109,6 @@ static ssize_t __cifs_writev( struct cifs_sb_info *cifs_sb; struct cifs_aio_ctx *ctx; struct iov_iter saved_from = *from; - size_t len = iov_iter_count(from); int rc; /* @@ -3262,18 +3142,21 @@ static ssize_t __cifs_writev( ctx->iocb = iocb; ctx->pos = iocb->ki_pos; + ctx->direct_io = direct; - if (direct) { - ctx->direct_io = true; - ctx->iter = *from; - ctx->len = len; - } else { - rc = setup_aio_ctx_iter(ctx, from, WRITE); - if (rc) { - kref_put(&ctx->refcount, cifs_aio_ctx_release); - return rc; - } + /* + * Duplicate the iterator as it may contain references to the calling + * process's virtual memory layout which won't be available in an async + * worker thread. This also takes a ref on every folio involved and + * attaches them to ctx->bv[]. + */ + rc = extract_iter_to_iter(from, ctx->len, &ctx->iter, &ctx->bv); + if (rc < 0) { + kref_put(&ctx->refcount, cifs_aio_ctx_release); + return rc; } + ctx->npages = rc; + ctx->len = iov_iter_count(&ctx->iter); /* grab a lock here due to read response handlers can access ctx */ mutex_lock(&ctx->aio_mutex); From patchwork Tue Jan 25 13:58:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 1584047 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=J+eMYUs6; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-cifs-owner@vger.kernel.org; receiver=) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by bilbo.ozlabs.org (Postfix) with ESMTP id 4JjpTB475Fz9t5H for ; Wed, 26 Jan 2022 01:02:38 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1386469AbiAYOCM (ORCPT ); Tue, 25 Jan 2022 09:02:12 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:45554 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1386933AbiAYN64 (ORCPT ); Tue, 25 Jan 2022 08:58:56 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643119132; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ccbPkGHDvnm/jO+h2W7/WQj7IG4rRzK7Qeqv7RS6pTU=; b=J+eMYUs6nClpJWWpBvG03qouWW+BbDz9QDUqhEEzUY9t/IYt5xJlsjWHO5aBUdSr9MMyuH 6wq+HoG+FzS0Ye2F4j3NKlZQsen9w4GJdG0+dC8n0iWEyt9IaNQb2sT1qK3PohFns76gkR gJQ7dNTOcgZzs9QyS3Hvfv0cxt1HHZ4= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-277-SWP8B1yAM7S1D9XQZ4nrXQ-1; Tue, 25 Jan 2022 08:58:49 -0500 X-MC-Unique: SWP8B1yAM7S1D9XQZ4nrXQ-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id C58EA51081; Tue, 25 Jan 2022 13:58:47 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.5]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5198A66E34; Tue, 25 Jan 2022 13:58:28 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [RFC PATCH 5/7] cifs: Make cifs_readahead() pass an iterator down From: David Howells To: smfrench@gmail.com, nspmangalore@gmail.com Cc: dhowells@redhat.com, jlayton@kernel.org, linux-cifs@vger.kernel.org, linux-cachefs@redhat.com, linux-fsdevel@vger.kernel.org Date: Tue, 25 Jan 2022 13:58:27 +0000 Message-ID: <164311910701.2806745.8126438935821941351.stgit@warthog.procyon.org.uk> In-Reply-To: <164311902471.2806745.10187041199819525677.stgit@warthog.procyon.org.uk> References: <164311902471.2806745.10187041199819525677.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Precedence: bulk List-ID: X-Mailing-List: linux-cifs@vger.kernel.org --- fs/cifs/file.c | 214 +++++++++++++------------------------------------------- 1 file changed, 50 insertions(+), 164 deletions(-) diff --git a/fs/cifs/file.c b/fs/cifs/file.c index f40e5b938d43..b57f9b492227 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -3301,14 +3301,12 @@ cifs_strict_writev(struct kiocb *iocb, struct iov_iter *from) return written; } -static struct cifs_readdata * -cifs_readdata_direct_alloc(struct page **pages, work_func_t complete) +static struct cifs_readdata *cifs_readdata_alloc(work_func_t complete) { struct cifs_readdata *rdata; rdata = kzalloc(sizeof(*rdata), GFP_KERNEL); - if (rdata != NULL) { - rdata->pages = pages; + if (rdata) { kref_init(&rdata->refcount); INIT_LIST_HEAD(&rdata->list); init_completion(&rdata->done); @@ -3318,22 +3316,6 @@ cifs_readdata_direct_alloc(struct page **pages, work_func_t complete) return rdata; } -static struct cifs_readdata * -cifs_readdata_alloc(unsigned int nr_pages, work_func_t complete) -{ - struct page **pages = - kcalloc(nr_pages, sizeof(struct page *), GFP_KERNEL); - struct cifs_readdata *ret = NULL; - - if (pages) { - ret = cifs_readdata_direct_alloc(pages, complete); - if (!ret) - kfree(pages); - } - - return ret; -} - void cifs_readdata_release(struct kref *refcount) { @@ -4147,145 +4129,60 @@ int cifs_file_mmap(struct file *file, struct vm_area_struct *vma) return rc; } -static void -cifs_readv_complete(struct work_struct *work) +/* + * Unlock a bunch of folios in the pagecache. + */ +static void cifs_unlock_folios(struct address_space *mapping, pgoff_t first, pgoff_t last) { - unsigned int i, got_bytes; - struct cifs_readdata *rdata = container_of(work, - struct cifs_readdata, work); - - got_bytes = rdata->got_bytes; - for (i = 0; i < rdata->nr_pages; i++) { - struct page *page = rdata->pages[i]; - - if (rdata->result == 0 || - (rdata->result == -EAGAIN && got_bytes)) { - flush_dcache_page(page); - SetPageUptodate(page); - } else - SetPageError(page); - - unlock_page(page); - - if (rdata->result == 0 || - (rdata->result == -EAGAIN && got_bytes)) - cifs_readpage_to_fscache(rdata->mapping->host, page); - - got_bytes -= min_t(unsigned int, PAGE_SIZE, got_bytes); - - put_page(page); - rdata->pages[i] = NULL; - } - kref_put(&rdata->refcount, cifs_readdata_release); + struct folio *folio; + XA_STATE(xas, &mapping->i_pages, first); + + rcu_read_lock(); + xas_for_each(&xas, folio, last) { + folio_unlock(folio); + } + rcu_read_unlock(); } -static int -readpages_fill_pages(struct TCP_Server_Info *server, - struct cifs_readdata *rdata, struct iov_iter *iter, - unsigned int len) +static void cifs_readahead_complete(struct work_struct *work) { - int result = 0; - unsigned int i; - u64 eof; - pgoff_t eof_index; - unsigned int nr_pages = rdata->nr_pages; - unsigned int page_offset = rdata->page_offset; + struct cifs_readdata *rdata = container_of(work, + struct cifs_readdata, work); + struct folio *folio; + pgoff_t last; + bool good = rdata->result == 0 || (rdata->result == -EAGAIN && rdata->got_bytes); - /* determine the eof that the server (probably) has */ - eof = CIFS_I(rdata->mapping->host)->server_eof; - eof_index = eof ? (eof - 1) >> PAGE_SHIFT : 0; - cifs_dbg(FYI, "eof=%llu eof_index=%lu\n", eof, eof_index); + XA_STATE(xas, &rdata->mapping->i_pages, rdata->offset / PAGE_SIZE); - rdata->got_bytes = 0; - rdata->tailsz = PAGE_SIZE; - for (i = 0; i < nr_pages; i++) { - struct page *page = rdata->pages[i]; - unsigned int to_read = rdata->pagesz; - size_t n; +#if 0 + if (good) + cifs_readpage_to_fscache(rdata->mapping->host, page); +#endif - if (i == 0) - to_read -= page_offset; - else - page_offset = 0; + if (iov_iter_count(&rdata->iter) > 0) + iov_iter_zero(iov_iter_count(&rdata->iter), &rdata->iter); - n = to_read; + last = round_down(rdata->offset + rdata->got_bytes - 1, PAGE_SIZE); - if (len >= to_read) { - len -= to_read; - } else if (len > 0) { - /* enough for partial page, fill and zero the rest */ - zero_user(page, len + page_offset, to_read - len); - n = rdata->tailsz = len; - len = 0; - } else if (page->index > eof_index) { - /* - * The VFS will not try to do readahead past the - * i_size, but it's possible that we have outstanding - * writes with gaps in the middle and the i_size hasn't - * caught up yet. Populate those with zeroed out pages - * to prevent the VFS from repeatedly attempting to - * fill them until the writes are flushed. - */ - zero_user(page, 0, PAGE_SIZE); - flush_dcache_page(page); - SetPageUptodate(page); - unlock_page(page); - put_page(page); - rdata->pages[i] = NULL; - rdata->nr_pages--; - continue; - } else { - /* no need to hold page hostage */ - unlock_page(page); - put_page(page); - rdata->pages[i] = NULL; - rdata->nr_pages--; - continue; + xas_for_each(&xas, folio, last) { + if (good) { + flush_dcache_folio(folio); + folio_mark_uptodate(folio); } - - if (iter) - result = copy_page_from_iter( - page, page_offset, n, iter); -#ifdef CONFIG_CIFS_SMB_DIRECT - else if (rdata->mr) - result = n; -#endif - else - result = cifs_read_page_from_socket( - server, page, page_offset, n); - if (result < 0) - break; - - rdata->got_bytes += result; + folio_unlock(folio); } - return rdata->got_bytes > 0 && result != -ECONNABORTED ? - rdata->got_bytes : result; -} - -static int -cifs_readpages_read_into_pages(struct TCP_Server_Info *server, - struct cifs_readdata *rdata, unsigned int len) -{ - return readpages_fill_pages(server, rdata, NULL, len); -} - -static int -cifs_readpages_copy_into_pages(struct TCP_Server_Info *server, - struct cifs_readdata *rdata, - struct iov_iter *iter) -{ - return readpages_fill_pages(server, rdata, iter, iter->count); + kref_put(&rdata->refcount, cifs_readdata_release); } static void cifs_readahead(struct readahead_control *ractl) { - int rc; struct cifsFileInfo *open_file = ractl->file->private_data; struct cifs_sb_info *cifs_sb = CIFS_FILE_SB(ractl->file); struct TCP_Server_Info *server; - pid_t pid; unsigned int xid; + pid_t pid; + int rc = 0; xid = get_xid(); @@ -4294,7 +4191,6 @@ static void cifs_readahead(struct readahead_control *ractl) else pid = current->tgid; - rc = 0; server = cifs_pick_channel(tlink_tcon(open_file->tlink)->ses); cifs_dbg(FYI, "%s: file=%p mapping=%p num_pages=%u\n", @@ -4304,8 +4200,7 @@ static void cifs_readahead(struct readahead_control *ractl) * Chop the readahead request up into rsize-sized read requests. */ while (readahead_count(ractl) - ractl->_batch_count) { - unsigned int i, nr_pages, got, rsize; - struct page *page; + unsigned int i, nr_pages, rsize; struct cifs_readdata *rdata; struct cifs_credits credits_on_stack; struct cifs_credits *credits = &credits_on_stack; @@ -4336,33 +4231,28 @@ static void cifs_readahead(struct readahead_control *ractl) break; } - rdata = cifs_readdata_alloc(nr_pages, cifs_readv_complete); + rdata = cifs_readdata_alloc(cifs_readahead_complete); if (!rdata) { /* best to give up if we're out of mem */ add_credits_and_wake_if(server, credits, 0); break; } - got = __readahead_batch(ractl, rdata->pages, nr_pages); - if (got != nr_pages) { - pr_warn("__readahead_batch() returned %u/%u\n", - got, nr_pages); - nr_pages = got; - } - - rdata->nr_pages = nr_pages; - rdata->bytes = readahead_batch_length(ractl); + rdata->offset = readahead_pos(ractl); + rdata->bytes = nr_pages * PAGE_SIZE; rdata->cfile = cifsFileInfo_get(open_file); rdata->server = server; rdata->mapping = ractl->mapping; - rdata->offset = readahead_pos(ractl); rdata->pid = pid; - rdata->pagesz = PAGE_SIZE; - rdata->tailsz = PAGE_SIZE; - rdata->read_into_pages = cifs_readpages_read_into_pages; - rdata->copy_into_pages = cifs_readpages_copy_into_pages; rdata->credits = credits_on_stack; + for (i = 0; i < nr_pages; i++) + if (!readahead_folio(ractl)) + BUG(); + + iov_iter_xarray(&rdata->iter, READ, &rdata->mapping->i_pages, + rdata->offset, rdata->bytes); + rc = adjust_credits(server, &rdata->credits, rdata->bytes); if (!rc) { if (rdata->cfile->invalidHandle) @@ -4373,17 +4263,13 @@ static void cifs_readahead(struct readahead_control *ractl) if (rc) { add_credits_and_wake_if(server, &rdata->credits, 0); - for (i = 0; i < rdata->nr_pages; i++) { - page = rdata->pages[i]; - unlock_page(page); - put_page(page); - } + cifs_unlock_folios(rdata->mapping, + rdata->offset / PAGE_SIZE, + (rdata->offset + rdata->bytes - 1) / PAGE_SIZE); /* Fallback to the readpage in error/reconnect cases */ kref_put(&rdata->refcount, cifs_readdata_release); break; } - - kref_put(&rdata->refcount, cifs_readdata_release); } free_xid(xid); From patchwork Tue Jan 25 13:58:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 1584048 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=OFTeizpq; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-cifs-owner@vger.kernel.org; receiver=) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by bilbo.ozlabs.org (Postfix) with ESMTP id 4JjpTq5wCtz9t5H for ; Wed, 26 Jan 2022 01:03:11 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1386689AbiAYOCi (ORCPT ); Tue, 25 Jan 2022 09:02:38 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:32354 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1385700AbiAYOAI (ORCPT ); Tue, 25 Jan 2022 09:00:08 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643119196; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DMnxoC/tm/Le0PTs9L7jXpIOpWSLwNekY7CodNmRy78=; b=OFTeizpqfBpgsQapjQ/ygm8CARrv4lAsmxTSE2sjp2sHHocfYkiogqOtCohDLiaKYLTaRz NHNTBon31stGqic61K2wq96RrbGmrxn8TVjDNMr3BT8wrhZl88N2tqpXzYTGrTcQttv3ZI Ba03jnCsKkaOuqw8UmL4D8jusfMTQE0= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-532-KpvIqEq4N8uo2lrv_c68dg-1; Tue, 25 Jan 2022 08:59:53 -0500 X-MC-Unique: KpvIqEq4N8uo2lrv_c68dg-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 16FBA1083F60; Tue, 25 Jan 2022 13:59:52 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.5]) by smtp.corp.redhat.com (Postfix) with ESMTP id DA63D7E125; Tue, 25 Jan 2022 13:58:53 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [RFC PATCH 6/7] cifs: Get direct I/O and unbuffered I/O working with iterators From: David Howells To: smfrench@gmail.com, nspmangalore@gmail.com Cc: dhowells@redhat.com, jlayton@kernel.org, linux-cifs@vger.kernel.org, linux-cachefs@redhat.com, linux-fsdevel@vger.kernel.org Date: Tue, 25 Jan 2022 13:58:53 +0000 Message-ID: <164311913303.2806745.13306912789898168904.stgit@warthog.procyon.org.uk> In-Reply-To: <164311902471.2806745.10187041199819525677.stgit@warthog.procyon.org.uk> References: <164311902471.2806745.10187041199819525677.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: linux-cifs@vger.kernel.org --- fs/cifs/cifsproto.h | 1 fs/cifs/file.c | 299 ++++++--------------------------------------------- fs/cifs/misc.c | 90 --------------- 3 files changed, 35 insertions(+), 355 deletions(-) diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h index 1b143f0a03c0..fb6bcda46266 100644 --- a/fs/cifs/cifsproto.h +++ b/fs/cifs/cifsproto.h @@ -595,7 +595,6 @@ enum securityEnum cifs_select_sectype(struct TCP_Server_Info *, enum securityEnum); struct cifs_aio_ctx *cifs_aio_ctx_alloc(void); void cifs_aio_ctx_release(struct kref *refcount); -int setup_aio_ctx_iter(struct cifs_aio_ctx *ctx, struct iov_iter *iter, int rw); void smb2_cached_lease_break(struct work_struct *work); int cifs_alloc_hash(const char *name, struct crypto_shash **shash, diff --git a/fs/cifs/file.c b/fs/cifs/file.c index b57f9b492227..f9b9a1562e17 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -3321,6 +3321,9 @@ cifs_readdata_release(struct kref *refcount) { struct cifs_readdata *rdata = container_of(refcount, struct cifs_readdata, refcount); + + if (rdata->ctx) + kref_put(&rdata->ctx->refcount, cifs_aio_ctx_release); #ifdef CONFIG_CIFS_SMB_DIRECT if (rdata->mr) { smbd_deregister_mr(rdata->mr); @@ -3330,85 +3333,9 @@ cifs_readdata_release(struct kref *refcount) if (rdata->cfile) cifsFileInfo_put(rdata->cfile); - kvfree(rdata->pages); kfree(rdata); } -static int -cifs_read_allocate_pages(struct cifs_readdata *rdata, unsigned int nr_pages) -{ - int rc = 0; - struct page *page; - unsigned int i; - - for (i = 0; i < nr_pages; i++) { - page = alloc_page(GFP_KERNEL|__GFP_HIGHMEM); - if (!page) { - rc = -ENOMEM; - break; - } - rdata->pages[i] = page; - } - - if (rc) { - unsigned int nr_page_failed = i; - - for (i = 0; i < nr_page_failed; i++) { - put_page(rdata->pages[i]); - rdata->pages[i] = NULL; - } - } - return rc; -} - -static void -cifs_uncached_readdata_release(struct kref *refcount) -{ - struct cifs_readdata *rdata = container_of(refcount, - struct cifs_readdata, refcount); - unsigned int i; - - kref_put(&rdata->ctx->refcount, cifs_aio_ctx_release); - for (i = 0; i < rdata->nr_pages; i++) { - put_page(rdata->pages[i]); - } - cifs_readdata_release(refcount); -} - -/** - * cifs_readdata_to_iov - copy data from pages in response to an iovec - * @rdata: the readdata response with list of pages holding data - * @iter: destination for our data - * - * This function copies data from a list of pages in a readdata response into - * an array of iovecs. It will first calculate where the data should go - * based on the info in the readdata and then copy the data into that spot. - */ -static int -cifs_readdata_to_iov(struct cifs_readdata *rdata, struct iov_iter *iter) -{ - size_t remaining = rdata->got_bytes; - unsigned int i; - - for (i = 0; i < rdata->nr_pages; i++) { - struct page *page = rdata->pages[i]; - size_t copy = min_t(size_t, remaining, PAGE_SIZE); - size_t written; - - if (unlikely(iov_iter_is_pipe(iter))) { - void *addr = kmap_atomic(page); - - written = copy_to_iter(addr, copy, iter); - kunmap_atomic(addr); - } else - written = copy_page_to_iter(page, 0, copy, iter); - remaining -= written; - if (written < copy && iov_iter_count(iter) > 0) - break; - } - return remaining ? -EFAULT : 0; -} - static void collect_uncached_read_data(struct cifs_aio_ctx *ctx); static void @@ -3420,81 +3347,7 @@ cifs_uncached_readv_complete(struct work_struct *work) complete(&rdata->done); collect_uncached_read_data(rdata->ctx); /* the below call can possibly free the last ref to aio ctx */ - kref_put(&rdata->refcount, cifs_uncached_readdata_release); -} - -static int -uncached_fill_pages(struct TCP_Server_Info *server, - struct cifs_readdata *rdata, struct iov_iter *iter, - unsigned int len) -{ - int result = 0; - unsigned int i; - unsigned int nr_pages = rdata->nr_pages; - unsigned int page_offset = rdata->page_offset; - - rdata->got_bytes = 0; - rdata->tailsz = PAGE_SIZE; - for (i = 0; i < nr_pages; i++) { - struct page *page = rdata->pages[i]; - size_t n; - unsigned int segment_size = rdata->pagesz; - - if (i == 0) - segment_size -= page_offset; - else - page_offset = 0; - - - if (len <= 0) { - /* no need to hold page hostage */ - rdata->pages[i] = NULL; - rdata->nr_pages--; - put_page(page); - continue; - } - - n = len; - if (len >= segment_size) - /* enough data to fill the page */ - n = segment_size; - else - rdata->tailsz = len; - len -= n; - - if (iter) - result = copy_page_from_iter( - page, page_offset, n, iter); -#ifdef CONFIG_CIFS_SMB_DIRECT - else if (rdata->mr) - result = n; -#endif - else - result = cifs_read_page_from_socket( - server, page, page_offset, n); - if (result < 0) - break; - - rdata->got_bytes += result; - } - - return rdata->got_bytes > 0 && result != -ECONNABORTED ? - rdata->got_bytes : result; -} - -static int -cifs_uncached_read_into_pages(struct TCP_Server_Info *server, - struct cifs_readdata *rdata, unsigned int len) -{ - return uncached_fill_pages(server, rdata, NULL, len); -} - -static int -cifs_uncached_copy_into_pages(struct TCP_Server_Info *server, - struct cifs_readdata *rdata, - struct iov_iter *iter) -{ - return uncached_fill_pages(server, rdata, iter, iter->count); + kref_put(&rdata->refcount, cifs_readdata_release); } static int cifs_resend_rdata(struct cifs_readdata *rdata, @@ -3565,7 +3418,7 @@ static int cifs_resend_rdata(struct cifs_readdata *rdata, } while (rc == -EAGAIN); fail: - kref_put(&rdata->refcount, cifs_uncached_readdata_release); + kref_put(&rdata->refcount, cifs_readdata_release); return rc; } @@ -3575,16 +3428,13 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file, struct cifs_aio_ctx *ctx) { struct cifs_readdata *rdata; - unsigned int npages, rsize; + unsigned int rsize; struct cifs_credits credits_on_stack; struct cifs_credits *credits = &credits_on_stack; size_t cur_len; int rc; pid_t pid; struct TCP_Server_Info *server; - struct page **pagevec; - size_t start; - struct iov_iter direct_iov = ctx->iter; server = cifs_pick_channel(tlink_tcon(open_file->tlink)->ses); @@ -3593,9 +3443,6 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file, else pid = current->tgid; - if (ctx->direct_io) - iov_iter_advance(&direct_iov, offset - ctx->pos); - do { if (open_file->invalidHandle) { rc = cifs_reopen_file(open_file, true); @@ -3612,77 +3459,26 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file, cur_len = min_t(const size_t, len, rsize); - if (ctx->direct_io) { - ssize_t result; - - result = iov_iter_get_pages_alloc( - &direct_iov, &pagevec, - cur_len, &start); - if (result < 0) { - cifs_dbg(VFS, - "Couldn't get user pages (rc=%zd) iter type %d iov_offset %zd count %zd\n", - result, iov_iter_type(&direct_iov), - direct_iov.iov_offset, - direct_iov.count); - dump_stack(); - - rc = result; - add_credits_and_wake_if(server, credits, 0); - break; - } - cur_len = (size_t)result; - iov_iter_advance(&direct_iov, cur_len); - - rdata = cifs_readdata_direct_alloc( - pagevec, cifs_uncached_readv_complete); - if (!rdata) { - add_credits_and_wake_if(server, credits, 0); - rc = -ENOMEM; - break; - } - - npages = (cur_len + start + PAGE_SIZE-1) / PAGE_SIZE; - rdata->page_offset = start; - rdata->tailsz = npages > 1 ? - cur_len-(PAGE_SIZE-start)-(npages-2)*PAGE_SIZE : - cur_len; - - } else { - - npages = DIV_ROUND_UP(cur_len, PAGE_SIZE); - /* allocate a readdata struct */ - rdata = cifs_readdata_alloc(npages, - cifs_uncached_readv_complete); - if (!rdata) { - add_credits_and_wake_if(server, credits, 0); - rc = -ENOMEM; - break; - } - - rc = cifs_read_allocate_pages(rdata, npages); - if (rc) { - kvfree(rdata->pages); - kfree(rdata); - add_credits_and_wake_if(server, credits, 0); - break; - } - - rdata->tailsz = PAGE_SIZE; + rdata = cifs_readdata_alloc(cifs_uncached_readv_complete); + if (!rdata) { + add_credits_and_wake_if(server, credits, 0); + rc = -ENOMEM; + break; } - rdata->server = server; - rdata->cfile = cifsFileInfo_get(open_file); - rdata->nr_pages = npages; - rdata->offset = offset; - rdata->bytes = cur_len; - rdata->pid = pid; - rdata->pagesz = PAGE_SIZE; - rdata->read_into_pages = cifs_uncached_read_into_pages; - rdata->copy_into_pages = cifs_uncached_copy_into_pages; - rdata->credits = credits_on_stack; - rdata->ctx = ctx; + rdata->server = server; + rdata->cfile = cifsFileInfo_get(open_file); + rdata->offset = offset; + rdata->bytes = cur_len; + rdata->pid = pid; + rdata->credits = credits_on_stack; + rdata->ctx = ctx; kref_get(&ctx->refcount); + rdata->iter = ctx->iter; + iov_iter_advance(&rdata->iter, offset - ctx->pos); + iov_iter_truncate(&rdata->iter, cur_len); + rc = adjust_credits(server, &rdata->credits, rdata->bytes); if (!rc) { @@ -3694,12 +3490,9 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file, if (rc) { add_credits_and_wake_if(server, &rdata->credits, 0); - kref_put(&rdata->refcount, - cifs_uncached_readdata_release); - if (rc == -EAGAIN) { - iov_iter_revert(&direct_iov, cur_len); + kref_put(&rdata->refcount, cifs_readdata_release); + if (rc == -EAGAIN) continue; - } break; } @@ -3746,22 +3539,6 @@ collect_uncached_read_data(struct cifs_aio_ctx *ctx) list_del_init(&rdata->list); INIT_LIST_HEAD(&tmp_list); - /* - * Got a part of data and then reconnect has - * happened -- fill the buffer and continue - * reading. - */ - if (got_bytes && got_bytes < rdata->bytes) { - rc = 0; - if (!ctx->direct_io) - rc = cifs_readdata_to_iov(rdata, to); - if (rc) { - kref_put(&rdata->refcount, - cifs_uncached_readdata_release); - continue; - } - } - if (ctx->direct_io) { /* * Re-use rdata as this is a @@ -3778,7 +3555,7 @@ collect_uncached_read_data(struct cifs_aio_ctx *ctx) &tmp_list, ctx); kref_put(&rdata->refcount, - cifs_uncached_readdata_release); + cifs_readdata_release); } list_splice(&tmp_list, &ctx->list); @@ -3786,8 +3563,6 @@ collect_uncached_read_data(struct cifs_aio_ctx *ctx) goto again; } else if (rdata->result) rc = rdata->result; - else if (!ctx->direct_io) - rc = cifs_readdata_to_iov(rdata, to); /* if there was a short read -- discard anything left */ if (rdata->got_bytes && rdata->got_bytes < rdata->bytes) @@ -3796,7 +3571,7 @@ collect_uncached_read_data(struct cifs_aio_ctx *ctx) ctx->total_len += rdata->got_bytes; } list_del_init(&rdata->list); - kref_put(&rdata->refcount, cifs_uncached_readdata_release); + kref_put(&rdata->refcount, cifs_readdata_release); } if (!ctx->direct_io) @@ -3856,7 +3631,10 @@ static ssize_t __cifs_readv( if (!ctx) return -ENOMEM; - ctx->cfile = cifsFileInfo_get(cfile); + ctx->pos = offset; + ctx->direct_io = direct; + ctx->len = len; + ctx->cfile = cifsFileInfo_get(cfile); if (!is_sync_kiocb(iocb)) ctx->iocb = iocb; @@ -3864,19 +3642,12 @@ static ssize_t __cifs_readv( if (iter_is_iovec(to)) ctx->should_dirty = true; - if (direct) { - ctx->pos = offset; - ctx->direct_io = true; - ctx->iter = *to; - ctx->len = len; - } else { - rc = setup_aio_ctx_iter(ctx, to, READ); - if (rc) { - kref_put(&ctx->refcount, cifs_aio_ctx_release); - return rc; - } - len = ctx->len; + rc = extract_iter_to_iter(to, len, &ctx->iter, &ctx->bv); + if (rc < 0) { + kref_put(&ctx->refcount, cifs_aio_ctx_release); + return rc; } + ctx->npages = rc; /* grab a lock here due to read response handlers can access ctx */ mutex_lock(&ctx->aio_mutex); diff --git a/fs/cifs/misc.c b/fs/cifs/misc.c index f5fe5720456a..6bbc314ab84c 100644 --- a/fs/cifs/misc.c +++ b/fs/cifs/misc.c @@ -974,96 +974,6 @@ cifs_aio_ctx_release(struct kref *refcount) kfree(ctx); } -#define CIFS_AIO_KMALLOC_LIMIT (1024 * 1024) - -int -setup_aio_ctx_iter(struct cifs_aio_ctx *ctx, struct iov_iter *iter, int rw) -{ - ssize_t rc; - unsigned int cur_npages; - unsigned int npages = 0; - unsigned int i; - size_t len; - size_t count = iov_iter_count(iter); - unsigned int saved_len; - size_t start; - unsigned int max_pages = iov_iter_npages(iter, INT_MAX); - struct page **pages = NULL; - struct bio_vec *bv = NULL; - - if (iov_iter_is_kvec(iter)) { - memcpy(&ctx->iter, iter, sizeof(*iter)); - ctx->len = count; - iov_iter_advance(iter, count); - return 0; - } - - if (array_size(max_pages, sizeof(*bv)) <= CIFS_AIO_KMALLOC_LIMIT) - bv = kmalloc_array(max_pages, sizeof(*bv), GFP_KERNEL); - - if (!bv) { - bv = vmalloc(array_size(max_pages, sizeof(*bv))); - if (!bv) - return -ENOMEM; - } - - if (array_size(max_pages, sizeof(*pages)) <= CIFS_AIO_KMALLOC_LIMIT) - pages = kmalloc_array(max_pages, sizeof(*pages), GFP_KERNEL); - - if (!pages) { - pages = vmalloc(array_size(max_pages, sizeof(*pages))); - if (!pages) { - kvfree(bv); - return -ENOMEM; - } - } - - saved_len = count; - - while (count && npages < max_pages) { - rc = iov_iter_get_pages(iter, pages, count, max_pages, &start); - if (rc < 0) { - cifs_dbg(VFS, "Couldn't get user pages (rc=%zd)\n", rc); - break; - } - - if (rc > count) { - cifs_dbg(VFS, "get pages rc=%zd more than %zu\n", rc, - count); - break; - } - - iov_iter_advance(iter, rc); - count -= rc; - rc += start; - cur_npages = DIV_ROUND_UP(rc, PAGE_SIZE); - - if (npages + cur_npages > max_pages) { - cifs_dbg(VFS, "out of vec array capacity (%u vs %u)\n", - npages + cur_npages, max_pages); - break; - } - - for (i = 0; i < cur_npages; i++) { - len = rc > PAGE_SIZE ? PAGE_SIZE : rc; - bv[npages + i].bv_page = pages[i]; - bv[npages + i].bv_offset = start; - bv[npages + i].bv_len = len - start; - rc -= len; - start = 0; - } - - npages += cur_npages; - } - - kvfree(pages); - ctx->bv = bv; - ctx->len = saved_len - count; - ctx->npages = npages; - iov_iter_bvec(&ctx->iter, rw, ctx->bv, npages, ctx->len); - return 0; -} - /** * cifs_alloc_hash - allocate hash and hash context together * @name: The name of the crypto hash algo From patchwork Tue Jan 25 13:59:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 1584049 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=TTRy8nni; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-cifs-owner@vger.kernel.org; receiver=) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by bilbo.ozlabs.org (Postfix) with ESMTP id 4JjpTy00HPz9t5H for ; Wed, 26 Jan 2022 01:03:17 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1385700AbiAYODF (ORCPT ); Tue, 25 Jan 2022 09:03:05 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:26808 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1385744AbiAYOAZ (ORCPT ); Tue, 25 Jan 2022 09:00:25 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643119218; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ssmuK407DvTsyk/H/EO57MZ7G3jp8P3zJlHztI7Z1Ak=; b=TTRy8nnisq2OLzn/3Sffz8NMthYclXE/JxNLIokk6JalK1TT7rLV0u+NZCWKVopTApvSzs cXwLO4IgczETubZqRlIeAuShNwFlcutkPTxlS3+bGXwcfbr8kSkFYFzdmSmRxuClVrXorM mkLY7bqRcrW/mDmoPGXkzAOIfFJSAPI= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-25-T63jkJRWP3u8JZItuA1TNg-1; Tue, 25 Jan 2022 09:00:14 -0500 X-MC-Unique: T63jkJRWP3u8JZItuA1TNg-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 1A3958144E7; Tue, 25 Jan 2022 14:00:12 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.5]) by smtp.corp.redhat.com (Postfix) with ESMTP id 30471703B8; Tue, 25 Jan 2022 13:59:58 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [RFC PATCH 7/7] cifs: Use netfslib to handle reads From: David Howells To: smfrench@gmail.com, nspmangalore@gmail.com Cc: dhowells@redhat.com, jlayton@kernel.org, linux-cifs@vger.kernel.org, linux-cachefs@redhat.com, linux-fsdevel@vger.kernel.org Date: Tue, 25 Jan 2022 13:59:57 +0000 Message-ID: <164311919732.2806745.2743328800847071763.stgit@warthog.procyon.org.uk> In-Reply-To: <164311902471.2806745.10187041199819525677.stgit@warthog.procyon.org.uk> References: <164311902471.2806745.10187041199819525677.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: linux-cifs@vger.kernel.org --- fs/cifs/Kconfig | 1 fs/cifs/cifsfs.c | 6 fs/cifs/cifsfs.h | 3 fs/cifs/cifsglob.h | 6 fs/cifs/cifssmb.c | 9 - fs/cifs/file.c | 824 ++++++++---------------------------------------- fs/cifs/fscache.c | 31 -- fs/cifs/fscache.h | 52 --- fs/cifs/inode.c | 17 + fs/cifs/smb2pdu.c | 15 + fs/netfs/read_helper.c | 7 11 files changed, 182 insertions(+), 789 deletions(-) diff --git a/fs/cifs/Kconfig b/fs/cifs/Kconfig index 3b7e3b9e4fd2..c47e2d3a101f 100644 --- a/fs/cifs/Kconfig +++ b/fs/cifs/Kconfig @@ -2,6 +2,7 @@ config CIFS tristate "SMB3 and CIFS support (advanced network filesystem)" depends on INET + select NETFS_SUPPORT select NLS select CRYPTO select CRYPTO_MD5 diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c index a56cb9c8c5ff..bd06df3bb24b 100644 --- a/fs/cifs/cifsfs.c +++ b/fs/cifs/cifsfs.c @@ -936,7 +936,7 @@ cifs_loose_read_iter(struct kiocb *iocb, struct iov_iter *iter) struct inode *inode = file_inode(iocb->ki_filp); if (iocb->ki_flags & IOCB_DIRECT) - return cifs_user_readv(iocb, iter); + return netfs_direct_read_iter(iocb, iter); rc = cifs_revalidate_mapping(inode); if (rc) @@ -1314,7 +1314,7 @@ const struct file_operations cifs_file_strict_ops = { }; const struct file_operations cifs_file_direct_ops = { - .read_iter = cifs_direct_readv, + .read_iter = netfs_direct_read_iter, .write_iter = cifs_direct_writev, .open = cifs_open, .release = cifs_close, @@ -1370,7 +1370,7 @@ const struct file_operations cifs_file_strict_nobrl_ops = { }; const struct file_operations cifs_file_direct_nobrl_ops = { - .read_iter = cifs_direct_readv, + .read_iter = netfs_direct_read_iter, .write_iter = cifs_direct_writev, .open = cifs_open, .release = cifs_close, diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h index 1c77bbc0815f..c7d5c268fc47 100644 --- a/fs/cifs/cifsfs.h +++ b/fs/cifs/cifsfs.h @@ -85,6 +85,7 @@ extern const struct inode_operations cifs_dfs_referral_inode_operations; /* Functions related to files and directories */ +extern const struct netfs_request_ops cifs_req_ops; extern const struct file_operations cifs_file_ops; extern const struct file_operations cifs_file_direct_ops; /* if directio mnt */ extern const struct file_operations cifs_file_strict_ops; /* if strictio mnt */ @@ -94,8 +95,6 @@ extern const struct file_operations cifs_file_strict_nobrl_ops; extern int cifs_open(struct inode *inode, struct file *file); extern int cifs_close(struct inode *inode, struct file *file); extern int cifs_closedir(struct inode *inode, struct file *file); -extern ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to); -extern ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to); extern ssize_t cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to); extern ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from); extern ssize_t cifs_direct_writev(struct kiocb *iocb, struct iov_iter *from); diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h index 3a4fed645636..938e4e9827ed 100644 --- a/fs/cifs/cifsglob.h +++ b/fs/cifs/cifsglob.h @@ -1313,18 +1313,14 @@ struct cifs_aio_ctx { /* asynchronous read support */ struct cifs_readdata { + struct netfs_read_subrequest *subreq; struct kref refcount; - struct list_head list; - struct completion done; struct cifsFileInfo *cfile; - struct address_space *mapping; - struct cifs_aio_ctx *ctx; __u64 offset; ssize_t got_bytes; unsigned int bytes; pid_t pid; int result; - struct work_struct work; struct iov_iter iter; struct kvec iov[2]; struct TCP_Server_Info *server; diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c index 38e7276352e2..c9fb77a8b31b 100644 --- a/fs/cifs/cifssmb.c +++ b/fs/cifs/cifssmb.c @@ -23,6 +23,7 @@ #include #include #include +#include #include "cifspdu.h" #include "cifsfs.h" #include "cifsglob.h" @@ -1609,7 +1610,13 @@ cifs_readv_callback(struct mid_q_entry *mid) rdata->result = -EIO; } - queue_work(cifsiod_wq, &rdata->work); + if (rdata->result == 0 || rdata->result == -EAGAIN) + iov_iter_advance(&rdata->subreq->iter, rdata->got_bytes); + netfs_subreq_terminated(rdata->subreq, + (rdata->result == 0 || rdata->result == -EAGAIN) ? + rdata->got_bytes : rdata->result, + false); + kref_put(&rdata->refcount, cifs_readdata_release); DeleteMidQEntry(mid); add_credits(server, &credits, 0); } diff --git a/fs/cifs/file.c b/fs/cifs/file.c index f9b9a1562e17..36559de02e37 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -21,6 +21,7 @@ #include #include #include +#include #include #include "cifsfs.h" #include "cifspdu.h" @@ -3306,12 +3307,8 @@ static struct cifs_readdata *cifs_readdata_alloc(work_func_t complete) struct cifs_readdata *rdata; rdata = kzalloc(sizeof(*rdata), GFP_KERNEL); - if (rdata) { + if (rdata) kref_init(&rdata->refcount); - INIT_LIST_HEAD(&rdata->list); - init_completion(&rdata->done); - INIT_WORK(&rdata->work, complete); - } return rdata; } @@ -3322,8 +3319,6 @@ cifs_readdata_release(struct kref *refcount) struct cifs_readdata *rdata = container_of(refcount, struct cifs_readdata, refcount); - if (rdata->ctx) - kref_put(&rdata->ctx->refcount, cifs_aio_ctx_release); #ifdef CONFIG_CIFS_SMB_DIRECT if (rdata->mr) { smbd_deregister_mr(rdata->mr); @@ -3336,370 +3331,6 @@ cifs_readdata_release(struct kref *refcount) kfree(rdata); } -static void collect_uncached_read_data(struct cifs_aio_ctx *ctx); - -static void -cifs_uncached_readv_complete(struct work_struct *work) -{ - struct cifs_readdata *rdata = container_of(work, - struct cifs_readdata, work); - - complete(&rdata->done); - collect_uncached_read_data(rdata->ctx); - /* the below call can possibly free the last ref to aio ctx */ - kref_put(&rdata->refcount, cifs_readdata_release); -} - -static int cifs_resend_rdata(struct cifs_readdata *rdata, - struct list_head *rdata_list, - struct cifs_aio_ctx *ctx) -{ - unsigned int rsize; - struct cifs_credits credits; - int rc; - struct TCP_Server_Info *server; - - /* XXX: should we pick a new channel here? */ - server = rdata->server; - - do { - if (rdata->cfile->invalidHandle) { - rc = cifs_reopen_file(rdata->cfile, true); - if (rc == -EAGAIN) - continue; - else if (rc) - break; - } - - /* - * Wait for credits to resend this rdata. - * Note: we are attempting to resend the whole rdata not in - * segments - */ - do { - rc = server->ops->wait_mtu_credits(server, rdata->bytes, - &rsize, &credits); - - if (rc) - goto fail; - - if (rsize < rdata->bytes) { - add_credits_and_wake_if(server, &credits, 0); - msleep(1000); - } - } while (rsize < rdata->bytes); - rdata->credits = credits; - - rc = adjust_credits(server, &rdata->credits, rdata->bytes); - if (!rc) { - if (rdata->cfile->invalidHandle) - rc = -EAGAIN; - else { -#ifdef CONFIG_CIFS_SMB_DIRECT - if (rdata->mr) { - rdata->mr->need_invalidate = true; - smbd_deregister_mr(rdata->mr); - rdata->mr = NULL; - } -#endif - rc = server->ops->async_readv(rdata); - } - } - - /* If the read was successfully sent, we are done */ - if (!rc) { - /* Add to aio pending list */ - list_add_tail(&rdata->list, rdata_list); - return 0; - } - - /* Roll back credits and retry if needed */ - add_credits_and_wake_if(server, &rdata->credits, 0); - } while (rc == -EAGAIN); - -fail: - kref_put(&rdata->refcount, cifs_readdata_release); - return rc; -} - -static int -cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file, - struct cifs_sb_info *cifs_sb, struct list_head *rdata_list, - struct cifs_aio_ctx *ctx) -{ - struct cifs_readdata *rdata; - unsigned int rsize; - struct cifs_credits credits_on_stack; - struct cifs_credits *credits = &credits_on_stack; - size_t cur_len; - int rc; - pid_t pid; - struct TCP_Server_Info *server; - - server = cifs_pick_channel(tlink_tcon(open_file->tlink)->ses); - - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD) - pid = open_file->pid; - else - pid = current->tgid; - - do { - if (open_file->invalidHandle) { - rc = cifs_reopen_file(open_file, true); - if (rc == -EAGAIN) - continue; - else if (rc) - break; - } - - rc = server->ops->wait_mtu_credits(server, cifs_sb->ctx->rsize, - &rsize, credits); - if (rc) - break; - - cur_len = min_t(const size_t, len, rsize); - - rdata = cifs_readdata_alloc(cifs_uncached_readv_complete); - if (!rdata) { - add_credits_and_wake_if(server, credits, 0); - rc = -ENOMEM; - break; - } - - rdata->server = server; - rdata->cfile = cifsFileInfo_get(open_file); - rdata->offset = offset; - rdata->bytes = cur_len; - rdata->pid = pid; - rdata->credits = credits_on_stack; - rdata->ctx = ctx; - kref_get(&ctx->refcount); - - rdata->iter = ctx->iter; - iov_iter_advance(&rdata->iter, offset - ctx->pos); - iov_iter_truncate(&rdata->iter, cur_len); - - rc = adjust_credits(server, &rdata->credits, rdata->bytes); - - if (!rc) { - if (rdata->cfile->invalidHandle) - rc = -EAGAIN; - else - rc = server->ops->async_readv(rdata); - } - - if (rc) { - add_credits_and_wake_if(server, &rdata->credits, 0); - kref_put(&rdata->refcount, cifs_readdata_release); - if (rc == -EAGAIN) - continue; - break; - } - - list_add_tail(&rdata->list, rdata_list); - offset += cur_len; - len -= cur_len; - } while (len > 0); - - return rc; -} - -static void -collect_uncached_read_data(struct cifs_aio_ctx *ctx) -{ - struct cifs_readdata *rdata, *tmp; - struct iov_iter *to = &ctx->iter; - struct cifs_sb_info *cifs_sb; - int rc; - - cifs_sb = CIFS_SB(ctx->cfile->dentry->d_sb); - - mutex_lock(&ctx->aio_mutex); - - if (list_empty(&ctx->list)) { - mutex_unlock(&ctx->aio_mutex); - return; - } - - rc = ctx->rc; - /* the loop below should proceed in the order of increasing offsets */ -again: - list_for_each_entry_safe(rdata, tmp, &ctx->list, list) { - if (!rc) { - if (!try_wait_for_completion(&rdata->done)) { - mutex_unlock(&ctx->aio_mutex); - return; - } - - if (rdata->result == -EAGAIN) { - /* resend call if it's a retryable error */ - struct list_head tmp_list; - unsigned int got_bytes = rdata->got_bytes; - - list_del_init(&rdata->list); - INIT_LIST_HEAD(&tmp_list); - - if (ctx->direct_io) { - /* - * Re-use rdata as this is a - * direct I/O - */ - rc = cifs_resend_rdata( - rdata, - &tmp_list, ctx); - } else { - rc = cifs_send_async_read( - rdata->offset + got_bytes, - rdata->bytes - got_bytes, - rdata->cfile, cifs_sb, - &tmp_list, ctx); - - kref_put(&rdata->refcount, - cifs_readdata_release); - } - - list_splice(&tmp_list, &ctx->list); - - goto again; - } else if (rdata->result) - rc = rdata->result; - - /* if there was a short read -- discard anything left */ - if (rdata->got_bytes && rdata->got_bytes < rdata->bytes) - rc = -ENODATA; - - ctx->total_len += rdata->got_bytes; - } - list_del_init(&rdata->list); - kref_put(&rdata->refcount, cifs_readdata_release); - } - - if (!ctx->direct_io) - ctx->total_len = ctx->len - iov_iter_count(to); - - /* mask nodata case */ - if (rc == -ENODATA) - rc = 0; - - ctx->rc = (rc == 0) ? (ssize_t)ctx->total_len : rc; - - mutex_unlock(&ctx->aio_mutex); - - if (ctx->iocb && ctx->iocb->ki_complete) - ctx->iocb->ki_complete(ctx->iocb, ctx->rc); - else - complete(&ctx->done); -} - -static ssize_t __cifs_readv( - struct kiocb *iocb, struct iov_iter *to, bool direct) -{ - size_t len; - struct file *file = iocb->ki_filp; - struct cifs_sb_info *cifs_sb; - struct cifsFileInfo *cfile; - struct cifs_tcon *tcon; - ssize_t rc, total_read = 0; - loff_t offset = iocb->ki_pos; - struct cifs_aio_ctx *ctx; - - /* - * iov_iter_get_pages_alloc() doesn't work with ITER_KVEC, - * fall back to data copy read path - * this could be improved by getting pages directly in ITER_KVEC - */ - if (direct && iov_iter_is_kvec(to)) { - cifs_dbg(FYI, "use non-direct cifs_user_readv for kvec I/O\n"); - direct = false; - } - - len = iov_iter_count(to); - if (!len) - return 0; - - cifs_sb = CIFS_FILE_SB(file); - cfile = file->private_data; - tcon = tlink_tcon(cfile->tlink); - - if (!tcon->ses->server->ops->async_readv) - return -ENOSYS; - - if ((file->f_flags & O_ACCMODE) == O_WRONLY) - cifs_dbg(FYI, "attempting read on write only file instance\n"); - - ctx = cifs_aio_ctx_alloc(); - if (!ctx) - return -ENOMEM; - - ctx->pos = offset; - ctx->direct_io = direct; - ctx->len = len; - ctx->cfile = cifsFileInfo_get(cfile); - - if (!is_sync_kiocb(iocb)) - ctx->iocb = iocb; - - if (iter_is_iovec(to)) - ctx->should_dirty = true; - - rc = extract_iter_to_iter(to, len, &ctx->iter, &ctx->bv); - if (rc < 0) { - kref_put(&ctx->refcount, cifs_aio_ctx_release); - return rc; - } - ctx->npages = rc; - - /* grab a lock here due to read response handlers can access ctx */ - mutex_lock(&ctx->aio_mutex); - - rc = cifs_send_async_read(offset, len, cfile, cifs_sb, &ctx->list, ctx); - - /* if at least one read request send succeeded, then reset rc */ - if (!list_empty(&ctx->list)) - rc = 0; - - mutex_unlock(&ctx->aio_mutex); - - if (rc) { - kref_put(&ctx->refcount, cifs_aio_ctx_release); - return rc; - } - - if (!is_sync_kiocb(iocb)) { - kref_put(&ctx->refcount, cifs_aio_ctx_release); - return -EIOCBQUEUED; - } - - rc = wait_for_completion_killable(&ctx->done); - if (rc) { - mutex_lock(&ctx->aio_mutex); - ctx->rc = rc = -EINTR; - total_read = ctx->total_len; - mutex_unlock(&ctx->aio_mutex); - } else { - rc = ctx->rc; - total_read = ctx->total_len; - } - - kref_put(&ctx->refcount, cifs_aio_ctx_release); - - if (total_read) { - iocb->ki_pos += total_read; - return total_read; - } - return rc; -} - -ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to) -{ - return __cifs_readv(iocb, to, true); -} - -ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to) -{ - return __cifs_readv(iocb, to, false); -} - ssize_t cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to) { @@ -3720,12 +3351,15 @@ cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to) * pos+len-1. */ if (!CIFS_CACHE_READ(cinode)) - return cifs_user_readv(iocb, to); + return netfs_direct_read_iter(iocb, to); if (cap_unix(tcon->ses) && (CIFS_UNIX_FCNTL_CAP & le64_to_cpu(tcon->fsUnixInfo.Capability)) && - ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NOPOSIXBRL) == 0)) + ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NOPOSIXBRL) == 0)) { + if (iocb->ki_flags & IOCB_DIRECT) + return netfs_direct_read_iter(iocb, to); return generic_file_read_iter(iocb, to); + } /* * We need to hold the sem to be sure nobody modifies lock list @@ -3734,104 +3368,16 @@ cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to) down_read(&cinode->lock_sem); if (!cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(to), tcon->ses->server->vals->shared_lock_type, - 0, NULL, CIFS_READ_OP)) - rc = generic_file_read_iter(iocb, to); + 0, NULL, CIFS_READ_OP)) { + if (iocb->ki_flags & IOCB_DIRECT) + rc = netfs_direct_read_iter(iocb, to); + else + rc = generic_file_read_iter(iocb, to); + } up_read(&cinode->lock_sem); return rc; } -static ssize_t -cifs_read(struct file *file, char *read_data, size_t read_size, loff_t *offset) -{ - int rc = -EACCES; - unsigned int bytes_read = 0; - unsigned int total_read; - unsigned int current_read_size; - unsigned int rsize; - struct cifs_sb_info *cifs_sb; - struct cifs_tcon *tcon; - struct TCP_Server_Info *server; - unsigned int xid; - char *cur_offset; - struct cifsFileInfo *open_file; - struct cifs_io_parms io_parms = {0}; - int buf_type = CIFS_NO_BUFFER; - __u32 pid; - - xid = get_xid(); - cifs_sb = CIFS_FILE_SB(file); - - /* FIXME: set up handlers for larger reads and/or convert to async */ - rsize = min_t(unsigned int, cifs_sb->ctx->rsize, CIFSMaxBufSize); - - if (file->private_data == NULL) { - rc = -EBADF; - free_xid(xid); - return rc; - } - open_file = file->private_data; - tcon = tlink_tcon(open_file->tlink); - server = cifs_pick_channel(tcon->ses); - - if (!server->ops->sync_read) { - free_xid(xid); - return -ENOSYS; - } - - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD) - pid = open_file->pid; - else - pid = current->tgid; - - if ((file->f_flags & O_ACCMODE) == O_WRONLY) - cifs_dbg(FYI, "attempting read on write only file instance\n"); - - for (total_read = 0, cur_offset = read_data; read_size > total_read; - total_read += bytes_read, cur_offset += bytes_read) { - do { - current_read_size = min_t(uint, read_size - total_read, - rsize); - /* - * For windows me and 9x we do not want to request more - * than it negotiated since it will refuse the read - * then. - */ - if (!(tcon->ses->capabilities & - tcon->ses->server->vals->cap_large_files)) { - current_read_size = min_t(uint, - current_read_size, CIFSMaxBufSize); - } - if (open_file->invalidHandle) { - rc = cifs_reopen_file(open_file, true); - if (rc != 0) - break; - } - io_parms.pid = pid; - io_parms.tcon = tcon; - io_parms.offset = *offset; - io_parms.length = current_read_size; - io_parms.server = server; - rc = server->ops->sync_read(xid, &open_file->fid, &io_parms, - &bytes_read, &cur_offset, - &buf_type); - } while (rc == -EAGAIN); - - if (rc || (bytes_read == 0)) { - if (total_read) { - break; - } else { - free_xid(xid); - return rc; - } - } else { - cifs_stats_bytes_read(tcon, total_read); - *offset += bytes_read; - } - } - free_xid(xid); - return total_read; -} - /* * If the page is mmap'ed into a process' page tables, then we need to make * sure that it doesn't change while being written back. @@ -3901,224 +3447,149 @@ int cifs_file_mmap(struct file *file, struct vm_area_struct *vma) } /* - * Unlock a bunch of folios in the pagecache. + * Issue a read operation on behalf of the netfs helper functions. We're asked + * to make a read of a certain size at a point in the file. We are permitted + * to only read a portion of that, but as long as we read something, the netfs + * helper will call us again so that we can issue another read. */ -static void cifs_unlock_folios(struct address_space *mapping, pgoff_t first, pgoff_t last) -{ - struct folio *folio; - XA_STATE(xas, &mapping->i_pages, first); - - rcu_read_lock(); - xas_for_each(&xas, folio, last) { - folio_unlock(folio); - } - rcu_read_unlock(); -} - -static void cifs_readahead_complete(struct work_struct *work) -{ - struct cifs_readdata *rdata = container_of(work, - struct cifs_readdata, work); - struct folio *folio; - pgoff_t last; - bool good = rdata->result == 0 || (rdata->result == -EAGAIN && rdata->got_bytes); - - XA_STATE(xas, &rdata->mapping->i_pages, rdata->offset / PAGE_SIZE); - -#if 0 - if (good) - cifs_readpage_to_fscache(rdata->mapping->host, page); -#endif - - if (iov_iter_count(&rdata->iter) > 0) - iov_iter_zero(iov_iter_count(&rdata->iter), &rdata->iter); - - last = round_down(rdata->offset + rdata->got_bytes - 1, PAGE_SIZE); - - xas_for_each(&xas, folio, last) { - if (good) { - flush_dcache_folio(folio); - folio_mark_uptodate(folio); - } - folio_unlock(folio); - } - - kref_put(&rdata->refcount, cifs_readdata_release); -} - -static void cifs_readahead(struct readahead_control *ractl) +static void cifs_req_issue_op(struct netfs_read_subrequest *subreq) { - struct cifsFileInfo *open_file = ractl->file->private_data; - struct cifs_sb_info *cifs_sb = CIFS_FILE_SB(ractl->file); + struct netfs_read_request *rreq = subreq->rreq; struct TCP_Server_Info *server; + struct cifs_readdata *rdata; + struct cifsFileInfo *open_file = rreq->netfs_priv; + struct cifs_sb_info *cifs_sb = CIFS_SB(rreq->inode->i_sb); + struct cifs_credits credits_on_stack, *credits = &credits_on_stack; unsigned int xid; pid_t pid; int rc = 0; + unsigned int rsize; xid = get_xid(); if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD) pid = open_file->pid; else - pid = current->tgid; + pid = current->tgid; // Ummm... This may be a workqueue server = cifs_pick_channel(tlink_tcon(open_file->tlink)->ses); - cifs_dbg(FYI, "%s: file=%p mapping=%p num_pages=%u\n", - __func__, ractl->file, ractl->mapping, readahead_count(ractl)); - - /* - * Chop the readahead request up into rsize-sized read requests. - */ - while (readahead_count(ractl) - ractl->_batch_count) { - unsigned int i, nr_pages, rsize; - struct cifs_readdata *rdata; - struct cifs_credits credits_on_stack; - struct cifs_credits *credits = &credits_on_stack; + cifs_dbg(FYI, "%s: op=%08x[%x] mapping=%p len=%zu/%zu\n", + __func__, rreq->debug_id, subreq->debug_index, rreq->mapping, + subreq->transferred, subreq->len); - if (open_file->invalidHandle) { + if (open_file->invalidHandle) { + do { rc = cifs_reopen_file(open_file, true); - if (rc) { - if (rc == -EAGAIN) - continue; - break; - } - } - - rc = server->ops->wait_mtu_credits(server, cifs_sb->ctx->rsize, - &rsize, credits); + } while (rc == -EAGAIN); if (rc) - break; - nr_pages = min_t(size_t, rsize / PAGE_SIZE, readahead_count(ractl)); - - /* - * Give up immediately if rsize is too small to read an entire - * page. The VFS will fall back to readpage. We should never - * reach this point however since we set ra_pages to 0 when the - * rsize is smaller than a cache page. - */ - if (unlikely(!nr_pages)) { - add_credits_and_wake_if(server, credits, 0); - break; - } - - rdata = cifs_readdata_alloc(cifs_readahead_complete); - if (!rdata) { - /* best to give up if we're out of mem */ - add_credits_and_wake_if(server, credits, 0); - break; - } + goto out; + } - rdata->offset = readahead_pos(ractl); - rdata->bytes = nr_pages * PAGE_SIZE; - rdata->cfile = cifsFileInfo_get(open_file); - rdata->server = server; - rdata->mapping = ractl->mapping; - rdata->pid = pid; - rdata->credits = credits_on_stack; + rc = server->ops->wait_mtu_credits(server, cifs_sb->ctx->rsize, &rsize, credits); + if (rc) + goto out; - for (i = 0; i < nr_pages; i++) - if (!readahead_folio(ractl)) - BUG(); + rdata = cifs_readdata_alloc(NULL); + if (!rdata) { + add_credits_and_wake_if(server, credits, 0); + rc = -ENOMEM; + goto out; + } - iov_iter_xarray(&rdata->iter, READ, &rdata->mapping->i_pages, - rdata->offset, rdata->bytes); + __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); + rdata->subreq = subreq; + rdata->cfile = cifsFileInfo_get(open_file); + rdata->server = server; + rdata->offset = subreq->start + subreq->transferred; + rdata->bytes = subreq->len - subreq->transferred; + rdata->pid = pid; + rdata->credits = credits_on_stack; + rdata->iter = subreq->iter; - rc = adjust_credits(server, &rdata->credits, rdata->bytes); - if (!rc) { - if (rdata->cfile->invalidHandle) - rc = -EAGAIN; - else - rc = server->ops->async_readv(rdata); - } + rc = adjust_credits(server, &rdata->credits, rdata->bytes); + if (!rc) { + if (rdata->cfile->invalidHandle) + rc = -EAGAIN; + else + rc = server->ops->async_readv(rdata); + } - if (rc) { - add_credits_and_wake_if(server, &rdata->credits, 0); - cifs_unlock_folios(rdata->mapping, - rdata->offset / PAGE_SIZE, - (rdata->offset + rdata->bytes - 1) / PAGE_SIZE); - /* Fallback to the readpage in error/reconnect cases */ - kref_put(&rdata->refcount, cifs_readdata_release); - break; - } + if (rc) { + add_credits_and_wake_if(server, &rdata->credits, 0); + /* Fallback to the readpage in error/reconnect cases */ + kref_put(&rdata->refcount, cifs_readdata_release); + goto out; } + kref_put(&rdata->refcount, cifs_readdata_release); + +out: free_xid(xid); + if (rc) + netfs_subreq_terminated(subreq, rc, false); +} + +static int cifs_init_rreq(struct netfs_read_request *rreq, struct file *file) +{ + rreq->netfs_priv = file->private_data; + return 0; } /* - * cifs_readpage_worker must be called with the page pinned + * Expand the size of a readahead to the size of the rsize, if at least as + * large as a page, allowing for the possibility that rsize is not pow-2 + * aligned. */ -static int cifs_readpage_worker(struct file *file, struct page *page, - loff_t *poffset) +static void cifs_expand_readahead(struct netfs_read_request *rreq) { - char *read_data; - int rc; + struct cifs_sb_info *cifs_sb = CIFS_SB(rreq->inode->i_sb); + unsigned int rsize = cifs_sb->ctx->rsize; + loff_t misalignment, i_size = i_size_read(rreq->inode); - /* Is the page cached? */ - rc = cifs_readpage_from_fscache(file_inode(file), page); - if (rc == 0) - goto read_complete; - - read_data = kmap(page); - /* for reads over a certain size could initiate async read ahead */ - - rc = cifs_read(file, read_data, PAGE_SIZE, poffset); - - if (rc < 0) - goto io_error; - else - cifs_dbg(FYI, "Bytes read %d\n", rc); + if (rsize < PAGE_SIZE) + return; - /* we do not want atime to be less than mtime, it broke some apps */ - file_inode(file)->i_atime = current_time(file_inode(file)); - if (timespec64_compare(&(file_inode(file)->i_atime), &(file_inode(file)->i_mtime))) - file_inode(file)->i_atime = file_inode(file)->i_mtime; + if (rsize < INT_MAX) + rsize = roundup_pow_of_two(rsize); else - file_inode(file)->i_atime = current_time(file_inode(file)); + rsize = ((unsigned int)INT_MAX + 1) / 2; - if (PAGE_SIZE > rc) - memset(read_data + rc, 0, PAGE_SIZE - rc); - - flush_dcache_page(page); - SetPageUptodate(page); - - /* send this page to the cache */ - cifs_readpage_to_fscache(file_inode(file), page); - - rc = 0; - -io_error: - kunmap(page); - unlock_page(page); + misalignment = rreq->start & (rsize - 1); + if (misalignment) { + rreq->start -= misalignment; + rreq->len += misalignment; + } -read_complete: - return rc; + rreq->len = round_up(rreq->len, rsize); + if (rreq->start < i_size && rreq->len > i_size - rreq->start) + rreq->len = i_size - rreq->start; } -static int cifs_readpage(struct file *file, struct page *page) +static void cifs_rreq_done(struct netfs_read_request *rreq) { - loff_t offset = page_file_offset(page); - int rc = -EACCES; - unsigned int xid; + struct inode *inode = rreq->inode; - xid = get_xid(); - - if (file->private_data == NULL) { - rc = -EBADF; - free_xid(xid); - return rc; - } - - cifs_dbg(FYI, "readpage %p at offset %d 0x%x\n", - page, (int)offset, (int)offset); - - rc = cifs_readpage_worker(file, page, &offset); + /* we do not want atime to be less than mtime, it broke some apps */ + inode->i_atime = current_time(inode); + if (timespec64_compare(&inode->i_atime, &inode->i_mtime)) + inode->i_atime = inode->i_mtime; + else + inode->i_atime = current_time(inode); +} - free_xid(xid); - return rc; +static void cifs_req_cleanup(struct address_space *mapping, void *netfs_priv) +{ } +const struct netfs_request_ops cifs_req_ops = { + .init_rreq = cifs_init_rreq, + .expand_readahead = cifs_expand_readahead, + .issue_op = cifs_req_issue_op, + .done = cifs_rreq_done, + .cleanup = cifs_req_cleanup, +}; + static int is_inode_writable(struct cifsInodeInfo *cifs_inode) { struct cifsFileInfo *open_file; @@ -4168,34 +3639,20 @@ static int cifs_write_begin(struct file *file, struct address_space *mapping, loff_t pos, unsigned len, unsigned flags, struct page **pagep, void **fsdata) { - int oncethru = 0; - pgoff_t index = pos >> PAGE_SHIFT; - loff_t offset = pos & (PAGE_SIZE - 1); - loff_t page_start = pos & PAGE_MASK; - loff_t i_size; - struct page *page; - int rc = 0; + struct folio *folio; + int rc; cifs_dbg(FYI, "write_begin from %lld len %d\n", (long long)pos, len); -start: - page = grab_cache_page_write_begin(mapping, index, flags); - if (!page) { - rc = -ENOMEM; - goto out; - } - - if (PageUptodate(page)) - goto out; - - /* - * If we write a full page it will be up to date, no need to read from - * the server. If the write is short, we'll end up doing a sync write - * instead. + /* Prefetch area to be written into the cache if we're caching this + * file. We need to do this before we get a lock on the page in case + * there's more than one writer competing for the same cache block. */ - if (len == PAGE_SIZE) - goto out; + rc = netfs_write_begin(file, mapping, pos, len, flags, &folio, fsdata); + if (rc < 0) + return rc; +#if 0 /* * optimize away the read when we have an oplock, and we're not * expecting to use any of the data we'd be reading in. That @@ -4210,34 +3667,17 @@ static int cifs_write_begin(struct file *file, struct address_space *mapping, offset + len, PAGE_SIZE); /* - * PageChecked means that the parts of the page - * to which we're not writing are considered up - * to date. Once the data is copied to the - * page, it can be set uptodate. + * Marking a folio checked means that the parts of the + * page to which we're not writing are considered up to + * date. Once the data is copied to the page, it can be + * set uptodate. */ - SetPageChecked(page); + folio_set_checked(folio); goto out; } } - - if ((file->f_flags & O_ACCMODE) != O_WRONLY && !oncethru) { - /* - * might as well read a page, it is fast enough. If we get - * an error, we don't need to return it. cifs_write_end will - * do a sync write instead since PG_uptodate isn't set. - */ - cifs_readpage_worker(file, page, &page_start); - put_page(page); - oncethru = 1; - goto start; - } else { - /* we could try using another file handle if there is one - - but how would we lock it to prevent close of that handle - racing with this read? In any case - this will be written out by write_end so is fine */ - } -out: - *pagep = page; +#endif + *pagep = folio_page(folio, (pos - folio_pos(folio)) / PAGE_SIZE); return rc; } @@ -4429,8 +3869,8 @@ static int cifs_set_page_dirty(struct page *page) #endif const struct address_space_operations cifs_addr_ops = { - .readpage = cifs_readpage, - .readahead = cifs_readahead, + .readpage = netfs_readpage, + .readahead = netfs_readahead, .writepage = cifs_writepage, .writepages = cifs_writepages, .write_begin = cifs_write_begin, @@ -4455,7 +3895,7 @@ const struct address_space_operations cifs_addr_ops = { * to leave cifs_readpages out of the address space operations. */ const struct address_space_operations cifs_addr_ops_smallbuf = { - .readpage = cifs_readpage, + .readpage = netfs_readpage, .writepage = cifs_writepage, .writepages = cifs_writepages, .write_begin = cifs_write_begin, diff --git a/fs/cifs/fscache.c b/fs/cifs/fscache.c index a7e7e5a97b7f..bb1c3a372de4 100644 --- a/fs/cifs/fscache.c +++ b/fs/cifs/fscache.c @@ -134,34 +134,3 @@ void cifs_fscache_release_inode_cookie(struct inode *inode) cifsi->netfs_ctx.cache = NULL; } } - -/* - * Retrieve a page from FS-Cache - */ -int __cifs_readpage_from_fscache(struct inode *inode, struct page *page) -{ - cifs_dbg(FYI, "%s: (fsc:%p, p:%p, i:0x%p\n", - __func__, cifs_inode_cookie(inode), page, inode); - return -ENOBUFS; // Needs conversion to using netfslib -} - -/* - * Retrieve a set of pages from FS-Cache - */ -int __cifs_readpages_from_fscache(struct inode *inode, - struct address_space *mapping, - struct list_head *pages, - unsigned *nr_pages) -{ - cifs_dbg(FYI, "%s: (0x%p/%u/0x%p)\n", - __func__, cifs_inode_cookie(inode), *nr_pages, inode); - return -ENOBUFS; // Needs conversion to using netfslib -} - -void __cifs_readpage_to_fscache(struct inode *inode, struct page *page) -{ - cifs_dbg(FYI, "%s: (fsc: %p, p: %p, i: %p)\n", - __func__, cifs_inode_cookie(inode), page, inode); - - // Needs conversion to using netfslib -} diff --git a/fs/cifs/fscache.h b/fs/cifs/fscache.h index 9f6e42e85d14..fdc03cd7b881 100644 --- a/fs/cifs/fscache.h +++ b/fs/cifs/fscache.h @@ -58,14 +58,6 @@ void cifs_fscache_fill_coherency(struct inode *inode, } -extern int cifs_fscache_release_page(struct page *page, gfp_t gfp); -extern int __cifs_readpage_from_fscache(struct inode *, struct page *); -extern int __cifs_readpages_from_fscache(struct inode *, - struct address_space *, - struct list_head *, - unsigned *); -extern void __cifs_readpage_to_fscache(struct inode *, struct page *); - static inline struct fscache_cookie *cifs_inode_cookie(struct inode *inode) { return netfs_i_cookie(inode); @@ -80,33 +72,6 @@ static inline void cifs_invalidate_cache(struct inode *inode, unsigned int flags i_size_read(inode), flags); } -static inline int cifs_readpage_from_fscache(struct inode *inode, - struct page *page) -{ - if (cifs_inode_cookie(inode)) - return __cifs_readpage_from_fscache(inode, page); - - return -ENOBUFS; -} - -static inline int cifs_readpages_from_fscache(struct inode *inode, - struct address_space *mapping, - struct list_head *pages, - unsigned *nr_pages) -{ - if (cifs_inode_cookie(inode)) - return __cifs_readpages_from_fscache(inode, mapping, pages, - nr_pages); - return -ENOBUFS; -} - -static inline void cifs_readpage_to_fscache(struct inode *inode, - struct page *page) -{ - if (PageFsCache(page)) - __cifs_readpage_to_fscache(inode, page); -} - #else /* CONFIG_CIFS_FSCACHE */ static inline void cifs_fscache_fill_coherency(struct inode *inode, @@ -123,23 +88,6 @@ static inline void cifs_fscache_unuse_inode_cookie(struct inode *inode, bool upd static inline struct fscache_cookie *cifs_inode_cookie(struct inode *inode) { return NULL; } static inline void cifs_invalidate_cache(struct inode *inode, unsigned int flags) {} -static inline int -cifs_readpage_from_fscache(struct inode *inode, struct page *page) -{ - return -ENOBUFS; -} - -static inline int cifs_readpages_from_fscache(struct inode *inode, - struct address_space *mapping, - struct list_head *pages, - unsigned *nr_pages) -{ - return -ENOBUFS; -} - -static inline void cifs_readpage_to_fscache(struct inode *inode, - struct page *page) {} - #endif /* CONFIG_CIFS_FSCACHE */ #endif /* _CIFS_FSCACHE_H */ diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c index 7d8b3ceb2af3..b6a9ded9fbb2 100644 --- a/fs/cifs/inode.c +++ b/fs/cifs/inode.c @@ -26,6 +26,19 @@ #include "fs_context.h" #include "cifs_ioctl.h" +/* + * Set parameters for the netfs library + */ +static void cifs_set_netfs_context(struct inode *inode) +{ + struct netfs_i_context *ctx = netfs_i_context(inode); + struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb); + + netfs_i_context_init(inode, &cifs_req_ops); + ctx->rsize = cifs_sb->ctx->rsize; + ctx->wsize = cifs_sb->ctx->wsize; +} + static void cifs_set_ops(struct inode *inode) { struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb); @@ -209,8 +222,10 @@ cifs_fattr_to_inode(struct inode *inode, struct cifs_fattr *fattr) if (fattr->cf_flags & CIFS_FATTR_DFS_REFERRAL) inode->i_flags |= S_AUTOMOUNT; - if (inode->i_state & I_NEW) + if (inode->i_state & I_NEW) { + cifs_set_netfs_context(inode); cifs_set_ops(inode); + } return 0; } diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c index ebbea7526ee2..0d76cffb4e75 100644 --- a/fs/cifs/smb2pdu.c +++ b/fs/cifs/smb2pdu.c @@ -23,6 +23,7 @@ #include #include #include +#include #include "cifsglob.h" #include "cifsacl.h" #include "cifsproto.h" @@ -4185,7 +4186,19 @@ smb2_readv_callback(struct mid_q_entry *mid) tcon->tid, tcon->ses->Suid, rdata->offset, rdata->got_bytes); - queue_work(cifsiod_wq, &rdata->work); + if (rdata->result == -ENODATA) { + /* We may have got an EOF error because fallocate + * failed to enlarge the file. + */ + if (rdata->subreq->start < rdata->subreq->rreq->i_size) + rdata->result = 0; + } + if (rdata->result == 0 || rdata->result == -EAGAIN) + iov_iter_advance(&rdata->subreq->iter, rdata->got_bytes); + netfs_subreq_terminated(rdata->subreq, + (rdata->result == 0 || rdata->result == -EAGAIN) ? + rdata->got_bytes : rdata->result, false); + kref_put(&rdata->refcount, cifs_readdata_release); DeleteMidQEntry(mid); add_credits(server, &credits, 0); } diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c index df13c9b22ca8..1fa242140dc4 100644 --- a/fs/netfs/read_helper.c +++ b/fs/netfs/read_helper.c @@ -553,8 +553,13 @@ static void netfs_rreq_assess_dio(struct netfs_read_request *rreq) list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { if (subreq->error || subreq->transferred == 0) break; - for (i = 0; i < subreq->bv_count; i++) + for (i = 0; i < subreq->bv_count; i++) { flush_dcache_page(subreq->bv[i].bv_page); + // TODO: cifs marks pages in the destination buffer + // dirty under some circumstances after a read. Do we + // need to do that too? + set_page_dirty(subreq->bv[i].bv_page); + } transferred += subreq->transferred; if (subreq->transferred < subreq->len) break;