From patchwork Thu Apr 29 03:48:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shivaprasad G Bhat X-Patchwork-Id: 1471503 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=kvm-ppc-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=kzaFabpj; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4FW1jC17Zlz9sjD for ; Thu, 29 Apr 2021 13:49:51 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237671AbhD2Dud (ORCPT ); Wed, 28 Apr 2021 23:50:33 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:36752 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237508AbhD2Du1 (ORCPT ); Wed, 28 Apr 2021 23:50:27 -0400 Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 13T3WcCb075405; Wed, 28 Apr 2021 23:48:48 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=pp1; bh=4KIMiO2sjOWXrhzviSocX+mOq7VTbf8VHLKtOnxFNLs=; b=kzaFabpj+8jSNwOElUrBYLB+1i7OqDVTIJWT2qTGKFNAMLtLxiOdzQykUaYeeQu4d9Rs G+DaEnlT31w9lTjr9C5aWojJOTP/mWoJP8rNVcTYCd8KarcuurvmGP9n2ULZAadPN3Q5 k9jExy0Yge7IMFfblJrXXkK4rDbROAHS1umEp13VTHEZz7zCPnbHdHzrNVoE7P/Q26zQ frFRDKafUYZU52xmDfmMEwA0ARP1rh1LJYOiy4BS0qbTTbRLkt7B5vAPcv5UX+dwJXdH Q7ow9MshEJq/aCAA3LKkj8wiaOMuqGKfJFgU073zSfo8zo5IldLjYQ4TlbNLrFbemzxZ 0w== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 387mey8s5c-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 28 Apr 2021 23:48:48 -0400 Received: from m0098404.ppops.net (m0098404.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 13T3jjR2160568; Wed, 28 Apr 2021 23:48:47 -0400 Received: from ppma04ams.nl.ibm.com (63.31.33a9.ip4.static.sl-reverse.com [169.51.49.99]) by mx0a-001b2d01.pphosted.com with ESMTP id 387mey8s4r-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 28 Apr 2021 23:48:47 -0400 Received: from pps.filterd (ppma04ams.nl.ibm.com [127.0.0.1]) by ppma04ams.nl.ibm.com (8.16.0.43/8.16.0.43) with SMTP id 13T3mPl0011851; Thu, 29 Apr 2021 03:48:45 GMT Received: from b06cxnps4074.portsmouth.uk.ibm.com (d06relay11.portsmouth.uk.ibm.com [9.149.109.196]) by ppma04ams.nl.ibm.com with ESMTP id 384ay8j61k-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 29 Apr 2021 03:48:45 +0000 Received: from d06av21.portsmouth.uk.ibm.com (d06av21.portsmouth.uk.ibm.com [9.149.105.232]) by b06cxnps4074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 13T3mfZW44499218 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 29 Apr 2021 03:48:41 GMT Received: from d06av21.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B882152065; Thu, 29 Apr 2021 03:48:41 +0000 (GMT) Received: from [172.17.0.2] (unknown [9.40.192.207]) by d06av21.portsmouth.uk.ibm.com (Postfix) with ESMTP id A29475204E; Thu, 29 Apr 2021 03:48:38 +0000 (GMT) Subject: [PATCH v4 1/3] spapr: nvdimm: Forward declare and move the definitions From: Shivaprasad G Bhat To: david@gibson.dropbear.id.au, groug@kaod.org, qemu-ppc@nongnu.org, ehabkost@redhat.com, marcel.apfelbaum@gmail.com, mst@redhat.com, imammedo@redhat.com, xiaoguangrong.eric@gmail.com, peter.maydell@linaro.org, eblake@redhat.com, qemu-arm@nongnu.org, richard.henderson@linaro.org, pbonzini@redhat.com, marcel.apfelbaum@gmail.com, stefanha@redhat.com, haozhong.zhang@intel.com, shameerali.kolothum.thodi@huawei.com, kwangwoo.lee@sk.com, armbru@redhat.com Cc: qemu-devel@nongnu.org, aneesh.kumar@linux.ibm.com, linux-nvdimm@lists.01.org, kvm-ppc@vger.kernel.org, shivaprasadbhat@gmail.com, bharata@linux.vnet.ibm.com Date: Wed, 28 Apr 2021 23:48:37 -0400 Message-ID: <161966811094.652.571342595267518155.stgit@17be908f7c1c> In-Reply-To: <161966810162.652.13723419108625443430.stgit@17be908f7c1c> References: <161966810162.652.13723419108625443430.stgit@17be908f7c1c> User-Agent: StGit/0.21 MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: RXI5XW7jpBirXixvyzqZy_ElNMGivRUW X-Proofpoint-ORIG-GUID: Co3mIJMA5Dxa1gj_EHFbJHIS-Wz247RZ X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391,18.0.761 definitions=2021-04-29_02:2021-04-28,2021-04-29 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 impostorscore=0 lowpriorityscore=0 adultscore=0 bulkscore=0 spamscore=0 priorityscore=1501 phishscore=0 mlxlogscore=999 malwarescore=0 suspectscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2104060000 definitions=main-2104290025 Precedence: bulk List-ID: X-Mailing-List: kvm-ppc@vger.kernel.org The subsequent patches add definitions which tend to get the compilation to cyclic dependency. So, prepare with forward declarations, move the defitions and clean up. Signed-off-by: Shivaprasad G Bhat --- hw/ppc/spapr_nvdimm.c | 12 ++++++++++++ include/hw/ppc/spapr_nvdimm.h | 14 ++------------ 2 files changed, 14 insertions(+), 12 deletions(-) diff --git a/hw/ppc/spapr_nvdimm.c b/hw/ppc/spapr_nvdimm.c index b46c36917c..8cf3fb2ffb 100644 --- a/hw/ppc/spapr_nvdimm.c +++ b/hw/ppc/spapr_nvdimm.c @@ -31,6 +31,18 @@ #include "qemu/range.h" #include "hw/ppc/spapr_numa.h" +/* + * The nvdimm size should be aligned to SCM block size. + * The SCM block size should be aligned to SPAPR_MEMORY_BLOCK_SIZE + * inorder to have SCM regions not to overlap with dimm memory regions. + * The SCM devices can have variable block sizes. For now, fixing the + * block size to the minimum value. + */ +#define SPAPR_MINIMUM_SCM_BLOCK_SIZE SPAPR_MEMORY_BLOCK_SIZE + +/* Have an explicit check for alignment */ +QEMU_BUILD_BUG_ON(SPAPR_MINIMUM_SCM_BLOCK_SIZE % SPAPR_MEMORY_BLOCK_SIZE); + bool spapr_nvdimm_validate(HotplugHandler *hotplug_dev, NVDIMMDevice *nvdimm, uint64_t size, Error **errp) { diff --git a/include/hw/ppc/spapr_nvdimm.h b/include/hw/ppc/spapr_nvdimm.h index 73be250e2a..764f999f54 100644 --- a/include/hw/ppc/spapr_nvdimm.h +++ b/include/hw/ppc/spapr_nvdimm.h @@ -11,19 +11,9 @@ #define HW_SPAPR_NVDIMM_H #include "hw/mem/nvdimm.h" -#include "hw/ppc/spapr.h" -/* - * The nvdimm size should be aligned to SCM block size. - * The SCM block size should be aligned to SPAPR_MEMORY_BLOCK_SIZE - * inorder to have SCM regions not to overlap with dimm memory regions. - * The SCM devices can have variable block sizes. For now, fixing the - * block size to the minimum value. - */ -#define SPAPR_MINIMUM_SCM_BLOCK_SIZE SPAPR_MEMORY_BLOCK_SIZE - -/* Have an explicit check for alignment */ -QEMU_BUILD_BUG_ON(SPAPR_MINIMUM_SCM_BLOCK_SIZE % SPAPR_MEMORY_BLOCK_SIZE); +typedef struct SpaprDrc SpaprDrc; +typedef struct SpaprMachineState SpaprMachineState; int spapr_pmem_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr, void *fdt, int *fdt_start_offset, Error **errp); From patchwork Thu Apr 29 03:48:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shivaprasad G Bhat X-Patchwork-Id: 1471504 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=kvm-ppc-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=YYVaharg; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4FW1jF6fLwz9sj5 for ; Thu, 29 Apr 2021 13:49:53 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237721AbhD2Duf (ORCPT ); Wed, 28 Apr 2021 23:50:35 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:65118 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236988AbhD2Du1 (ORCPT ); Wed, 28 Apr 2021 23:50:27 -0400 Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 13T3Xpr5145314; Wed, 28 Apr 2021 23:49:01 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=pp1; bh=bMcwz7oQFh7Nvlb5VDeYwVWlJbYAq/ZO5j22Bl7S9GY=; b=YYVahargcyQlXcieFjAuPU0ypVWiJaCRYlfi4JCpunDPlM/9911aWTlpUlTwciGl+yDL WSyNAIAvNdz34JgG9ffJFMEgIS8ZIhTqjreadq9yfh4pnNPMp8ezXKclEsBoCmcjWIq8 OoGLpcyc9sBa2WDHEcUYnNCqi2cg5sHOWV/B1qFnS8GC7mXoltgAy24BRwYWznYm9L/8 C+tHzeUO/+7BfjTuEvQcfY7mykqb6spbeCLVTGZ4UB4FWm1dpihA0VPL8hUnpIgElfO1 Rg4yP64gu3NnC7qDc5jJ5SGngxRgYS8UxcIElHDzLXRqt4rz90HVoD9vrGoDfptXHwNK Iw== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 387m3mh6dd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 28 Apr 2021 23:49:01 -0400 Received: from m0098410.ppops.net (m0098410.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 13T3XrR9145562; Wed, 28 Apr 2021 23:49:00 -0400 Received: from ppma01fra.de.ibm.com (46.49.7a9f.ip4.static.sl-reverse.com [159.122.73.70]) by mx0a-001b2d01.pphosted.com with ESMTP id 387m3mh6cn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 28 Apr 2021 23:49:00 -0400 Received: from pps.filterd (ppma01fra.de.ibm.com [127.0.0.1]) by ppma01fra.de.ibm.com (8.16.0.43/8.16.0.43) with SMTP id 13T3mweD008621; Thu, 29 Apr 2021 03:48:58 GMT Received: from b06cxnps3075.portsmouth.uk.ibm.com (d06relay10.portsmouth.uk.ibm.com [9.149.109.195]) by ppma01fra.de.ibm.com with ESMTP id 384ay817rt-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 29 Apr 2021 03:48:58 +0000 Received: from d06av21.portsmouth.uk.ibm.com (d06av21.portsmouth.uk.ibm.com [9.149.105.232]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 13T3msUE16974216 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 29 Apr 2021 03:48:54 GMT Received: from d06av21.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A7AE25204F; Thu, 29 Apr 2021 03:48:54 +0000 (GMT) Received: from [172.17.0.2] (unknown [9.40.192.207]) by d06av21.portsmouth.uk.ibm.com (Postfix) with ESMTP id 79F2752051; Thu, 29 Apr 2021 03:48:51 +0000 (GMT) Subject: [PATCH v4 2/3] spapr: nvdimm: Implement H_SCM_FLUSH hcall From: Shivaprasad G Bhat To: david@gibson.dropbear.id.au, groug@kaod.org, qemu-ppc@nongnu.org, ehabkost@redhat.com, marcel.apfelbaum@gmail.com, mst@redhat.com, imammedo@redhat.com, xiaoguangrong.eric@gmail.com, peter.maydell@linaro.org, eblake@redhat.com, qemu-arm@nongnu.org, richard.henderson@linaro.org, pbonzini@redhat.com, marcel.apfelbaum@gmail.com, stefanha@redhat.com, haozhong.zhang@intel.com, shameerali.kolothum.thodi@huawei.com, kwangwoo.lee@sk.com, armbru@redhat.com Cc: qemu-devel@nongnu.org, aneesh.kumar@linux.ibm.com, linux-nvdimm@lists.01.org, kvm-ppc@vger.kernel.org, shivaprasadbhat@gmail.com, bharata@linux.vnet.ibm.com Date: Wed, 28 Apr 2021 23:48:50 -0400 Message-ID: <161966812696.652.15987779677314301634.stgit@17be908f7c1c> In-Reply-To: <161966810162.652.13723419108625443430.stgit@17be908f7c1c> References: <161966810162.652.13723419108625443430.stgit@17be908f7c1c> User-Agent: StGit/0.21 MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: 4R9rCl_gv6iL5Vah2dS3cQB7aXSov4jk X-Proofpoint-GUID: Q36OuPzE9ZyT-EjuN6glRsW_w9TQYPTd X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391,18.0.761 definitions=2021-04-29_02:2021-04-28,2021-04-29 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 phishscore=0 bulkscore=0 adultscore=0 mlxlogscore=999 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 suspectscore=0 spamscore=0 malwarescore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2104060000 definitions=main-2104290025 Precedence: bulk List-ID: X-Mailing-List: kvm-ppc@vger.kernel.org The patch adds support for the SCM flush hcall for the nvdimm devices. To be available for exploitation by guest through the next patch. The hcall expects the semantics such that the flush to return with H_BUSY when the operation is expected to take longer time along with a continue_token. The hcall to be called again providing the continue_token to get the status. So, all fresh requsts are put into a 'pending' list and flush worker is submitted to the thread pool. The thread pool completion callbacks move the requests to 'completed' list, which are cleaned up after reporting to guest in subsequent hcalls to get the status. The semantics makes it necessary to preserve the continue_tokens and their return status across migrations. So, the completed flush states are forwarded to the destination and the pending ones are restarted at the destination in post_load. The necessary nvdimm flush specific vmstate structures are added to the spapr machine vmstate. Signed-off-by: Shivaprasad G Bhat --- hw/ppc/spapr.c | 6 + hw/ppc/spapr_nvdimm.c | 234 +++++++++++++++++++++++++++++++++++++++++ include/hw/ppc/spapr.h | 10 ++ include/hw/ppc/spapr_nvdimm.h | 13 ++ 4 files changed, 262 insertions(+), 1 deletion(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index e4be00b732..80957f9188 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1607,6 +1607,8 @@ static void spapr_machine_reset(MachineState *machine) spapr->ov5_cas = spapr_ovec_clone(spapr->ov5); } + spapr_nvdimm_finish_flushes(spapr); + /* DRC reset may cause a device to be unplugged. This will cause troubles * if this device is used by another device (eg, a running vhost backend * will crash QEMU if the DIMM holding the vring goes away). To avoid such @@ -2003,6 +2005,7 @@ static const VMStateDescription vmstate_spapr = { &vmstate_spapr_cap_ccf_assist, &vmstate_spapr_cap_fwnmi, &vmstate_spapr_fwnmi, + &vmstate_spapr_nvdimm_states, NULL } }; @@ -2997,6 +3000,9 @@ static void spapr_machine_init(MachineState *machine) } qemu_cond_init(&spapr->fwnmi_machine_check_interlock_cond); + + QLIST_INIT(&spapr->pending_flush_states); + QLIST_INIT(&spapr->completed_flush_states); } #define DEFAULT_KVM_TYPE "auto" diff --git a/hw/ppc/spapr_nvdimm.c b/hw/ppc/spapr_nvdimm.c index 8cf3fb2ffb..77eb7e1293 100644 --- a/hw/ppc/spapr_nvdimm.c +++ b/hw/ppc/spapr_nvdimm.c @@ -22,6 +22,7 @@ * THE SOFTWARE. */ #include "qemu/osdep.h" +#include "qemu/cutils.h" #include "qapi/error.h" #include "hw/ppc/spapr_drc.h" #include "hw/ppc/spapr_nvdimm.h" @@ -30,6 +31,7 @@ #include "hw/ppc/fdt.h" #include "qemu/range.h" #include "hw/ppc/spapr_numa.h" +#include "block/thread-pool.h" /* * The nvdimm size should be aligned to SCM block size. @@ -371,6 +373,237 @@ static target_ulong h_scm_bind_mem(PowerPCCPU *cpu, SpaprMachineState *spapr, return H_SUCCESS; } +static uint64_t flush_token; + +static int flush_worker_cb(void *opaque) +{ + int ret = H_SUCCESS; + SpaprNVDIMMDeviceFlushState *state = opaque; + + /* flush raw backing image */ + if (qemu_fdatasync(state->backend_fd) < 0) { + error_report("papr_scm: Could not sync nvdimm to backend file: %s", + strerror(errno)); + ret = H_HARDWARE; + } + + return ret; +} + +static void spapr_nvdimm_flush_completion_cb(void *opaque, int hcall_ret) +{ + SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine()); + SpaprNVDIMMDeviceFlushState *state = opaque; + + state->hcall_ret = hcall_ret; + QLIST_REMOVE(state, node); + QLIST_INSERT_HEAD(&spapr->completed_flush_states, state, node); +} + +static const VMStateDescription vmstate_spapr_nvdimm_flush_state = { + .name = "spapr_nvdimm_flush_state", + .version_id = 1, + .minimum_version_id = 1, + .fields = (VMStateField[]) { + VMSTATE_UINT64(continue_token, SpaprNVDIMMDeviceFlushState), + VMSTATE_INT64(hcall_ret, SpaprNVDIMMDeviceFlushState), + VMSTATE_UINT32(drcidx, SpaprNVDIMMDeviceFlushState), + VMSTATE_END_OF_LIST() + }, +}; + +static bool spapr_nvdimm_states_needed(void *opaque) +{ + SpaprMachineState *spapr = (SpaprMachineState *)opaque; + + return (!QLIST_EMPTY(&spapr->pending_flush_states) || + !QLIST_EMPTY(&spapr->completed_flush_states)); +} + +static int spapr_nvdimm_post_load(void *opaque, int version_id) +{ + SpaprMachineState *spapr = (SpaprMachineState *)opaque; + SpaprNVDIMMDeviceFlushState *state, *next; + PCDIMMDevice *dimm; + HostMemoryBackend *backend = NULL; + ThreadPool *pool = aio_get_thread_pool(qemu_get_aio_context()); + SpaprDrc *drc; + + QLIST_FOREACH_SAFE(state, &spapr->completed_flush_states, node, next) { + if (flush_token < state->continue_token) { + flush_token = state->continue_token; + } + } + + QLIST_FOREACH_SAFE(state, &spapr->pending_flush_states, node, next) { + if (flush_token < state->continue_token) { + flush_token = state->continue_token; + } + + drc = spapr_drc_by_index(state->drcidx); + dimm = PC_DIMM(drc->dev); + backend = MEMORY_BACKEND(dimm->hostmem); + state->backend_fd = memory_region_get_fd(&backend->mr); + + thread_pool_submit_aio(pool, flush_worker_cb, state, + spapr_nvdimm_flush_completion_cb, state); + } + + return 0; +} + +const VMStateDescription vmstate_spapr_nvdimm_states = { + .name = "spapr_nvdimm_states", + .version_id = 1, + .minimum_version_id = 1, + .needed = spapr_nvdimm_states_needed, + .post_load = spapr_nvdimm_post_load, + .fields = (VMStateField[]) { + VMSTATE_QLIST_V(completed_flush_states, SpaprMachineState, 1, + vmstate_spapr_nvdimm_flush_state, + SpaprNVDIMMDeviceFlushState, node), + VMSTATE_QLIST_V(pending_flush_states, SpaprMachineState, 1, + vmstate_spapr_nvdimm_flush_state, + SpaprNVDIMMDeviceFlushState, node), + VMSTATE_END_OF_LIST() + }, +}; + +/* + * Assign a token and reserve it for the new flush state. + */ +static SpaprNVDIMMDeviceFlushState *spapr_nvdimm_init_new_flush_state( + SpaprMachineState *spapr) +{ + SpaprNVDIMMDeviceFlushState *state; + + state = g_malloc0(sizeof(*state)); + + flush_token++; + /* Token zero is presumed as no job pending. Handle the overflow to zero */ + if (flush_token == 0) { + flush_token++; + } + state->continue_token = flush_token; + + QLIST_INSERT_HEAD(&spapr->pending_flush_states, state, node); + + return state; +} + +/* + * spapr_nvdimm_finish_flushes + * Waits for all pending flush requests to complete + * their execution and free the states + */ +void spapr_nvdimm_finish_flushes(SpaprMachineState *spapr) +{ + SpaprNVDIMMDeviceFlushState *state, *next; + + /* + * Called on reset path, the main loop thread which calls + * the pending BHs has gotten out running in the reset path, + * finally reaching here. Other code path being guest + * h_client_architecture_support, thats early boot up. + */ + while (!QLIST_EMPTY(&spapr->pending_flush_states)) { + aio_poll(qemu_get_aio_context(), true); + } + + QLIST_FOREACH_SAFE(state, &spapr->completed_flush_states, node, next) { + QLIST_REMOVE(state, node); + g_free(state); + } +} + +/* + * spapr_nvdimm_get_flush_status + * Fetches the status of the hcall worker and returns + * H_LONG_BUSY_XYZ if the worker is still running. + */ +static int spapr_nvdimm_get_flush_status(SpaprMachineState *spapr, + uint64_t token) +{ + int ret = H_LONG_BUSY_ORDER_10_MSEC; + SpaprNVDIMMDeviceFlushState *state, *node; + + QLIST_FOREACH_SAFE(state, &spapr->pending_flush_states, node, node) { + if (state->continue_token == token) { + goto exit; + } + } + ret = H_P2; /* If not found in complete list too, invalid token */ + QLIST_FOREACH_SAFE(state, &spapr->completed_flush_states, node, node) { + if (state->continue_token == token) { + ret = state->hcall_ret; + QLIST_REMOVE(state, node); + g_free(state); + break; + } + } +exit: + return ret; +} + +/* + * H_SCM_FLUSH + * Input: drc_index, continue-token + * Out: continue-token + * Return Value: H_SUCCESS, H_Parameter, H_P2, H_LONG_BUSY + * + * Given a DRC Index Flush the data to backend NVDIMM device. + * The hcall returns H_LONG_BUSY_XX when the flush takes longer time and + * the hcall needs to be issued multiple times in order to be completely + * serviced. The continue-token from the output to be passed in the + * argument list of subsequent hcalls until the hcall is completely serviced + * at which point H_SUCCESS or other error is returned. + */ +static target_ulong h_scm_flush(PowerPCCPU *cpu, SpaprMachineState *spapr, + target_ulong opcode, target_ulong *args) +{ + int ret; + uint32_t drc_index = args[0]; + uint64_t continue_token = args[1]; + SpaprDrc *drc = spapr_drc_by_index(drc_index); + PCDIMMDevice *dimm; + HostMemoryBackend *backend = NULL; + SpaprNVDIMMDeviceFlushState *state; + ThreadPool *pool = aio_get_thread_pool(qemu_get_aio_context()); + + if (!drc || !drc->dev || + spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) { + return H_PARAMETER; + } + + if (continue_token != 0) { + goto get_status; + } + + dimm = PC_DIMM(drc->dev); + backend = MEMORY_BACKEND(dimm->hostmem); + + state = spapr_nvdimm_init_new_flush_state(spapr); + if (!state) { + return H_HARDWARE; + } + + state->drcidx = drc_index; + state->backend_fd = memory_region_get_fd(&backend->mr); + + thread_pool_submit_aio(pool, flush_worker_cb, state, + spapr_nvdimm_flush_completion_cb, state); + + continue_token = state->continue_token; + +get_status: + ret = spapr_nvdimm_get_flush_status(spapr, continue_token); + if (H_IS_LONG_BUSY(ret)) { + args[0] = continue_token; + } + + return ret; +} + static target_ulong h_scm_unbind_mem(PowerPCCPU *cpu, SpaprMachineState *spapr, target_ulong opcode, target_ulong *args) { @@ -487,6 +720,7 @@ static void spapr_scm_register_types(void) spapr_register_hypercall(H_SCM_BIND_MEM, h_scm_bind_mem); spapr_register_hypercall(H_SCM_UNBIND_MEM, h_scm_unbind_mem); spapr_register_hypercall(H_SCM_UNBIND_ALL, h_scm_unbind_all); + spapr_register_hypercall(H_SCM_FLUSH, h_scm_flush); } type_init(spapr_scm_register_types) diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h index bf7cab7a2c..478c031396 100644 --- a/include/hw/ppc/spapr.h +++ b/include/hw/ppc/spapr.h @@ -12,10 +12,12 @@ #include "hw/ppc/spapr_xive.h" /* For SpaprXive */ #include "hw/ppc/xics.h" /* For ICSState */ #include "hw/ppc/spapr_tpm_proxy.h" +#include "hw/ppc/spapr_nvdimm.h" struct SpaprVioBus; struct SpaprPhbState; struct SpaprNvram; +struct SpaprNVDIMMDeviceFlushState; typedef struct SpaprEventLogEntry SpaprEventLogEntry; typedef struct SpaprEventSource SpaprEventSource; @@ -245,6 +247,11 @@ struct SpaprMachineState { uint32_t numa_assoc_array[MAX_NODES + NVGPU_MAX_NUM][NUMA_ASSOC_SIZE]; Error *fwnmi_migration_blocker; + + /* nvdimm flush states */ + QLIST_HEAD(, SpaprNVDIMMDeviceFlushState) pending_flush_states; + QLIST_HEAD(, SpaprNVDIMMDeviceFlushState) completed_flush_states; + }; #define H_SUCCESS 0 @@ -538,8 +545,9 @@ struct SpaprMachineState { #define H_SCM_BIND_MEM 0x3EC #define H_SCM_UNBIND_MEM 0x3F0 #define H_SCM_UNBIND_ALL 0x3FC +#define H_SCM_FLUSH 0x44C -#define MAX_HCALL_OPCODE H_SCM_UNBIND_ALL +#define MAX_HCALL_OPCODE H_SCM_FLUSH /* The hcalls above are standardized in PAPR and implemented by pHyp * as well. diff --git a/include/hw/ppc/spapr_nvdimm.h b/include/hw/ppc/spapr_nvdimm.h index 764f999f54..24d8e37b33 100644 --- a/include/hw/ppc/spapr_nvdimm.h +++ b/include/hw/ppc/spapr_nvdimm.h @@ -11,6 +11,7 @@ #define HW_SPAPR_NVDIMM_H #include "hw/mem/nvdimm.h" +#include "migration/vmstate.h" typedef struct SpaprDrc SpaprDrc; typedef struct SpaprMachineState SpaprMachineState; @@ -21,5 +22,17 @@ void spapr_dt_persistent_memory(SpaprMachineState *spapr, void *fdt); bool spapr_nvdimm_validate(HotplugHandler *hotplug_dev, NVDIMMDevice *nvdimm, uint64_t size, Error **errp); void spapr_add_nvdimm(DeviceState *dev, uint64_t slot); +void spapr_nvdimm_finish_flushes(SpaprMachineState *spapr); + +typedef struct SpaprNVDIMMDeviceFlushState { + uint64_t continue_token; + int64_t hcall_ret; + int backend_fd; + uint32_t drcidx; + + QLIST_ENTRY(SpaprNVDIMMDeviceFlushState) node; +} SpaprNVDIMMDeviceFlushState; + +extern const VMStateDescription vmstate_spapr_nvdimm_states; #endif From patchwork Thu Apr 29 03:49:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shivaprasad G Bhat X-Patchwork-Id: 1471505 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=kvm-ppc-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=K1BeiXYy; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4FW1jG47qSz9t0G for ; Thu, 29 Apr 2021 13:49:54 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236988AbhD2Duj (ORCPT ); Wed, 28 Apr 2021 23:50:39 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:26802 "EHLO mx0b-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237669AbhD2Dud (ORCPT ); Wed, 28 Apr 2021 23:50:33 -0400 Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 13T3Y2P7004598; Wed, 28 Apr 2021 23:49:15 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=pp1; bh=1+Hl9csVO7jPeFzAFQ1Zx9afiJ6i34856r88Bn4qweg=; b=K1BeiXYyeTSiguRC9bcXf4lUNVC+DT6lvWTN9sYjfdcZha768Kqy8BXonwcoCwSQb4LB I2bzh17n6KIF71BWjbfjTvJlrfUHG1OeXuFlj2RojjMccOc0iC7YrZvMYMmpns2BRdMB TcVTt2OYA9T/g8igPz7rWErWJfZwkO8p30CjCJgD6pmwKzYllJvcolTovDQkGW1j/py2 VZYL7842eLCa2/5UMi/y0FULQYiQD1turFKcuj38QUdYxaEuJUjMqeWM8/syKOqWDMVb uam1A2edGjSxIupe/vfYU7Hy3nTP4vgY8ODNHW/vR9fRwtwuXzjY6w840+/PNgHZzXIS 4A== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 387n00r716-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 28 Apr 2021 23:49:15 -0400 Received: from m0098417.ppops.net (m0098417.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 13T3mRtN049423; Wed, 28 Apr 2021 23:49:15 -0400 Received: from ppma05fra.de.ibm.com (6c.4a.5195.ip4.static.sl-reverse.com [149.81.74.108]) by mx0a-001b2d01.pphosted.com with ESMTP id 387n00r70f-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 28 Apr 2021 23:49:14 -0400 Received: from pps.filterd (ppma05fra.de.ibm.com [127.0.0.1]) by ppma05fra.de.ibm.com (8.16.0.43/8.16.0.43) with SMTP id 13T3nBNj016772; Thu, 29 Apr 2021 03:49:11 GMT Received: from b06cxnps3075.portsmouth.uk.ibm.com (d06relay10.portsmouth.uk.ibm.com [9.149.109.195]) by ppma05fra.de.ibm.com with ESMTP id 384gjxs660-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 29 Apr 2021 03:49:11 +0000 Received: from d06av22.portsmouth.uk.ibm.com (d06av22.portsmouth.uk.ibm.com [9.149.105.58]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 13T3n8v923200238 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 29 Apr 2021 03:49:08 GMT Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 03CA24C046; Thu, 29 Apr 2021 03:49:08 +0000 (GMT) Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id BB4A94C040; Thu, 29 Apr 2021 03:49:04 +0000 (GMT) Received: from [172.17.0.2] (unknown [9.40.192.207]) by d06av22.portsmouth.uk.ibm.com (Postfix) with ESMTP; Thu, 29 Apr 2021 03:49:04 +0000 (GMT) Subject: [PATCH v4 3/3] nvdimm: Enable sync-dax device property for nvdimm From: Shivaprasad G Bhat To: david@gibson.dropbear.id.au, groug@kaod.org, qemu-ppc@nongnu.org, ehabkost@redhat.com, marcel.apfelbaum@gmail.com, mst@redhat.com, imammedo@redhat.com, xiaoguangrong.eric@gmail.com, peter.maydell@linaro.org, eblake@redhat.com, qemu-arm@nongnu.org, richard.henderson@linaro.org, pbonzini@redhat.com, marcel.apfelbaum@gmail.com, stefanha@redhat.com, haozhong.zhang@intel.com, shameerali.kolothum.thodi@huawei.com, kwangwoo.lee@sk.com, armbru@redhat.com Cc: qemu-devel@nongnu.org, aneesh.kumar@linux.ibm.com, linux-nvdimm@lists.01.org, kvm-ppc@vger.kernel.org, shivaprasadbhat@gmail.com, bharata@linux.vnet.ibm.com Date: Wed, 28 Apr 2021 23:49:03 -0400 Message-ID: <161966813983.652.5749368609701495826.stgit@17be908f7c1c> In-Reply-To: <161966810162.652.13723419108625443430.stgit@17be908f7c1c> References: <161966810162.652.13723419108625443430.stgit@17be908f7c1c> User-Agent: StGit/0.21 MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: tdrzgzZ0tvoI-RrZVFpPsIiBw154En2S X-Proofpoint-GUID: xNI7P45zBEqmB5bjEKEi8ho9x_3JUEfk X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391,18.0.761 definitions=2021-04-29_02:2021-04-28,2021-04-29 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 bulkscore=0 spamscore=0 suspectscore=0 priorityscore=1501 impostorscore=0 adultscore=0 phishscore=0 clxscore=1015 mlxlogscore=999 mlxscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2104060000 definitions=main-2104290025 Precedence: bulk List-ID: X-Mailing-List: kvm-ppc@vger.kernel.org The patch adds the 'sync-dax' property to the nvdimm device. When the sync-dax is 'direct' indicates the backend is synchronous DAX capable and no explicit flush requests are required. When the mode is set to 'writeback' it indicates the backend is not synhronous DAX capable and explicit flushes to Hypervisor are required. On PPC where the flush requests from guest can be honoured by the qemu, the 'writeback' mode is supported and set as the default. The device tree property "hcall-flush-required" is added to the nvdimm node which makes the guest to issue H_SCM_FLUSH hcalls to request for flushes explicitly. This would be the default behaviour without sync-dax property set for the nvdimm device. For old pSeries machine, the default is 'unsafe'. For non-PPC platforms, the mode is set to 'unsafe' as the default. Signed-off-by: Shivaprasad G Bhat --- hw/arm/virt.c | 28 +++++++++++++++++++++++-- hw/i386/pc.c | 28 +++++++++++++++++++++++-- hw/mem/nvdimm.c | 52 +++++++++++++++++++++++++++++++++++++++++++---- hw/ppc/spapr.c | 10 +++++++++ hw/ppc/spapr_nvdimm.c | 39 +++++++++++++++++++++++++++++++++++ include/hw/mem/nvdimm.h | 11 ++++++++++ include/hw/ppc/spapr.h | 1 + qapi/common.json | 20 ++++++++++++++++++ 8 files changed, 179 insertions(+), 10 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 9f01d9041b..f32e3e4010 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -2358,6 +2358,27 @@ static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms) return ms->possible_cpus; } +static bool virt_nvdimm_validate(const MachineState *ms, NVDIMMDevice *nvdimm, + Error **errp) +{ + NvdimmSyncModes sync; + + if (!ms->nvdimms_state->is_enabled) { + error_setg(errp, "nvdimm is not enabled: add 'nvdimm=on' to '-M'"); + return false; + } + + sync = object_property_get_enum(OBJECT(nvdimm), NVDIMM_SYNC_DAX_PROP, + "NvdimmSyncModes", &error_abort); + if (sync == NVDIMM_SYNC_MODES_WRITEBACK) { + error_setg(errp, "NVDIMM device " NVDIMM_SYNC_DAX_PROP + "=%s mode unsupported", NvdimmSyncModes_str(sync)); + return false; + } + + return true; +} + static void virt_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev, Error **errp) { @@ -2376,9 +2397,10 @@ static void virt_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev, return; } - if (is_nvdimm && !ms->nvdimms_state->is_enabled) { - error_setg(errp, "nvdimm is not enabled: add 'nvdimm=on' to '-M'"); - return; + if (is_nvdimm) { + if (!virt_nvdimm_validate(ms, NVDIMM(dev), errp)) { + return; + } } pc_dimm_pre_plug(PC_DIMM(dev), MACHINE(hotplug_dev), NULL, errp); diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 8a84b25a03..2d5151462c 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -1211,6 +1211,27 @@ void pc_i8259_create(ISABus *isa_bus, qemu_irq *i8259_irqs) g_free(i8259); } +static bool pc_nvdimm_validate(const MachineState *ms, NVDIMMDevice *nvdimm, + Error **errp) +{ + NvdimmSyncModes sync; + + if (!ms->nvdimms_state->is_enabled) { + error_setg(errp, "nvdimm is not enabled: add 'nvdimm=on' to '-M'"); + return false; + } + + sync = object_property_get_enum(OBJECT(nvdimm), NVDIMM_SYNC_DAX_PROP, + "NvdimmSyncModes", &error_abort); + if (sync == NVDIMM_SYNC_MODES_WRITEBACK) { + error_setg(errp, "NVDIMM device " NVDIMM_SYNC_DAX_PROP + "=%s mode unsupported", NvdimmSyncModes_str(sync)); + return false; + } + + return true; +} + static void pc_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev, Error **errp) { @@ -1233,9 +1254,10 @@ static void pc_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev, return; } - if (is_nvdimm && !ms->nvdimms_state->is_enabled) { - error_setg(errp, "nvdimm is not enabled: missing 'nvdimm' in '-M'"); - return; + if (is_nvdimm) { + if (!pc_nvdimm_validate(ms, NVDIMM(dev), errp)) { + return; + } } hotplug_handler_pre_plug(x86ms->acpi_dev, dev, &local_err); diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c index 7397b67156..56b4527362 100644 --- a/hw/mem/nvdimm.c +++ b/hw/mem/nvdimm.c @@ -96,6 +96,19 @@ static void nvdimm_set_uuid(Object *obj, Visitor *v, const char *name, g_free(value); } +static int nvdimm_get_sync_mode(Object *obj, Error **errp G_GNUC_UNUSED) +{ + NVDIMMDevice *nvdimm = NVDIMM(obj); + + return nvdimm->sync_dax; +} + +static void nvdimm_set_sync_mode(Object *obj, int mode, Error **errp) +{ + NVDIMMDevice *nvdimm = NVDIMM(obj); + + nvdimm->sync_dax = mode; +} static void nvdimm_init(Object *obj) { @@ -105,6 +118,13 @@ static void nvdimm_init(Object *obj) object_property_add(obj, NVDIMM_UUID_PROP, "QemuUUID", nvdimm_get_uuid, nvdimm_set_uuid, NULL, NULL); + + object_property_add_enum(obj, NVDIMM_SYNC_DAX_PROP, "NvdimmSyncModes", + &NvdimmSyncModes_lookup, nvdimm_get_sync_mode, + nvdimm_set_sync_mode); + object_property_set_description(obj, NVDIMM_SYNC_DAX_PROP, + "Set the Synchronus DAX mode"); + } static void nvdimm_finalize(Object *obj) @@ -119,6 +139,9 @@ static void nvdimm_prepare_memory_region(NVDIMMDevice *nvdimm, Error **errp) PCDIMMDevice *dimm = PC_DIMM(nvdimm); uint64_t align, pmem_size, size; MemoryRegion *mr; + HostMemoryBackend *hostmem; + bool is_file_backed; + bool __attribute__((unused)) is_pmem = false; g_assert(!nvdimm->nvdimm_mr); @@ -135,9 +158,8 @@ static void nvdimm_prepare_memory_region(NVDIMMDevice *nvdimm, Error **errp) nvdimm->label_data = memory_region_get_ram_ptr(mr) + pmem_size; pmem_size = QEMU_ALIGN_DOWN(pmem_size, align); + hostmem = dimm->hostmem; if (size <= nvdimm->label_size || !pmem_size) { - HostMemoryBackend *hostmem = dimm->hostmem; - error_setg(errp, "the size of memdev %s (0x%" PRIx64 ") is too " "small to contain nvdimm label (0x%" PRIx64 ") and " "aligned PMEM (0x%" PRIx64 ")", @@ -147,14 +169,36 @@ static void nvdimm_prepare_memory_region(NVDIMMDevice *nvdimm, Error **errp) } if (!nvdimm->unarmed && memory_region_is_rom(mr)) { - HostMemoryBackend *hostmem = dimm->hostmem; - error_setg(errp, "'unarmed' property must be off since memdev %s " "is read-only", object_get_canonical_path_component(OBJECT(hostmem))); return; } + is_file_backed = (memory_region_get_fd(mr) > 0); + if (nvdimm->sync_dax == NVDIMM_SYNC_MODES_WRITEBACK && !is_file_backed) { + error_setg(errp, NVDIMM_SYNC_DAX_PROP"='%s' mode requires the " + "memdev %s to be file backed", + NvdimmSyncModes_str(nvdimm->sync_dax), + object_get_canonical_path_component(OBJECT(hostmem))); + return; + } + +#ifdef CONFIG_LIBPMEM + if (is_file_backed) { + is_pmem = object_property_get_bool(OBJECT(hostmem), "pmem", + &error_abort); + } + + if (nvdimm->sync_dax == NVDIMM_SYNC_MODES_DIRECT && !is_pmem) { + error_setg(errp, "NVDIMM device "NVDIMM_SYNC_DAX_PROP"=%s mode requires" + " the memory backend device to be synchronous DAX capable. " + "Indicate it so with pmem=yes for the corresponding " + "memory-backend-file.", + NvdimmSyncModes_str(nvdimm->sync_dax)); + } +#endif + nvdimm->nvdimm_mr = g_new(MemoryRegion, 1); memory_region_init_alias(nvdimm->nvdimm_mr, OBJECT(dimm), "nvdimm-memory", mr, 0, pmem_size); diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 80957f9188..d0058bc13b 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -4616,6 +4616,11 @@ static void spapr_machine_latest_class_options(MachineClass *mc) static void spapr_machine_6_0_class_options(MachineClass *mc) { /* Defaults for the latest behaviour inherited from the base class */ + static GlobalProperty compat[] = { + { "nvdimm", "sync-dax", "writeback" }, + }; + + compat_props_add(mc->compat_props, compat, G_N_ELEMENTS(compat)); } DEFINE_SPAPR_MACHINE(6_0, "6.0", true); @@ -4625,8 +4630,13 @@ DEFINE_SPAPR_MACHINE(6_0, "6.0", true); */ static void spapr_machine_5_2_class_options(MachineClass *mc) { + static GlobalProperty compat[] = { + { "nvdimm", "sync-dax", "unsafe" }, + }; + spapr_machine_6_0_class_options(mc); compat_props_add(mc->compat_props, hw_compat_5_2, hw_compat_5_2_len); + compat_props_add(mc->compat_props, compat, G_N_ELEMENTS(compat)); } DEFINE_SPAPR_MACHINE(5_2, "5.2", false); diff --git a/hw/ppc/spapr_nvdimm.c b/hw/ppc/spapr_nvdimm.c index 77eb7e1293..615439391c 100644 --- a/hw/ppc/spapr_nvdimm.c +++ b/hw/ppc/spapr_nvdimm.c @@ -50,6 +50,10 @@ bool spapr_nvdimm_validate(HotplugHandler *hotplug_dev, NVDIMMDevice *nvdimm, { const MachineClass *mc = MACHINE_GET_CLASS(hotplug_dev); const MachineState *ms = MACHINE(hotplug_dev); + PCDIMMDevice __attribute__((unused)) *dimm = PC_DIMM(nvdimm); + MemoryRegion __attribute__((unused)) *mr; + bool __attribute__((unused)) is_pmem = false; + NvdimmSyncModes __attribute__((unused)) sync; g_autofree char *uuidstr = NULL; QemuUUID uuid; int ret; @@ -77,6 +81,24 @@ bool spapr_nvdimm_validate(HotplugHandler *hotplug_dev, NVDIMMDevice *nvdimm, return false; } +#ifdef CONFIG_LIBPMEM + sync = object_property_get_enum(OBJECT(nvdimm), NVDIMM_SYNC_DAX_PROP, + "NvdimmSyncModes", &error_abort); + + mr = host_memory_backend_get_memory(dimm->hostmem); + if (memory_region_get_fd(mr) > 0) { /* memor-backend-file */ + HostMemoryBackend *backend = MEMORY_BACKEND(dimm->hostmem); + is_pmem = object_property_get_bool(OBJECT(backend), "pmem", + &error_abort); + } + + if (sync == NVDIMM_SYNC_MODES_WRITEBACK && is_pmem) { + warn_report("The NVDIMM backing device being Synchronous DAX capable, " + NVDIMM_SYNC_DAX_PROP"='%s' is unnecessary as the backend " + "ensures the safety already.", NvdimmSyncModes_str(sync)); + } +#endif + uuidstr = object_property_get_str(OBJECT(nvdimm), NVDIMM_UUID_PROP, &error_abort); ret = qemu_uuid_parse(uuidstr, &uuid); @@ -124,6 +146,9 @@ static int spapr_dt_nvdimm(SpaprMachineState *spapr, void *fdt, uint64_t lsize = nvdimm->label_size; uint64_t size = object_property_get_int(OBJECT(nvdimm), PC_DIMM_SIZE_PROP, NULL); + NvdimmSyncModes sync_dax = object_property_get_enum(OBJECT(nvdimm), + NVDIMM_SYNC_DAX_PROP, + "NvdimmSyncModes", &error_abort); drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM, slot); g_assert(drc); @@ -158,6 +183,11 @@ static int spapr_dt_nvdimm(SpaprMachineState *spapr, void *fdt, "operating-system"))); _FDT(fdt_setprop(fdt, child_offset, "ibm,cache-flush-required", NULL, 0)); + if (sync_dax == NVDIMM_SYNC_MODES_WRITEBACK) { + _FDT(fdt_setprop(fdt, child_offset, "ibm,hcall-flush-required", + NULL, 0)); + } + return child_offset; } @@ -566,6 +596,8 @@ static target_ulong h_scm_flush(PowerPCCPU *cpu, SpaprMachineState *spapr, uint64_t continue_token = args[1]; SpaprDrc *drc = spapr_drc_by_index(drc_index); PCDIMMDevice *dimm; + NVDIMMDevice *nvdimm; + NvdimmSyncModes sync_dax; HostMemoryBackend *backend = NULL; SpaprNVDIMMDeviceFlushState *state; ThreadPool *pool = aio_get_thread_pool(qemu_get_aio_context()); @@ -575,6 +607,13 @@ static target_ulong h_scm_flush(PowerPCCPU *cpu, SpaprMachineState *spapr, return H_PARAMETER; } + nvdimm = NVDIMM(drc->dev); + sync_dax = object_property_get_enum(OBJECT(nvdimm), NVDIMM_SYNC_DAX_PROP, + "NvdimmSyncModes", &error_abort); + if (sync_dax != NVDIMM_SYNC_MODES_WRITEBACK) { + return H_UNSUPPORTED; + } + if (continue_token != 0) { goto get_status; } diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h index bcf62f825c..ef30bdeca4 100644 --- a/include/hw/mem/nvdimm.h +++ b/include/hw/mem/nvdimm.h @@ -28,6 +28,7 @@ #include "qemu/uuid.h" #include "hw/acpi/aml-build.h" #include "qom/object.h" +#include "qapi/qapi-types-machine.h" #define NVDIMM_DEBUG 0 #define nvdimm_debug(fmt, ...) \ @@ -51,6 +52,7 @@ OBJECT_DECLARE_TYPE(NVDIMMDevice, NVDIMMClass, NVDIMM) #define NVDIMM_LABEL_SIZE_PROP "label-size" #define NVDIMM_UUID_PROP "uuid" #define NVDIMM_UNARMED_PROP "unarmed" +#define NVDIMM_SYNC_DAX_PROP "sync-dax" struct NVDIMMDevice { /* private */ @@ -85,6 +87,15 @@ struct NVDIMMDevice { */ bool unarmed; + /* + * The 'writeback' value would indicate the guest to make explicit + * flush requests to hypervisor. When 'direct', the device is + * assumed to be synchronous DAX capable and no explicit flush + * is required. 'unsafe' indicates flush semantics unimplemented + * and the data persistence not guaranteed in power failure scenarios. + */ + NvdimmSyncModes sync_dax; + /* * The PPC64 - spapr requires each nvdimm device have a uuid. */ diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h index 478c031396..ddde87e2b6 100644 --- a/include/hw/ppc/spapr.h +++ b/include/hw/ppc/spapr.h @@ -332,6 +332,7 @@ struct SpaprMachineState { #define H_P7 -60 #define H_P8 -61 #define H_P9 -62 +#define H_UNSUPPORTED -67 #define H_OVERLAP -68 #define H_UNSUPPORTED_FLAG -256 #define H_MULTI_THREADS_ACTIVE -9005 diff --git a/qapi/common.json b/qapi/common.json index 7c976296f0..bec1b45b09 100644 --- a/qapi/common.json +++ b/qapi/common.json @@ -197,3 +197,23 @@ { 'enum': 'GrabToggleKeys', 'data': [ 'ctrl-ctrl', 'alt-alt', 'shift-shift','meta-meta', 'scrolllock', 'ctrl-scrolllock' ] } + +## +# @NvdimmSyncModes: +# +# Indicates the mode of flush to be used to ensure persistence in case +# of power failures. +# +# @unsafe: This is to indicate, the data on the backend device not be +# consistent in power failure scenarios. +# @direct: This is to indicate the backend device supports synchronous DAX +# and no explicit flush requests from the guest is required. +# @writeback: To be used when the backend device doesn't support synchronous +# DAX. The hypervisor issues flushes to the disk when requested +# by the guest. +# Since: 6.0 +# +## +{ 'enum': 'NvdimmSyncModes', + 'data': [ 'unsafe', 'writeback', + { 'name': 'direct', 'if': 'defined(CONFIG_LIBPMEM)' } ] }