From patchwork Sat Dec 15 00:44:37 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 206598 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id EC6842C00A9 for ; Sat, 15 Dec 2012 11:45:40 +1100 (EST) Received: from localhost ([::1]:33922 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Tjfsl-0006mZ-GM for incoming@patchwork.ozlabs.org; Fri, 14 Dec 2012 19:45:35 -0500 Received: from eggs.gnu.org ([208.118.235.92]:56007) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TjfsP-0006ky-KN for qemu-devel@nongnu.org; Fri, 14 Dec 2012 19:45:22 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TjfsE-0003V7-AZ for qemu-devel@nongnu.org; Fri, 14 Dec 2012 19:45:13 -0500 Received: from mga02.intel.com ([134.134.136.20]:36389) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TjfsC-0003Sv-F5 for qemu-devel@nongnu.org; Fri, 14 Dec 2012 19:45:02 -0500 Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga101.jf.intel.com with ESMTP; 14 Dec 2012 16:44:40 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.84,285,1355126400"; d="scan'208";a="257646875" Received: from dcgshare.lm.intel.com ([10.232.118.254]) by orsmga002.jf.intel.com with ESMTP; 14 Dec 2012 16:44:39 -0800 Received: by dcgshare.lm.intel.com (Postfix, from userid 1017) id EF916E0C52; Fri, 14 Dec 2012 17:44:38 -0700 (MST) From: Keith Busch To: qemu-devel@nongnu.org Date: Fri, 14 Dec 2012 17:44:37 -0700 Message-Id: <1355532277-28230-1-git-send-email-keith.busch@intel.com> X-Mailer: git-send-email 1.7.1 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 134.134.136.20 Cc: Keith Busch , Keith Busch , "Michael S. Tsirkin" Subject: [Qemu-devel] [PATCHv2] NVMe: Initial commit for NVM Express device X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org NVM Express is an open standard for PCI-e attached Non-Volatile Memory storage. This commit adds an emulated device that supports the register interface and command set defined by this standard. The standard can be viewed at nvmexpress.org. Cc: Michael S. Tsirkin Cc: Keith Busch Signed-off-by: Keith Busch --- Thanks to everyone I've received comments from first attempt. Apparently there are _rules_ my emulated device needs to follow in order to run correctly in this environment. :) The most major change is none of the routines run in their own threads anymore. They are all timer callbacks that take the BQL. This single threads processes I was trying to parallelize before, so some of the code is simpler, albiet runs a little slower. The device uses the qemu block interface to communicate with the backing storage that represents the non-volatile memory of the nvme device. I split the code into a header and c file. I know this is fairly large, but I don't see a nice way to significantly split this into multiple commits without crippling the functionality. There are some optional features here that could potenetially be removed from the initial commmit, but they don't take an appreciable amount of code space. A lot had to change from the first revision, so my apologies if I missed or misunderstood something from the previous comments. MAINTAINERS | 5 + default-configs/pci.mak | 1 + hw/Makefile.objs | 2 + hw/nvme.c | 2294 +++++++++++++++++++++++++++++++++++++++++++++++ hw/nvme.h | 678 ++++++++++++++ hw/pci_ids.h | 1 + 6 files changed, 2981 insertions(+), 0 deletions(-) create mode 100644 hw/nvme.c create mode 100644 hw/nvme.h diff --git a/MAINTAINERS b/MAINTAINERS index 2ede20d..ff5eeb5 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -563,6 +563,11 @@ S: Supported F: hw/virtio-serial* F: hw/virtio-console* +nvme +M: Keith Busch +S: Supported +F: hw/nvme* + Xilinx EDK M: Peter Crosthwaite M: Edgar E. Iglesias diff --git a/default-configs/pci.mak b/default-configs/pci.mak index ae9d1eb..eb67914 100644 --- a/default-configs/pci.mak +++ b/default-configs/pci.mak @@ -11,6 +11,7 @@ CONFIG_PCNET_PCI=y CONFIG_PCNET_COMMON=y CONFIG_LSI_SCSI_PCI=y CONFIG_MEGASAS_SCSI_PCI=y +CONFIG_NVME_PCI=y CONFIG_RTL8139_PCI=y CONFIG_E1000_PCI=y CONFIG_IDE_CORE=y diff --git a/hw/Makefile.objs b/hw/Makefile.objs index d581d8d..fa5a5b3 100644 --- a/hw/Makefile.objs +++ b/hw/Makefile.objs @@ -41,6 +41,8 @@ common-obj-$(CONFIG_I8259) += i8259_common.o i8259.o common-obj-y += fifo.o common-obj-y += pam.o +common-obj-$(CONFIG_NVME_PCI) += nvme.o + # PPC devices common-obj-$(CONFIG_PREP_PCI) += prep_pci.o common-obj-$(CONFIG_I82378) += i82378.o diff --git a/hw/nvme.c b/hw/nvme.c new file mode 100644 index 0000000..3c489da --- /dev/null +++ b/hw/nvme.c @@ -0,0 +1,2294 @@ +/* + * QEMU NVM Express Controller + * + * Copyright (c) 2012, Intel Corporation + * + * Written by Keith Busch + * + * This code is licensed under the GNU GPL v2 or later. + */ + +/** + * Reference Specs: http://www.nvmexpress.org, 1.1, 1.0d + * + * http://www.nvmexpress.org/index.php/download_file/view/102/1/ + * http://www.nvmexpress.org/index.php/download_file/view/100/1/ + */ + +/** + * Usage: + * + * Add "-device nvme[,options]" parameters when starting QEMU. + * Repeat "-device nvme" for multiple nvme controller pci devices. + * + * Options: + * + * drive= : (Required) drive id to use as backing storage + * namespaces=<1-256> : Number of namespaces. Default: 1 + * size=<1-131072> : Size of each namespace in MB. Default: 512 + * queues=<1-2047> : Number of controller IO queues. Default:64 + * entries=<1-16535> : Maximum number of queue entries. Default:2047 + * aerl=<0-255> : Number of async event request to accept. Default:3 + * acl=<0-255> : The abort command limit. Default:3 + * mdts=<0-255> : Maximum data transfer size, see NVMe spec. Default: 5 + * cqr=<0,1> : Contiguous Queues Required. Default:1 + * stride=<0-12> : Doorbell stride. Default:0 + * + * Alternate method: to use all the default options, execute as + * "-drive file=,if=nvme" + * + */ + +/** + * TODO: + * - Single and Multiple Message MSI + * - Scatter Gather List support + * - NVMe Subsystem Reset + * - Fused commands + * - Meta-data, interleaved and separate + * - Check/regen protection information + * - Reservations + * - Coalescing + * - Security and firmware (vendor specific) + * - Persistent features + * - Test page sizes > 4k + * - Secure erase on format + * - Submit aio discard requests + */ + +#include "bitmap.h" +#include "bitops.h" +#include "block-common.h" +#include "hw.h" +#include "msix.h" +#include "msi.h" +#include "nvme.h" +#include "pci.h" + +#define NVME_DEBUG +#ifdef NVME_DEBUG +enum { + IO_DBG, ADM_DBG, DBG, INFO, ERR +}; + +#define DBGBIT(x) (1 << x) +int debug_flags = DBGBIT(DBG) | DBGBIT(INFO) | DBGBIT(ERR); +#define NVME_LOG(level, fmt, ...) \ + do {\ + if (debug_flags & DBGBIT(level)) { \ + fprintf(stderr, "%s: " fmt "\n", __func__, ##__VA_ARGS__);\ + } \ + } while (0) +#else +#define DBGBIT(x) 0 +#define NVME_LOG(level, fmt, ...) do {} while (0) +#endif + +#define NVME_MAX_QS PCI_MSIX_FLAGS_QSIZE +#define NVME_MAX_QUEUE_ENTRIES 0xffff +#define NVME_MAX_STRIDE 12 +#define NVME_MAX_NAMESPACE_SIZE 131072 +#define NVME_MAX_NUM_NAMESPACES 256 +#define BYTES_PER_MB (1024ULL * 1024ULL) +#define NVME_OP_ABORTED 0xff +#define NVME_SPARE_THRESHOLD 20 +#define NVME_TEMPERATURE 0x143 +#define min(x, y) ((x) < (y) ? (x) : (y)) + +static int instance; +static void nvme_sq_process(void *opaque); + +static inline void _nvme_check_size(void) +{ + QEMU_BUILD_BUG_ON(sizeof(NvmeCmd) != 64); + QEMU_BUILD_BUG_ON(sizeof(NvmeDeleteQ) != 64); + QEMU_BUILD_BUG_ON(sizeof(NvmeCreateCq) != 64); + QEMU_BUILD_BUG_ON(sizeof(NvmeCreateSq) != 64); + QEMU_BUILD_BUG_ON(sizeof(NvmeIdentify) != 64); + QEMU_BUILD_BUG_ON(sizeof(NvmeRwCmd) != 64); + QEMU_BUILD_BUG_ON(sizeof(NvmeDsmCmd) != 64); + QEMU_BUILD_BUG_ON(sizeof(NvmeCqe) != 16); + QEMU_BUILD_BUG_ON(sizeof(NvmeRangeType) != 64); + QEMU_BUILD_BUG_ON(sizeof(NvmeFwSlotInfoLog) != 512); + QEMU_BUILD_BUG_ON(sizeof(NvmeErrorLog) != 64); + QEMU_BUILD_BUG_ON(sizeof(NvmeSmartLog) != 512); + QEMU_BUILD_BUG_ON(sizeof(NvmeIdCtrl) != 4096); + QEMU_BUILD_BUG_ON(sizeof(NvmeIdNs) != 4096); +} + +static void nvme_open_namespace(NvmeCtrl *n, NvmeNamespace *ns) +{ + uint16_t lba_idx; + NvmeIdNs *id_ns = &ns->id_ns; + + lba_idx = NVME_ID_NS_FLBAS_INDEX(id_ns->flbas); + NVME_LOG(DBG, "ctrl:%u nsid:%u flbas:%x nsze:%"PRIu64" ds:%u ms:%u", + n->instance, ns->id, id_ns->flbas, id_ns->nsze, id_ns->lbaf[lba_idx].ds, + id_ns->lbaf[lba_idx].ms); + + ns->util = bitmap_new(id_ns->nsze); + ns->uncorrectable = bitmap_new(id_ns->nsze); + ns->thresh_warn_issued = 0; + ns->nuse_thresh = ((double)ns->id_ns.nsze) * + (1 - ((double)NVME_SPARE_THRESHOLD) / 100.0); +} + +static void nvme_close_namespace(NvmeNamespace *ns) +{ + if (ns->util) { + g_free(ns->util); + ns->util = NULL; + } + if (ns->uncorrectable) { + g_free(ns->uncorrectable); + ns->uncorrectable = NULL; + } +} + +static void nvme_enqueue_event(NvmeCtrl *n, uint8_t event_type, + uint8_t event_info, uint8_t log_page) +{ + AsyncEvent *event = (AsyncEvent *)g_malloc(sizeof(*event)); + event->result.event_type = event_type; + event->result.event_info = event_info; + event->result.log_page = log_page; + QSIMPLEQ_INSERT_TAIL(&(n->aer_queue), event, entry); + + NVME_LOG(INFO, "ctrl:%u type:%x info:%x page:%x", n->instance, + event_type, event_info, log_page); + qemu_mod_timer(n->aer_timer, qemu_get_clock_ns(vm_clock) + 10000); +} + +static uint64_t *nvme_setup_discontig(uint64_t prp_addr, uint16_t queue_depth, + uint16_t page_size, uint16_t entry_size) +{ + int i; + uint16_t prps_per_page = page_size / sizeof(uint64_t); + uint64_t prp[prps_per_page]; + uint16_t total_prps = DIV_ROUND_UP(queue_depth * entry_size, page_size); + uint64_t *prp_list = g_malloc0(total_prps * sizeof(*prp_list)); + + NVME_LOG(DBG, + "queue depth:%u page size:%u entry size:%u prps per page:%u total:%u\n", + queue_depth, page_size, entry_size, prps_per_page, total_prps); + + for (i = 0; i < total_prps; i++) { + if (i % prps_per_page == 0 && i < total_prps - 1) { + if (!prp_addr || prp_addr & (page_size - 1)) { + NVME_LOG(ERR, + "invalid prp list address for discontig queue:%"PRIx64"", + prp_addr); + g_free(prp_list); + return NULL; + } + cpu_physical_memory_rw(prp_addr, (uint8_t *)&prp, sizeof(prp), 1); + prp_addr = prp[prps_per_page - 1]; + } + prp_list[i] = prp[i % prps_per_page]; + NVME_LOG(DBG, "map prp:%"PRIx64"\n", prp_list[i]); + if (!prp_list[i] || prp_list[i] & (page_size - 1)) { + NVME_LOG(ERR, "invalid prp for discontig queue:%"PRIx64"", + prp_list[i]); + g_free(prp_list); + return NULL; + } + } + return prp_list; +} + +static hwaddr nvme_discontig(uint64_t *dma_addr, uint16_t page_size, + uint16_t queue_idx, uint16_t entry_size) +{ + uint16_t entries_per_page = page_size / entry_size; + uint16_t prp_index = queue_idx / entries_per_page; + uint16_t index_in_prp = queue_idx % entries_per_page; + NVME_LOG(IO_DBG, + "dma_addr:%"PRIx64" page size:%u queue index:%u entry size:%u", + *dma_addr, page_size, queue_idx, entry_size); + return dma_addr[prp_index] + index_in_prp * entry_size; +} + +static uint32_t nvme_do_prp(uint64_t prp1, uint64_t prp2, uint8_t *buf, + uint32_t len, int data_dir, NvmeCtrl *n) +{ + uint32_t trans_len = n->page_size - (prp1 % n->page_size); + trans_len = min(len, trans_len); + + if (!prp1) { + NVME_LOG(ERR, "null prp1"); + return NVME_INVALID_FIELD | NVME_DNR; + } + + NVME_LOG(IO_DBG, + "ctrl:%u page:%u prp1:%"PRIx64" prp2:%"PRIx64" buf:%p len:%u dir:%d", + n->instance, n->page_size, prp1, prp2, buf, len, data_dir); + + cpu_physical_memory_rw(prp1, buf, trans_len, data_dir); + len -= trans_len; + buf += trans_len; + if (len) { + if (!prp2) { + NVME_LOG(ERR, "null prp2"); + return NVME_INVALID_FIELD | NVME_DNR; + } + if (len > n->page_size) { + uint64_t prp_list[n->max_prp_ents], nents, prp_trans; + int i = 0; + + nents = (uint64_t)((len + n->page_size - 1) >> n->page_bits); + prp_trans = min(n->max_prp_ents, nents) * sizeof(uint64_t); + cpu_physical_memory_rw(prp2, (uint8_t *)prp_list, prp_trans, 0); + + while (len != 0) { + if (i == n->max_prp_ents - 1 && len > n->page_size) { + if (!prp_list[i] || prp_list[i] & (n->page_size - 1)) { + NVME_LOG(ERR, + "null or unaligned prp chain:%u entry %"PRIx64"", i, + prp_list[i]); + return NVME_INVALID_FIELD | NVME_DNR; + } + nents = (uint64_t)((len + n->page_size - 1) >> + n->page_bits); + prp_trans = min(n->max_prp_ents, nents) * sizeof(uint64_t); + cpu_physical_memory_rw(prp_list[i], (uint8_t *)prp_list, + prp_trans, 0); + i = 0; + } + if (!prp_list[i] || prp_list[i] & (n->page_size - 1)) { + NVME_LOG(ERR, + "null or unaligned prp list:%u entry %"PRIx64"", + i, prp_list[i]); + return NVME_INVALID_FIELD | NVME_DNR; + } + + NVME_LOG(IO_DBG, "prp[%u]:%"PRIx64"", i, prp_list[i]); + trans_len = min(len, n->page_size); + cpu_physical_memory_rw(prp_list[i], buf, trans_len, data_dir); + + len -= trans_len; + buf += trans_len; + i++; + } + } else { + if (prp2 & (n->page_size - 1)) { + NVME_LOG(ERR, "prp2 alignment"); + return NVME_INVALID_FIELD | NVME_DNR; + } + cpu_physical_memory_rw(prp2, buf, len, data_dir); + } + } + + return NVME_SUCCESS; +} + +static uint16_t nvme_map_prp(NvmeRequest *req, uint64_t prp1, uint64_t prp2, + uint32_t len, int data_dir, NvmeCtrl *n) +{ + hwaddr req_len, trans_len = n->page_size - (prp1 % n->page_size); + req_len = trans_len = min(len, trans_len); + int num_prps = (len >> n->page_bits) + 1; + void *addr; + + if (!prp1) { + NVME_LOG(ERR, "null prp1"); + return NVME_INVALID_FIELD | NVME_DNR; + } + + NVME_LOG(IO_DBG, + "ctrl:%u page size:%u prp1:%"PRIx64" prp2:%"PRIx64" len:%u dir:%d", + n->instance, n->page_size, prp1, prp2, len, data_dir); + + qemu_iovec_init(&req->qiov, num_prps); + addr = cpu_physical_memory_map(prp1, &trans_len, data_dir); + if (!addr || req_len != trans_len) { + NVME_LOG(ERR, + "unable to map data bytes:%"PRIu64" from address:%"PRIx64"", + trans_len, prp1); + return NVME_INTERNAL_DEV_ERROR; + } + NVME_LOG(IO_DBG, "mapped prp1:%"PRIx64" to %p len:%"PRIu64"", prp1, addr, + trans_len); + qemu_iovec_add(&req->qiov, addr, trans_len); + + len -= trans_len; + if (len) { + if (!prp2) { + NVME_LOG(ERR, "null prp2"); + return NVME_INVALID_FIELD | NVME_DNR; + } + if (len > n->page_size) { + uint64_t prp_list[n->max_prp_ents], nents, prp_trans; + int i = 0; + + nents = (uint64_t)((len + n->page_size - 1) >> n->page_bits); + prp_trans = min(n->max_prp_ents, nents) * sizeof(uint64_t); + cpu_physical_memory_rw(prp2, (uint8_t *)prp_list, prp_trans, 0); + + while (len != 0) { + if (i == n->max_prp_ents - 1 && len > n->page_size) { + if (!prp_list[i] || prp_list[i] & (n->page_size - 1)) { + NVME_LOG(ERR, + "null or unaligned prp chain:%u entry %"PRIx64"", i, + prp_list[i]); + return NVME_INVALID_FIELD | NVME_DNR; + } + nents = (uint64_t)((len + n->page_size - 1) >> + n->page_bits); + prp_trans = min(n->max_prp_ents, nents) * sizeof(uint64_t); + cpu_physical_memory_rw(prp_list[i], (uint8_t *)prp_list, + prp_trans, 0); + i = 0; + } + if (!prp_list[i] || prp_list[i] & (n->page_size - 1)) { + NVME_LOG(ERR, + "null or unaligned prp list:%u entry %"PRIx64"", + i, prp_list[i]); + return NVME_INVALID_FIELD | NVME_DNR; + } + + req_len = trans_len = min(len, n->page_size); + addr = cpu_physical_memory_map(prp_list[i], &trans_len, + data_dir); + if (!addr || req_len != trans_len) { + NVME_LOG(ERR, + "unable to map addr%"PRIu64" bytes:%"PRIx64"", + prp_list[i], trans_len); + return NVME_INTERNAL_DEV_ERROR; + } + NVME_LOG(IO_DBG, "mapped prp[%u]:%"PRIx64" to %p len:%"PRIu64"", + i, prp_list[i], addr, trans_len); + qemu_iovec_add(&req->qiov, addr, trans_len); + + len -= trans_len; + i++; + } + } else { + if (prp2 & (n->page_size - 1)) { + NVME_LOG(ERR, "prp2 alignment"); + return NVME_INVALID_FIELD | NVME_DNR; + } + req_len = trans_len = len; + addr = cpu_physical_memory_map(prp2, &trans_len, data_dir); + if (!addr || req_len != trans_len) { + NVME_LOG(ERR, + "unable to map data bytes:%"PRIu64" from address:%"PRIx64"", + trans_len, prp2); + return NVME_INTERNAL_DEV_ERROR; + } + NVME_LOG(IO_DBG, "mapped prp2:%"PRIx64" to %p len:%"PRIu64"", prp2, + addr, trans_len); + qemu_iovec_add(&req->qiov, addr, trans_len); + } + } + return NVME_SUCCESS; +} + +static void nvme_isr_notify(NvmeCtrl *n, NvmeCQueue *cq) +{ + if (cq->irq_enabled) { + if (msix_enabled(&(n->dev))) { + msix_notify(&(n->dev), cq->vector); + } else if (msi_enabled(&(n->dev))) { + msi_notify(&(n->dev), cq->vector); + } else { + qemu_irq_pulse(n->dev.irq[0]); + } + } +} + +static int nvme_check_sqid(NvmeCtrl *n, uint16_t sqid) +{ + return sqid < n->num_queues && n->sq[sqid] != NULL ? 0 : -1; +} + +static int nvme_check_cqid(NvmeCtrl *n, uint16_t cqid) +{ + return cqid < n->num_queues && n->cq[cqid] != NULL ? 0 : -1; +} + +static void nvme_inc_cq_tail(NvmeCQueue *cq) +{ + cq->tail++; + if (cq->tail >= cq->size) { + cq->tail = 0; + cq->phase = !cq->phase; + } +} + +static void nvme_inc_sq_head(NvmeSQueue *sq) +{ + sq->head = (sq->head + 1) % sq->size; +} + +static uint8_t nvme_cq_full(NvmeCQueue *cq) +{ + return (cq->tail + 1) % cq->size == cq->head; +} + +static uint8_t nvme_sq_empty(NvmeSQueue *sq) +{ + return sq->head == sq->tail; +} + +static void nvme_enqueue_req_completion(NvmeCQueue *cq, NvmeRequest *req) +{ + NVME_LOG(IO_DBG, "cq:%u sq:%u", cq->id, req->sq->cqid); + assert(cq->id == req->sq->cqid); + QSIMPLEQ_INSERT_TAIL(&(cq->req_list), req, entry); + if (!qemu_timer_pending(cq->timer)) { + qemu_mod_timer(cq->timer, qemu_get_clock_ns(vm_clock) + 1000); + } +} + +static void nvme_post_cqes(void *opaque) +{ + NvmeCQueue *cq = opaque; + NvmeSQueue *sq; + NvmeRequest *req, *next; + NvmeCtrl *n = cq->ctrl; + hwaddr addr; + uint32_t tail; + int processed = 0; + + NVME_LOG(IO_DBG, "cq:%u", cq->id); + QSIMPLEQ_FOREACH_SAFE(req, &cq->req_list, entry, next) { + if (nvme_cq_full(cq)) { + break; + } + + QSIMPLEQ_REMOVE_HEAD(&cq->req_list, entry); + sq = req->sq; + + NVME_LOG(IO_DBG, "ctrl:%u cq:%u sq:%u cq head:%u cq tail:%u", + n->instance, cq->id, sq->id, cq->head, cq->tail); + + tail = cq->tail; + req->cqe.status |= cq->phase; + req->cqe.sq_id = sq->id; + req->cqe.sq_head = sq->head; + nvme_inc_cq_tail(cq); + + if (cq->phys_contig) { + addr = cq->dma_addr + tail * n->cqe_size; + } else { + addr = nvme_discontig(cq->prp_list, tail, n->page_size, + n->cqe_size); + } + + NVME_LOG(IO_DBG, "ctrl:%u cq:%u cq tail:%u addr:%"PRIx64"", + n->instance, cq->id, cq->tail, addr); + cpu_physical_memory_rw(addr, (uint8_t *)&req->cqe, sizeof(req->cqe), 1); + ++processed; + + QSIMPLEQ_INSERT_TAIL(&(sq->req_list), req, entry); + } + if (processed) { + nvme_isr_notify(n, cq); + } +} + +static void nvme_set_error_page(NvmeCtrl *n, uint16_t sqid, uint16_t cid, + uint16_t status, uint16_t location, uint64_t lba, uint32_t nsid) +{ + NvmeErrorLog *elp; + elp = &n->elpes[n->elp_index]; + elp->error_count = n->error_count++; + elp->sqid = sqid; + elp->cid = cid; + elp->status_field = status; + elp->param_error_location = location; + elp->lba = lba; + elp->nsid = nsid; + n->elp_index = (n->elp_index + 1) % n->elpe; +} + +static void nvme_aer_process_cb(void *param) +{ + NvmeCtrl *n = param; + NvmeRequest *req; + AerResult *result; + AsyncEvent *event, *next;; + + NVME_LOG(DBG, "ctrl:%u outstanding aers:%u mask:%x queue empty:%u", + n->instance, n->outstanding_aers, n->aer_mask, + QSIMPLEQ_EMPTY(&n->aer_queue)); + + QSIMPLEQ_FOREACH_SAFE(event, &n->aer_queue, entry, next) { + if (n->outstanding_aers <= 0) { + break; + } + if (n->aer_mask & (1 << event->result.event_type)) { + continue; + } + + QSIMPLEQ_REMOVE_HEAD(&n->aer_queue, entry); + n->aer_mask |= 1 << event->result.event_type; + n->outstanding_aers--; + + req = n->aer_reqs[n->outstanding_aers]; + result = (AerResult *)&req->cqe.result; + result->event_type = event->result.event_type; + result->event_info = event->result.event_info; + result->log_page = event->result.log_page; + g_free(event); + + req->cqe.status = NVME_SUCCESS << 1; + nvme_enqueue_req_completion(n->cq[0], req); + } +} + +static void nvme_update_stats(NvmeNamespace *ns, uint16_t nlb, int rw) +{ + uint64_t tmp; + if (!rw) { + if (++ns->host_write_commands[0] == 0) { + ++ns->host_write_commands[1]; + } + + tmp = ns->data_units_written[0]; + ns->write_data_counter += nlb + 1; + ns->data_units_written[0] += (ns->write_data_counter / 1000); + ns->write_data_counter %= 1000; + if (tmp > ns->data_units_written[0]) { + ++ns->data_units_written[1]; + } + } else { + if (++ns->host_read_commands[0] == 0) { + ++ns->host_read_commands[1]; + } + + tmp = ns->data_units_read[0]; + ns->read_data_counter += nlb + 1; + ns->data_units_read[0] += (ns->read_data_counter / 1000); + ns->read_data_counter %= 1000; + if (tmp > ns->data_units_read[0]) { + ++ns->data_units_read[1]; + } + } +} + +static void nvme_update_ns_util(NvmeNamespace *ns, uint64_t slba, uint16_t nlb) +{ + uint64_t nr; + uint64_t elba = slba + nlb; + unsigned long *addr = ns->util; + + for (nr = slba; nr <= elba; nr++) { + if (!test_and_set_bit(nr, addr)) { + assert(ns->id_ns.nuse < ns->id_ns.nsze); + ++ns->id_ns.nuse; + } + } + bitmap_clear(ns->uncorrectable, slba, nlb); +} + +static int nvme_is_unrecovered(NvmeNamespace *ns, uint64_t slba, uint16_t nlb) +{ + uint64_t nr; + uint64_t elba = slba + nlb; + unsigned long *addr = ns->uncorrectable; + for (nr = slba; nr <= elba; nr++) { + if (test_bit(nr, addr)) { + return 1; + } + } + return 0; +} + +static void nvme_dsm_dealloc(NvmeNamespace *ns, uint64_t slba, uint64_t nlb) +{ + uint64_t nr; + uint64_t elba = nlb + slba; + unsigned long *addr = ns->util; + for (nr = slba; nr < elba; nr++) { + if (test_and_clear_bit(nr, addr)) { + assert(ns->id_ns.nuse > 0); + --ns->id_ns.nuse; + } + } +} + +static uint16_t nvme_dsm(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd, + NvmeRequest *req) +{ + uint16_t nr = (cmd->cdw10 & 0xff) + 1; + NvmeDsmRange range[nr]; + + if (nvme_do_prp(cmd->prp1, cmd->prp2, (uint8_t *) range, sizeof(range), + 1, n)) { + nvme_set_error_page(n, req->sq->id, cmd->cid, NVME_INVALID_FIELD, + offsetof(NvmeCmd, prp1), 0, ns->id); + return NVME_INVALID_FIELD | NVME_DNR; + } + + if (cmd->cdw11 & NVME_DSMGMT_AD) { + int i; + uint64_t slba; + uint32_t nlb; + for (i = 0; i < nr; i++) { + slba = range[i].slba; + nlb = range[i].nlb; + if (slba + nlb > ns->id_ns.ncap) { + NVME_LOG(ERR, + "range error, slba:%"PRIu64" nlb:%u size:%"PRIu64"", + slba, nlb, ns->id_ns.ncap); + nvme_set_error_page(n, req->sq->id, cmd->cid, NVME_LBA_RANGE, + offsetof(NvmeCmd, cdw10), slba + nlb, ns->id); + return NVME_LBA_RANGE | NVME_DNR; + } + nvme_dsm_dealloc(ns, slba, nlb); + } + } + return NVME_SUCCESS; +} + +static void nvme_rw_cb(void *opaque, int ret) +{ + int i; + NvmeRequest *req = opaque; + NvmeSQueue *sq = req->sq; + NvmeNamespace *ns = req->ns; + NvmeCtrl *n = sq->ctrl; + NvmeCQueue *cq = n->cq[sq->cqid]; + + NVME_LOG(IO_DBG, + "ctrl:%u sq:%u cid:%x nsid:%u nlb:%u slba:%"PRIu64" ret:%d", + n->instance, sq->id, req->cqe.cid, ns->id, req->nlb, req->slba, ret); + + nvme_update_stats(ns, req->nlb, req->rw); + if (!req->rw) { + nvme_update_ns_util(ns, req->slba, req->nlb); + } + + for (i = 0; i < req->qiov.niov; i++) { + NVME_LOG(IO_DBG, "unmap:%p len:%"PRIu64"", req->qiov.iov[i].iov_base, + req->qiov.iov[i].iov_len); + cpu_physical_memory_unmap(req->qiov.iov[i].iov_base, + req->qiov.iov[i].iov_len, req->rw, + req->qiov.iov[i].iov_len); + } + qemu_iovec_destroy(&req->qiov); + + if (!ret) { + req->cqe.status = NVME_SUCCESS << 1; + } else { + NVME_LOG(ERR, "nsid:%u nlb:%u slba:%"PRIu64" ret:%d", ns->id, req->nlb, + req->slba, ret); + req->cqe.status = NVME_INTERNAL_DEV_ERROR << 1; + } + nvme_enqueue_req_completion(cq, req); +} + +static uint16_t nvme_rw(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd, + NvmeRequest *req) +{ + NvmeRwCmd *rw = (NvmeRwCmd *)cmd; + uint8_t lba_index = NVME_ID_NS_FLBAS_INDEX(ns->id_ns.flbas); + uint8_t data_shift = ns->id_ns.lbaf[lba_index].ds; + uint64_t data_size = (rw->nlb + 1) << data_shift; + uint64_t meta_size = (rw->nlb + 1) * ns->id_ns.lbaf[lba_index].ms; + int data_dir = rw->opcode == NVME_CMD_WRITE ? 0 : 1; + uint16_t ret; + + NVME_LOG(IO_DBG, + "nsid:%u slba:%"PRIu64" nlb:%u data size:%"PRIu64" meta:%"PRIu64"", + ns->id, rw->slba, rw->nlb, data_size, meta_size); + if ((rw->slba + rw->nlb) > ns->id_ns.nsze) { + NVME_LOG(ERR, "range error, slba:%ld nlb:%u size:%ld", rw->slba, + rw->nlb, ns->id_ns.nsze); + nvme_set_error_page(n, req->sq->id, cmd->cid, NVME_LBA_RANGE, + offsetof(NvmeRwCmd, slba), rw->slba + rw->nlb, ns->id); + return NVME_LBA_RANGE | NVME_DNR; + } + if (n->id_ctrl.mdts && data_size > n->page_size * (1 << n->id_ctrl.mdts)) { + NVME_LOG(ERR, "transfer size error, mdts:%u data size:%"PRIu64"", + n->id_ctrl.mdts, data_size); + nvme_set_error_page(n, req->sq->id, cmd->cid, NVME_INVALID_FIELD, + offsetof(NvmeRwCmd, nlb), rw->slba + rw->nlb, ns->id); + return NVME_INVALID_FIELD | NVME_DNR; + } + if (data_dir && nvme_is_unrecovered(ns, rw->slba, rw->nlb)) { + NVME_LOG(ERR, + "nsid:%u slba:%"PRIx64" nlb:%u access uncorrectable range", + ns->id, rw->slba, rw->nlb); + nvme_set_error_page(n, req->sq->id, cmd->cid, NVME_UNRECOVERED_READ, + offsetof(NvmeRwCmd, slba), rw->slba + rw->nlb, ns->id); + return NVME_UNRECOVERED_READ; + } + + ret = nvme_map_prp(req, rw->prp1, rw->prp2, data_size, data_dir, n); + if (ret == NVME_SUCCESS) { + uint64_t slba = ns->start_block + (rw->slba << (data_shift - 9)); + uint32_t nlb = (rw->nlb + 1) << (data_shift - 9); + + req->slba = rw->slba; + req->nlb = rw->nlb; + req->ns = ns; + req->rw = data_dir; + + ret = NVME_NO_COMPLETE; + assert(nlb * BDRV_SECTOR_SIZE == req->qiov.size); + NVME_LOG(IO_DBG, + "req:%p iov:%p ns:%u sq:%u cid:%x nlb:%u slba:%"PRIx64" "\ + "aio slba:%"PRIu64" aio nlb:%u", req, req->qiov.iov, ns->id, + req->sq->id, req->cqe.cid, rw->nlb, rw->slba, slba, nlb); + data_dir ? + bdrv_aio_readv(n->conf.bs, slba, &req->qiov, nlb, nvme_rw_cb, req) : + bdrv_aio_writev(n->conf.bs, slba, &req->qiov, nlb, nvme_rw_cb, req); + } + return ret; +} + +static uint16_t nvme_write_uncor(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd, + NvmeRequest *req) +{ + NvmeRwCmd *rw = (NvmeRwCmd *)cmd; + unsigned long *addr = ns->uncorrectable; + + NVME_LOG(IO_DBG, "nsid:%u slba:%"PRIu64" nlb:%u", ns->id, rw->slba, + rw->nlb); + if ((rw->slba + rw->nlb) > ns->id_ns.nsze) { + NVME_LOG(ERR, "range error, slba:%ld nlb:%u nsze:%ld", rw->slba, + rw->nlb, ns->id_ns.nsze); + nvme_set_error_page(n, req->sq->id, cmd->cid, NVME_LBA_RANGE, + offsetof(NvmeRwCmd, slba), rw->slba + rw->nlb, ns->id); + return NVME_LBA_RANGE | NVME_DNR; + } + + nvme_update_ns_util(ns, rw->slba, rw->nlb); + bitmap_set(addr, rw->slba, rw->nlb); + return NVME_SUCCESS; +} + +static void nvme_flush_cb(void *opaque, int ret) +{ + NvmeRequest *req = opaque; + NvmeSQueue *sq = req->sq; + NvmeCtrl *n = sq->ctrl; + NvmeCQueue *cq = n->cq[sq->cqid]; + + if (!ret) { + req->cqe.status = NVME_SUCCESS << 1; + } else { + NVME_LOG(ERR, "nsid:%u nlb:%u slba:%"PRIu64" ret:%d", req->ns->id, + req->nlb, req->slba, ret); + req->cqe.status = NVME_INTERNAL_DEV_ERROR << 1; + } + nvme_enqueue_req_completion(cq, req); +} + +static uint16_t nvme_flush(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd, + NvmeRequest *req) +{ + NVME_LOG(IO_DBG, "ctrl:%u nsid:%u", n->instance, ns->id); + req->ns = ns; + bdrv_aio_flush(n->conf.bs, nvme_flush_cb, req); + return NVME_NO_COMPLETE; +} + +static void nvme_cmp_cb(void *opaque, int ret) +{ + int i; + NvmeRequest *req = opaque; + NvmeSQueue *sq = req->sq; + NvmeNamespace *ns = req->ns; + NvmeCtrl *n = sq->ctrl; + NvmeCQueue *cq = n->cq[sq->cqid]; + QEMUIOVector *qiov = req->opaque; + + if (!ret) { + int cmp = 1; + for (i = 0; i < req->qiov.niov; i++) { + NVME_LOG(IO_DBG, "unmap:%p len:%"PRIu64"", + req->qiov.iov[i].iov_base, req->qiov.iov[i].iov_len); + if (memcmp(qiov->iov[i].iov_base, req->qiov.iov[i].iov_base, + qiov->iov[i].iov_len)) { + cmp = 0; + } + cpu_physical_memory_unmap(req->qiov.iov[i].iov_base, + req->qiov.iov[i].iov_len, req->rw, + req->qiov.iov[i].iov_len); + g_free(qiov->iov[i].iov_base); + } + qemu_iovec_destroy(&req->qiov); + g_free(qiov->iov); + g_free(qiov); + if (cmp) { + req->cqe.status = NVME_SUCCESS << 1; + } else { + req->cqe.status = NVME_CMP_FAILURE << 1; + } + } else { + NVME_LOG(ERR, "nsid:%u nlb:%u slba:%"PRIu64" ret:%d", ns->id, req->nlb, + req->slba, ret); + req->cqe.status = NVME_INTERNAL_DEV_ERROR << 1; + } + nvme_enqueue_req_completion(cq, req); +} + +static uint16_t nvme_compare(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd, + NvmeRequest *req) +{ + NvmeRwCmd *rw = (NvmeRwCmd *)cmd; + uint8_t lba_index = NVME_ID_NS_FLBAS_INDEX(ns->id_ns.flbas); + uint8_t data_shift = ns->id_ns.lbaf[lba_index].ds; + uint64_t data_size = (rw->nlb + 1) << data_shift; + uint16_t ret; + + NVME_LOG(IO_DBG, "nsid:%u slba:%"PRIu64" nlb:%u data size:%"PRIu64"", + ns->id, rw->slba, rw->nlb, data_size); + if ((rw->slba + rw->nlb) > ns->id_ns.nsze) { + NVME_LOG(ERR, "range error, nsid:%u slba:%ld nlb:%u nsze:%ld", ns->id, + rw->slba, rw->nlb, ns->id_ns.nsze); + nvme_set_error_page(n, req->sq->id, cmd->cid, NVME_LBA_RANGE, + offsetof(NvmeRwCmd, slba), rw->slba + rw->nlb, ns->id); + return NVME_LBA_RANGE | NVME_DNR; + } + if (n->id_ctrl.mdts && data_size > n->page_size * (1 << n->id_ctrl.mdts)) { + NVME_LOG(ERR, + "transfer size error, nsid:%u mdts:%u data size:%"PRIu64"", + ns->id, n->id_ctrl.mdts, data_size); + nvme_set_error_page(n, req->sq->id, cmd->cid, NVME_INVALID_FIELD, + offsetof(NvmeRwCmd, nlb), rw->slba + rw->nlb, ns->id); + return NVME_INVALID_FIELD | NVME_DNR; + } + if (nvme_is_unrecovered(ns, rw->slba, rw->nlb)) { + NVME_LOG(ERR, + "nsid:%u slba:%"PRIx64" nlb:%u access uncorrectable range", + ns->id, rw->slba, rw->nlb); + nvme_set_error_page(n, req->sq->id, cmd->cid, NVME_UNRECOVERED_READ, + offsetof(NvmeRwCmd, slba), rw->slba + rw->nlb, ns->id); + return NVME_UNRECOVERED_READ; + } + + ret = nvme_map_prp(req, rw->prp1, rw->prp2, data_size, 0, n); + if (ret == NVME_SUCCESS) { + int i; + uint64_t slba = ns->start_block + (rw->slba << (data_shift - 9)); + uint32_t nlb = (rw->nlb + 1) << (data_shift - 9); + QEMUIOVector *qiov = g_malloc(sizeof(*qiov));; + struct iovec *cmp_iov = g_malloc(req->qiov.niov * sizeof(*cmp_iov)); + for (i = 0; i < req->qiov.niov; i++) { + cmp_iov[i].iov_len = req->qiov.iov[i].iov_len; + cmp_iov[i].iov_base = g_malloc(cmp_iov[i].iov_len); + } + qemu_iovec_init_external(qiov, cmp_iov, req->qiov.niov); + + req->opaque = qiov; + req->slba = rw->slba; + req->nlb = rw->nlb; + req->ns = ns; + req->rw = 0; + + assert(req->qiov.size == qiov->size); + assert(nlb * BDRV_SECTOR_SIZE == req->qiov.size); + bdrv_aio_readv(n->conf.bs, slba, qiov, nlb, nvme_cmp_cb, req); + ret = NVME_NO_COMPLETE; + } + return ret; +} + +static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req) +{ + NvmeNamespace *ns; + if (cmd->nsid == 0 || cmd->nsid > n->num_namespaces) { + NVME_LOG(ERR, "invalid nsid:%u", cmd->nsid); + nvme_set_error_page(n, req->sq->id, cmd->cid, NVME_INVALID_NSID, + offsetof(NvmeCmd, nsid), 0, cmd->nsid); + return NVME_INVALID_NSID | NVME_DNR; + } + ns = &n->namespaces[cmd->nsid - 1]; + + NVME_LOG(IO_DBG, "ctrl:%u nsid:%u opcode:%x", n->instance, ns->id, + cmd->opcode); + + switch (cmd->opcode) { + case NVME_CMD_FLUSH: + return nvme_flush(n, ns, cmd, req); + + case NVME_CMD_WRITE: + case NVME_CMD_READ: + return nvme_rw(n, ns, cmd, req); + + case NVME_CMD_DSM: + return nvme_dsm(n, ns, cmd, req); + + case NVME_CMD_WRITE_UNCOR: + return nvme_write_uncor(n, ns, cmd, req); + + case NVME_CMD_COMPARE: + return nvme_compare(n, ns, cmd, req); + + default: + return NVME_INVALID_OPCODE | NVME_DNR; + } +} + +static uint16_t nvme_del_sq(NvmeCtrl *n, NvmeCmd *cmd) +{ + NvmeDeleteQ *c = (NvmeDeleteQ *)cmd; + NvmeSQueue *sq; + + if (!c->qid || nvme_check_sqid(n, c->qid)) { + NVME_LOG(ERR, "invalid sq:%u", c->qid); + return NVME_INVALID_QID | NVME_DNR; + } + + + sq = n->sq[c->qid]; + if (!nvme_check_cqid(n, sq->cqid)) { + NvmeCQueue *cq = n->cq[sq->cqid]; + QTAILQ_REMOVE(&cq->sq_list, sq, entry); + } else { + NVME_LOG(ERR, "sq:%u does not contain valid cq:%u", sq->id, + sq->cqid); + } + n->sq[c->qid] = NULL; + + NVME_LOG(INFO, "ctrl:%u sq:%u cq:%u", n->instance, c->qid, + sq->cqid); + qemu_del_timer(sq->timer); + if (sq->prp_list) { + g_free(sq->prp_list); + } + g_free(sq); + + return NVME_SUCCESS; +} + +static void nvme_init_sq_reqs(NvmeSQueue *sq) +{ + int i; + QSIMPLEQ_INIT(&sq->req_list); + sq->io_req = g_malloc(sq->size * sizeof(*sq->io_req)); + for (i = 0; i < sq->size; i++) { + sq->io_req[i].sq = sq; + QSIMPLEQ_INSERT_TAIL(&(sq->req_list), &sq->io_req[i], entry); + } + sq->timer = qemu_new_timer_ns(vm_clock, nvme_sq_process, sq); +} + +static uint16_t nvme_create_sq(NvmeCtrl *n, NvmeCmd *cmd) +{ + NvmeCreateSq *c = (NvmeCreateSq *)cmd; + NvmeSQueue *sq; + NvmeCQueue *cq; + if (!c->cqid || nvme_check_cqid(n, c->cqid)) { + NVME_LOG(ERR, "invalid cq:%u", c->cqid); + return NVME_INVALID_CQID | NVME_DNR; + } + if (!c->sqid || (c->sqid && !nvme_check_sqid(n, c->sqid))) { + NVME_LOG(ERR, "invalid sq:%u", c->sqid); + return NVME_INVALID_QID | NVME_DNR; + } + if (!c->qsize || c->qsize > NVME_CAP_MQES(n->bar.cap)) { + NVME_LOG(ERR, "invalid size"); + return NVME_MAX_QSIZE_EXCEEDED | NVME_DNR; + } + if (!c->prp1 || c->prp1 & (n->page_size - 1)) { + NVME_LOG(ERR, "invalid prp:%"PRIx64"", c->prp1); + return NVME_INVALID_FIELD | NVME_DNR; + } + if (!(NVME_SQ_FLAGS_PC(c->sq_flags)) && NVME_CAP_CQR(n->bar.cap)) { + NVME_LOG(ERR, "invalid prp discontinuity"); + return NVME_INVALID_FIELD | NVME_DNR; + } + + sq = g_malloc0(sizeof(*sq)); + if (!(NVME_SQ_FLAGS_PC(c->sq_flags))) { + sq->prp_list = nvme_setup_discontig(c->prp1, c->qsize + 1, + n->page_size, n->sqe_size); + if (!sq->prp_list) { + g_free(sq); + return NVME_INVALID_FIELD | NVME_DNR; + } + } else { + sq->dma_addr = c->prp1; + } + sq->id = c->sqid; + sq->size = c->qsize + 1; + sq->phys_contig = NVME_SQ_FLAGS_PC(c->sq_flags); + sq->cqid = c->cqid; + sq->ctrl = n; + + switch (NVME_SQ_FLAGS_QPRIO(c->sq_flags)) { + case NVME_Q_PRIO_URGENT: + sq->arb_burst = (1 << NVME_ARB_AB(n->features.arbitration)); + break; + case NVME_Q_PRIO_HIGH: + sq->arb_burst = NVME_ARB_HPW(n->features.arbitration) + 1; + break; + case NVME_Q_PRIO_NORMAL: + sq->arb_burst = NVME_ARB_MPW(n->features.arbitration) + 1; + break; + case NVME_Q_PRIO_LOW: + default: + sq->arb_burst = NVME_ARB_LPW(n->features.arbitration) + 1; + break; + } + nvme_init_sq_reqs(sq); + + NVME_LOG(INFO, "ctrl:%u sq:%u cq:%u size:%u burst:%u dma:%"PRIx64"", + n->instance, c->sqid, c->cqid, c->qsize, sq->arb_burst, sq->dma_addr); + + n->sq[c->sqid] = sq; + cq = n->cq[c->cqid]; + QTAILQ_INSERT_TAIL(&(cq->sq_list), sq, entry); + + return NVME_SUCCESS; +} + +static uint16_t nvme_del_cq(NvmeCtrl *n, NvmeCmd *cmd) +{ + NvmeDeleteQ *c = (NvmeDeleteQ *)cmd; + NvmeCQueue *cq; + + if (!c->qid || nvme_check_cqid(n, c->qid)) { + NVME_LOG(ERR, "invalid cq:%u", c->qid); + return NVME_INVALID_CQID | NVME_DNR; + } + + cq = n->cq[c->qid]; + if (!QTAILQ_EMPTY(&cq->sq_list)) { + NVME_LOG(ERR, "error, cq:%u sq(s) are still connected to CQ", + c->qid); + return NVME_INVALID_QUEUE_DEL; + } + + NVME_LOG(INFO, "ctrl:%u cq:%u irq vector:%u", n->instance, c->qid, + cq->vector); + + msix_vector_unuse(&n->dev, cq->vector); + n->cq[c->qid] = NULL; + if (cq->prp_list) { + g_free(cq->prp_list); + } + g_free(cq); + + return NVME_SUCCESS; +} + +static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeCmd *cmd) +{ + NvmeCreateCq *c = (NvmeCreateCq *)cmd; + NvmeCQueue *cq; + + if (!c->cqid || (c->cqid && !nvme_check_cqid(n, c->cqid))) { + NVME_LOG(ERR, "invalid cq:%u", c->cqid); + return NVME_INVALID_CQID | NVME_DNR; + } + if (!c->qsize || c->qsize > NVME_CAP_MQES(n->bar.cap)) { + NVME_LOG(ERR, "invalid size:%u", c->qsize); + return NVME_MAX_QSIZE_EXCEEDED | NVME_DNR; + } + if (!c->prp1) { + NVME_LOG(ERR, "invalid null prp"); + return NVME_INVALID_FIELD | NVME_DNR; + } + if (c->irq_vector > n->num_queues) { + NVME_LOG(ERR, "invalid irq vector:%u", c->irq_vector); + return NVME_INVALID_IRQ_VECTOR; + } + if (!(c->cq_flags & 1) && NVME_CAP_CQR(n->bar.cap)) { + NVME_LOG(ERR, "invalid prp discontinuity"); + return NVME_INVALID_FIELD | NVME_DNR; + } + + cq = g_malloc0(sizeof(*cq)); + if (!(c->cq_flags & 1)) { + cq->prp_list = nvme_setup_discontig(c->prp1, c->qsize + 1, + n->page_size, n->cqe_size); + if (!cq->prp_list) { + g_free(cq); + return NVME_INVALID_FIELD | NVME_DNR; + } + } else { + cq->dma_addr = c->prp1; + } + cq->id = c->cqid; + cq->dma_addr = c->prp1; + cq->irq_enabled = (c->cq_flags & 2) >> 1; + cq->vector = c->irq_vector; + cq->phase = 1; + cq->size = c->qsize + 1; + cq->phys_contig = c->cq_flags & 1; + cq->ctrl = n; + + QSIMPLEQ_INIT(&cq->req_list); + QTAILQ_INIT(&cq->sq_list); + n->cq[c->cqid] = cq; + msix_vector_use(&n->dev, cq->vector); + cq->timer = qemu_new_timer_ns(vm_clock, nvme_post_cqes, cq); + + NVME_LOG(INFO, + "ctrl:%u cq:%u size:%u irq vector:%u dma:%"PRIx64"", n->instance, + c->cqid, c->qsize, cq->vector, cq->dma_addr); + return NVME_SUCCESS; +} + +static uint16_t nvme_identify(NvmeCtrl *n, NvmeCmd *cmd) +{ + NvmeIdentify *c = (NvmeIdentify *)cmd; + NvmeNamespace *ns; + NVME_LOG(ADM_DBG, "ctrl:%u cns:%u nsid:%u", n->instance, c->cns, c->nsid); + if (c->cns) { + return nvme_do_prp(cmd->prp1, cmd->prp2, (uint8_t *)&n->id_ctrl, + sizeof(n->id_ctrl), 1, n); + } + + if (c->nsid == 0 || c->nsid > n->num_namespaces) { + NVME_LOG(ERR, "ctrl:%u invalid nsid:%u, namespaces:%u", + n->instance, c->nsid, n->num_namespaces); + return NVME_INVALID_NSID | NVME_DNR; + } + + ns = &n->namespaces[c->nsid - 1]; + return nvme_do_prp(cmd->prp1, cmd->prp2, (uint8_t *)&ns->id_ns, + sizeof(ns->id_ns), 1, n); +} + +static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, uint32_t *result) +{ + uint32_t dw11 = cmd->cdw11; + NVME_LOG(ADM_DBG, "ctrl:%u feature id:%x", n->instance, cmd->cdw10); + switch (cmd->cdw10) { + case NVME_ARBITRATION: + *result = n->features.arbitration; + break; + case NVME_POWER_MANAGEMENT: + *result = n->features.power_mgmt; + break; + case NVME_LBA_RANGE_TYPE: + return nvme_do_prp(cmd->prp1, cmd->prp2, + (uint8_t *)n->namespaces[cmd->nsid].lba_range, + min(sizeof(n->namespaces[cmd->nsid].lba_range), + (dw11 & 0x3f) * sizeof(NvmeRangeType)), 1, n); + case NVME_TEMPERATURE_THRESHOLD: + *result = n->features.temp_thresh; + break; + case NVME_ERROR_RECOVERY: + *result = n->features.err_rec; + break; + case NVME_VOLATILE_WRITE_CACHE: + *result = n->features.volatile_wc; + break; + case NVME_NUMBER_OF_QUEUES: + *result = n->features.num_queues; + break; + case NVME_INTERRUPT_COALESCING: + *result = n->features.int_coalescing; + break; + case NVME_INTERRUPT_VECTOR_CONF: + if ((dw11 & 0xffff) > n->num_queues) { + return NVME_INVALID_FIELD | NVME_DNR; + } + *result = n->features.int_vector_config[dw11 & 0xffff]; + break; + case NVME_WRITE_ATOMICITY: + *result = n->features.write_atomicity; + break; + case NVME_ASYNCHRONOUS_EVENT_CONF: + *result = n->features.async_config; + break; + case NVME_SOFTWARE_PROGRESS_MARKER: + *result = n->features.sw_prog_marker; + break; + default: + return NVME_INVALID_FIELD | NVME_DNR; + } + return NVME_SUCCESS; +} + +static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, uint32_t *result) +{ + uint32_t dw11 = cmd->cdw11; + NVME_LOG(ADM_DBG, "ctrl:%u feature id:%x", n->instance, cmd->cdw10); + switch (cmd->cdw10) { + case NVME_ARBITRATION: + *result = n->features.arbitration; + break; + case NVME_POWER_MANAGEMENT: + n->features.power_mgmt = dw11; + break; + case NVME_LBA_RANGE_TYPE: + return nvme_do_prp(cmd->prp1, cmd->prp2, + (uint8_t *)n->namespaces[cmd->nsid].lba_range, + min(sizeof(n->namespaces[cmd->nsid].lba_range), + (dw11 & 0x3f) * sizeof(NvmeRangeType)), 0, n); + case NVME_TEMPERATURE_THRESHOLD: + n->features.temp_thresh = dw11; + if (n->features.temp_thresh <= n->temperature && !n->temp_warn_issued) { + n->temp_warn_issued = 1; + nvme_enqueue_event(n, NVME_AER_TYPE_SMART, + NVME_AER_INFO_SMART_TEMP_THRESH, + NVME_LOG_SMART_INFO); + } else if (n->features.temp_thresh > n->temperature && + !(n->aer_mask & 1 << NVME_AER_TYPE_SMART)) { + n->temp_warn_issued = 0; + } + break; + case NVME_ERROR_RECOVERY: + n->features.err_rec = dw11; + break; + case NVME_VOLATILE_WRITE_CACHE: + break; + case NVME_NUMBER_OF_QUEUES: + *result = n->features.num_queues; + break; + case NVME_INTERRUPT_COALESCING: + break; + case NVME_INTERRUPT_VECTOR_CONF: + if ((dw11 & 0xffff) > n->num_queues) { + return NVME_INVALID_FIELD | NVME_DNR; + } + n->features.int_vector_config[dw11 & 0xffff] = dw11 & 0x1ffff; + break; + case NVME_WRITE_ATOMICITY: + n->features.write_atomicity = dw11; + break; + case NVME_ASYNCHRONOUS_EVENT_CONF: + n->features.async_config = dw11; + break; + case NVME_SOFTWARE_PROGRESS_MARKER: + n->features.sw_prog_marker = dw11; + break; + default: + return NVME_INVALID_FIELD | NVME_DNR; + } + return NVME_SUCCESS; +} + +static uint16_t nvme_fw_log_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len) +{ + uint32_t trans_len; + NvmeFwSlotInfoLog fw_log; + + trans_len = min(sizeof(fw_log), buf_len); + if (buf_len < sizeof(fw_log)) { + NVME_LOG(ERR, "not enough memory, needs %ld, has %u bytes", + sizeof(fw_log), buf_len); + } + + return nvme_do_prp(cmd->prp1, cmd->prp2, (uint8_t *)&fw_log, trans_len, + 1, n); +} + +static uint16_t nvme_error_log_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len) +{ + uint32_t trans_len; + trans_len = min(sizeof(*n->elpes) * n->elpe, buf_len); + if (buf_len % sizeof(*n->elpes)) { + NVME_LOG(ERR, "buffer not multiple of log page size:%"PRIu64"", + sizeof(*n->elpes)); + } + + n->aer_mask &= ~(1 << NVME_AER_TYPE_ERROR); + if (!QSIMPLEQ_EMPTY(&n->aer_queue)) { + qemu_mod_timer(n->aer_timer, qemu_get_clock_ns(vm_clock) + 10000); + } + return nvme_do_prp(cmd->prp1, cmd->prp2, (uint8_t *)n->elpes, trans_len, + 1, n); +} + +static uint16_t nvme_smart_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len) +{ + uint32_t trans_len; + time_t current_seconds; + NvmeSmartLog smart_log; + + trans_len = min(sizeof(smart_log), buf_len); + if (buf_len < sizeof(smart_log)) { + NVME_LOG(ERR, "not enough memory for log, needs %ld, has %u bytes", + sizeof(smart_log), buf_len); + } + + memset(&smart_log, 0x0, sizeof(smart_log)); + if (cmd->nsid == 0xffffffff || !(n->id_ctrl.lpa & 0x1)) { + int i; + uint64_t dur[2] = {0, 0}; + uint64_t duw[2] = {0, 0}; + uint64_t hrc[2] = {0, 0}; + uint64_t hwc[2] = {0, 0}; + uint64_t total_use = 0; + uint64_t total_size = 0; + for (i = 0; i < n->num_namespaces; ++i) { + uint64_t tmp; + NvmeNamespace *ns = &n->namespaces[i]; + if (ns == NULL) { + continue; + } + + tmp = dur[0]; + dur[0] += ns->data_units_read[0]; + dur[1] += ns->data_units_read[1]; + if (tmp > dur[0]) { + ++dur[1]; + } + + tmp = duw[0]; + duw[0] += ns->data_units_written[0]; + duw[1] += ns->data_units_written[1]; + if (tmp > duw[0]) { + ++duw[1]; + } + + tmp = hrc[0]; + hrc[0] += ns->host_read_commands[0]; + hrc[1] += ns->host_read_commands[1]; + if (tmp > hrc[0]) { + ++hrc[1]; + } + + tmp = hwc[0]; + hwc[0] += ns->host_write_commands[0]; + hwc[1] += ns->host_write_commands[1]; + if (tmp > hwc[0]) { + ++hwc[1]; + } + + total_size += ns->id_ns.nsze; + total_use += ns->id_ns.nuse; + } + + smart_log.data_units_read[0] = dur[0]; + smart_log.data_units_read[1] = dur[1]; + smart_log.data_units_written[0] = duw[0]; + smart_log.data_units_written[1] = duw[1]; + smart_log.host_read_commands[0] = hrc[0]; + smart_log.host_read_commands[1] = hrc[1]; + smart_log.host_write_commands[0] = hwc[0]; + smart_log.host_write_commands[1] = hwc[1]; + smart_log.available_spare = 100 - (uint32_t)((((double)total_use) / + total_size) * 100); + } else if (cmd->nsid > 0 && cmd->nsid <= n->num_namespaces && + (n->id_ctrl.lpa & 0x1)) { + NvmeNamespace *ns = &n->namespaces[cmd->nsid - 1]; + smart_log.data_units_read[0] = ns->data_units_read[0]; + smart_log.data_units_read[1] = ns->data_units_read[1]; + smart_log.data_units_written[0] = ns->data_units_written[0]; + smart_log.data_units_written[1] = ns->data_units_written[1]; + smart_log.host_read_commands[0] = ns->host_read_commands[0]; + smart_log.host_read_commands[1] = ns->host_read_commands[1]; + smart_log.host_write_commands[0] = ns->host_write_commands[0]; + smart_log.host_write_commands[1] = ns->host_write_commands[1]; + smart_log.available_spare = 100 - (uint32_t) + ((((double)ns->id_ns.nuse) / ns->id_ns.nsze) * 100); + } else { + return NVME_INVALID_NSID | NVME_DNR; + } + + smart_log.temperature[0] = n->temperature & 0xff; + smart_log.temperature[1] = (n->temperature >> 8) & 0xff; + smart_log.percentage_used = n->percentage_used; + + current_seconds = time(NULL); + smart_log.power_on_hours[0] = ((current_seconds - n->start_time) / 60) / 60; + + smart_log.available_spare_threshold = NVME_SPARE_THRESHOLD; + if (smart_log.available_spare <= NVME_SPARE_THRESHOLD) { + smart_log.critical_warning |= NVME_SMART_SPARE; + } + if (n->features.temp_thresh <= n->temperature) { + smart_log.critical_warning |= NVME_SMART_TEMPERATURE; + } + + n->aer_mask &= ~(1 << NVME_AER_TYPE_SMART); + if (!QSIMPLEQ_EMPTY(&n->aer_queue)) { + qemu_mod_timer(n->aer_timer, qemu_get_clock_ns(vm_clock) + 10000); + } + return nvme_do_prp(cmd->prp1, cmd->prp2, (uint8_t *)&smart_log, trans_len, + 1, n); +} + +static uint16_t nvme_get_log(NvmeCtrl *n, NvmeCmd *cmd) +{ + uint16_t lid = cmd->cdw10 & 0xffff; + uint32_t len = ((cmd->cdw10 >> 16) & 0xff) << 2; + + NVME_LOG(ADM_DBG, "ctrl:%u log id:%u len:%u", n->instance, lid, len); + switch (lid) { + case NVME_LOG_ERROR_INFO: + return nvme_error_log_info(n, cmd, len); + case NVME_LOG_SMART_INFO: + return nvme_smart_info(n, cmd, len); + case NVME_LOG_FW_SLOT_INFO: + return nvme_fw_log_info(n, cmd, len); + default: + return NVME_INVALID_LOG_ID | NVME_DNR; + } +} + +static uint16_t nvme_format_namespace(NvmeNamespace *ns, uint8_t lba_idx, + uint8_t meta_loc, uint8_t pil, uint8_t pi, uint8_t sec_erase) +{ + uint64_t old_size; + uint8_t lbaf = NVME_ID_NS_FLBAS_INDEX(ns->id_ns.flbas); + + if (lba_idx > ns->id_ns.nlbaf) { + NVME_LOG(ERR, "invalid lba index:%u", lba_idx); + return NVME_INVALID_FORMAT | NVME_DNR; + } + if (pi) { + if (pil && !NVME_ID_NS_DPC_LAST_EIGHT(ns->id_ns.dpc)) { + NVME_LOG(ERR, "pi requested as last 8 bytes, dpc:%x", + ns->id_ns.dpc); + return NVME_INVALID_FORMAT | NVME_DNR; + } + if (!pil && !NVME_ID_NS_DPC_FIRST_EIGHT(ns->id_ns.dpc)) { + NVME_LOG(ERR, "pi requested as first 8 bytes, dpc:%x", + ns->id_ns.dpc); + return NVME_INVALID_FORMAT | NVME_DNR; + } + if (!((ns->id_ns.dpc & 0x7) & (1 << (pi - 1)))) { + NVME_LOG(ERR, "invalid pi type:%u, dpc:%x", + pi, ns->id_ns.dpc); + return NVME_INVALID_FORMAT | NVME_DNR; + } + } + if (meta_loc && ns->id_ns.lbaf[lba_idx].ms && + !NVME_ID_NS_MC_EXTENDED(ns->id_ns.mc)) { + NVME_LOG(ERR, "invalid meta location:%x, mc:%x", + meta_loc, ns->id_ns.mc); + return NVME_INVALID_FORMAT | NVME_DNR; + } + if (!meta_loc && ns->id_ns.lbaf[lba_idx].ms && + !NVME_ID_NS_MC_SEPARATE(ns->id_ns.mc)) { + NVME_LOG(ERR, "invalid meta location:%x, mc:%x", + meta_loc, ns->id_ns.mc); + return NVME_INVALID_FORMAT | NVME_DNR; + } + + NVME_LOG(DBG, + "ctrl:%u nsid:%x lba index:%u meta loc:%u pil:%u pi:%u erase:%u", + ns->ctrl->instance, ns->id, lba_idx, meta_loc, pil, pi, sec_erase); + + nvme_close_namespace(ns); + old_size = ns->id_ns.nsze * (1 << ns->id_ns.lbaf[lbaf].ds); + ns->id_ns.nuse = 0; + ns->id_ns.flbas = lba_idx | meta_loc; + ns->id_ns.nsze = old_size >> ns->id_ns.lbaf[lba_idx].ds; + ns->id_ns.ncap = ns->id_ns.nsze; + ns->id_ns.dps = pil | pi; + nvme_open_namespace(ns->ctrl, ns); + + if (sec_erase) { + /* TODO: write zeros, complete asynchronously */ + ; + } + + return NVME_SUCCESS; +} + +static uint16_t nvme_format(NvmeCtrl *n, NvmeCmd *cmd) +{ + NvmeNamespace *ns; + uint32_t dw10 = cmd->cdw10; + uint32_t nsid = cmd->nsid; + uint8_t lba_idx = dw10 & 0xf; + uint8_t meta_loc = dw10 & 0x10; + uint8_t pil = (dw10 >> 5) & 0x8; + uint8_t pi = (dw10 >> 5) & 0x7; + uint8_t sec_erase = (dw10 >> 8) & 0x7; + + if (nsid == 0xffffffff) { + uint32_t i; + uint16_t ret; + + for (i = 0; i < n->num_namespaces; ++i) { + ns = &n->namespaces[i]; + ret = nvme_format_namespace(ns, lba_idx, meta_loc, pil, pi, + sec_erase); + if (ret != NVME_SUCCESS) { + return ret; + } + } + return ret; + } + + if (nsid == 0 || nsid > n->num_namespaces) { + NVME_LOG(ERR, "invalid nsid:%u", nsid); + return NVME_INVALID_NSID | NVME_DNR; + } + + ns = &n->namespaces[cmd->nsid - 1]; + return nvme_format_namespace(ns, lba_idx, meta_loc, pil, pi, + sec_erase); +} + +static uint16_t nvme_async_req(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req) +{ + if (n->outstanding_aers > n->aerl + 1) { + NVME_LOG(ERR, "exceeded async requests:%u", n->aerl); + return NVME_AER_LIMIT_EXCEEDED; + } + + NVME_LOG(ADM_DBG, "ctrl:%u cid:%x outstanding aers:%u", n->instance, + cmd->cid, n->outstanding_aers); + n->aer_reqs[n->outstanding_aers] = req; + qemu_mod_timer(n->aer_timer, qemu_get_clock_ns(vm_clock) + 10000); + n->outstanding_aers++; + + return NVME_NO_COMPLETE; +} + +static uint16_t nvme_abort_req(NvmeCtrl *n, NvmeCmd *cmd, uint32_t *result) +{ + uint32_t index = 0; + uint16_t sqid = cmd->cdw10 & 0xffff; + uint16_t cid = (cmd->cdw10 >> 16) & 0xffff; + NvmeSQueue *sq; + + *result = 1; + NVME_LOG(INFO, "sq:%u cid:%x", sqid, cid); + if (nvme_check_sqid(n, sqid) || sqid == 0) { + return NVME_SUCCESS; + } + + sq = n->sq[sqid]; + while ((sq->head + index) % sq->size != sq->tail) { + NvmeCmd abort_cmd; + hwaddr addr; + + if (sq->phys_contig) { + addr = sq->dma_addr + ((sq->head + index) % sq->size) * n->sqe_size; + } else { + addr = nvme_discontig(sq->prp_list, (sq->head + index) % sq->size, + n->page_size, n->sqe_size); + } + + cpu_physical_memory_rw(addr, (uint8_t *)&abort_cmd, + sizeof(abort_cmd), 0); + if (abort_cmd.cid == cid) { + NvmeRequest *req; + NVME_LOG(INFO, "found sq:%u cid:%x to abort", sqid, cid); + + if (QSIMPLEQ_EMPTY(&sq->req_list)) { + NVME_LOG(ERR, "sq:%u has no available requests", sqid); + break; + } + + req = QSIMPLEQ_FIRST(&sq->req_list); + QSIMPLEQ_REMOVE_HEAD(&sq->req_list, entry); + memset(&req->cqe, 0, sizeof(req->cqe)); + req->cqe.cid = cid; + req->cqe.status = NVME_CMD_ABORT_REQ << 1; + + abort_cmd.opcode = NVME_OP_ABORTED; + cpu_physical_memory_rw(addr, (uint8_t *)&abort_cmd, + sizeof(abort_cmd), 1); + *result = 0; + + nvme_enqueue_req_completion(n->cq[sq->cqid], req); + break; + } + ++index; + } + if (*result) { + NVME_LOG(ERR, "failed to find req to abort sq:%u cid:%x", sqid, cid); + } + + return NVME_SUCCESS; +} + +static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req) +{ + NVME_LOG(ADM_DBG, "opcode:%x", cmd->opcode); + switch (cmd->opcode) { + case NVME_ADM_CMD_DELETE_SQ: + return nvme_del_sq(n, cmd); + case NVME_ADM_CMD_CREATE_SQ: + return nvme_create_sq(n, cmd); + case NVME_ADM_CMD_GET_LOG_PAGE: + return nvme_get_log(n, cmd); + case NVME_ADM_CMD_DELETE_CQ: + return nvme_del_cq(n, cmd); + case NVME_ADM_CMD_CREATE_CQ: + return nvme_create_cq(n, cmd); + case NVME_ADM_CMD_IDENTIFY: + return nvme_identify(n, cmd); + case NVME_ADM_CMD_ABORT: + return nvme_abort_req(n, cmd, &req->cqe.result); + case NVME_ADM_CMD_SET_FEATURES: + return nvme_set_feature(n, cmd, &req->cqe.result); + case NVME_ADM_CMD_GET_FEATURES: + return nvme_get_feature(n, cmd, &req->cqe.result); + case NVME_ADM_CMD_ASYNC_EV_REQ: + return nvme_async_req(n, cmd, req); + case NVME_ADM_CMD_FORMAT_NVM: + return nvme_format(n, cmd); + case NVME_ADM_CMD_ACTIVATE_FW: + case NVME_ADM_CMD_DOWNLOAD_FW: + case NVME_ADM_CMD_SECURITY_SEND: + case NVME_ADM_CMD_SECURITY_RECV: + default: + NVME_LOG(ERR, "invalid opcode:%x", cmd->opcode); + return NVME_INVALID_OPCODE | NVME_DNR; + } +} + +static int nvme_process_sq(NvmeSQueue *sq, NvmeCQueue *cq, NvmeCtrl *n) +{ + hwaddr addr; + NvmeCmd cmd; + NvmeRequest *req; + uint16_t status; + + NVME_LOG(IO_DBG, "ctrl:%u sq:%u head:%u tail:%u", n->instance, + sq->id, sq->head, sq->tail); + if (nvme_sq_empty(sq) || QSIMPLEQ_EMPTY(&sq->req_list)) { + return 0; + } + if (sq->phys_contig) { + addr = sq->dma_addr + sq->head * n->sqe_size; + } else { + addr = nvme_discontig(sq->prp_list, sq->head, n->page_size, + n->sqe_size); + } + + cpu_physical_memory_rw(addr, (uint8_t *)&cmd, sizeof(cmd), 0); + nvme_inc_sq_head(sq); + if (cmd.opcode == NVME_OP_ABORTED) { + return 1; + } + + req = QSIMPLEQ_FIRST(&sq->req_list); + QSIMPLEQ_REMOVE_HEAD(&sq->req_list, entry); + memset(&req->cqe, 0, sizeof(req->cqe)); + req->cqe.cid = cmd.cid; + + if (sq->id == 0) { + status = nvme_admin_cmd(n, &cmd, req); + } else { + status = nvme_io_cmd(n, &cmd, req); + } + if (status != NVME_NO_COMPLETE) { + req->cqe.status = status << 1; + nvme_enqueue_req_completion(cq, req); + } + + return 1; +} + +static void nvme_sq_process(void *opaque) +{ + NvmeSQueue *sq = opaque; + NvmeCtrl *n = sq->ctrl; + NvmeCQueue *cq = n->cq[sq->cqid]; + int processed = 0; + + NVME_LOG(IO_DBG, "sq:%u cq:%u burst:%u", sq->id, cq->id, sq->arb_burst); + while (nvme_process_sq(sq, cq, n) && processed++ < sq->arb_burst) { + ; + } + sq->completed += processed; + + if (!nvme_sq_empty(sq)) { + if (!qemu_timer_pending(sq->timer)) { + qemu_mod_timer(sq->timer, qemu_get_clock_ns(vm_clock) + 1000); + } + } +} + +static void nvme_clear_ctrl(NvmeCtrl *n) +{ + int i; + AsyncEvent *event; + + for (i = 1; i < n->num_queues; i++) { + if (n->sq[i] != NULL) { + NvmeSQueue *sq = n->sq[i]; + qemu_del_timer(sq->timer); + if (sq->prp_list) { + g_free(sq->prp_list); + } + g_free(sq); + n->sq[i] = NULL; + } + } + for (i = 1; i < n->num_queues; i++) { + if (n->cq[i] != NULL) { + NvmeCQueue *cq = n->cq[i]; + qemu_del_timer(cq->timer); + msix_vector_unuse(&n->dev, cq->vector); + if (cq->prp_list) { + g_free(cq->prp_list); + } + g_free(n->cq[i]); + n->cq[i] = NULL; + } + } + + qemu_del_timer(n->sq[0]->timer); + qemu_del_timer(n->cq[0]->timer); + + n->bar.csts &= ~NVME_CSTS_READY; + n->bar.cc = 0; + n->admin_cq.vector = 0; + n->admin_cq.head = n->admin_cq.tail = 0; + n->admin_sq.head = n->admin_sq.tail = 0; + + qemu_del_timer(n->aer_timer); + while ((event = QSIMPLEQ_FIRST(&n->aer_queue)) != NULL) { + QSIMPLEQ_REMOVE_HEAD(&n->aer_queue, entry); + g_free(event); + } + n->outstanding_aers = 0; + n->aer_mask = 0; + n->temp_warn_issued = 0; +} + +static int nvme_start_ctrl(NvmeCtrl *n) +{ + uint32_t page_bits = NVME_CC_MPS(n->bar.cc) + 12; + uint32_t page_size = 1 << page_bits; + + if (!n->bar.asq || !n->bar.acq) { + NVME_LOG(ERR, "initialize attempted before admin queues created"); + return -1; + } + if (NVME_CC_MPS(n->bar.cc) < NVME_CAP_MPSMIN(n->bar.cap) || + NVME_CC_MPS(n->bar.cc) > NVME_CAP_MPSMAX(n->bar.cap)) { + NVME_LOG(ERR, "bad page size:%u min:%u max:%u", + NVME_CC_MPS(n->bar.cc), (uint32_t)NVME_CAP_MPSMIN(n->bar.cap), + (uint32_t)NVME_CAP_MPSMAX(n->bar.cap)); + return -1; + } + if (n->bar.asq & (page_size - 1) || n->bar.acq & (page_size - 1)) { + NVME_LOG(ERR, "asq:%"PRIx64" acq:%"PRIx64" not page aligned:%x", + n->bar.asq, n->bar.acq, page_size); + return -1; + } + if (NVME_CC_IOCQES(n->bar.cc) < NVME_CTRL_CQES_MIN(n->id_ctrl.cqes) || + NVME_CC_IOCQES(n->bar.cc) > NVME_CTRL_CQES_MAX(n->id_ctrl.cqes)) { + NVME_LOG(ERR, "bad cq entry size:%u min:%x max:%u", + NVME_CC_IOCQES(n->bar.cc), NVME_CTRL_CQES_MIN(n->id_ctrl.cqes), + NVME_CTRL_CQES_MAX(n->id_ctrl.cqes)); + return -1; + } + if (NVME_CC_IOSQES(n->bar.cc) < NVME_CTRL_SQES_MIN(n->id_ctrl.cqes) || + NVME_CC_IOSQES(n->bar.cc) > NVME_CTRL_SQES_MAX(n->id_ctrl.cqes)) { + NVME_LOG(ERR, "bad sq entry size:%u min:%x max:%u", + NVME_CC_IOSQES(n->bar.cc), NVME_CTRL_SQES_MIN(n->id_ctrl.cqes), + NVME_CTRL_SQES_MAX(n->id_ctrl.cqes)); + return -1; + } + if (NVME_AQA_ASQS(n->bar.aqa) == 0 || NVME_AQA_ASQS(n->bar.aqa) > 4095) { + NVME_LOG(ERR, "bad asq:%u", NVME_AQA_ASQS(n->bar.aqa)); + return -1; + } + if (NVME_AQA_ACQS(n->bar.aqa) == 0 || NVME_AQA_ACQS(n->bar.aqa) > 4095) { + NVME_LOG(ERR, "bad acq:%u", NVME_AQA_ACQS(n->bar.aqa)); + return -1; + } + + n->page_bits = NVME_CC_MPS(n->bar.cc) + 12; + n->page_size = 1 << n->page_bits; + n->max_prp_ents = n->page_size / sizeof(uint64_t); + n->cqe_size = 1 << NVME_CC_IOCQES(n->bar.cc); + n->sqe_size = 1 << NVME_CC_IOSQES(n->bar.cc); + n->cq[0]->size = NVME_AQA_ACQS(n->bar.aqa) + 1; + n->sq[0]->size = NVME_AQA_ASQS(n->bar.aqa) + 1; + n->cq[0]->vector = 0; + n->cq[0]->phase = 1; + n->cq[0]->irq_enabled = 1; + n->cq[0]->dma_addr = n->bar.acq; + n->sq[0]->dma_addr = n->bar.asq; + n->sq[0]->arb_burst = NVME_ARB_HPW(n->features.arbitration) + 1; + + nvme_init_sq_reqs(n->sq[0]); + QSIMPLEQ_INIT(&n->cq[0]->req_list); + n->cq[0]->timer = qemu_new_timer_ns(vm_clock, nvme_post_cqes, n->cq[0]); + + n->aer_timer = qemu_new_timer_ns(vm_clock, nvme_aer_process_cb, n); + QSIMPLEQ_INIT(&n->aer_queue); + + return 0; +} + +static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, uint64_t data, + unsigned size) +{ + NVME_LOG(IO_DBG, "ctrl:%u offset:%"PRIx64" data:%"PRIx64"", n->instance, + offset, data); + switch (offset) { + case 0xc: + n->bar.intms |= data & 0xffffffff; + n->bar.intmc = n->bar.intms; + break; + case 0x10: + n->bar.intms &= ~(data & 0xffffffff); + n->bar.intmc = n->bar.intms; + break; + case 0x14: + if ((data & CC_EN_MASK) && !(n->bar.cc & CC_EN_MASK)) { + NVME_LOG(ERR, "start ctrl:%u", n->instance); + n->bar.cc = data; + if (nvme_start_ctrl(n)) { + n->bar.csts = NVME_CSTS_FAILED; + } else { + n->bar.csts = NVME_CSTS_READY; + } + } else if (!(data & CC_EN_MASK) && n->bar.cc) { + NVME_LOG(ERR, "shut down ctrl:%u", n->instance); + nvme_clear_ctrl(n); + } else { + NVME_LOG(ERR, "invalid controller transition controller:%u", + n->instance); + } + break; + case 0x24: + n->bar.aqa = data & 0xffffffff; + break; + case 0x28: + n->bar.asq = data; + break; + case 0x2c: + n->bar.asq |= data << 32; + break; + case 0x30: + n->bar.acq = data; + break; + case 0x34: + n->bar.acq |= data << 32; + break; + }; +} + +static uint64_t nvme_mmio_read(void *opaque, hwaddr addr, unsigned size) +{ + NvmeCtrl *n = (NvmeCtrl *)opaque; + uint8_t *ptr = (uint8_t *)&n->bar; + uint64_t val = 0; + + if (addr < sizeof(n->bar)) { + memcpy(&val, ptr + addr, size); + } else { + NVME_LOG(ERR, "bad bar offset %ld", addr); + } + + return val; +} + +static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val) +{ + uint32_t qid; + NVME_LOG(IO_DBG, "ctrl:%u addr:%"PRIx64" val:%u", n->instance, addr, val); + if (addr & ((1 << (2 + n->db_stride)) - 1)) { + NVME_LOG(ERR, "invalid doorbell:%"PRIx64", within stride:%u", addr, + n->db_stride); + nvme_enqueue_event(n, NVME_AER_TYPE_ERROR, + NVME_AER_INFO_ERR_INVALID_DB, NVME_LOG_ERROR_INFO); + return; + } + + if (((addr - 0x1000) >> (2 + n->db_stride)) & 1) { + NvmeCQueue *cq; + uint16_t new_head = val & 0xffff; + int start_sqs; + + qid = (addr - (0x1000 + (1 << (n->db_stride + 2)))) >> + (3 + n->db_stride); + if (nvme_check_cqid(n, qid)) { + NVME_LOG(ERR, "invalid cq:%u for addr:%"PRIx64"", qid, addr); + nvme_enqueue_event(n, NVME_AER_TYPE_ERROR, + NVME_AER_INFO_ERR_INVALID_DB, NVME_LOG_ERROR_INFO); + return; + } + + cq = n->cq[qid]; + if (new_head >= cq->size) { + NVME_LOG(ERR, "invalid head:%u size:%u", new_head, cq->size); + nvme_enqueue_event(n, NVME_AER_TYPE_ERROR, + NVME_AER_INFO_ERR_INVALID_DB, NVME_LOG_ERROR_INFO); + return; + } + + NVME_LOG(IO_DBG, + "ctrl:%u cq:%u tail:%u current head:%u new head:%u", + n->instance, qid, cq->tail, cq->head, new_head); + start_sqs = nvme_cq_full(cq) ? 1 : 0; + cq->head = new_head; + + if (start_sqs) { + NvmeSQueue *sq; + if (!qemu_timer_pending(cq->timer)) { + qemu_mod_timer(cq->timer, qemu_get_clock_ns(vm_clock) + 500); + } + QTAILQ_FOREACH(sq, &cq->sq_list, entry) { + if (!qemu_timer_pending(sq->timer)) { + qemu_mod_timer(sq->timer, qemu_get_clock_ns(vm_clock) + + 1000); + } + } + } + if (cq->tail != cq->head) { + nvme_isr_notify(n, cq); + } + } else { + uint16_t new_tail; + NvmeSQueue *sq; + + qid = (addr - 0x1000) >> (3 + n->db_stride); + if (nvme_check_sqid(n, qid)) { + NVME_LOG(ERR, "invalid sq:%u", qid); + nvme_enqueue_event(n, NVME_AER_TYPE_ERROR, + NVME_AER_INFO_ERR_INVALID_SQ, NVME_LOG_ERROR_INFO); + return; + } + + sq = n->sq[qid]; + new_tail = val & 0xffff; + if (new_tail >= sq->size) { + NVME_LOG(ERR, "invalid tail:%u size:%u", new_tail, sq->size); + nvme_enqueue_event(n, NVME_AER_TYPE_ERROR, + NVME_AER_INFO_ERR_INVALID_DB, NVME_LOG_ERROR_INFO); + return; + } + + NVME_LOG(IO_DBG, + "ctrl:%u sq:%u head:%u current tail:%u new tail:%u", + n->instance, qid, sq->head, sq->tail, new_tail); + sq->tail = new_tail; + if (!qemu_timer_pending(sq->timer)) { + qemu_mod_timer(sq->timer, qemu_get_clock_ns(vm_clock) + 1000); + } + } +} + +static void nvme_mmio_write(void *opaque, hwaddr addr, uint64_t data, + unsigned size) +{ + NvmeCtrl *n = (NvmeCtrl *)opaque; + if (addr < sizeof(n->bar)) { + nvme_write_bar(n, addr, data, size); + } else if (addr >= 0x1000) { + nvme_process_db(n, addr, data); + } else { + NVME_LOG(ERR, "bad bar offset %ld", addr); + } +} + +static const MemoryRegionOps nvme_mmio_ops = { + .read = nvme_mmio_read, + .write = nvme_mmio_write, + .endianness = DEVICE_LITTLE_ENDIAN, + .impl = { + .min_access_size = 2, + .max_access_size = 8, + }, +}; + +static void nvme_close_namespaces(NvmeCtrl *n) +{ + int i; + NVME_LOG(DBG, "controller:%u namespaces:%u", n->instance, + n->num_namespaces); + for (i = 0; i < n->num_namespaces; i++) { + nvme_close_namespace(&n->namespaces[i]); + } +} + +static void nvme_init_namespace(NvmeCtrl *n, NvmeNamespace *ns, int id) +{ + int i; + NvmeIdNs *id_ns = &ns->id_ns; + + NVME_LOG(DBG, "ctrl:%u nsid:%u", n->instance, id); + id_ns->ncap = id_ns->nsze = (n->ns_size * BYTES_PER_MB) >> 9; + id_ns->nuse = 0; + id_ns->nlbaf = 0x3; + id_ns->flbas = 0; + id_ns->nsfeat = 0; + id_ns->mc = 0x0; + id_ns->dpc = 0x0; + id_ns->dps = 0; + for (i = 0; i <= id_ns->nlbaf; i++) { + id_ns->lbaf[i].ds = 9 + i; + id_ns->lbaf[i].ms = 0; + } + + ns->id = id; + ns->ctrl = n; + ns->start_block = id_ns->nsze * (id - 1); + nvme_open_namespace(n, ns); +} + +static void nvme_init_namespaces(NvmeCtrl *n) +{ + int i; + NVME_LOG(DBG, "ctrl:%u namespaces:%u size:%uMB", n->instance, + n->num_namespaces, n->ns_size); + for (i = 1; i <= n->num_namespaces; i++) { + nvme_init_namespace(n, &n->namespaces[i - 1], i); + } +} + +static void nvme_init_ctrl(NvmeCtrl *n) +{ + int i; + NvmeIdCtrl *id = &n->id_ctrl; + + id->vid = PCI_VENDOR_ID_INTEL; + id->ssvid = 0x0111; + id->rab = 6; + id->ieee[0] = 0x00; + id->ieee[1] = 0x02; + id->ieee[2] = 0xb3; + id->cmic = 0; + id->mdts = n->mdts; + id->oacs = NVME_OACS_FORMAT; + id->acl = n->acl; + id->aerl = n->aerl; + id->frmw = 7 << 1; + id->lpa = 1 << 0; + id->elpe = n->elpe; + id->npss = 2; + id->sqes = 0xf << 4 | 0x6; + id->cqes = 0xf << 4 | 0x4; + id->nn = n->num_namespaces; + id->oncs = NVME_ONCS_COMPARE | NVME_ONCS_WRITE_UNCORR | NVME_ONCS_DSM; + id->fuses = 0; + id->fna = 0; + id->vwc = 1; + id->awun = 0; + id->awupf = 0; + + snprintf((char *)id->mn, sizeof(id->mn), "QEMU NVMe Ctrl"); + snprintf((char *)id->fr, sizeof(id->fr), "1.0"); + snprintf((char *)id->sn, sizeof(id->sn), "NVMeQx10%02x", n->instance); + + id->psd[0].mp = 0x9c4; + id->psd[0].enlat = 0x10; + id->psd[0].exlat = 0x4; + id->psd[0].rrt = 0; + id->psd[0].rrl = 0; + id->psd[0].rwt = 0; + id->psd[0].rwl = 0; + + id->psd[1].mp = 0x8fc; + id->psd[1].enlat = 0x10; + id->psd[1].exlat = 0x10; + id->psd[1].rrt = 1; + id->psd[1].rrl = 1; + id->psd[1].rwt = 1; + id->psd[1].rwl = 1; + + id->psd[2].mp = 0x2bc; + id->psd[2].enlat = 0x1e8480; + id->psd[2].exlat = 0x1e8480; + id->psd[2].rrt = 2; + id->psd[2].rrl = 2; + id->psd[2].rwt = 2; + id->psd[2].rwl = 2; + + n->features.arbitration = 0x1f0f0706; + n->features.power_mgmt = 0; + n->features.temp_thresh = 0x14d; + n->features.err_rec = 0; + n->features.volatile_wc = 0; + n->features.num_queues = (n->num_queues - 1) | + ((n->num_queues - 1) << 16); + n->features.int_coalescing = 0; + n->features.write_atomicity = 0; + n->features.async_config = 0x0; + n->features.sw_prog_marker = 0; + + for (i = 0; i < n->num_queues; i++) { + n->features.int_vector_config[i] = i; + } + + n->temperature = NVME_TEMPERATURE; + + n->bar.cap = (uint64_t)(n->max_q_ents & CAP_MQES_MASK) << CAP_MQES_SHIFT; + n->bar.cap |= (uint64_t)(n->cqr & CAP_CQR_MASK) << CAP_CQR_SHIFT; + n->bar.cap |= (uint64_t)(1 & CAP_AMS_MASK) << CAP_AMS_SHIFT; + n->bar.cap |= (uint64_t)(0xf & CAP_TO_MASK) << CAP_TO_SHIFT; + n->bar.cap |= (uint64_t)(n->db_stride & CAP_DSTRD_MASK) << CAP_DSTRD_SHIFT; + n->bar.cap |= (uint64_t)(0 & CAP_NSSRS_MASK) << CAP_NSSRS_SHIFT; + n->bar.cap |= (uint64_t)(1 & CAP_CSS_MASK) << CAP_CSS_SHIFT; + n->bar.cap |= (uint64_t)(0 & CAP_MPSMIN_MASK) << CAP_MPSMIN_SHIFT; + n->bar.cap |= (uint64_t)(0xf & CAP_MPSMAX_MASK) << CAP_MPSMAX_SHIFT; + + n->bar.vs = 0x00010001; + n->bar.intmc = n->bar.intms = 0; + NVME_LOG(DBG, "ctrl:%u cap:%016lx", n->instance, n->bar.cap); +} + +static void nvme_init_admin_queues(NvmeCtrl *n) +{ + NVME_LOG(DBG, "ctrl:%u", n->instance); + n->admin_sq.ctrl = n; + n->admin_cq.ctrl = n; + n->admin_sq.phys_contig = 1; + n->admin_cq.phys_contig = 1; + n->cq[0] = &n->admin_cq; + n->sq[0] = &n->admin_sq; + + QTAILQ_INIT(&n->admin_cq.sq_list); + QTAILQ_INSERT_TAIL(&(n->admin_cq.sq_list), &n->admin_sq, entry); +} + +static int nvme_init(PCIDevice *pci_dev) +{ + NvmeCtrl *n = DO_UPCAST(NvmeCtrl, dev, pci_dev); + uint8_t *pci_conf; + + NVME_LOG(DBG, "new controller B:D.f: %02x:%02x.%u", + pci_bus_num(pci_dev->bus), PCI_SLOT(pci_dev->devfn), + PCI_FUNC(pci_dev->devfn)); + if (!n->conf.bs) { + NVME_LOG(ERR, "drive property not set"); + return -1; + } + if (n->num_namespaces == 0 || n->num_namespaces > NVME_MAX_NUM_NAMESPACES) { + NVME_LOG(ERR, "requested invalid number of namespace:%u max:%u", + n->num_namespaces, NVME_MAX_NUM_NAMESPACES); + return -1; + } + if (n->num_queues < 1 || n->num_queues > NVME_MAX_QS) { + NVME_LOG(ERR, "requested invalid number of queues:%u max:%u", + n->num_queues, NVME_MAX_QS); + return -1; + } + if (n->db_stride > NVME_MAX_STRIDE) { + NVME_LOG(ERR, "requested invalid stride:%u max:%u", + n->db_stride, NVME_MAX_STRIDE); + return -1; + } + if (n->max_q_ents < 1 || n->max_q_ents > + NVME_MAX_QUEUE_ENTRIES) { + NVME_LOG(ERR, "requested invalid queue entries:%u, max:%u", + n->max_q_ents, NVME_MAX_QUEUE_ENTRIES); + return -1; + } + if (n->cqr > 1) { + NVME_LOG(ERR, + "requested invalid contiguous regions requeired:%u max:%u", + n->cqr, 1); + return -1; + } + + n->reg_size = 1 << qemu_fls(0x1004 + 2 * (n->num_queues + 1) * + (4 << n->db_stride)); + n->ns_size = ((bdrv_getlength(n->conf.bs)) / BYTES_PER_MB) / + n->num_namespaces; + NVME_LOG(DBG, "NVMe controller properties:\n" + "\tctrl:%u namespaces:%u size:%uMB queues:%u stride:%u bar size:%u\n"\ + "\tqueue entries:%u cqr:%u mdts:%u aerl:%u acl:%u elpe:%u\n"\ + "\tfile size:%"PRIu64"", instance, n->num_namespaces, n->ns_size, + n->num_queues, n->db_stride, n->reg_size, n->max_q_ents, n->cqr, + n->mdts, n->aerl, n->acl, n->elpe, bdrv_getlength(n->conf.bs)); + + pci_conf = pci_dev->config; + pci_conf[PCI_INTERRUPT_PIN] = 1; + pci_config_set_prog_interface(pci_dev->config, 0x2); + pci_config_set_class(pci_dev->config, PCI_CLASS_STORAGE_EXPRESS); + + n->instance = instance++; + n->start_time = time(NULL); + n->error_count = 1; + n->namespaces = g_malloc0(sizeof(*n->namespaces)*n->num_namespaces); + n->sq = g_malloc0(sizeof(*n->sq)*n->num_queues); + n->cq = g_malloc0(sizeof(*n->cq)*n->num_queues); + n->aer_reqs = g_malloc0((n->aerl + 1) * sizeof(*n->aer_reqs)); + n->elpes = g_malloc0((n->elpe + 1) * sizeof(*n->elpes)); + n->features.int_vector_config = g_malloc(n->num_queues * + sizeof(*n->features.int_vector_config)); + + nvme_init_admin_queues(n); + nvme_init_ctrl(n); + nvme_init_namespaces(n); + + memory_region_init_io(&n->iomem, &nvme_mmio_ops, n, "nvme-mmio", + n->reg_size); + pci_register_bar(&n->dev, 0, + PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64, + &n->iomem); + + if (msix_init_exclusive_bar(&n->dev, n->num_queues, 4)) { + NVME_LOG(ERR, "ctrl:%u msix init failed", n->instance); + } else { + msix_vector_use(&n->dev, 0); + } + + NVME_LOG(DBG, "ctrl:%u initialization complete", n->instance); + return 0; +} + +static void nvme_exit(PCIDevice *pci_dev) +{ + NvmeCtrl *n = DO_UPCAST(NvmeCtrl, dev, pci_dev); + NVME_LOG(DBG, "exit ctrl:%u", n->instance); + nvme_close_namespaces(n); + g_free(n->namespaces); + g_free(n->cq); + g_free(n->sq); + g_free(n->aer_reqs); + g_free(n->features.int_vector_config); + msix_vector_unuse(&n->dev, 0); + memory_region_destroy(&n->iomem); +} + +static void nvme_reset(DeviceState *dev) +{ + NvmeCtrl *n = DO_UPCAST(NvmeCtrl, dev.qdev, dev); + NVME_LOG(DBG, "reset ctrl:%u", n->instance); + (void)n; +} + +static Property nvme_props[] = { + DEFINE_PROP_UINT32("namespaces", NvmeCtrl, num_namespaces, 1), + DEFINE_PROP_UINT32("queues", NvmeCtrl, num_queues, 64), + DEFINE_PROP_UINT32("entries", NvmeCtrl, max_q_ents, 0x7ff), + DEFINE_PROP_UINT8("stride", NvmeCtrl, db_stride, 0), + DEFINE_PROP_UINT8("aerl", NvmeCtrl, aerl, 3), + DEFINE_PROP_UINT8("acl", NvmeCtrl, acl, 3), + DEFINE_PROP_UINT8("elpe", NvmeCtrl, elpe, 3), + DEFINE_PROP_UINT8("mdts", NvmeCtrl, mdts, 5), + DEFINE_PROP_UINT8("cqr", NvmeCtrl, cqr, 1), + DEFINE_BLOCK_PROPERTIES(NvmeCtrl, conf), + DEFINE_PROP_END_OF_LIST(), +}; + +static const VMStateDescription nvme_vmstate = { + .name = "nvme", + .version_id = 1, + .minimum_version_id = 1, + .minimum_version_id_old = 1, + .fields = (VMStateField []) { + VMSTATE_PCI_DEVICE(dev, NvmeCtrl), + VMSTATE_END_OF_LIST() + } +}; + +static void nvme_class_init(ObjectClass *oc, void *data) +{ + DeviceClass *dc = DEVICE_CLASS(oc); + PCIDeviceClass *pc = PCI_DEVICE_CLASS(oc); + + pc->init = nvme_init; + pc->exit = nvme_exit; + + pc->class_id = PCI_CLASS_STORAGE_EXPRESS; + pc->vendor_id = PCI_VENDOR_ID_INTEL; + pc->device_id = 0x0111; + pc->revision = 1; + + dc->desc = "Non-Volatile Memory Express"; + dc->reset = nvme_reset; + dc->props = nvme_props; + dc->vmsd = &nvme_vmstate; +} + +static TypeInfo nvme_info = { + .name = "nvme", + .parent = TYPE_PCI_DEVICE, + .instance_size = sizeof(NvmeCtrl), + .class_init = nvme_class_init, +}; + +static void nvme_register_devices(void) +{ + type_register_static(&nvme_info); +} +type_init(nvme_register_devices); diff --git a/hw/nvme.h b/hw/nvme.h new file mode 100644 index 0000000..2dd0e71 --- /dev/null +++ b/hw/nvme.h @@ -0,0 +1,678 @@ +#ifndef _NVME_H +#define _NVME_H + +typedef struct NvmeBar { + uint64_t cap; + uint32_t vs; + uint32_t intms; + uint32_t intmc; + uint32_t cc; + uint32_t rsvd1; + uint32_t csts; + uint32_t nssrc; + uint32_t aqa; + uint64_t asq; + uint64_t acq; +} NvmeBar; + +enum NvmeCapShift { + CAP_MQES_SHIFT = 0, + CAP_CQR_SHIFT = 16, + CAP_AMS_SHIFT = 17, + CAP_TO_SHIFT = 24, + CAP_DSTRD_SHIFT = 32, + CAP_NSSRS_SHIFT = 33, + CAP_CSS_SHIFT = 37, + CAP_MPSMIN_SHIFT = 48, + CAP_MPSMAX_SHIFT = 52, +}; + +enum NvmeCapMask { + CAP_MQES_MASK = 0xffff, + CAP_CQR_MASK = 0x1, + CAP_AMS_MASK = 0x3, + CAP_TO_MASK = 0xff, + CAP_DSTRD_MASK = 0xf, + CAP_NSSRS_MASK = 0x1, + CAP_CSS_MASK = 0xff, + CAP_MPSMIN_MASK = 0xf, + CAP_MPSMAX_MASK = 0xf, +}; + +#define NVME_CAP_MQES(cap) (((cap) >> CAP_MQES_SHIFT) & CAP_MQES_MASK) +#define NVME_CAP_CQR(cap) (((cap) >> CAP_CQR_SHIFT) & CAP_CQR_MASK) +#define NVME_CAP_AMS(cap) (((cap) >> CAP_AMS_SHIFT) & CAP_AMS_MASK) +#define NVME_CAP_TO(cap) (((cap) >> CAP_TO_SHIFT) & CAP_TO_MASK) +#define NVME_CAP_DSTRD(cap) (((cap) >> CAP_DSTRD_SHIFT) & CAP_DSTRD_MASK) +#define NVME_CAP_NSSRS(cap) (((cap) >> CAP_NSSRS_SHIFT) & CAP_NSSRS_MASK) +#define NVME_CAP_CSS(cap) (((cap) >> CAP_CSS_SHIFT) & CAP_CSS_MASK) +#define NVME_CAP_MPSMIN(cap)(((cap) >> CAP_MPSMIN_SHIFT) & CAP_MPSMIN_MASK) +#define NVME_CAP_MPSMAX(cap)(((cap) >> CAP_MPSMAX_SHIFT) & CAP_MPSMAX_MASK) + +enum NvmeCcShift { + CC_EN_SHIFT = 0, + CC_CSS_SHIFT = 4, + CC_MPS_SHIFT = 7, + CC_AMS_SHIFT = 11, + CC_SHN_SHIFT = 14, + CC_IOSQES_SHIFT = 16, + CC_IOCQES_SHIFT = 20, +}; + +enum NvmeCcMask { + CC_EN_MASK = 0x1, + CC_CSS_MASK = 0x7, + CC_MPS_MASK = 0xf, + CC_AMS_MASK = 0x7, + CC_SHN_MASK = 0x3, + CC_IOSQES_MASK = 0xf, + CC_IOCQES_MASK = 0xf, +}; + +#define NVME_CC_EN(cc) ((cc >> CC_EN_SHIFT) & CC_EN_MASK) +#define NVME_CC_CSS(cc) ((cc >> CC_CSS_SHIFT) & CC_CSS_MASK) +#define NVME_CC_MPS(cc) ((cc >> CC_MPS_SHIFT) & CC_MPS_MASK) +#define NVME_CC_AMS(cc) ((cc >> CC_AMS_SHIFT) & CC_AMS_MASK) +#define NVME_CC_SHN(cc) ((cc >> CC_SHN_SHIFT) & CC_SHN_MASK) +#define NVME_CC_IOSQES(cc) ((cc >> CC_IOSQES_SHIFT) & CC_IOSQES_MASK) +#define NVME_CC_IOCQES(cc) ((cc >> CC_IOCQES_SHIFT) & CC_IOCQES_MASK) + +enum NvmeCstsShift { + CSTS_RDY_SHIFT = 0, + CSTS_CFS_SHIFT = 1, + CSTS_SHST_SHIFT = 2, + CSTS_NSSRO_SHIFT = 4, +}; + +enum NvmeCstsMask { + CSTS_RDY_MASK = 0x1, + CSTS_CFS_MASK = 0x1, + CSTS_SHST_MASK = 0x3, + CSTS_NSSRO_MASK = 0x1, +}; + +enum NvmeCsts { + NVME_CSTS_READY = 1 << CSTS_RDY_SHIFT, + NVME_CSTS_FAILED = 1 << CSTS_CFS_SHIFT, + NVME_CSTS_SHST_NORMAL = 0 << CSTS_SHST_SHIFT, + NVME_CSTS_SHST_PROGRESS = 1 << CSTS_SHST_SHIFT, + NVME_CSTS_SHST_COMPLETE = 2 << CSTS_SHST_SHIFT, + NVME_CSTS_NSSRO = 1 << CSTS_NSSRO_SHIFT, +}; + +#define NVME_CSTS_RDY(csts) ((csts >> CSTS_RDY_SHIFT) & CSTS_RDY_MASK) +#define NVME_CSTS_CFS(csts) ((csts >> CSTS_CFS_SHIFT) & CSTS_CFS_MASK) +#define NVME_CSTS_SHST(csts) ((csts >> CSTS_SHST_SHIFT) & CSTS_SHST_MASK) +#define NVME_CSTS_NSSRO(csts) ((csts >> CSTS_NSSRO_SHIFT) & CSTS_NSSRO_MASK) + +enum NvmeAqaShift { + AQA_ASQS_SHIFT = 0, + AQA_ACQS_SHIFT = 16, +}; + +enum NvmeAqaMask { + AQA_ASQS_MASK = 0xfff, + AQA_ACQS_MASK = 0xfff, +}; + +#define NVME_AQA_ASQS(aqa) ((aqa >> AQA_ASQS_SHIFT) & AQA_ASQS_MASK) +#define NVME_AQA_ACQS(aqa) ((aqa >> AQA_ACQS_SHIFT) & AQA_ACQS_MASK) + +typedef struct NvmeCmd { + uint8_t opcode; + uint8_t fuse; + uint16_t cid; + uint32_t nsid; + uint64_t res1; + uint64_t mptr; + uint64_t prp1; + uint64_t prp2; + uint32_t cdw10; + uint32_t cdw11; + uint32_t cdw12; + uint32_t cdw13; + uint32_t cdw14; + uint32_t cdw15; +} NvmeCmd; + +typedef struct NvmeDeleteQ { + uint8_t opcode; + uint8_t flags; + uint16_t cid; + uint32_t rsvd1[9]; + uint16_t qid; + uint16_t rsvd10; + uint32_t rsvd11[5]; +} NvmeDeleteQ; + +typedef struct NvmeCreateCq { + uint8_t opcode; + uint8_t flags; + uint16_t cid; + uint32_t rsvd1[5]; + uint64_t prp1; + uint64_t rsvd8; + uint16_t cqid; + uint16_t qsize; + uint16_t cq_flags; + uint16_t irq_vector; + uint32_t rsvd12[4]; +} NvmeCreateCq; + +typedef struct NvmeCreateSq { + uint8_t opcode; + uint8_t flags; + uint16_t cid; + uint32_t rsvd1[5]; + uint64_t prp1; + uint64_t rsvd8; + uint16_t sqid; + uint16_t qsize; + uint16_t sq_flags; + uint16_t cqid; + uint32_t rsvd12[4]; +} NvmeCreateSq; + +#define NVME_SQ_FLAGS_PC(sq_flags) (sq_flags & 0x1) +#define NVME_SQ_FLAGS_QPRIO(sq_flags) ((sq_flags >> 1) & 0x3) + +enum QueueFlags { + NVME_Q_PC = 1, + NVME_Q_PRIO_URGENT = 0, + NVME_Q_PRIO_HIGH = 1, + NVME_Q_PRIO_NORMAL = 2, + NVME_Q_PRIO_LOW = 3, +}; + +typedef struct NvmeIdentify { + uint8_t opcode; + uint8_t flags; + uint16_t cid; + uint32_t nsid; + uint64_t rsvd2[2]; + uint64_t prp1; + uint64_t prp2; + uint32_t cns; + uint32_t rsvd11[5]; +} NvmeIdentify; + +typedef struct NvmeRwCmd { + uint8_t opcode; + uint8_t flags; + uint16_t cid; + uint32_t nsid; + uint64_t rsvd2; + uint64_t mptr; + uint64_t prp1; + uint64_t prp2; + uint64_t slba; + uint16_t nlb; + uint16_t control; + uint32_t dsmgmt; + uint32_t reftag; + uint16_t apptag; + uint16_t appmask; +} NvmeRwCmd; + +typedef struct NvmeDsmCmd { + uint8_t opcode; + uint8_t flags; + uint16_t cid; + uint32_t nsid; + uint64_t rsvd2[2]; + uint64_t prp1; + uint64_t prp2; + uint32_t nr; + uint32_t attributes; + uint32_t rsvd12[4]; +} NvmeDsmCmd; + +enum { + NVME_DSMGMT_IDR = 1 << 0, + NVME_DSMGMT_IDW = 1 << 1, + NVME_DSMGMT_AD = 1 << 2, +}; + +typedef struct NvmeDsmRange { + uint32_t cattr; + uint32_t nlb; + uint64_t slba; +} NvmeDsmRange; + +enum AsyncEventRequest { + NVME_AER_TYPE_ERROR = 0, + NVME_AER_TYPE_SMART = 1, + NVME_AER_TYPE_IO_SPECIFIC = 6, + NVME_AER_TYPE_VENDOR_SPECIFIC = 7, + NVME_AER_INFO_ERR_INVALID_SQ = 0, + NVME_AER_INFO_ERR_INVALID_DB = 1, + NVME_AER_INFO_ERR_DIAG_FAIL = 2, + NVME_AER_INFO_ERR_PERS_INTERNAL_ERR = 3, + NVME_AER_INFO_ERR_TRANS_INTERNAL_ERR = 4, + NVME_AER_INFO_ERR_FW_IMG_LOAD_ERR = 5, + NVME_AER_INFO_SMART_RELIABILITY = 0, + NVME_AER_INFO_SMART_TEMP_THRESH = 1, + NVME_AER_INFO_SMART_SPARE_THRESH = 2, +}; + +typedef struct AerResult { + uint8_t event_type; + uint8_t event_info; + uint8_t log_page; + uint8_t resv; +} AerResult; + +typedef struct AsyncEvent { + QSIMPLEQ_ENTRY(AsyncEvent) entry; + AerResult result; +} AsyncEvent; + +typedef struct NvmeCqe { + uint32_t result; + uint32_t rsvd; + uint16_t sq_head; + uint16_t sq_id; + uint16_t cid; + uint16_t status; +} NvmeCqe; + +typedef struct NvmeFwSlotInfoLog { + uint8_t afi; + uint8_t reserved1[7]; + uint8_t frs1[8]; + uint8_t frs2[8]; + uint8_t frs3[8]; + uint8_t frs4[8]; + uint8_t frs5[8]; + uint8_t frs6[8]; + uint8_t frs7[8]; + uint8_t reserved2[448]; +} NvmeFwSlotInfoLog; + +typedef struct NvmeErrorLog { + uint64_t error_count; + uint16_t sqid; + uint16_t cid; + uint16_t status_field; + uint16_t param_error_location; + uint64_t lba; + uint32_t nsid; + uint8_t vs; + uint8_t resv[35]; +} NvmeErrorLog; + +typedef struct NvmeSmartLog { + uint8_t critical_warning; + uint8_t temperature[2]; + uint8_t available_spare; + uint8_t available_spare_threshold; + uint8_t percentage_used; + uint8_t reserved1[26]; + uint64_t data_units_read[2]; + uint64_t data_units_written[2]; + uint64_t host_read_commands[2]; + uint64_t host_write_commands[2]; + uint64_t controller_busy_time[2]; + uint64_t power_cycles[2]; + uint64_t power_on_hours[2]; + uint64_t unsafe_shutdowns[2]; + uint64_t media_errors[2]; + uint64_t number_of_error_log_entries[2]; + uint8_t reserved2[320]; +} NvmeSmartLog; + +enum NvmeSmartWarn { + NVME_SMART_SPARE = 1 << 0, + NVME_SMART_TEMPERATURE = 1 << 1, + NVME_SMART_RELIABILITY = 1 << 2, + NVME_SMART_MEDIA_READ_ONLY = 1 << 3, + NVME_SMART_FAILED_VOLATILE_MEDIA = 1 << 4, +}; + +typedef struct NvmeRequest { + struct NvmeSQueue *sq; + struct NvmeNamespace *ns; + void *opaque; + uint64_t slba; + uint16_t rw; + uint16_t nlb; + NvmeCqe cqe; + QEMUIOVector qiov; + QSIMPLEQ_ENTRY(NvmeRequest)entry; +} NvmeRequest; + +typedef struct NvmeSQueue { + struct NvmeCtrl *ctrl; + uint8_t phys_contig; + uint8_t arb_burst; + uint16_t id; + uint16_t cqid; + uint32_t head; + uint32_t tail; + uint32_t size; + uint64_t dma_addr; + uint64_t completed; + uint64_t *prp_list; + QEMUTimer *timer; + QTAILQ_ENTRY(NvmeSQueue) entry; + NvmeRequest *io_req; + QSIMPLEQ_HEAD(sq_req_list, NvmeRequest) req_list; +} NvmeSQueue; + +typedef struct NvmeCQueue { + struct NvmeCtrl *ctrl; + uint8_t phys_contig; + uint8_t phase; + uint16_t id; + uint16_t irq_enabled; + uint32_t head; + uint32_t tail; + uint32_t vector; + uint32_t size; + uint64_t dma_addr; + uint64_t *prp_list; + QEMUTimer *timer; + QTAILQ_HEAD(sq_list, NvmeSQueue) sq_list; + QSIMPLEQ_HEAD(cq_req_list, NvmeRequest) req_list; +} NvmeCQueue; + +typedef struct NvmePSD { + uint16_t mp; + uint16_t reserved; + uint32_t enlat; + uint32_t exlat; + uint8_t rrt; + uint8_t rrl; + uint8_t rwt; + uint8_t rwl; + uint8_t resv[16]; +} NvmePSD; + +typedef struct NvmeIdCtrl { + uint16_t vid; + uint16_t ssvid; + uint8_t sn[20]; + uint8_t mn[40]; + uint8_t fr[8]; + uint8_t rab; + uint8_t ieee[3]; + uint8_t cmic; + uint8_t mdts; + uint8_t rsvd255[178]; + uint16_t oacs; + uint8_t acl; + uint8_t aerl; + uint8_t frmw; + uint8_t lpa; + uint8_t elpe; + uint8_t npss; + uint8_t rsvd511[248]; + uint8_t sqes; + uint8_t cqes; + uint16_t rsvd515; + uint32_t nn; + uint16_t oncs; + uint16_t fuses; + uint8_t fna; + uint8_t vwc; + uint16_t awun; + uint16_t awupf; + uint8_t rsvd703[174]; + uint8_t rsvd2047[1344]; + NvmePSD psd[32]; + uint8_t vs[1024]; +} NvmeIdCtrl; + +enum NvmeIdCtrlOacs { + NVME_OACS_SECURITY = 1 << 0, + NVME_OACS_FORMAT = 1 << 1, + NVME_OACS_FW = 1 << 2, +}; + +enum NvmeIdCtrlOncs { + NVME_ONCS_COMPARE = 1 << 0, + NVME_ONCS_WRITE_UNCORR = 1 << 1, + NVME_ONCS_DSM = 1 << 2, + NVME_ONCS_WRITE_ZEROS = 1 << 3, + NVME_ONCS_FEATURES = 1 << 4, + NVME_ONCS_RESRVATIONS = 1 << 5, +}; + +#define NVME_CTRL_SQES_MIN(sqes) ((sqes) & 0xf) +#define NVME_CTRL_SQES_MAX(sqes) (((sqes) >> 4) & 0xf) +#define NVME_CTRL_CQES_MIN(cqes) ((cqes) & 0xf) +#define NVME_CTRL_CQES_MAX(cqes) (((cqes) >> 4) & 0xf) + +typedef struct NvmeFeatureVal { + uint32_t arbitration; + uint32_t power_mgmt; + uint32_t temp_thresh; + uint32_t err_rec; + uint32_t volatile_wc; + uint32_t num_queues; + uint32_t int_coalescing; + uint32_t *int_vector_config; + uint32_t write_atomicity; + uint32_t async_config; + uint32_t sw_prog_marker; +} NvmeFeatureVal; + +#define NVME_ARB_AB(arb) (arb & 0x7) +#define NVME_ARB_LPW(arb) ((arb >> 8) & 0xff) +#define NVME_ARB_MPW(arb) ((arb >> 16) & 0xff) +#define NVME_ARB_HPW(arb) ((arb >> 24) & 0xff) + +typedef struct NvmeRangeType { + uint8_t type; + uint8_t attributes; + uint8_t rsvd2[14]; + uint64_t slba; + uint64_t nlb; + uint8_t guid[16]; + uint8_t rsvd48[16]; +} NvmeRangeType; + +typedef struct NvmeLBAF { + uint16_t ms; + uint8_t ds; + uint8_t rp; +} NvmeLBAF; + +typedef struct NvmeIdNs { + uint64_t nsze; + uint64_t ncap; + uint64_t nuse; + uint8_t nsfeat; + uint8_t nlbaf; + uint8_t flbas; + uint8_t mc; + uint8_t dpc; + uint8_t dps; + uint8_t res30[98]; + NvmeLBAF lbaf[16]; + uint8_t res192[192]; + uint8_t vs[3712]; +} NvmeIdNs; + +#define NVME_ID_NS_NSFEAT_THIN(nsfeat) ((nsfeat & 0x1)) +#define NVME_ID_NS_FLBAS_EXTENDED(flbas) ((flbas >> 4) & 0x1) +#define NVME_ID_NS_FLBAS_INDEX(flbas) ((flbas & 0xf)) +#define NVME_ID_NS_MC_SEPARATE(mc) ((mc >> 1) & 0x1) +#define NVME_ID_NS_MC_EXTENDED(mc) ((mc & 0x1)) +#define NVME_ID_NS_DPC_LAST_EIGHT(dpc) ((dpc >> 4) & 0x1) +#define NVME_ID_NS_DPC_FIRST_EIGHT(dpc) ((dpc >> 3) & 0x1) +#define NVME_ID_NS_DPC_TYPE_3(dpc) ((dpc >> 2) & 0x1) +#define NVME_ID_NS_DPC_TYPE_2(dpc) ((dpc >> 1) & 0x1) +#define NVME_ID_NS_DPC_TYPE_1(dpc) ((dpc & 0x1)) + +enum NvmeIdNsDps { + DPS_TYPE_NONE = 0, + DPS_TYPE_0 = 1, + DPS_TYPE_1 = 2, + DPS_TYPE_2 = 3, + DPS_TYPE_3 = 4, +}; + +typedef struct NvmeNamespace { + struct NvmeCtrl *ctrl; + NvmeIdNs id_ns; + NvmeRangeType lba_range[64]; + uint32_t id; + uint64_t start_block; + uint32_t write_data_counter; + uint32_t read_data_counter; + uint64_t data_units_read[2]; + uint64_t data_units_written[2]; + uint64_t host_read_commands[2]; + uint64_t host_write_commands[2]; + uint64_t nuse_thresh; + uint8_t thresh_warn_issued; + unsigned long *util; + unsigned long *uncorrectable; +} NvmeNamespace; + +typedef struct NvmeCtrl { + PCIDevice dev; + MemoryRegion iomem; + NvmeBar bar; + BlockConf conf; + + time_t start_time; + int instance; + uint16_t temperature; + uint16_t page_size; + uint16_t page_bits; + uint16_t max_prp_ents; + uint16_t cqe_size; + uint16_t sqe_size; + uint8_t percentage_used; + uint8_t outstanding_aers; + uint8_t elp_index; + uint8_t error_count; + NvmeRequest **aer_reqs; + uint32_t reg_size; + + uint32_t num_namespaces; + uint32_t ns_size; + uint32_t num_queues; + uint32_t max_q_ents; + uint8_t db_stride; + uint8_t acl; + uint8_t aerl; + uint8_t mdts; + uint8_t elpe; + uint8_t cqr; + + NvmeErrorLog *elpes; + NvmeNamespace *namespaces; + NvmeFeatureVal features; + NvmeIdCtrl id_ctrl; + NvmeSQueue admin_sq; + NvmeCQueue admin_cq; + NvmeSQueue **sq; + NvmeCQueue **cq; + + QSIMPLEQ_HEAD(aer_queue, AsyncEvent) aer_queue; + QEMUTimer *aer_timer; + uint8_t aer_mask; + uint8_t temp_warn_issued; +} NvmeCtrl; + +enum NvmeStatusCodes { + NVME_SUCCESS = 0x0000, + NVME_INVALID_OPCODE = 0x0001, + NVME_INVALID_FIELD = 0x0002, + NVME_CID_CONFLICT = 0x0003, + NVME_DATA_TRAS_ERROR = 0x0004, + NVME_POWER_LOSS_ABORT = 0x0005, + NVME_INTERNAL_DEV_ERROR = 0x0006, + NVME_CMD_ABORT_REQ = 0x0007, + NVME_CMD_ABORT_SQ_DEL = 0x0008, + NVME_CMD_ABORT_FAILED_FUSE = 0x0009, + NVME_CMD_ABORT_MISSING_FUSE = 0x000a, + NVME_INVALID_NSID = 0x000b, + NVME_CMD_SEQ_ERROR = 0x000c, + NVME_LBA_RANGE = 0x0080, + NVME_CAP_EXCEEDED = 0x0081, + NVME_NS_NOT_READY = 0x0082, + NVME_NS_RESV_CONFLICT = 0x0083, + NVME_INVALID_CQID = 0x0100, + NVME_INVALID_QID = 0x0101, + NVME_MAX_QSIZE_EXCEEDED = 0x0102, + NVME_ACL_EXCEEDED = 0x0103, + NVME_RESERVED = 0x0104, + NVME_AER_LIMIT_EXCEEDED = 0x0105, + NVME_INVALID_FW_SLOT = 0x0106, + NVME_INVALID_FW_IMAGE = 0x0107, + NVME_INVALID_IRQ_VECTOR = 0x0108, + NVME_INVALID_LOG_ID = 0x0109, + NVME_INVALID_FORMAT = 0x010a, + NVME_FW_REQ_RESET = 0x010b, + NVME_INVALID_QUEUE_DEL = 0x010c, + NVME_FID_NOT_SAVEABLE = 0x010d, + NVME_FID_NOT_NSID_SPEC = 0x010f, + NVME_FW_REQ_SUSYSTEM_RESET = 0x0110, + NVME_CONFLICTING_ATTRS = 0x0180, + NVME_INVALID_PROT_INFO = 0x0181, + NVME_WRITE_TO_RO = 0x0182, + NVME_WRITE_FAULT = 0x0280, + NVME_UNRECOVERED_READ = 0x0281, + NVME_E2E_GUARD_ERROR = 0x0282, + NVME_E2E_APP_ERROR = 0x0283, + NVME_E2E_REF_ERROR = 0x0284, + NVME_CMP_FAILURE = 0x0285, + NVME_ACCESS_DENIED = 0x0286, + NVME_MORE = 0x2000, + NVME_DNR = 0x4000, + NVME_NO_COMPLETE = 0xffff, +}; + +enum NvmeAdminCommands { + NVME_ADM_CMD_DELETE_SQ = 0x00, + NVME_ADM_CMD_CREATE_SQ = 0x01, + NVME_ADM_CMD_GET_LOG_PAGE = 0x02, + NVME_ADM_CMD_DELETE_CQ = 0x04, + NVME_ADM_CMD_CREATE_CQ = 0x05, + NVME_ADM_CMD_IDENTIFY = 0x06, + NVME_ADM_CMD_ABORT = 0x08, + NVME_ADM_CMD_SET_FEATURES = 0x09, + NVME_ADM_CMD_GET_FEATURES = 0x0a, + NVME_ADM_CMD_ASYNC_EV_REQ = 0x0c, + NVME_ADM_CMD_ACTIVATE_FW = 0x10, + NVME_ADM_CMD_DOWNLOAD_FW = 0x11, + NVME_ADM_CMD_FORMAT_NVM = 0x80, + NVME_ADM_CMD_SECURITY_SEND = 0x81, + NVME_ADM_CMD_SECURITY_RECV = 0x82, +}; + +enum NvmeIoCommands { + NVME_CMD_FLUSH = 0x00, + NVME_CMD_WRITE = 0x01, + NVME_CMD_READ = 0x02, + NVME_CMD_WRITE_UNCOR = 0x04, + NVME_CMD_COMPARE = 0x05, + NVME_CMD_DSM = 0x09, +}; + +enum LogIdentifier { + NVME_LOG_ERROR_INFO = 0x01, + NVME_LOG_SMART_INFO = 0x02, + NVME_LOG_FW_SLOT_INFO = 0x03, +}; + +enum NvmeFeatureIds { + NVME_ARBITRATION = 1, + NVME_POWER_MANAGEMENT = 2, + NVME_LBA_RANGE_TYPE = 3, + NVME_TEMPERATURE_THRESHOLD = 4, + NVME_ERROR_RECOVERY = 5, + NVME_VOLATILE_WRITE_CACHE = 6, + NVME_NUMBER_OF_QUEUES = 7, + NVME_INTERRUPT_COALESCING = 8, + NVME_INTERRUPT_VECTOR_CONF = 9, + NVME_WRITE_ATOMICITY = 0x0a, + NVME_ASYNCHRONOUS_EVENT_CONF = 0x0b, + NVME_SOFTWARE_PROGRESS_MARKER = 0x80 +}; + +#endif diff --git a/hw/pci_ids.h b/hw/pci_ids.h index 5df7245..bb7ba02 100644 --- a/hw/pci_ids.h +++ b/hw/pci_ids.h @@ -17,6 +17,7 @@ #define PCI_CLASS_STORAGE_IDE 0x0101 #define PCI_CLASS_STORAGE_RAID 0x0104 #define PCI_CLASS_STORAGE_SATA 0x0106 +#define PCI_CLASS_STORAGE_EXPRESS 0x0108 #define PCI_CLASS_STORAGE_OTHER 0x0180 #define PCI_CLASS_NETWORK_ETHERNET 0x0200