From patchwork Fri Nov 2 09:01:42 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Daniel Axtens X-Patchwork-Id: 992215 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 42mbfw3FrnzB4Ts; Fri, 2 Nov 2018 20:02:00 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1gIVL1-0005q3-Pg; Fri, 02 Nov 2018 09:01:55 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1gIVKz-0005pf-Vu for kernel-team@lists.canonical.com; Fri, 02 Nov 2018 09:01:53 +0000 Received: from mail-qk1-f197.google.com ([209.85.222.197]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1gIVKz-0003QO-M0 for kernel-team@lists.canonical.com; Fri, 02 Nov 2018 09:01:53 +0000 Received: by mail-qk1-f197.google.com with SMTP id m63-v6so2755566qkb.9 for ; Fri, 02 Nov 2018 02:01:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=fgdgv8jqcawKCQ/sC/ihNJAbGenuQvSu29wGoG00T/8=; b=t8L+4tZZjk0LsVCriJKMYvXmbGJqlJ1+VDxVbygFMP7mE/8/by3ZEzGP9Jh+SwIUmo dEXs/ZglCixRvJiosKZK1owcmn+gpZSLm/X8lgoMEEWkQf8+rmJVHK6stpz2A4rfZMSW sUIwGMwV9XtrFzJEjZDHxlWRuFUxG4/IDS7QYOym1f1AhACIY3LP3gaHPFkGCRmf86km OBsvTy0dCmRijtOQo7d2hVwLqRWcYPQmnMVZcQ4ZElsNWOaedqkNA2aV+gBGvpVq4UvC wp39D1HJ5018wYGAm6zAf8HcLp8rDPYRONBp7CByS3Ygv8pGMbYTFWcb2j94PxR5yVAJ 8Qyw== X-Gm-Message-State: AGRZ1gIAvcz6cG71vtqT4WGKl6PGH5zX4I3RbTw1lWeuHKHOCj5Ffu7n BEAIzq+8IKuxNv3V3mzKVOL338/f4wy9Y0qUeGmwU/6am4XCXXcFMysPu4ABKz++NkRYjBU8f+c QOzQYmEIrI76H+yu144JxQV5jvrXQax8ugJZk6FmsSA+KSQXo X-Received: by 2002:a0c:ebc8:: with SMTP id k8mr4413709qvq.14.1541149312588; Fri, 02 Nov 2018 02:01:52 -0700 (PDT) X-Google-Smtp-Source: AJdET5cfx1tVDiJkMGzVhqDwHWTAFLusQjTe1/TUDgWWMRcF0BlynFR0wQYsIqPijhLYfF8GALW93A== X-Received: by 2002:a0c:ebc8:: with SMTP id k8mr4413699qvq.14.1541149312389; Fri, 02 Nov 2018 02:01:52 -0700 (PDT) Received: from linkitivity.iinet.net.au ([2001:67c:1562:8007::aac:4356]) by smtp.gmail.com with ESMTPSA id 29sm343330qky.16.2018.11.02.02.01.50 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 02 Nov 2018 02:01:51 -0700 (PDT) From: Daniel Axtens To: kernel-team@lists.canonical.com Subject: [SRU xenial-aws][PATCH 1/2] xen-blkfront: don't use req->errors Date: Fri, 2 Nov 2018 20:01:42 +1100 Message-Id: <20181102090143.2880-2-daniel.axtens@canonical.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181102090143.2880-1-daniel.axtens@canonical.com> References: <20181102090143.2880-1-daniel.axtens@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Christoph Hellwig BugLink: https://bugs.launchpad.net/bugs/1801305 xen-blkfron is the last users using rq->errros for passing back error to blk-mq, and I'd like to get rid of that. In the longer run the driver should be moving more of the completion processing into .complete, but this is the minimal change to move forward for now. Signed-off-by: Christoph Hellwig Acked-by: Roger Pau Monné Reviewed-by: Konrad Rzeszutek Wilk Signed-off-by: Jens Axboe (backported from commit 2609587c1eeb4ff484e6c9a746aa3627b9a2649c) Signed-off-by: Daniel Axtens --- drivers/block/xen-blkfront.c | 36 +++++++++++++++++++++++++----------- 1 file changed, 25 insertions(+), 11 deletions(-) diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c index eb296c1b169a..bdea9cd9c709 100644 --- a/drivers/block/xen-blkfront.c +++ b/drivers/block/xen-blkfront.c @@ -90,6 +90,15 @@ struct split_bio { atomic_t pending; }; +struct blkif_req { + int error; +}; + +static inline struct blkif_req *blkif_req(struct request *rq) +{ + return blk_mq_rq_to_pdu(rq); +} + static DEFINE_MUTEX(blkfront_mutex); static const struct block_device_operations xlvbd_block_fops; @@ -819,9 +828,15 @@ out_busy: return BLK_MQ_RQ_QUEUE_BUSY; } +static void blkif_complete_rq(struct request *rq) +{ + blk_mq_end_request(rq, blkif_req(rq)->error); +} + static struct blk_mq_ops blkfront_mq_ops = { .queue_rq = blkif_queue_rq, .map_queue = blk_mq_map_queue, + .complete = blkif_complete_rq, }; static void blkif_set_queue_limits(struct blkfront_info *info) @@ -873,7 +888,7 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size, info->tag_set.queue_depth = BLK_RING_SIZE(info); info->tag_set.numa_node = NUMA_NO_NODE; info->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE; - info->tag_set.cmd_size = 0; + info->tag_set.cmd_size = sizeof(struct blkif_req); info->tag_set.driver_data = info; if (blk_mq_alloc_tag_set(&info->tag_set)) @@ -1384,7 +1399,6 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id) unsigned long flags; struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)dev_id; struct blkfront_info *info = rinfo->dev_info; - int error; if (unlikely(info->connected != BLKIF_STATE_CONNECTED)) { if (info->connected != BLKIF_STATE_FREEZING) @@ -1424,37 +1438,36 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id) continue; } - error = (bret->status == BLKIF_RSP_OKAY) ? 0 : -EIO; + blkif_req(req)->error = (bret->status == BLKIF_RSP_OKAY) ? 0 : -EIO; switch (bret->operation) { case BLKIF_OP_DISCARD: if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) { struct request_queue *rq = info->rq; printk(KERN_WARNING "blkfront: %s: %s op failed\n", info->gd->disk_name, op_name(bret->operation)); - error = -EOPNOTSUPP; + blkif_req(req)->error = -EOPNOTSUPP; info->feature_discard = 0; info->feature_secdiscard = 0; queue_flag_clear(QUEUE_FLAG_DISCARD, rq); queue_flag_clear(QUEUE_FLAG_SECDISCARD, rq); } - blk_mq_complete_request(req, error); break; case BLKIF_OP_FLUSH_DISKCACHE: case BLKIF_OP_WRITE_BARRIER: if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) { printk(KERN_WARNING "blkfront: %s: %s op failed\n", info->gd->disk_name, op_name(bret->operation)); - error = -EOPNOTSUPP; + blkif_req(req)->error = -EOPNOTSUPP; } if (unlikely(bret->status == BLKIF_RSP_ERROR && rinfo->shadow[id].req.u.rw.nr_segments == 0)) { printk(KERN_WARNING "blkfront: %s: empty %s op failed\n", info->gd->disk_name, op_name(bret->operation)); - error = -EOPNOTSUPP; + blkif_req(req)->error = -EOPNOTSUPP; } - if (unlikely(error)) { - if (error == -EOPNOTSUPP) - error = 0; + if (unlikely(blkif_req(req)->error)) { + if (blkif_req(req)->error == -EOPNOTSUPP) + blkif_req(req)->error = 0; info->feature_flush = 0; xlvbd_flush(info); } @@ -1465,11 +1478,12 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id) dev_dbg(&info->xbdev->dev, "Bad return from blkdev data " "request: %x\n", bret->status); - blk_mq_complete_request(req, error); break; default: BUG(); } + + blk_mq_complete_request(req, 0); } rinfo->ring.rsp_cons = i; From patchwork Fri Nov 2 09:01:43 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Axtens X-Patchwork-Id: 992216 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 42mbg22cVNzB4Ts; Fri, 2 Nov 2018 20:02:05 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1gIVL6-0005s1-VL; Fri, 02 Nov 2018 09:02:00 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1gIVL3-0005qv-Ff for kernel-team@lists.canonical.com; Fri, 02 Nov 2018 09:01:57 +0000 Received: from mail-qk1-f197.google.com ([209.85.222.197]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1gIVL3-0003QS-2S for kernel-team@lists.canonical.com; Fri, 02 Nov 2018 09:01:57 +0000 Received: by mail-qk1-f197.google.com with SMTP id w6-v6so2702947qka.15 for ; Fri, 02 Nov 2018 02:01:57 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=TWOMi68xQFxe23k28OMSd6UrKLoszI/GjGyg5drPpoM=; b=Sz9xYvRIRqSblv3XlsVFIPhWzCmYT0WakcmC1+Tll+BMOWya+Ym9FCqIeiqMA+FR0q 2ysxIJ+c69AYkjp9Ku6EYjs9KnufOc10IZWzPTMG53bWEVClnimNQilK9XCrNBzBMcRx /nwgXtl/KP9R0AUc+mrZq+1lyUylg+r9g87QLSn7SFP2nvtfAOUNpFjUjnjmDiB5LPC1 9yZkdbRdwwcDM4twKFSbYcdR6PjXrm1U8yxoN3IxXKeZtSgTFupajkRuPjwvEgGkuxgO RELCMMSeYGUYsx9/fXVlTRPAm5GjnyHmXRJpoPQ9rhKTS+8xuuE30DF7hap6gDf11kYV 1FcQ== X-Gm-Message-State: AGRZ1gIpI7alz4MR/TvJjCVqiTiJ+LhURb6GV4ejEnSM4cr5SaodFHai MT5tvvIF98df6YndZHpPYa0Uq19J6wL8qXV7pxv2dJBs44fcoOyS9ODvl4J9V2m4AYgqtaqEUgG ic24KtM/D3y/yVs9ujOljiUqdoKOtCCqe7/mbJeLTye34YzBC X-Received: by 2002:a0c:a9da:: with SMTP id c26mr9775790qvb.140.1541149315929; Fri, 02 Nov 2018 02:01:55 -0700 (PDT) X-Google-Smtp-Source: AJdET5d/0W4kKo9dX4yJcx4yJtH/aSOu48DBU+IyUwAUu6PF5Y1fqxx8zHb1k7aeg9C22AifeOiagA== X-Received: by 2002:a0c:a9da:: with SMTP id c26mr9775779qvb.140.1541149315535; Fri, 02 Nov 2018 02:01:55 -0700 (PDT) Received: from linkitivity.iinet.net.au ([2001:67c:1562:8007::aac:4356]) by smtp.gmail.com with ESMTPSA id 29sm343330qky.16.2018.11.02.02.01.52 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 02 Nov 2018 02:01:54 -0700 (PDT) From: Daniel Axtens To: kernel-team@lists.canonical.com Subject: [SRU xenial-aws][PATCH 2/2] xen-blkfront: resurrect request-based mode Date: Fri, 2 Nov 2018 20:01:43 +1100 Message-Id: <20181102090143.2880-3-daniel.axtens@canonical.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181102090143.2880-1-daniel.axtens@canonical.com> References: <20181102090143.2880-1-daniel.axtens@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Munehisa Kamata BugLink: https://bugs.launchpad.net/bugs/1801305 This change resurrect request-based mode which was completely dropped in commit 907c3eb18e0b ("xen-blkfront: convert to blk-mq APIs"). Not to make the queue lock stale, resurrect per-device (vbd) lock in blkfront_info which is never freed during Xen suspend and use it in request-based mode. This is bascially the same as what the driver was doing until commit 11659569f720 ("xen/blkfront: split per device io_lock"). If the driver is in blk-mq mode, just use the lock(s) in blkfront_ring_info. In commit b7420c1eaeac ("drivers/amazon: xen-blkfront: resurrect request-based mode"), we accidentally didn't bring piece of code which empties the request queue while saving bios. The logic was originally introduced in commit 402b27f9f2c2 ("xen-block: implement indirect descriptors"). It seems to be still required for request-based mode, so just do the same thing as before. Note that some suspend/resume logic were moved from blkif_recover() to blkfront_resume() in commit 7b427a59538a ("xen-blkfront: save uncompleted reqs in blkfront_resume()"), so add the logic to blkfront_resume(). Forward-port notes: As part of this forward port, we are no longer using out of tree xen-blkfront. Request based patch and its releated per device vbd lock has now been ported on top of intree xen-blkfront. For reference: 4.9 CR for resurrect request based mode: https://cr.amazon.com/r/6834653/ 4.9 CR for resurrect per-device (vbd) lock: https://cr.amazon.com/r/7475903/ 4.9 CR for empty the request queue while resuming: https://cr.amazon.com/r/7475918/ As part of forward-port, all the above 3 related patches, have been merged into a single commit. In 4.14.y kernel, we realized during forward-port and testing, that blk-mq stashes the error code for request right after the request structure in memory. Care was taken to not reuse this piece of memory for stashing error code in request mode as this can cause memory corruption. Hibernation: To not break git bisect and the hibernation feature, blkfront_freeze() and blkfront_resume() were modified as well to support request-based mode. Reported-by: Imre Palik Reviewed-by: Eduardo Valentin Reviewed-by: Munehisa Kamata Reviewed-by: Anchal Agarwal Signed-off-by: Munehisa Kamata Signed-off-by: Vallish Vaidyeshwara CR: https://cr.amazon.com/r/8309443 (backported from 0026-xen-blkfront-resurrect-request-based-mode.patch in AWS 4.14 kernel SRPM) Signed-off-by: Daniel Axtens --- drivers/block/xen-blkfront.c | 315 +++++++++++++++++++++++++++++------ 1 file changed, 260 insertions(+), 55 deletions(-) diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c index bdea9cd9c709..ce8e316c370f 100644 --- a/drivers/block/xen-blkfront.c +++ b/drivers/block/xen-blkfront.c @@ -130,6 +130,15 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the #define BLK_MAX_RING_SIZE \ __CONST_RING_SIZE(blkif, XEN_PAGE_SIZE * XENBUS_MAX_RING_GRANTS) +static unsigned int blkfront_use_blk_mq = 0; +module_param_named(use_blk_mq, blkfront_use_blk_mq, int, S_IRUGO); +MODULE_PARM_DESC(use_blk_mq, "Enable blk-mq (default is 0)"); + +/* + * Index to the first available ring. + */ +#define FIRST_RING_ID (0) + /* * ring-ref%u i=(-1UL) would take 11 characters + 'ring-ref' is 8, so 19 * characters are enough. Define to 20 to keep consistent with backend. @@ -168,6 +177,12 @@ struct blkfront_ring_info { */ struct blkfront_info { + /* + * Per vbd lock which protects an associated blkfront_ring_info if the + * driver is in request-based mode. Use this lock always instead of per + * ring lock in that mode. + */ + spinlock_t io_lock; struct mutex mutex; struct xenbus_device *xbdev; struct gendisk *gd; @@ -238,6 +253,19 @@ static DEFINE_SPINLOCK(minor_lock); #define GREFS(_psegs) ((_psegs) * GRANTS_PER_PSEG) +/* Macro to save error status */ +#define BLKIF_REQ_PUT_ERROR_STATUS(req, error, status) \ + do { \ + if (blkfront_use_blk_mq) \ + blkif_req(req)->error = status; \ + else \ + error = status; \ + } while (0) + +/* Macro to retrieve error status */ +#define BLKIF_REQ_GET_ERROR_STATUS(req, error) \ + ((blkfront_use_blk_mq) ? blkif_req(req)->error : error) + static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo); static void blkfront_gather_backend_features(struct blkfront_info *info); static void __blkif_free(struct blkfront_info *info); @@ -793,6 +821,62 @@ static inline bool blkif_request_flush_invalid(struct request *req, !(info->feature_flush & REQ_FUA))); } +static inline void blkif_complete_request(struct request *req, int error) +{ + if (blkfront_use_blk_mq) + blk_mq_complete_request(req, error); + else + __blk_end_request_all(req, error); +} + +/* + * do_blkif_request + * read a block; request is in a request queue + */ +static void do_blkif_request(struct request_queue *rq) +{ + struct blkfront_info *info = NULL; + struct request *req; + int queued; + + pr_debug("Entered do_blkif_request\n"); + + queued = 0; + + while ((req = blk_peek_request(rq)) != NULL) { + info = req->rq_disk->private_data; + + if (RING_FULL(&info->rinfo[FIRST_RING_ID].ring)) + goto wait; + + blk_start_request(req); + + if (blkif_request_flush_invalid(req, info)) { + __blk_end_request_all(req, -EOPNOTSUPP); + continue; + } + + pr_debug("do_blk req %p: cmd_flags %u, sec %lx, " + "(%u/%u) [%s]\n", + req, req->cmd_flags, (unsigned long)blk_rq_pos(req), + blk_rq_cur_sectors(req), blk_rq_sectors(req), + rq_data_dir(req) ? "write" : "read"); + + if (blkif_queue_request(req, &info->rinfo[FIRST_RING_ID])) { + blk_requeue_request(rq, req); +wait: + /* Avoid pointless unplugs. */ + blk_stop_queue(rq); + break; + } + + queued++; + } + + if(queued != 0) + flush_requests(&info->rinfo[FIRST_RING_ID]); +} + static int blkif_queue_rq(struct blk_mq_hw_ctx *hctx, const struct blk_mq_queue_data *qd) { @@ -882,21 +966,28 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size, struct request_queue *rq; struct blkfront_info *info = gd->private_data; - memset(&info->tag_set, 0, sizeof(info->tag_set)); - info->tag_set.ops = &blkfront_mq_ops; - info->tag_set.nr_hw_queues = info->nr_rings; - info->tag_set.queue_depth = BLK_RING_SIZE(info); - info->tag_set.numa_node = NUMA_NO_NODE; - info->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE; - info->tag_set.cmd_size = sizeof(struct blkif_req); - info->tag_set.driver_data = info; - - if (blk_mq_alloc_tag_set(&info->tag_set)) - return -1; - rq = blk_mq_init_queue(&info->tag_set); - if (IS_ERR(rq)) { - blk_mq_free_tag_set(&info->tag_set); - return -1; + if (blkfront_use_blk_mq) { + memset(&info->tag_set, 0, sizeof(info->tag_set)); + info->tag_set.ops = &blkfront_mq_ops; + info->tag_set.nr_hw_queues = info->nr_rings; + info->tag_set.queue_depth = BLK_RING_SIZE(info); + info->tag_set.numa_node = NUMA_NO_NODE; + info->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE; + info->tag_set.cmd_size = sizeof(struct blkif_req); + info->tag_set.driver_data = info; + + if (blk_mq_alloc_tag_set(&info->tag_set)) + return -1; + rq = blk_mq_init_queue(&info->tag_set); + if (IS_ERR(rq)) { + blk_mq_free_tag_set(&info->tag_set); + return -1; + } + } else { + spin_lock_init(&info->io_lock); + rq = blk_init_queue(do_blkif_request, &info->io_lock); + if (IS_ERR(rq)) + return -1; } rq->queuedata = info; @@ -1098,21 +1189,29 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity, static void xlvbd_release_gendisk(struct blkfront_info *info) { unsigned int minor, nr_minors, i; + unsigned long flags; if (info->rq == NULL) return; /* No more blkif_request(). */ - blk_mq_stop_hw_queues(info->rq); + if (blkfront_use_blk_mq) { + blk_mq_stop_hw_queues(info->rq); - for (i = 0; i < info->nr_rings; i++) { - struct blkfront_ring_info *rinfo = &info->rinfo[i]; + for (i = 0; i < info->nr_rings; i++) { + struct blkfront_ring_info *rinfo = &info->rinfo[i]; - /* No more gnttab callback work. */ - gnttab_cancel_free_callback(&rinfo->callback); + /* No more gnttab callback work. */ + gnttab_cancel_free_callback(&rinfo->callback); - /* Flush gnttab callback work. Must be done with no locks held. */ - flush_work(&rinfo->work); + /* Flush gnttab callback work. Must be done with no locks held. */ + flush_work(&rinfo->work); + } + } else { + spin_lock_irqsave(&info->io_lock, flags); + blk_stop_queue(info->rq); + gnttab_cancel_free_callback(&info->rinfo[FIRST_RING_ID].callback); + spin_unlock_irqrestore(&info->io_lock, flags); } del_gendisk(info->gd); @@ -1122,7 +1221,8 @@ static void xlvbd_release_gendisk(struct blkfront_info *info) xlbd_release_minors(minor, nr_minors); blk_cleanup_queue(info->rq); - blk_mq_free_tag_set(&info->tag_set); + if (blkfront_use_blk_mq) + blk_mq_free_tag_set(&info->tag_set); info->rq = NULL; put_disk(info->gd); @@ -1135,17 +1235,30 @@ static inline void kick_pending_request_queues_locked(struct blkfront_ring_info if (unlikely(rinfo->dev_info->connected == BLKIF_STATE_FREEZING)) return; - if (!RING_FULL(&rinfo->ring)) + if (RING_FULL(&rinfo->ring)) + return; + + if (blkfront_use_blk_mq) { blk_mq_start_stopped_hw_queues(rinfo->dev_info->rq, true); + } else { + /* Re-enable calldowns */ + blk_start_queue(rinfo->dev_info->rq); + /* Kick things off immediately */ + do_blkif_request(rinfo->dev_info->rq); + } } static void kick_pending_request_queues(struct blkfront_ring_info *rinfo) { unsigned long flags; + struct blkfront_info *info = rinfo->dev_info; + spinlock_t *lock; - spin_lock_irqsave(&rinfo->ring_lock, flags); + lock = blkfront_use_blk_mq ? &rinfo->ring_lock : &info->io_lock; + + spin_lock_irqsave(lock, flags); kick_pending_request_queues_locked(rinfo); - spin_unlock_irqrestore(&rinfo->ring_lock, flags); + spin_unlock_irqrestore(lock, flags); } static void blkif_restart_queue(struct work_struct *work) @@ -1156,6 +1269,7 @@ static void blkif_restart_queue(struct work_struct *work) kick_pending_request_queues(rinfo); } +/* Must be called with per vbd lock held if the frontend uses request-based */ static void blkif_free_ring(struct blkfront_ring_info *rinfo) { struct grant *persistent_gnt, *n; @@ -1238,6 +1352,9 @@ free_shadow: /* No more gnttab callback work. */ gnttab_cancel_free_callback(&rinfo->callback); + if (!blkfront_use_blk_mq) + spin_unlock_irq(&info->io_lock); + /* Flush gnttab callback work. Must be done with no locks held. */ flush_work(&rinfo->work); @@ -1259,11 +1376,18 @@ free_shadow: static void blkif_free(struct blkfront_info *info, int suspend) { /* Prevent new requests being issued until we fix things up. */ + if (!blkfront_use_blk_mq) + spin_lock_irq(&info->io_lock); + info->connected = suspend ? BLKIF_STATE_SUSPENDED : BLKIF_STATE_DISCONNECTED; /* No more blkif_request(). */ - if (info->rq) - blk_mq_stop_hw_queues(info->rq); + if (info->rq) { + if (blkfront_use_blk_mq) + blk_mq_stop_hw_queues(info->rq); + else + blk_stop_queue(info->rq); + } __blkif_free(info); } @@ -1399,13 +1523,17 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id) unsigned long flags; struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)dev_id; struct blkfront_info *info = rinfo->dev_info; + spinlock_t *lock; + int error = 0; if (unlikely(info->connected != BLKIF_STATE_CONNECTED)) { if (info->connected != BLKIF_STATE_FREEZING) return IRQ_HANDLED; } - spin_lock_irqsave(&rinfo->ring_lock, flags); + lock = blkfront_use_blk_mq ? &rinfo->ring_lock : &info->io_lock; + + spin_lock_irqsave(lock, flags); again: rp = rinfo->ring.sring->rsp_prod; rmb(); /* Ensure we see queued responses up to 'rp'. */ @@ -1438,14 +1566,18 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id) continue; } - blkif_req(req)->error = (bret->status == BLKIF_RSP_OKAY) ? 0 : -EIO; + if (bret->status == BLKIF_RSP_OKAY) + BLKIF_REQ_PUT_ERROR_STATUS(req, error, 0); + else + BLKIF_REQ_PUT_ERROR_STATUS(req, error, -EIO); + switch (bret->operation) { case BLKIF_OP_DISCARD: if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) { struct request_queue *rq = info->rq; printk(KERN_WARNING "blkfront: %s: %s op failed\n", info->gd->disk_name, op_name(bret->operation)); - blkif_req(req)->error = -EOPNOTSUPP; + BLKIF_REQ_PUT_ERROR_STATUS(req, error, -EOPNOTSUPP); info->feature_discard = 0; info->feature_secdiscard = 0; queue_flag_clear(QUEUE_FLAG_DISCARD, rq); @@ -1457,17 +1589,19 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id) if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) { printk(KERN_WARNING "blkfront: %s: %s op failed\n", info->gd->disk_name, op_name(bret->operation)); - blkif_req(req)->error = -EOPNOTSUPP; + BLKIF_REQ_PUT_ERROR_STATUS(req, error, -EOPNOTSUPP); } if (unlikely(bret->status == BLKIF_RSP_ERROR && rinfo->shadow[id].req.u.rw.nr_segments == 0)) { printk(KERN_WARNING "blkfront: %s: empty %s op failed\n", info->gd->disk_name, op_name(bret->operation)); - blkif_req(req)->error = -EOPNOTSUPP; + BLKIF_REQ_PUT_ERROR_STATUS(req, error, -EOPNOTSUPP); } - if (unlikely(blkif_req(req)->error)) { - if (blkif_req(req)->error == -EOPNOTSUPP) - blkif_req(req)->error = 0; + if (unlikely(BLKIF_REQ_GET_ERROR_STATUS(req, error))) { + if (BLKIF_REQ_GET_ERROR_STATUS(req, error) + == -EOPNOTSUPP) + BLKIF_REQ_PUT_ERROR_STATUS(req, error, + 0); info->feature_flush = 0; xlvbd_flush(info); } @@ -1483,7 +1617,7 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id) BUG(); } - blk_mq_complete_request(req, 0); + blkif_complete_request(req, BLKIF_REQ_GET_ERROR_STATUS(req, error)); } rinfo->ring.rsp_cons = i; @@ -1498,7 +1632,7 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id) kick_pending_request_queues_locked(rinfo); - spin_unlock_irqrestore(&rinfo->ring_lock, flags); + spin_unlock_irqrestore(lock, flags); return IRQ_HANDLED; } @@ -1737,8 +1871,11 @@ static int negotiate_mq(struct blkfront_info *info) backend_max_queues = 1; info->nr_rings = min(backend_max_queues, xen_blkif_max_queues); - /* We need at least one ring. */ - if (!info->nr_rings) + /* + * We need at least one ring. Also, do not allow to have multiple rings if blk-mq is + * not used. + */ + if (!info->nr_rings || !blkfront_use_blk_mq) info->nr_rings = 1; info->rinfo = kzalloc(sizeof(struct blkfront_ring_info) * info->nr_rings, GFP_KERNEL); @@ -1755,7 +1892,8 @@ static int negotiate_mq(struct blkfront_info *info) INIT_LIST_HEAD(&rinfo->grants); rinfo->dev_info = info; INIT_WORK(&rinfo->work, blkif_restart_queue); - spin_lock_init(&rinfo->ring_lock); + if (blkfront_use_blk_mq) + spin_lock_init(&rinfo->ring_lock); } return 0; } @@ -1876,6 +2014,10 @@ static int blkif_recover(struct blkfront_info *info) } xenbus_switch_state(info->xbdev, XenbusStateConnected); + /* blk_requeue_request below must be called with queue lock held */ + if (!blkfront_use_blk_mq) + spin_lock_irq(&info->io_lock); + /* Now safe for us to use the shared ring */ info->connected = BLKIF_STATE_CONNECTED; @@ -1884,19 +2026,33 @@ static int blkif_recover(struct blkfront_info *info) rinfo = &info->rinfo[r_index]; /* Kick any other new requests queued since we resumed */ - kick_pending_request_queues(rinfo); + if (blkfront_use_blk_mq) + kick_pending_request_queues(rinfo); + else + kick_pending_request_queues_locked(rinfo); } - if (frozen) + if (frozen) { + if (!blkfront_use_blk_mq) + spin_unlock_irq(&info->io_lock); return 0; + } list_for_each_entry_safe(req, n, &info->requests, queuelist) { /* Requeue pending requests (flush or discard) */ list_del_init(&req->queuelist); BUG_ON(req->nr_phys_segments > segs); - blk_mq_requeue_request(req); + if (blkfront_use_blk_mq) + blk_mq_requeue_request(req); + else + blk_requeue_request(info->rq, req); + } + + if (blkfront_use_blk_mq) { + blk_mq_kick_requeue_list(info->rq); + } else { + spin_unlock_irq(&info->io_lock); } - blk_mq_kick_requeue_list(info->rq); while ((bio = bio_list_pop(&info->bio_list)) != NULL) { /* Traverse the list of pending bios and re-queue them */ @@ -1976,10 +2132,41 @@ static int blkfront_resume(struct xenbus_device *dev) merge_bio.tail = shadow[j].request->biotail; bio_list_merge(&info->bio_list, &merge_bio); shadow[j].request->bio = NULL; - blk_mq_end_request(shadow[j].request, 0); + if (blkfront_use_blk_mq) + blk_mq_end_request(shadow[j].request, 0); + else + blk_end_request_all(shadow[j].request, 0); } } + if (!blkfront_use_blk_mq) { + struct request *req; + struct bio_list merge_bio; + + /* + * Empty the queue, this is important because we might have + * requests in the queue with more segments than what we + * can handle now. + */ + spin_lock_irq(&info->io_lock); + while ((req = blk_fetch_request(info->rq)) != NULL) { + if (req->cmd_flags & (REQ_FLUSH | REQ_DISCARD) || + req->cmd_flags & REQ_FUA) { + list_add(&req->queuelist, &info->requests); + continue; + } + merge_bio.head = req->bio; + merge_bio.tail = req->biotail; + bio_list_merge(&info->bio_list, &merge_bio); + req->bio = NULL; + if (req->cmd_flags & REQ_FLUSH || + req->cmd_flags & REQ_FUA) + pr_alert("diskcache flush request found!\n"); + __blk_end_request_all(req, 0); + } + spin_unlock_irq(&info->io_lock); + } + blkif_free(info, info->connected == BLKIF_STATE_CONNECTED); err = negotiate_mq(info); @@ -1987,7 +2174,7 @@ static int blkfront_resume(struct xenbus_device *dev) return err; err = talk_to_blkback(dev, info); - if (!err) + if (!err && blkfront_use_blk_mq) blk_mq_update_nr_hw_queues(&info->tag_set, info->nr_rings); /* @@ -2328,6 +2515,8 @@ static void blkback_changed(struct xenbus_device *dev, case XenbusStateClosed: if (dev->state == XenbusStateClosed) { if (info->connected == BLKIF_STATE_FREEZING) { + if (!blkfront_use_blk_mq) + spin_lock_irq(&info->io_lock); __blkif_free(info); info->connected = BLKIF_STATE_FROZEN; complete(&info->wait_backend_disconnected); @@ -2505,13 +2694,24 @@ static int blkfront_freeze(struct xenbus_device *dev) info->connected = BLKIF_STATE_FREEZING; - blk_mq_stop_hw_queues(info->rq); + if (blkfront_use_blk_mq) { + blk_mq_stop_hw_queues(info->rq); + + for (i = 0; i < info->nr_rings; i++) { + rinfo = &info->rinfo[i]; - for (i = 0; i < info->nr_rings; i++) { - struct blkfront_ring_info *rinfo = &info->rinfo[i]; + gnttab_cancel_free_callback(&rinfo->callback); + flush_work(&rinfo->work); + } + } else { + spin_lock_irq(&info->io_lock); + blk_stop_queue(info->rq); + gnttab_cancel_free_callback( + &info->rinfo[FIRST_RING_ID].callback); + spin_unlock_irq(&info->io_lock); - gnttab_cancel_free_callback(&rinfo->callback); - flush_work(&rinfo->work); + blk_sync_queue(info->rq); + flush_work(&info->rinfo[FIRST_RING_ID].work); } /* @@ -2519,6 +2719,7 @@ static int blkfront_freeze(struct xenbus_device *dev) * Ensure that there is nothing left there before disconnecting. */ for (i = 0; i < info->nr_rings; i++) { + spinlock_t *lock; bool busy; unsigned long req_timeout_ms = 25; unsigned long ring_timeout; @@ -2526,13 +2727,16 @@ static int blkfront_freeze(struct xenbus_device *dev) rinfo = &info->rinfo[i]; ring = &rinfo->ring; + lock = blkfront_use_blk_mq ? + &rinfo->ring_lock : &info->io_lock; + ring_timeout = jiffies + msecs_to_jiffies(req_timeout_ms * RING_SIZE(ring)); do { - spin_lock_irq(&rinfo->ring_lock); + spin_lock_irq(lock); busy = blkfront_ring_is_busy(ring); - spin_unlock_irq(&rinfo->ring_lock); + spin_unlock_irq(lock); /* * We may want to give the backend or interrupt handler @@ -2583,7 +2787,8 @@ static int blkfront_restore(struct xenbus_device *dev) if (err) goto out; - blk_mq_update_nr_hw_queues(&info->tag_set, info->nr_rings); + if (blkfront_use_blk_mq) + blk_mq_update_nr_hw_queues(&info->tag_set, info->nr_rings); out: return err;