Patch Detail
get:
Show a patch.
patch:
Update a patch.
put:
Update a patch.
GET /api/patches/227/?format=api
{ "id": 227, "url": "http://patchwork.ozlabs.org/api/patches/227/?format=api", "web_url": "http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20080910162356.7294fe87@BL3D1974.boeblingen.de.ibm.com/", "project": { "id": 2, "url": "http://patchwork.ozlabs.org/api/projects/2/?format=api", "name": "Linux PPC development", "link_name": "linuxppc-dev", "list_id": "linuxppc-dev.lists.ozlabs.org", "list_email": "linuxppc-dev@lists.ozlabs.org", "web_url": "https://github.com/linuxppc/wiki/wiki", "scm_url": "https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git", "webscm_url": "https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/", "list_archive_url": "https://lore.kernel.org/linuxppc-dev/", "list_archive_url_format": "https://lore.kernel.org/linuxppc-dev/{}/", "commit_url_format": "https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id={}" }, "msgid": "<20080910162356.7294fe87@BL3D1974.boeblingen.de.ibm.com>", "list_archive_url": "https://lore.kernel.org/linuxppc-dev/20080910162356.7294fe87@BL3D1974.boeblingen.de.ibm.com/", "date": "2008-09-10T14:23:56", "name": "ib/ehca: add flush CQE generation", "commit_ref": null, "pull_url": null, "state": "not-applicable", "archived": true, "hash": "dd97126194c92c8289ef734ddb321e41c009d7ec", "submitter": { "id": 129, "url": "http://patchwork.ozlabs.org/api/people/129/?format=api", "name": "Alexander Schmidt", "email": "alexs@linux.vnet.ibm.com" }, "delegate": null, "mbox": "http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20080910162356.7294fe87@BL3D1974.boeblingen.de.ibm.com/mbox/", "series": [], "comments": "http://patchwork.ozlabs.org/api/patches/227/comments/", "check": "pending", "checks": "http://patchwork.ozlabs.org/api/patches/227/checks/", "tags": {}, "related": [], "headers": { "Return-Path": "<linuxppc-dev-bounces+patchwork=ozlabs.org@ozlabs.org>", "X-Original-To": [ "patchwork@ozlabs.org", "linuxppc-dev@ozlabs.org" ], "Delivered-To": [ "patchwork@ozlabs.org", "linuxppc-dev@ozlabs.org" ], "Received": [ "from ozlabs.org (localhost [127.0.0.1])\n\tby ozlabs.org (Postfix) with ESMTP id C309ADE4B2\n\tfor <patchwork@ozlabs.org>; Thu, 11 Sep 2008 00:24:13 +1000 (EST)", "from mtagate1.de.ibm.com (mtagate1.de.ibm.com [195.212.17.161])\n\t(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))\n\t(Client CN \"mtagate1.de.ibm.com\", Issuer \"Equifax\" (verified OK))\n\tby ozlabs.org (Postfix) with ESMTPS id EDA70DDDFF\n\tfor <linuxppc-dev@ozlabs.org>; Thu, 11 Sep 2008 00:23:54 +1000 (EST)", "from d12nrmr1607.megacenter.de.ibm.com\n\t(d12nrmr1607.megacenter.de.ibm.com [9.149.167.49])\n\tby mtagate1.de.ibm.com (8.13.1/8.13.1) with ESMTP id m8AENmZx000374\n\tfor <linuxppc-dev@ozlabs.org>; Wed, 10 Sep 2008 14:23:48 GMT", "from d12av04.megacenter.de.ibm.com (d12av04.megacenter.de.ibm.com\n\t[9.149.165.229])\n\tby d12nrmr1607.megacenter.de.ibm.com (8.13.8/8.13.8/NCO v9.1) with\n\tESMTP id m8AENm493252478\n\tfor <linuxppc-dev@ozlabs.org>; Wed, 10 Sep 2008 16:23:48 +0200", "from d12av04.megacenter.de.ibm.com (loopback [127.0.0.1])\n\tby d12av04.megacenter.de.ibm.com (8.12.11.20060308/8.13.3) with ESMTP\n\tid m8AENiAW021150\n\tfor <linuxppc-dev@ozlabs.org>; Wed, 10 Sep 2008 16:23:45 +0200", "from BL3D1974.boeblingen.de.ibm.com\n\t(dyn-9-152-217-41.boeblingen.de.ibm.com [9.152.217.41])\n\tby d12av04.megacenter.de.ibm.com (8.12.11.20060308/8.12.11) with\n\tESMTP id m8AENi2n021139; Wed, 10 Sep 2008 16:23:44 +0200" ], "Date": "Wed, 10 Sep 2008 16:23:56 +0200", "From": "Alexander Schmidt <alexs@linux.vnet.ibm.com>", "To": "Roland Dreier <rolandd@cisco.com>, of-ewg <ewg@lists.openfabrics.org>,\n\tof-general <general@lists.openfabrics.org>, lkml\n\t<linux-kernel@vger.kernel.org>, linuxppc-dev <linuxppc-dev@ozlabs.org>,\n\tStefan Roscher <stefan.roscher@de.ibm.com>, Hoang-Nam Nguyen\n\t<HNGUYEN@linux.vnet.ibm.com>, Joachim Fenkes <fenkes@de.ibm.com>,\n\tChristoph Raisch <raisch@de.ibm.com>", "Subject": "[PATCH] ib/ehca: add flush CQE generation", "Message-ID": "<20080910162356.7294fe87@BL3D1974.boeblingen.de.ibm.com>", "X-Mailer": "Claws Mail 3.5.0 (GTK+ 2.12.11; i386-redhat-linux-gnu)", "Mime-Version": "1.0", "X-BeenThere": "linuxppc-dev@ozlabs.org", "X-Mailman-Version": "2.1.11", "Precedence": "list", "List-Id": "Linux on PowerPC Developers Mail List <linuxppc-dev.ozlabs.org>", "List-Unsubscribe": "<https://ozlabs.org/mailman/options/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@ozlabs.org?subject=unsubscribe>", "List-Archive": "<http://ozlabs.org/pipermail/linuxppc-dev>", "List-Post": "<mailto:linuxppc-dev@ozlabs.org>", "List-Help": "<mailto:linuxppc-dev-request@ozlabs.org?subject=help>", "List-Subscribe": "<https://ozlabs.org/mailman/listinfo/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@ozlabs.org?subject=subscribe>", "Content-Type": "text/plain; charset=\"us-ascii\"", "Content-Transfer-Encoding": "7bit", "Sender": "linuxppc-dev-bounces+patchwork=ozlabs.org@ozlabs.org", "Errors-To": "linuxppc-dev-bounces+patchwork=ozlabs.org@ozlabs.org" }, "content": "When a QP goes into error state, it is required that flush CQEs are\ndelivered to the application for any outstanding work requests. eHCA does not\ndo this in hardware, so this patch adds software flush CQE generation to the\nehca driver.\n\nWhenever a QP gets into error state, it is added to the QP error list of its\nrespective CQ. If the error QP list of a CQ is not empty, poll_cq()\ngenerates flush CQEs before polling the actual CQ.\n\nSigned-off-by: Alexander Schmidt <alexs@linux.vnet.ibm.com>", "diff": "--- infiniband.git.orig/drivers/infiniband/hw/ehca/ehca_classes.h\n+++ infiniband.git/drivers/infiniband/hw/ehca/ehca_classes.h\n@@ -164,6 +164,13 @@ struct ehca_qmap_entry {\n \tu16 reported;\n };\n \n+struct ehca_queue_map {\n+\tstruct ehca_qmap_entry *map;\n+\tunsigned int entries;\n+\tunsigned int tail;\n+\tunsigned int left_to_poll;\n+};\n+\n struct ehca_qp {\n \tunion {\n \t\tstruct ib_qp ib_qp;\n@@ -173,8 +180,9 @@ struct ehca_qp {\n \tenum ehca_ext_qp_type ext_type;\n \tenum ib_qp_state state;\n \tstruct ipz_queue ipz_squeue;\n-\tstruct ehca_qmap_entry *sq_map;\n+\tstruct ehca_queue_map sq_map;\n \tstruct ipz_queue ipz_rqueue;\n+\tstruct ehca_queue_map rq_map;\n \tstruct h_galpas galpas;\n \tu32 qkey;\n \tu32 real_qp_num;\n@@ -204,6 +212,8 @@ struct ehca_qp {\n \tatomic_t nr_events; /* events seen */\n \twait_queue_head_t wait_completion;\n \tint mig_armed;\n+\tstruct list_head sq_err_node;\n+\tstruct list_head rq_err_node;\n };\n \n #define IS_SRQ(qp) (qp->ext_type == EQPT_SRQ)\n@@ -233,6 +243,8 @@ struct ehca_cq {\n \t/* mmap counter for resources mapped into user space */\n \tu32 mm_count_queue;\n \tu32 mm_count_galpa;\n+\tstruct list_head sqp_err_list;\n+\tstruct list_head rqp_err_list;\n };\n \n enum ehca_mr_flag {\n--- infiniband.git.orig/drivers/infiniband/hw/ehca/ehca_reqs.c\n+++ infiniband.git/drivers/infiniband/hw/ehca/ehca_reqs.c\n@@ -53,9 +53,25 @@\n /* in RC traffic, insert an empty RDMA READ every this many packets */\n #define ACK_CIRC_THRESHOLD 2000000\n \n+static u64 replace_wr_id(u64 wr_id, u16 idx)\n+{\n+\tu64 ret;\n+\n+\tret = wr_id & ~QMAP_IDX_MASK;\n+\tret |= idx & QMAP_IDX_MASK;\n+\n+\treturn ret;\n+}\n+\n+static u16 get_app_wr_id(u64 wr_id)\n+{\n+\treturn wr_id & QMAP_IDX_MASK;\n+}\n+\n static inline int ehca_write_rwqe(struct ipz_queue *ipz_rqueue,\n \t\t\t\t struct ehca_wqe *wqe_p,\n-\t\t\t\t struct ib_recv_wr *recv_wr)\n+\t\t\t\t struct ib_recv_wr *recv_wr,\n+\t\t\t\t u32 rq_map_idx)\n {\n \tu8 cnt_ds;\n \tif (unlikely((recv_wr->num_sge < 0) ||\n@@ -69,7 +85,7 @@ static inline int ehca_write_rwqe(struct\n \t/* clear wqe header until sglist */\n \tmemset(wqe_p, 0, offsetof(struct ehca_wqe, u.ud_av.sg_list));\n \n-\twqe_p->work_request_id = recv_wr->wr_id;\n+\twqe_p->work_request_id = replace_wr_id(recv_wr->wr_id, rq_map_idx);\n \twqe_p->nr_of_data_seg = recv_wr->num_sge;\n \n \tfor (cnt_ds = 0; cnt_ds < recv_wr->num_sge; cnt_ds++) {\n@@ -146,6 +162,7 @@ static inline int ehca_write_swqe(struct\n \tu64 dma_length;\n \tstruct ehca_av *my_av;\n \tu32 remote_qkey = send_wr->wr.ud.remote_qkey;\n+\tstruct ehca_qmap_entry *qmap_entry = &qp->sq_map.map[sq_map_idx];\n \n \tif (unlikely((send_wr->num_sge < 0) ||\n \t\t (send_wr->num_sge > qp->ipz_squeue.act_nr_of_sg))) {\n@@ -158,11 +175,10 @@ static inline int ehca_write_swqe(struct\n \t/* clear wqe header until sglist */\n \tmemset(wqe_p, 0, offsetof(struct ehca_wqe, u.ud_av.sg_list));\n \n-\twqe_p->work_request_id = send_wr->wr_id & ~QMAP_IDX_MASK;\n-\twqe_p->work_request_id |= sq_map_idx & QMAP_IDX_MASK;\n+\twqe_p->work_request_id = replace_wr_id(send_wr->wr_id, sq_map_idx);\n \n-\tqp->sq_map[sq_map_idx].app_wr_id = send_wr->wr_id & QMAP_IDX_MASK;\n-\tqp->sq_map[sq_map_idx].reported = 0;\n+\tqmap_entry->app_wr_id = get_app_wr_id(send_wr->wr_id);\n+\tqmap_entry->reported = 0;\n \n \tswitch (send_wr->opcode) {\n \tcase IB_WR_SEND:\n@@ -496,7 +512,9 @@ static int internal_post_recv(struct ehc\n \tstruct ehca_wqe *wqe_p;\n \tint wqe_cnt = 0;\n \tint ret = 0;\n+\tu32 rq_map_idx;\n \tunsigned long flags;\n+\tstruct ehca_qmap_entry *qmap_entry;\n \n \tif (unlikely(!HAS_RQ(my_qp))) {\n \t\tehca_err(dev, \"QP has no RQ ehca_qp=%p qp_num=%x ext_type=%d\",\n@@ -524,8 +542,15 @@ static int internal_post_recv(struct ehc\n \t\t\t}\n \t\t\tgoto post_recv_exit0;\n \t\t}\n+\t\t/*\n+\t\t * Get the index of the WQE in the recv queue. The same index\n+\t\t * is used for writing into the rq_map.\n+\t\t */\n+\t\trq_map_idx = start_offset / my_qp->ipz_rqueue.qe_size;\n+\n \t\t/* write a RECV WQE into the QUEUE */\n-\t\tret = ehca_write_rwqe(&my_qp->ipz_rqueue, wqe_p, cur_recv_wr);\n+\t\tret = ehca_write_rwqe(&my_qp->ipz_rqueue, wqe_p, cur_recv_wr,\n+\t\t\t\trq_map_idx);\n \t\t/*\n \t\t * if something failed,\n \t\t * reset the free entry pointer to the start value\n@@ -540,6 +565,11 @@ static int internal_post_recv(struct ehc\n \t\t\t}\n \t\t\tgoto post_recv_exit0;\n \t\t}\n+\n+\t\tqmap_entry = &my_qp->rq_map.map[rq_map_idx];\n+\t\tqmap_entry->app_wr_id = get_app_wr_id(cur_recv_wr->wr_id);\n+\t\tqmap_entry->reported = 0;\n+\n \t\twqe_cnt++;\n \t} /* eof for cur_recv_wr */\n \n@@ -596,10 +626,12 @@ static const u8 ib_wc_opcode[255] = {\n /* internal function to poll one entry of cq */\n static inline int ehca_poll_cq_one(struct ib_cq *cq, struct ib_wc *wc)\n {\n-\tint ret = 0;\n+\tint ret = 0, qmap_tail_idx;\n \tstruct ehca_cq *my_cq = container_of(cq, struct ehca_cq, ib_cq);\n \tstruct ehca_cqe *cqe;\n \tstruct ehca_qp *my_qp;\n+\tstruct ehca_qmap_entry *qmap_entry;\n+\tstruct ehca_queue_map *qmap;\n \tint cqe_count = 0, is_error;\n \n repoll:\n@@ -674,27 +706,52 @@ repoll:\n \t\tgoto repoll;\n \twc->qp = &my_qp->ib_qp;\n \n-\tif (!(cqe->w_completion_flags & WC_SEND_RECEIVE_BIT)) {\n-\t\tstruct ehca_qmap_entry *qmap_entry;\n+\tif (is_error) {\n \t\t/*\n-\t\t * We got a send completion and need to restore the original\n-\t\t * wr_id.\n+\t\t * set left_to_poll to 0 because in error state, we will not\n+\t\t * get any additional CQEs\n \t\t */\n-\t\tqmap_entry = &my_qp->sq_map[cqe->work_request_id &\n-\t\t\t\t\t QMAP_IDX_MASK];\n+\t\tehca_add_to_err_list(my_qp, 1);\n+\t\tmy_qp->sq_map.left_to_poll = 0;\n \n-\t\tif (qmap_entry->reported) {\n-\t\t\tehca_warn(cq->device, \"Double cqe on qp_num=%#x\",\n-\t\t\t\t my_qp->real_qp_num);\n-\t\t\t/* found a double cqe, discard it and read next one */\n-\t\t\tgoto repoll;\n-\t\t}\n-\t\twc->wr_id = cqe->work_request_id & ~QMAP_IDX_MASK;\n-\t\twc->wr_id |= qmap_entry->app_wr_id;\n-\t\tqmap_entry->reported = 1;\n-\t} else\n+\t\tif (HAS_RQ(my_qp))\n+\t\t\tehca_add_to_err_list(my_qp, 0);\n+\t\tmy_qp->rq_map.left_to_poll = 0;\n+\t}\n+\n+\tqmap_tail_idx = get_app_wr_id(cqe->work_request_id);\n+\tif (!(cqe->w_completion_flags & WC_SEND_RECEIVE_BIT))\n+\t\t/* We got a send completion. */\n+\t\tqmap = &my_qp->sq_map;\n+\telse\n \t\t/* We got a receive completion. */\n-\t\twc->wr_id = cqe->work_request_id;\n+\t\tqmap = &my_qp->rq_map;\n+\n+\tqmap_entry = &qmap->map[qmap_tail_idx];\n+\tif (qmap_entry->reported) {\n+\t\tehca_warn(cq->device, \"Double cqe on qp_num=%#x\",\n+\t\t\t\tmy_qp->real_qp_num);\n+\t\t/* found a double cqe, discard it and read next one */\n+\t\tgoto repoll;\n+\t}\n+\n+\twc->wr_id = replace_wr_id(cqe->work_request_id, qmap_entry->app_wr_id);\n+\tqmap_entry->reported = 1;\n+\n+\t/* this is a proper completion, we need to advance the tail pointer */\n+\tif (++qmap->tail == qmap->entries)\n+\t\tqmap->tail = 0;\n+\n+\t/* if left_to_poll is decremented to 0, add the QP to the error list */\n+\tif (qmap->left_to_poll > 0) {\n+\t\tqmap->left_to_poll--;\n+\t\tif ((my_qp->sq_map.left_to_poll == 0) &&\n+\t\t\t\t(my_qp->rq_map.left_to_poll == 0)) {\n+\t\t\tehca_add_to_err_list(my_qp, 1);\n+\t\t\tif (HAS_RQ(my_qp))\n+\t\t\t\tehca_add_to_err_list(my_qp, 0);\n+\t\t}\n+\t}\n \n \t/* eval ib_wc_opcode */\n \twc->opcode = ib_wc_opcode[cqe->optype]-1;\n@@ -733,13 +790,88 @@ poll_cq_one_exit0:\n \treturn ret;\n }\n \n+static int generate_flush_cqes(struct ehca_qp *my_qp, struct ib_cq *cq,\n+\t\t\t struct ib_wc *wc, int num_entries,\n+\t\t\t struct ipz_queue *ipz_queue, int on_sq)\n+{\n+\tint nr = 0;\n+\tstruct ehca_wqe *wqe;\n+\tu64 offset;\n+\tstruct ehca_queue_map *qmap;\n+\tstruct ehca_qmap_entry *qmap_entry;\n+\n+\tif (on_sq)\n+\t\tqmap = &my_qp->sq_map;\n+\telse\n+\t\tqmap = &my_qp->rq_map;\n+\n+\tqmap_entry = &qmap->map[qmap->tail];\n+\n+\twhile ((nr < num_entries) && (qmap_entry->reported == 0)) {\n+\t\t/* generate flush CQE */\n+\t\tmemset(wc, 0, sizeof(*wc));\n+\n+\t\toffset = qmap->tail * ipz_queue->qe_size;\n+\t\twqe = (struct ehca_wqe *)ipz_qeit_calc(ipz_queue, offset);\n+\t\tif (!wqe) {\n+\t\t\tehca_err(cq->device, \"Invalid wqe offset=%#lx on \"\n+\t\t\t\t \"qp_num=%#x\", offset, my_qp->real_qp_num);\n+\t\t\treturn nr;\n+\t\t}\n+\n+\t\twc->wr_id = replace_wr_id(wqe->work_request_id,\n+\t\t\t\t\t qmap_entry->app_wr_id);\n+\n+\t\tif (on_sq) {\n+\t\t\tswitch (wqe->optype) {\n+\t\t\tcase WQE_OPTYPE_SEND:\n+\t\t\t\twc->opcode = IB_WC_SEND;\n+\t\t\t\tbreak;\n+\t\t\tcase WQE_OPTYPE_RDMAWRITE:\n+\t\t\t\twc->opcode = IB_WC_RDMA_WRITE;\n+\t\t\t\tbreak;\n+\t\t\tcase WQE_OPTYPE_RDMAREAD:\n+\t\t\t\twc->opcode = IB_WC_RDMA_READ;\n+\t\t\t\tbreak;\n+\t\t\tdefault:\n+\t\t\t\tehca_err(cq->device, \"Invalid optype=%x\",\n+\t\t\t\t\t\twqe->optype);\n+\t\t\t\treturn nr;\n+\t\t\t}\n+\t\t} else\n+\t\t\twc->opcode = IB_WC_RECV;\n+\n+\t\tif (wqe->wr_flag & WQE_WRFLAG_IMM_DATA_PRESENT) {\n+\t\t\twc->ex.imm_data = wqe->immediate_data;\n+\t\t\twc->wc_flags |= IB_WC_WITH_IMM;\n+\t\t}\n+\n+\t\twc->status = IB_WC_WR_FLUSH_ERR;\n+\n+\t\twc->qp = &my_qp->ib_qp;\n+\n+\t\t/* mark as reported and advance tail pointer */\n+\t\tqmap_entry->reported = 1;\n+\t\tif (++qmap->tail == qmap->entries)\n+\t\t\tqmap->tail = 0;\n+\t\tqmap_entry = &qmap->map[qmap->tail];\n+\n+\t\twc++; nr++;\n+\t}\n+\n+\treturn nr;\n+\n+}\n+\n int ehca_poll_cq(struct ib_cq *cq, int num_entries, struct ib_wc *wc)\n {\n \tstruct ehca_cq *my_cq = container_of(cq, struct ehca_cq, ib_cq);\n \tint nr;\n+\tstruct ehca_qp *err_qp;\n \tstruct ib_wc *current_wc = wc;\n \tint ret = 0;\n \tunsigned long flags;\n+\tint entries_left = num_entries;\n \n \tif (num_entries < 1) {\n \t\tehca_err(cq->device, \"Invalid num_entries=%d ehca_cq=%p \"\n@@ -749,15 +881,40 @@ int ehca_poll_cq(struct ib_cq *cq, int n\n \t}\n \n \tspin_lock_irqsave(&my_cq->spinlock, flags);\n-\tfor (nr = 0; nr < num_entries; nr++) {\n+\n+\t/* generate flush cqes for send queues */\n+\tlist_for_each_entry(err_qp, &my_cq->sqp_err_list, sq_err_node) {\n+\t\tnr = generate_flush_cqes(err_qp, cq, current_wc, entries_left,\n+\t\t\t\t&err_qp->ipz_squeue, 1);\n+\t\tentries_left -= nr;\n+\t\tcurrent_wc += nr;\n+\n+\t\tif (entries_left == 0)\n+\t\t\tbreak;\n+\t}\n+\n+\t/* generate flush cqes for receive queues */\n+\tlist_for_each_entry(err_qp, &my_cq->rqp_err_list, rq_err_node) {\n+\t\tnr = generate_flush_cqes(err_qp, cq, current_wc, entries_left,\n+\t\t\t\t&err_qp->ipz_rqueue, 0);\n+\t\tentries_left -= nr;\n+\t\tcurrent_wc += nr;\n+\n+\t\tif (entries_left == 0)\n+\t\t\tbreak;\n+\t}\n+\n+\tfor (nr = 0; nr < entries_left; nr++) {\n \t\tret = ehca_poll_cq_one(cq, current_wc);\n \t\tif (ret)\n \t\t\tbreak;\n \t\tcurrent_wc++;\n \t} /* eof for nr */\n+\tentries_left -= nr;\n+\n \tspin_unlock_irqrestore(&my_cq->spinlock, flags);\n \tif (ret == -EAGAIN || !ret)\n-\t\tret = nr;\n+\t\tret = num_entries - entries_left;\n \n poll_cq_exit0:\n \treturn ret;\n--- infiniband.git.orig/drivers/infiniband/hw/ehca/ehca_cq.c\n+++ infiniband.git/drivers/infiniband/hw/ehca/ehca_cq.c\n@@ -276,6 +276,9 @@ struct ib_cq *ehca_create_cq(struct ib_d\n \tfor (i = 0; i < QP_HASHTAB_LEN; i++)\n \t\tINIT_HLIST_HEAD(&my_cq->qp_hashtab[i]);\n \n+\tINIT_LIST_HEAD(&my_cq->sqp_err_list);\n+\tINIT_LIST_HEAD(&my_cq->rqp_err_list);\n+\n \tif (context) {\n \t\tstruct ipz_queue *ipz_queue = &my_cq->ipz_queue;\n \t\tstruct ehca_create_cq_resp resp;\n--- infiniband.git.orig/drivers/infiniband/hw/ehca/ehca_qp.c\n+++ infiniband.git/drivers/infiniband/hw/ehca/ehca_qp.c\n@@ -396,6 +396,50 @@ static void ehca_determine_small_queue(s\n \tqueue->is_small = (queue->page_size != 0);\n }\n \n+/* needs to be called with cq->spinlock held */\n+void ehca_add_to_err_list(struct ehca_qp *qp, int on_sq)\n+{\n+\tstruct list_head *list, *node;\n+\n+\t/* TODO: support low latency QPs */\n+\tif (qp->ext_type == EQPT_LLQP)\n+\t\treturn;\n+\n+\tif (on_sq) {\n+\t\tlist = &qp->send_cq->sqp_err_list;\n+\t\tnode = &qp->sq_err_node;\n+\t} else {\n+\t\tlist = &qp->recv_cq->rqp_err_list;\n+\t\tnode = &qp->rq_err_node;\n+\t}\n+\n+\tif (list_empty(node))\n+\t\tlist_add_tail(node, list);\n+\n+\treturn;\n+}\n+\n+static void del_from_err_list(struct ehca_cq *cq, struct list_head *node)\n+{\n+\tunsigned long flags;\n+\n+\tspin_lock_irqsave(&cq->spinlock, flags);\n+\n+\tif (!list_empty(node))\n+\t\tlist_del_init(node);\n+\n+\tspin_unlock_irqrestore(&cq->spinlock, flags);\n+}\n+\n+static void reset_queue_map(struct ehca_queue_map *qmap)\n+{\n+\tint i;\n+\n+\tqmap->tail = 0;\n+\tfor (i = 0; i < qmap->entries; i++)\n+\t\tqmap->map[i].reported = 1;\n+}\n+\n /*\n * Create an ib_qp struct that is either a QP or an SRQ, depending on\n * the value of the is_srq parameter. If init_attr and srq_init_attr share\n@@ -407,12 +451,11 @@ static struct ehca_qp *internal_create_q\n \tstruct ib_srq_init_attr *srq_init_attr,\n \tstruct ib_udata *udata, int is_srq)\n {\n-\tstruct ehca_qp *my_qp;\n+\tstruct ehca_qp *my_qp, *my_srq = NULL;\n \tstruct ehca_pd *my_pd = container_of(pd, struct ehca_pd, ib_pd);\n \tstruct ehca_shca *shca = container_of(pd->device, struct ehca_shca,\n \t\t\t\t\t ib_device);\n \tstruct ib_ucontext *context = NULL;\n-\tu32 nr_qes;\n \tu64 h_ret;\n \tint is_llqp = 0, has_srq = 0;\n \tint qp_type, max_send_sge, max_recv_sge, ret;\n@@ -457,8 +500,7 @@ static struct ehca_qp *internal_create_q\n \n \t/* handle SRQ base QPs */\n \tif (init_attr->srq) {\n-\t\tstruct ehca_qp *my_srq =\n-\t\t\tcontainer_of(init_attr->srq, struct ehca_qp, ib_srq);\n+\t\tmy_srq = container_of(init_attr->srq, struct ehca_qp, ib_srq);\n \n \t\thas_srq = 1;\n \t\tparms.ext_type = EQPT_SRQBASE;\n@@ -716,15 +758,19 @@ static struct ehca_qp *internal_create_q\n \t\t\t\t \"and pages ret=%i\", ret);\n \t\t\tgoto create_qp_exit2;\n \t\t}\n-\t\tnr_qes = my_qp->ipz_squeue.queue_length /\n+\n+\t\tmy_qp->sq_map.entries = my_qp->ipz_squeue.queue_length /\n \t\t\t my_qp->ipz_squeue.qe_size;\n-\t\tmy_qp->sq_map = vmalloc(nr_qes *\n+\t\tmy_qp->sq_map.map = vmalloc(my_qp->sq_map.entries *\n \t\t\t\t\tsizeof(struct ehca_qmap_entry));\n-\t\tif (!my_qp->sq_map) {\n+\t\tif (!my_qp->sq_map.map) {\n \t\t\tehca_err(pd->device, \"Couldn't allocate squeue \"\n \t\t\t\t \"map ret=%i\", ret);\n \t\t\tgoto create_qp_exit3;\n \t\t}\n+\t\tINIT_LIST_HEAD(&my_qp->sq_err_node);\n+\t\t/* to avoid the generation of bogus flush CQEs */\n+\t\treset_queue_map(&my_qp->sq_map);\n \t}\n \n \tif (HAS_RQ(my_qp)) {\n@@ -736,6 +782,25 @@ static struct ehca_qp *internal_create_q\n \t\t\t\t \"and pages ret=%i\", ret);\n \t\t\tgoto create_qp_exit4;\n \t\t}\n+\n+\t\tmy_qp->rq_map.entries = my_qp->ipz_rqueue.queue_length /\n+\t\t\tmy_qp->ipz_rqueue.qe_size;\n+\t\tmy_qp->rq_map.map = vmalloc(my_qp->rq_map.entries *\n+\t\t\t\tsizeof(struct ehca_qmap_entry));\n+\t\tif (!my_qp->rq_map.map) {\n+\t\t\tehca_err(pd->device, \"Couldn't allocate squeue \"\n+\t\t\t\t\t\"map ret=%i\", ret);\n+\t\t\tgoto create_qp_exit5;\n+\t\t}\n+\t\tINIT_LIST_HEAD(&my_qp->rq_err_node);\n+\t\t/* to avoid the generation of bogus flush CQEs */\n+\t\treset_queue_map(&my_qp->rq_map);\n+\t} else if (init_attr->srq) {\n+\t\t/* this is a base QP, use the queue map of the SRQ */\n+\t\tmy_qp->rq_map = my_srq->rq_map;\n+\t\tINIT_LIST_HEAD(&my_qp->rq_err_node);\n+\n+\t\tmy_qp->ipz_rqueue = my_srq->ipz_rqueue;\n \t}\n \n \tif (is_srq) {\n@@ -799,7 +864,7 @@ static struct ehca_qp *internal_create_q\n \t\tif (ret) {\n \t\t\tehca_err(pd->device,\n \t\t\t\t \"Couldn't assign qp to send_cq ret=%i\", ret);\n-\t\t\tgoto create_qp_exit6;\n+\t\t\tgoto create_qp_exit7;\n \t\t}\n \t}\n \n@@ -825,25 +890,29 @@ static struct ehca_qp *internal_create_q\n \t\tif (ib_copy_to_udata(udata, &resp, sizeof resp)) {\n \t\t\tehca_err(pd->device, \"Copy to udata failed\");\n \t\t\tret = -EINVAL;\n-\t\t\tgoto create_qp_exit7;\n+\t\t\tgoto create_qp_exit8;\n \t\t}\n \t}\n \n \treturn my_qp;\n \n-create_qp_exit7:\n+create_qp_exit8:\n \tehca_cq_unassign_qp(my_qp->send_cq, my_qp->real_qp_num);\n \n-create_qp_exit6:\n+create_qp_exit7:\n \tkfree(my_qp->mod_qp_parm);\n \n+create_qp_exit6:\n+\tif (HAS_RQ(my_qp))\n+\t\tvfree(my_qp->rq_map.map);\n+\n create_qp_exit5:\n \tif (HAS_RQ(my_qp))\n \t\tipz_queue_dtor(my_pd, &my_qp->ipz_rqueue);\n \n create_qp_exit4:\n \tif (HAS_SQ(my_qp))\n-\t\tvfree(my_qp->sq_map);\n+\t\tvfree(my_qp->sq_map.map);\n \n create_qp_exit3:\n \tif (HAS_SQ(my_qp))\n@@ -1035,6 +1104,101 @@ static int prepare_sqe_rts(struct ehca_q\n \treturn 0;\n }\n \n+static int calc_left_cqes(u64 wqe_p, struct ipz_queue *ipz_queue,\n+\t\t\t struct ehca_queue_map *qmap)\n+{\n+\tvoid *wqe_v;\n+\tu64 q_ofs;\n+\tu32 wqe_idx;\n+\n+\t/* convert real to abs address */\n+\twqe_p = wqe_p & (~(1UL << 63));\n+\n+\twqe_v = abs_to_virt(wqe_p);\n+\n+\tif (ipz_queue_abs_to_offset(ipz_queue, wqe_p, &q_ofs)) {\n+\t\tehca_gen_err(\"Invalid offset for calculating left cqes \"\n+\t\t\t\t\"wqe_p=%#lx wqe_v=%p\\n\", wqe_p, wqe_v);\n+\t\treturn -EFAULT;\n+\t}\n+\n+\twqe_idx = q_ofs / ipz_queue->qe_size;\n+\tif (wqe_idx < qmap->tail)\n+\t\tqmap->left_to_poll = (qmap->entries - qmap->tail) + wqe_idx;\n+\telse\n+\t\tqmap->left_to_poll = wqe_idx - qmap->tail;\n+\n+\treturn 0;\n+}\n+\n+static int check_for_left_cqes(struct ehca_qp *my_qp, struct ehca_shca *shca)\n+{\n+\tu64 h_ret;\n+\tvoid *send_wqe_p, *recv_wqe_p;\n+\tint ret;\n+\tunsigned long flags;\n+\tint qp_num = my_qp->ib_qp.qp_num;\n+\n+\t/* this hcall is not supported on base QPs */\n+\tif (my_qp->ext_type != EQPT_SRQBASE) {\n+\t\t/* get send and receive wqe pointer */\n+\t\th_ret = hipz_h_disable_and_get_wqe(shca->ipz_hca_handle,\n+\t\t\t\tmy_qp->ipz_qp_handle, &my_qp->pf,\n+\t\t\t\t&send_wqe_p, &recv_wqe_p, 4);\n+\t\tif (h_ret != H_SUCCESS) {\n+\t\t\tehca_err(&shca->ib_device, \"disable_and_get_wqe() \"\n+\t\t\t\t \"failed ehca_qp=%p qp_num=%x h_ret=%li\",\n+\t\t\t\t my_qp, qp_num, h_ret);\n+\t\t\treturn ehca2ib_return_code(h_ret);\n+\t\t}\n+\n+\t\t/*\n+\t\t * acquire lock to ensure that nobody is polling the cq which\n+\t\t * could mean that the qmap->tail pointer is in an\n+\t\t * inconsistent state.\n+\t\t */\n+\t\tspin_lock_irqsave(&my_qp->send_cq->spinlock, flags);\n+\t\tret = calc_left_cqes((u64)send_wqe_p, &my_qp->ipz_squeue,\n+\t\t\t\t&my_qp->sq_map);\n+\t\tspin_unlock_irqrestore(&my_qp->send_cq->spinlock, flags);\n+\t\tif (ret)\n+\t\t\treturn ret;\n+\n+\n+\t\tspin_lock_irqsave(&my_qp->recv_cq->spinlock, flags);\n+\t\tret = calc_left_cqes((u64)recv_wqe_p, &my_qp->ipz_rqueue,\n+\t\t\t\t&my_qp->rq_map);\n+\t\tspin_unlock_irqrestore(&my_qp->recv_cq->spinlock, flags);\n+\t\tif (ret)\n+\t\t\treturn ret;\n+\t} else {\n+\t\tspin_lock_irqsave(&my_qp->send_cq->spinlock, flags);\n+\t\tmy_qp->sq_map.left_to_poll = 0;\n+\t\tspin_unlock_irqrestore(&my_qp->send_cq->spinlock, flags);\n+\n+\t\tspin_lock_irqsave(&my_qp->recv_cq->spinlock, flags);\n+\t\tmy_qp->rq_map.left_to_poll = 0;\n+\t\tspin_unlock_irqrestore(&my_qp->recv_cq->spinlock, flags);\n+\t}\n+\n+\t/* this assures flush cqes being generated only for pending wqes */\n+\tif ((my_qp->sq_map.left_to_poll == 0) &&\n+\t\t\t\t(my_qp->rq_map.left_to_poll == 0)) {\n+\t\tspin_lock_irqsave(&my_qp->send_cq->spinlock, flags);\n+\t\tehca_add_to_err_list(my_qp, 1);\n+\t\tspin_unlock_irqrestore(&my_qp->send_cq->spinlock, flags);\n+\n+\t\tif (HAS_RQ(my_qp)) {\n+\t\t\tspin_lock_irqsave(&my_qp->recv_cq->spinlock, flags);\n+\t\t\tehca_add_to_err_list(my_qp, 0);\n+\t\t\tspin_unlock_irqrestore(&my_qp->recv_cq->spinlock,\n+\t\t\t\t\tflags);\n+\t\t}\n+\t}\n+\n+\treturn 0;\n+}\n+\n /*\n * internal_modify_qp with circumvention to handle aqp0 properly\n * smi_reset2init indicates if this is an internal reset-to-init-call for\n@@ -1539,10 +1703,27 @@ static int internal_modify_qp(struct ib_\n \t\t\tgoto modify_qp_exit2;\n \t\t}\n \t}\n+\tif ((qp_new_state == IB_QPS_ERR) && (qp_cur_state != IB_QPS_ERR)) {\n+\t\tret = check_for_left_cqes(my_qp, shca);\n+\t\tif (ret)\n+\t\t\tgoto modify_qp_exit2;\n+\t}\n \n \tif (statetrans == IB_QPST_ANY2RESET) {\n \t\tipz_qeit_reset(&my_qp->ipz_rqueue);\n \t\tipz_qeit_reset(&my_qp->ipz_squeue);\n+\n+\t\tif (qp_cur_state == IB_QPS_ERR) {\n+\t\t\tdel_from_err_list(my_qp->send_cq, &my_qp->sq_err_node);\n+\n+\t\t\tif (HAS_RQ(my_qp))\n+\t\t\t\tdel_from_err_list(my_qp->recv_cq,\n+\t\t\t\t\t\t &my_qp->rq_err_node);\n+\t\t}\n+\t\treset_queue_map(&my_qp->sq_map);\n+\n+\t\tif (HAS_RQ(my_qp))\n+\t\t\treset_queue_map(&my_qp->rq_map);\n \t}\n \n \tif (attr_mask & IB_QP_QKEY)\n@@ -1958,6 +2139,16 @@ static int internal_destroy_qp(struct ib\n \tidr_remove(&ehca_qp_idr, my_qp->token);\n \twrite_unlock_irqrestore(&ehca_qp_idr_lock, flags);\n \n+\t/*\n+\t * SRQs will never get into an error list and do not have a recv_cq,\n+\t * so we need to skip them here.\n+\t */\n+\tif (HAS_RQ(my_qp) && !IS_SRQ(my_qp))\n+\t\tdel_from_err_list(my_qp->recv_cq, &my_qp->rq_err_node);\n+\n+\tif (HAS_SQ(my_qp))\n+\t\tdel_from_err_list(my_qp->send_cq, &my_qp->sq_err_node);\n+\n \t/* now wait until all pending events have completed */\n \twait_event(my_qp->wait_completion, !atomic_read(&my_qp->nr_events));\n \n@@ -1983,7 +2174,7 @@ static int internal_destroy_qp(struct ib\n \tif (qp_type == IB_QPT_GSI) {\n \t\tstruct ib_event event;\n \t\tehca_info(dev, \"device %s: port %x is inactive.\",\n-\t\t\t shca->ib_device.name, port_num);\n+\t\t\t\tshca->ib_device.name, port_num);\n \t\tevent.device = &shca->ib_device;\n \t\tevent.event = IB_EVENT_PORT_ERR;\n \t\tevent.element.port_num = port_num;\n@@ -1991,11 +2182,15 @@ static int internal_destroy_qp(struct ib\n \t\tib_dispatch_event(&event);\n \t}\n \n-\tif (HAS_RQ(my_qp))\n+\tif (HAS_RQ(my_qp)) {\n \t\tipz_queue_dtor(my_pd, &my_qp->ipz_rqueue);\n+\n+\t\tvfree(my_qp->rq_map.map);\n+\t}\n \tif (HAS_SQ(my_qp)) {\n \t\tipz_queue_dtor(my_pd, &my_qp->ipz_squeue);\n-\t\tvfree(my_qp->sq_map);\n+\n+\t\tvfree(my_qp->sq_map.map);\n \t}\n \tkmem_cache_free(qp_cache, my_qp);\n \tatomic_dec(&shca->num_qps);\n--- infiniband.git.orig/drivers/infiniband/hw/ehca/ehca_iverbs.h\n+++ infiniband.git/drivers/infiniband/hw/ehca/ehca_iverbs.h\n@@ -197,6 +197,8 @@ void ehca_poll_eqs(unsigned long data);\n int ehca_calc_ipd(struct ehca_shca *shca, int port,\n \t\t enum ib_rate path_rate, u32 *ipd);\n \n+void ehca_add_to_err_list(struct ehca_qp *qp, int on_sq);\n+\n #ifdef CONFIG_PPC_64K_PAGES\n void *ehca_alloc_fw_ctrlblock(gfp_t flags);\n void ehca_free_fw_ctrlblock(void *ptr);\n", "prefixes": [] }