[{"id":1770866,"web_url":"http://patchwork.ozlabs.org/comment/1770866/","msgid":"<20170919025747.GM27153@umbus>","list_archive_url":null,"date":"2017-09-19T02:57:47","subject":"Re: [Qemu-devel] [RFC PATCH v2 07/21] ppc/xive: add MMIO handlers\n\tfor the XIVE interrupt sources","submitter":{"id":47,"url":"http://patchwork.ozlabs.org/api/people/47/","name":"David Gibson","email":"david@gibson.dropbear.id.au"},"content":"On Mon, Sep 11, 2017 at 07:12:21PM +0200, Cédric Le Goater wrote:\n> Each interrupt source is associated with a two bit state machine\n> called an Event State Buffer (ESB) which is controlled by MMIO to\n> trigger events. See code for more details on the states and\n> transitions.\n> \n> The MMIO space for the ESB translation is 512GB large on baremetal\n> (powernv) systems and the BAR depends on the chip id. In our model for\n> the sPAPR machine, we choose to only map a sub memory region for the\n> provisionned IRQ numbers and to use the mapping address of chip 0 on a\n> real system. The OS will get the address of the MMIO page of the ESB\n> entry associated with an IRQ using the H_INT_GET_SOURCE_INFO hcall.\n\nOn bare metal, are the MMIOs for each irq source mapped contiguously?\n\n> For KVM support, we should think of a way to map this QEMU memory\n> region in the host to trigger events directly.\n\nThis would rely on being able to map them without mapping those for\nany other VM or the host.  Does that mean allocating a contiguous (and\naligned) hunk of irqs for a guest?\n\nWe're going to need to be careful about irq allocation here.\nEven though GET_SOURCE_INFO allows dynamic mapping of irq numbers to\nMMIO addresses, we need the MMIO addresses to be stable and\nconsistent, because we can't have them change across migration.  We\nneed to have this consistent between in-qemu and in-KVM XIVE\nimplementations as well.\n\n> \n> Signed-off-by: Cédric Le Goater <clg@kaod.org>\n> ---\n>  hw/intc/spapr_xive.c        | 255 ++++++++++++++++++++++++++++++++++++++++++++\n>  include/hw/ppc/spapr_xive.h |   6 ++\n>  2 files changed, 261 insertions(+)\n> \n> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c\n> index 1ed7b6a286e9..8a85d64efc4c 100644\n> --- a/hw/intc/spapr_xive.c\n> +++ b/hw/intc/spapr_xive.c\n> @@ -33,6 +33,218 @@ static void spapr_xive_irq(sPAPRXive *xive, int srcno)\n>  }\n>  \n>  /*\n> + * \"magic\" Event State Buffer (ESB) MMIO offsets.\n> + *\n> + * Each interrupt source has a 2-bit state machine called ESB\n> + * which can be controlled by MMIO. It's made of 2 bits, P and\n> + * Q. P indicates that an interrupt is pending (has been sent\n> + * to a queue and is waiting for an EOI). Q indicates that the\n> + * interrupt has been triggered while pending.\n> + *\n> + * This acts as a coalescing mechanism in order to guarantee\n> + * that a given interrupt only occurs at most once in a queue.\n> + *\n> + * When doing an EOI, the Q bit will indicate if the interrupt\n> + * needs to be re-triggered.\n> + *\n> + * The following offsets into the ESB MMIO allow to read or\n> + * manipulate the PQ bits. They must be used with an 8-bytes\n> + * load instruction. They all return the previous state of the\n> + * interrupt (atomically).\n> + *\n> + * Additionally, some ESB pages support doing an EOI via a\n> + * store at 0 and some ESBs support doing a trigger via a\n> + * separate trigger page.\n> + */\n> +#define XIVE_ESB_GET            0x800\n> +#define XIVE_ESB_SET_PQ_00      0xc00\n> +#define XIVE_ESB_SET_PQ_01      0xd00\n> +#define XIVE_ESB_SET_PQ_10      0xe00\n> +#define XIVE_ESB_SET_PQ_11      0xf00\n> +\n> +#define XIVE_ESB_VAL_P          0x2\n> +#define XIVE_ESB_VAL_Q          0x1\n> +\n> +#define XIVE_ESB_RESET          0x0\n> +#define XIVE_ESB_PENDING        XIVE_ESB_VAL_P\n> +#define XIVE_ESB_QUEUED         (XIVE_ESB_VAL_P | XIVE_ESB_VAL_Q)\n> +#define XIVE_ESB_OFF            XIVE_ESB_VAL_Q\n> +\n> +static uint8_t spapr_xive_pq_get(sPAPRXive *xive, uint32_t idx)\n> +{\n> +    uint32_t byte = idx / 4;\n> +    uint32_t bit  = (idx % 4) * 2;\n> +\n> +    assert(byte < xive->sbe_size);\n> +\n> +    return (xive->sbe[byte] >> bit) & 0x3;\n> +}\n> +\n> +static uint8_t spapr_xive_pq_set(sPAPRXive *xive, uint32_t idx, uint8_t pq)\n> +{\n> +    uint32_t byte = idx / 4;\n> +    uint32_t bit  = (idx % 4) * 2;\n> +    uint8_t old, new;\n> +\n> +    assert(byte < xive->sbe_size);\n> +\n> +    old = xive->sbe[byte];\n> +\n> +    new = xive->sbe[byte] & ~(0x3 << bit);\n> +    new |= (pq & 0x3) << bit;\n> +\n> +    xive->sbe[byte] = new;\n> +\n> +    return (old >> bit) & 0x3;\n> +}\n> +\n> +static bool spapr_xive_pq_eoi(sPAPRXive *xive, uint32_t srcno)\n> +{\n> +    uint8_t old_pq = spapr_xive_pq_get(xive, srcno);\n> +\n> +    switch (old_pq) {\n> +    case XIVE_ESB_RESET:\n> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_RESET);\n> +        return false;\n> +    case XIVE_ESB_PENDING:\n> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_RESET);\n> +        return false;\n> +    case XIVE_ESB_QUEUED:\n> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_PENDING);\n> +        return true;\n> +    case XIVE_ESB_OFF:\n> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_OFF);\n> +        return false;\n> +    default:\n> +         g_assert_not_reached();\n> +    }\n> +}\n> +\n> +static bool spapr_xive_pq_trigger(sPAPRXive *xive, uint32_t srcno)\n> +{\n> +    uint8_t old_pq = spapr_xive_pq_get(xive, srcno);\n> +\n> +    switch (old_pq) {\n> +    case XIVE_ESB_RESET:\n> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_PENDING);\n> +        return true;\n> +    case XIVE_ESB_PENDING:\n> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_QUEUED);\n> +        return true;\n> +    case XIVE_ESB_QUEUED:\n> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_QUEUED);\n> +        return true;\n> +    case XIVE_ESB_OFF:\n> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_OFF);\n> +        return false;\n> +    default:\n> +         g_assert_not_reached();\n> +    }\n> +}\n> +\n> +/*\n> + * XIVE Interrupt Source MMIOs\n> + */\n> +static void spapr_xive_source_eoi(sPAPRXive *xive, uint32_t srcno)\n> +{\n> +    ICSIRQState *irq = &xive->ics->irqs[srcno];\n> +\n> +    if (irq->flags & XICS_FLAGS_IRQ_LSI) {\n> +        irq->status &= ~XICS_STATUS_SENT;\n> +    }\n> +}\n> +\n> +/* TODO: handle second page\n> + *\n> + * Some HW use a separate page for trigger. We only support the case\n> + * in which the trigger can be done in the same page as the EOI.\n> + */\n> +static uint64_t spapr_xive_esb_read(void *opaque, hwaddr addr, unsigned size)\n> +{\n> +    sPAPRXive *xive = SPAPR_XIVE(opaque);\n> +    uint32_t offset = addr & 0xF00;\n> +    uint32_t srcno = addr >> xive->esb_shift;\n> +    XiveIVE *ive;\n> +    uint64_t ret = -1;\n> +\n> +    ive = spapr_xive_get_ive(xive, srcno);\n> +    if (!ive || !(ive->w & IVE_VALID))  {\n> +        qemu_log_mask(LOG_GUEST_ERROR, \"XIVE: invalid LISN %d\\n\", srcno);\n> +        goto out;\n\nSince there's a whole (4k) page for each source, I wonder if we should\nactually map each one as a separate MMIO region to allow us to tweak\nthe mappings more flexibly.\n\n> +    }\n> +\n> +    switch (offset) {\n> +    case 0:\n> +        spapr_xive_source_eoi(xive, srcno);\n> +\n> +        /* return TRUE or FALSE depending on PQ value */\n> +        ret = spapr_xive_pq_eoi(xive, srcno);\n> +        break;\n> +\n> +    case XIVE_ESB_GET:\n> +        ret = spapr_xive_pq_get(xive, srcno);\n> +        break;\n> +\n> +    case XIVE_ESB_SET_PQ_00:\n> +    case XIVE_ESB_SET_PQ_01:\n> +    case XIVE_ESB_SET_PQ_10:\n> +    case XIVE_ESB_SET_PQ_11:\n> +        ret = spapr_xive_pq_set(xive, srcno, (offset >> 8) & 0x3);\n> +        break;\n> +    default:\n> +        qemu_log_mask(LOG_GUEST_ERROR, \"XIVE: invalid ESB addr %d\\n\", offset);\n> +    }\n> +\n> +out:\n> +    return ret;\n> +}\n> +\n> +static void spapr_xive_esb_write(void *opaque, hwaddr addr,\n> +                           uint64_t value, unsigned size)\n> +{\n> +    sPAPRXive *xive = SPAPR_XIVE(opaque);\n> +    uint32_t offset = addr & 0xF00;\n> +    uint32_t srcno = addr >> xive->esb_shift;\n> +    XiveIVE *ive;\n> +    bool notify = false;\n> +\n> +    ive = spapr_xive_get_ive(xive, srcno);\n> +    if (!ive || !(ive->w & IVE_VALID))  {\n> +        qemu_log_mask(LOG_GUEST_ERROR, \"XIVE: invalid LISN %d\\n\", srcno);\n> +        return;\n> +    }\n> +\n> +    switch (offset) {\n> +    case 0:\n> +        /* TODO: should we trigger even if the IVE is masked ? */\n> +        notify = spapr_xive_pq_trigger(xive, srcno);\n> +        break;\n> +    default:\n> +        qemu_log_mask(LOG_GUEST_ERROR, \"XIVE: invalid ESB write addr %d\\n\",\n> +                      offset);\n> +        return;\n> +    }\n> +\n> +    if (notify && !(ive->w & IVE_MASKED)) {\n> +        qemu_irq_pulse(xive->qirqs[srcno]);\n> +    }\n> +}\n> +\n> +static const MemoryRegionOps spapr_xive_esb_ops = {\n> +    .read = spapr_xive_esb_read,\n> +    .write = spapr_xive_esb_write,\n> +    .endianness = DEVICE_BIG_ENDIAN,\n> +    .valid = {\n> +        .min_access_size = 8,\n> +        .max_access_size = 8,\n> +    },\n> +    .impl = {\n> +        .min_access_size = 8,\n> +        .max_access_size = 8,\n> +    },\n> +};\n> +\n> +/*\n>   * XIVE Interrupt Source\n>   */\n>  static void spapr_xive_source_set_irq_msi(sPAPRXive *xive, int srcno, int val)\n> @@ -74,6 +286,33 @@ static void spapr_xive_source_set_irq(void *opaque, int srcno, int val)\n>  /*\n>   * Main XIVE object\n>   */\n> +#define P9_MMIO_BASE     0x006000000000000ull\n> +\n> +/* VC BAR contains set translations for the ESBs and the EQs. */\n> +#define VC_BAR_DEFAULT   0x10000000000ull\n> +#define VC_BAR_SIZE      0x08000000000ull\n> +#define ESB_SHIFT        16 /* One 64k page. OPAL has two */\n> +\n> +static uint64_t spapr_xive_esb_default_read(void *p, hwaddr offset,\n> +                                            unsigned size)\n> +{\n> +    qemu_log_mask(LOG_UNIMP, \"%s: 0x%\" HWADDR_PRIx \" [%u]\\n\",\n> +                  __func__, offset, size);\n> +    return 0;\n> +}\n> +\n> +static void spapr_xive_esb_default_write(void *opaque, hwaddr offset,\n> +                                         uint64_t value, unsigned size)\n> +{\n> +    qemu_log_mask(LOG_UNIMP, \"%s: 0x%\" HWADDR_PRIx \" <- 0x%\" PRIx64 \" [%u]\\n\",\n> +                  __func__, offset, value, size);\n> +}\n> +\n> +static const MemoryRegionOps spapr_xive_esb_default_ops = {\n> +    .read = spapr_xive_esb_default_read,\n> +    .write = spapr_xive_esb_default_write,\n> +    .endianness = DEVICE_BIG_ENDIAN,\n> +};\n>  \n>  void spapr_xive_reset(void *dev)\n>  {\n> @@ -144,6 +383,22 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)\n>      xive->nr_eqs = xive->nr_targets * XIVE_EQ_PRIORITY_COUNT;\n>      xive->eqt = g_malloc0(xive->nr_eqs * sizeof(XiveEQ));\n>  \n> +    /* VC BAR. That's the full window but we will only map the\n> +     * subregions in use. */\n> +    xive->esb_base = (P9_MMIO_BASE | VC_BAR_DEFAULT);\n> +    xive->esb_shift = ESB_SHIFT;\n> +\n> +    /* Install default memory region handlers to log bogus access */\n> +    memory_region_init_io(&xive->esb_mr, NULL, &spapr_xive_esb_default_ops,\n> +                          NULL, \"xive.esb.full\", VC_BAR_SIZE);\n> +    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &xive->esb_mr);\n> +\n> +    /* Install the ESB memory region in the overall one */\n> +    memory_region_init_io(&xive->esb_iomem, OBJECT(xive), &spapr_xive_esb_ops,\n> +                          xive, \"xive.esb\",\n> +                          (1ull << xive->esb_shift) * xive->nr_irqs);\n> +    memory_region_add_subregion(&xive->esb_mr, 0, &xive->esb_iomem);\n> +\n>      qemu_register_reset(spapr_xive_reset, dev);\n>  }\n>  \n> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h\n> index eab92c4c1bb8..0f516534d76a 100644\n> --- a/include/hw/ppc/spapr_xive.h\n> +++ b/include/hw/ppc/spapr_xive.h\n> @@ -46,6 +46,12 @@ struct sPAPRXive {\n>      XiveIVE      *ivt;\n>      XiveEQ       *eqt;\n>      uint32_t     nr_eqs;\n> +\n> +    /* ESB memory region */\n> +    uint32_t     esb_shift;\n> +    hwaddr       esb_base;\n> +    MemoryRegion esb_mr;\n> +    MemoryRegion esb_iomem;\n>  };\n>  \n>  #endif /* PPC_SPAPR_XIVE_H */","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=nongnu.org\n\t(client-ip=2001:4830:134:3::11; helo=lists.gnu.org;\n\tenvelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n\treceiver=<UNKNOWN>)","ozlabs.org;\n\tdkim=fail reason=\"signature verification failed\" (1024-bit key;\n\tunprotected) header.d=gibson.dropbear.id.au\n\theader.i=@gibson.dropbear.id.au header.b=\"bAhB3ioJ\"; \n\tdkim-atps=neutral"],"Received":["from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])\n\t(using TLSv1 with cipher AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xxKf465vzz9s7m\n\tfor <incoming@patchwork.ozlabs.org>;\n\tTue, 19 Sep 2017 21:00:12 +1000 (AEST)","from localhost ([::1]:41480 helo=lists.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.71) (envelope-from\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)\n\tid 1duGGB-0003Kl-0h\n\tfor incoming@patchwork.ozlabs.org; Tue, 19 Sep 2017 07:00:11 -0400","from eggs.gnu.org ([2001:4830:134:3::10]:51214)\n\tby lists.gnu.org with esmtp (Exim 4.71)\n\t(envelope-from <dgibson@ozlabs.org>) id 1duFtX-0001Go-9k\n\tfor qemu-devel@nongnu.org; Tue, 19 Sep 2017 06:36:52 -0400","from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)\n\t(envelope-from <dgibson@ozlabs.org>) id 1duFtT-0002sC-G6\n\tfor qemu-devel@nongnu.org; Tue, 19 Sep 2017 06:36:47 -0400","from ozlabs.org ([103.22.144.67]:43803)\n\tby eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)\n\t(Exim 4.71) (envelope-from <dgibson@ozlabs.org>)\n\tid 1duFtS-0002oC-PH; Tue, 19 Sep 2017 06:36:43 -0400","by ozlabs.org (Postfix, from userid 1007)\n\tid 3xxK6q1r3bz9t41; Tue, 19 Sep 2017 20:36:34 +1000 (AEST)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/simple;\n\td=gibson.dropbear.id.au; s=201602; t=1505817395;\n\tbh=bvLw3O6qeaDp/Gd/DWFEeEFjlgbGRxZ4onHfMQSo6V8=;\n\th=Date:From:To:Cc:Subject:References:In-Reply-To:From;\n\tb=bAhB3ioJvnB+pHEiCK1ad1/F9iLcmzruGZ7NpnB+kCn4DVPgR1BFOgnETnuYLh3B3\n\t68rswo05iMWenCGouW5Shc4Ewd4grbLr7w37d0M+D7F7Go3w+CBgx3kX1LJJ+eYOz2\n\tUSpqxpjtrkO7aUyEh06xMKM0VJdt2ly/Z5M3kQYk=","Date":"Tue, 19 Sep 2017 12:57:47 +1000","From":"David Gibson <david@gibson.dropbear.id.au>","To":"=?iso-8859-1?q?C=E9dric?= Le Goater <clg@kaod.org>","Message-ID":"<20170919025747.GM27153@umbus>","References":"<20170911171235.29331-1-clg@kaod.org>\n\t<20170911171235.29331-8-clg@kaod.org>","MIME-Version":"1.0","Content-Type":"multipart/signed; micalg=pgp-sha256;\n\tprotocol=\"application/pgp-signature\"; boundary=\"Cqq5NadOW2RfLMJ/\"","Content-Disposition":"inline","In-Reply-To":"<20170911171235.29331-8-clg@kaod.org>","User-Agent":"Mutt/1.8.3 (2017-05-23)","X-detected-operating-system":"by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]\n\t[fuzzy]","X-Received-From":"103.22.144.67","Subject":"Re: [Qemu-devel] [RFC PATCH v2 07/21] ppc/xive: add MMIO handlers\n\tfor the XIVE interrupt sources","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.21","Precedence":"list","List-Id":"<qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<http://lists.nongnu.org/archive/html/qemu-devel/>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Cc":"Alexey Kardashevskiy <aik@ozlabs.ru>, qemu-ppc@nongnu.org,\n\tqemu-devel@nongnu.org, Alexander Graf <agraf@suse.de>","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"\"Qemu-devel\"\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>"}},{"id":1771831,"web_url":"http://patchwork.ozlabs.org/comment/1771831/","msgid":"<771e897f-f304-3616-2e9c-e582f556b914@kaod.org>","list_archive_url":null,"date":"2017-09-20T13:05:18","subject":"Re: [Qemu-devel] [RFC PATCH v2 07/21] ppc/xive: add MMIO handlers\n\tfor the XIVE interrupt sources","submitter":{"id":68548,"url":"http://patchwork.ozlabs.org/api/people/68548/","name":"Cédric Le Goater","email":"clg@kaod.org"},"content":">> +/*\n>> + * XIVE Interrupt Source MMIOs\n>> + */\n>> +static uint64_t spapr_xive_esb_read(void *opaque, hwaddr addr, unsigned size)\n>> +{\n>> +    sPAPRXive *xive = SPAPR_XIVE(opaque);\n>> +    uint32_t offset = addr & 0xF00;\n>> +    uint32_t srcno = addr >> xive->esb_shift;\n>> +    XiveIVE *ive;\n>> +    uint64_t ret = -1;\n>> +\n>> +    ive = spapr_xive_get_ive(xive, srcno);\n>> +    if (!ive || !(ive->w & IVE_VALID))  {\n>> +        qemu_log_mask(LOG_GUEST_ERROR, \"XIVE: invalid LISN %d\\n\", srcno);\n>> +        goto out;\n> \n> Since there's a whole (4k) page for each source, I wonder if we should\n> actually map each one as a separate MMIO region to allow us to tweak\n> the mappings more flexibly\nyes we could have a subregion for each source. In that case, \nwe should also handle IVE_VALID properly. That will require \na specific XIVE allocator which was difficult to do while\nkeeping the compatibility with XICS for migration and CAS.\n\n\nC.","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":"ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=nongnu.org\n\t(client-ip=2001:4830:134:3::11; helo=lists.gnu.org;\n\tenvelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n\treceiver=<UNKNOWN>)","Received":["from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])\n\t(using TLSv1 with cipher AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xy0Xz0RxBz9s06\n\tfor <incoming@patchwork.ozlabs.org>;\n\tWed, 20 Sep 2017 23:13:07 +1000 (AEST)","from localhost ([::1]:47991 helo=lists.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.71) (envelope-from\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)\n\tid 1dueoL-00087c-13\n\tfor incoming@patchwork.ozlabs.org; Wed, 20 Sep 2017 09:13:05 -0400","from eggs.gnu.org ([2001:4830:134:3::10]:41234)\n\tby lists.gnu.org with esmtp (Exim 4.71)\n\t(envelope-from <clg@kaod.org>) id 1duemz-0007bR-L3\n\tfor qemu-devel@nongnu.org; Wed, 20 Sep 2017 09:12:22 -0400","from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)\n\t(envelope-from <clg@kaod.org>) id 1duemP-0006gX-3q\n\tfor qemu-devel@nongnu.org; Wed, 20 Sep 2017 09:11:41 -0400","from 7.mo2.mail-out.ovh.net ([188.165.48.182]:35756)\n\tby eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)\n\t(Exim 4.71) (envelope-from <clg@kaod.org>) id 1duemO-0006f3-Rt\n\tfor qemu-devel@nongnu.org; Wed, 20 Sep 2017 09:11:05 -0400","from player157.ha.ovh.net (b9.ovh.net [213.186.33.59])\n\tby mo2.mail-out.ovh.net (Postfix) with ESMTP id D2907ACA99\n\tfor <qemu-devel@nongnu.org>; Wed, 20 Sep 2017 15:05:24 +0200 (CEST)","from zorba.kaod.org (LFbn-1-2231-173.w90-76.abo.wanadoo.fr\n\t[90.76.52.173]) (Authenticated sender: postmaster@kaod.org)\n\tby player157.ha.ovh.net (Postfix) with ESMTPSA id 6174B50008D;\n\tWed, 20 Sep 2017 15:05:18 +0200 (CEST)"],"To":"David Gibson <david@gibson.dropbear.id.au>","References":"<20170911171235.29331-1-clg@kaod.org>\n\t<20170911171235.29331-8-clg@kaod.org> <20170919025747.GM27153@umbus>","From":"=?utf-8?q?C=C3=A9dric_Le_Goater?= <clg@kaod.org>","Message-ID":"<771e897f-f304-3616-2e9c-e582f556b914@kaod.org>","Date":"Wed, 20 Sep 2017 15:05:18 +0200","User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101\n\tThunderbird/52.3.0","MIME-Version":"1.0","In-Reply-To":"<20170919025747.GM27153@umbus>","Content-Type":"text/plain; charset=windows-1252","Content-Language":"en-US","Content-Transfer-Encoding":"7bit","X-Ovh-Tracer-Id":"7463590484540164984","X-VR-SPAMSTATE":"OK","X-VR-SPAMSCORE":"-100","X-VR-SPAMCAUSE":"gggruggvucftvghtrhhoucdtuddrfeelledriedtgdefvdcutefuodetggdotefrodftvfcurfhrohhfihhlvgemucfqggfjpdevjffgvefmvefgnecuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd","X-detected-operating-system":"by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]\n\t[fuzzy]","X-Received-From":"188.165.48.182","Subject":"Re: [Qemu-devel] [RFC PATCH v2 07/21] ppc/xive: add MMIO handlers\n\tfor the XIVE interrupt sources","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.21","Precedence":"list","List-Id":"<qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<http://lists.nongnu.org/archive/html/qemu-devel/>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Cc":"Alexey Kardashevskiy <aik@ozlabs.ru>, qemu-ppc@nongnu.org,\n\tqemu-devel@nongnu.org, Alexander Graf <agraf@suse.de>","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"\"Qemu-devel\"\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>"}},{"id":1771832,"web_url":"http://patchwork.ozlabs.org/comment/1771832/","msgid":"<131a8102-00e4-9621-ddca-39bc685b95cf@kaod.org>","list_archive_url":null,"date":"2017-09-20T12:54:31","subject":"Re: [Qemu-devel] [RFC PATCH v2 07/21] ppc/xive: add MMIO handlers\n\tfor the XIVE interrupt sources","submitter":{"id":68548,"url":"http://patchwork.ozlabs.org/api/people/68548/","name":"Cédric Le Goater","email":"clg@kaod.org"},"content":"On 09/19/2017 04:57 AM, David Gibson wrote:\n> On Mon, Sep 11, 2017 at 07:12:21PM +0200, Cédric Le Goater wrote:\n>> Each interrupt source is associated with a two bit state machine\n>> called an Event State Buffer (ESB) which is controlled by MMIO to\n>> trigger events. See code for more details on the states and\n>> transitions.\n>>\n>> The MMIO space for the ESB translation is 512GB large on baremetal\n>> (powernv) systems and the BAR depends on the chip id. In our model for\n>> the sPAPR machine, we choose to only map a sub memory region for the\n>> provisionned IRQ numbers and to use the mapping address of chip 0 on a\n>> real system. The OS will get the address of the MMIO page of the ESB\n>> entry associated with an IRQ using the H_INT_GET_SOURCE_INFO hcall.\n> \n> On bare metal, are the MMIOs for each irq source mapped contiguously?\n\nyes. \n \n>> For KVM support, we should think of a way to map this QEMU memory\n>> region in the host to trigger events directly.\n> \n> This would rely on being able to map them without mapping those for\n> any other VM or the host.  Does that mean allocating a contiguous (and\n> aligned) hunk of irqs for a guest?\n\nI think so yes, the IRQ and the memory regions are tied, and also being \nable to pass the MMIO region from the host to the guest, a bit like VFIO \nfor the IOMMU regions I suppose. But I haven't dig the problem too much. \n\nThis is an important part in the overall design. \n\n> We're going to need to be careful about irq allocation here.\n> Even though GET_SOURCE_INFO allows dynamic mapping of irq numbers to\n> MMIO addresses, \n\nGET_SOURCE_INFO only retrieves the address of the MMIO region for \na 'lisn'. it is not dynamically mapped. In the KVM case, the initial\ninformation on the address would come from OPAL and then the host \nkernel would translate this information for the guest.\n\n> we need the MMIO addresses to be stable and consistent, because \n> we can't have them change across migration.  \n\nyes. I will catch my XIVE guru next week in Paris to clarify that\npart. \n\n> We need to have this consistent between in-qemu and in-KVM XIVE\n> implementations as well.\n\nyes.\n\nC.\n\n>>\n>> Signed-off-by: Cédric Le Goater <clg@kaod.org>\n>> ---\n>>  hw/intc/spapr_xive.c        | 255 ++++++++++++++++++++++++++++++++++++++++++++\n>>  include/hw/ppc/spapr_xive.h |   6 ++\n>>  2 files changed, 261 insertions(+)\n>>\n>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c\n>> index 1ed7b6a286e9..8a85d64efc4c 100644\n>> --- a/hw/intc/spapr_xive.c\n>> +++ b/hw/intc/spapr_xive.c\n>> @@ -33,6 +33,218 @@ static void spapr_xive_irq(sPAPRXive *xive, int srcno)\n>>  }\n>>  \n>>  /*\n>> + * \"magic\" Event State Buffer (ESB) MMIO offsets.\n>> + *\n>> + * Each interrupt source has a 2-bit state machine called ESB\n>> + * which can be controlled by MMIO. It's made of 2 bits, P and\n>> + * Q. P indicates that an interrupt is pending (has been sent\n>> + * to a queue and is waiting for an EOI). Q indicates that the\n>> + * interrupt has been triggered while pending.\n>> + *\n>> + * This acts as a coalescing mechanism in order to guarantee\n>> + * that a given interrupt only occurs at most once in a queue.\n>> + *\n>> + * When doing an EOI, the Q bit will indicate if the interrupt\n>> + * needs to be re-triggered.\n>> + *\n>> + * The following offsets into the ESB MMIO allow to read or\n>> + * manipulate the PQ bits. They must be used with an 8-bytes\n>> + * load instruction. They all return the previous state of the\n>> + * interrupt (atomically).\n>> + *\n>> + * Additionally, some ESB pages support doing an EOI via a\n>> + * store at 0 and some ESBs support doing a trigger via a\n>> + * separate trigger page.\n>> + */\n>> +#define XIVE_ESB_GET            0x800\n>> +#define XIVE_ESB_SET_PQ_00      0xc00\n>> +#define XIVE_ESB_SET_PQ_01      0xd00\n>> +#define XIVE_ESB_SET_PQ_10      0xe00\n>> +#define XIVE_ESB_SET_PQ_11      0xf00\n>> +\n>> +#define XIVE_ESB_VAL_P          0x2\n>> +#define XIVE_ESB_VAL_Q          0x1\n>> +\n>> +#define XIVE_ESB_RESET          0x0\n>> +#define XIVE_ESB_PENDING        XIVE_ESB_VAL_P\n>> +#define XIVE_ESB_QUEUED         (XIVE_ESB_VAL_P | XIVE_ESB_VAL_Q)\n>> +#define XIVE_ESB_OFF            XIVE_ESB_VAL_Q\n>> +\n>> +static uint8_t spapr_xive_pq_get(sPAPRXive *xive, uint32_t idx)\n>> +{\n>> +    uint32_t byte = idx / 4;\n>> +    uint32_t bit  = (idx % 4) * 2;\n>> +\n>> +    assert(byte < xive->sbe_size);\n>> +\n>> +    return (xive->sbe[byte] >> bit) & 0x3;\n>> +}\n>> +\n>> +static uint8_t spapr_xive_pq_set(sPAPRXive *xive, uint32_t idx, uint8_t pq)\n>> +{\n>> +    uint32_t byte = idx / 4;\n>> +    uint32_t bit  = (idx % 4) * 2;\n>> +    uint8_t old, new;\n>> +\n>> +    assert(byte < xive->sbe_size);\n>> +\n>> +    old = xive->sbe[byte];\n>> +\n>> +    new = xive->sbe[byte] & ~(0x3 << bit);\n>> +    new |= (pq & 0x3) << bit;\n>> +\n>> +    xive->sbe[byte] = new;\n>> +\n>> +    return (old >> bit) & 0x3;\n>> +}\n>> +\n>> +static bool spapr_xive_pq_eoi(sPAPRXive *xive, uint32_t srcno)\n>> +{\n>> +    uint8_t old_pq = spapr_xive_pq_get(xive, srcno);\n>> +\n>> +    switch (old_pq) {\n>> +    case XIVE_ESB_RESET:\n>> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_RESET);\n>> +        return false;\n>> +    case XIVE_ESB_PENDING:\n>> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_RESET);\n>> +        return false;\n>> +    case XIVE_ESB_QUEUED:\n>> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_PENDING);\n>> +        return true;\n>> +    case XIVE_ESB_OFF:\n>> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_OFF);\n>> +        return false;\n>> +    default:\n>> +         g_assert_not_reached();\n>> +    }\n>> +}\n>> +\n>> +static bool spapr_xive_pq_trigger(sPAPRXive *xive, uint32_t srcno)\n>> +{\n>> +    uint8_t old_pq = spapr_xive_pq_get(xive, srcno);\n>> +\n>> +    switch (old_pq) {\n>> +    case XIVE_ESB_RESET:\n>> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_PENDING);\n>> +        return true;\n>> +    case XIVE_ESB_PENDING:\n>> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_QUEUED);\n>> +        return true;\n>> +    case XIVE_ESB_QUEUED:\n>> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_QUEUED);\n>> +        return true;\n>> +    case XIVE_ESB_OFF:\n>> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_OFF);\n>> +        return false;\n>> +    default:\n>> +         g_assert_not_reached();\n>> +    }\n>> +}\n>> +\n>> +/*\n>> + * XIVE Interrupt Source MMIOs\n>> + */\n>> +static void spapr_xive_source_eoi(sPAPRXive *xive, uint32_t srcno)\n>> +{\n>> +    ICSIRQState *irq = &xive->ics->irqs[srcno];\n>> +\n>> +    if (irq->flags & XICS_FLAGS_IRQ_LSI) {\n>> +        irq->status &= ~XICS_STATUS_SENT;\n>> +    }\n>> +}\n>> +\n>> +/* TODO: handle second page\n>> + *\n>> + * Some HW use a separate page for trigger. We only support the case\n>> + * in which the trigger can be done in the same page as the EOI.\n>> + */\n>> +static uint64_t spapr_xive_esb_read(void *opaque, hwaddr addr, unsigned size)\n>> +{\n>> +    sPAPRXive *xive = SPAPR_XIVE(opaque);\n>> +    uint32_t offset = addr & 0xF00;\n>> +    uint32_t srcno = addr >> xive->esb_shift;\n>> +    XiveIVE *ive;\n>> +    uint64_t ret = -1;\n>> +\n>> +    ive = spapr_xive_get_ive(xive, srcno);\n>> +    if (!ive || !(ive->w & IVE_VALID))  {\n>> +        qemu_log_mask(LOG_GUEST_ERROR, \"XIVE: invalid LISN %d\\n\", srcno);\n>> +        goto out;\n> \n> Since there's a whole (4k) page for each source, I wonder if we should\n> actually map each one as a separate MMIO region to allow us to tweak\n> the mappings more flexibly.\n> \n>> +    }\n>> +\n>> +    switch (offset) {\n>> +    case 0:\n>> +        spapr_xive_source_eoi(xive, srcno);\n>> +\n>> +        /* return TRUE or FALSE depending on PQ value */\n>> +        ret = spapr_xive_pq_eoi(xive, srcno);\n>> +        break;\n>> +\n>> +    case XIVE_ESB_GET:\n>> +        ret = spapr_xive_pq_get(xive, srcno);\n>> +        break;\n>> +\n>> +    case XIVE_ESB_SET_PQ_00:\n>> +    case XIVE_ESB_SET_PQ_01:\n>> +    case XIVE_ESB_SET_PQ_10:\n>> +    case XIVE_ESB_SET_PQ_11:\n>> +        ret = spapr_xive_pq_set(xive, srcno, (offset >> 8) & 0x3);\n>> +        break;\n>> +    default:\n>> +        qemu_log_mask(LOG_GUEST_ERROR, \"XIVE: invalid ESB addr %d\\n\", offset);\n>> +    }\n>> +\n>> +out:\n>> +    return ret;\n>> +}\n>> +\n>> +static void spapr_xive_esb_write(void *opaque, hwaddr addr,\n>> +                           uint64_t value, unsigned size)\n>> +{\n>> +    sPAPRXive *xive = SPAPR_XIVE(opaque);\n>> +    uint32_t offset = addr & 0xF00;\n>> +    uint32_t srcno = addr >> xive->esb_shift;\n>> +    XiveIVE *ive;\n>> +    bool notify = false;\n>> +\n>> +    ive = spapr_xive_get_ive(xive, srcno);\n>> +    if (!ive || !(ive->w & IVE_VALID))  {\n>> +        qemu_log_mask(LOG_GUEST_ERROR, \"XIVE: invalid LISN %d\\n\", srcno);\n>> +        return;\n>> +    }\n>> +\n>> +    switch (offset) {\n>> +    case 0:\n>> +        /* TODO: should we trigger even if the IVE is masked ? */\n>> +        notify = spapr_xive_pq_trigger(xive, srcno);\n>> +        break;\n>> +    default:\n>> +        qemu_log_mask(LOG_GUEST_ERROR, \"XIVE: invalid ESB write addr %d\\n\",\n>> +                      offset);\n>> +        return;\n>> +    }\n>> +\n>> +    if (notify && !(ive->w & IVE_MASKED)) {\n>> +        qemu_irq_pulse(xive->qirqs[srcno]);\n>> +    }\n>> +}\n>> +\n>> +static const MemoryRegionOps spapr_xive_esb_ops = {\n>> +    .read = spapr_xive_esb_read,\n>> +    .write = spapr_xive_esb_write,\n>> +    .endianness = DEVICE_BIG_ENDIAN,\n>> +    .valid = {\n>> +        .min_access_size = 8,\n>> +        .max_access_size = 8,\n>> +    },\n>> +    .impl = {\n>> +        .min_access_size = 8,\n>> +        .max_access_size = 8,\n>> +    },\n>> +};\n>> +\n>> +/*\n>>   * XIVE Interrupt Source\n>>   */\n>>  static void spapr_xive_source_set_irq_msi(sPAPRXive *xive, int srcno, int val)\n>> @@ -74,6 +286,33 @@ static void spapr_xive_source_set_irq(void *opaque, int srcno, int val)\n>>  /*\n>>   * Main XIVE object\n>>   */\n>> +#define P9_MMIO_BASE     0x006000000000000ull\n>> +\n>> +/* VC BAR contains set translations for the ESBs and the EQs. */\n>> +#define VC_BAR_DEFAULT   0x10000000000ull\n>> +#define VC_BAR_SIZE      0x08000000000ull\n>> +#define ESB_SHIFT        16 /* One 64k page. OPAL has two */\n>> +\n>> +static uint64_t spapr_xive_esb_default_read(void *p, hwaddr offset,\n>> +                                            unsigned size)\n>> +{\n>> +    qemu_log_mask(LOG_UNIMP, \"%s: 0x%\" HWADDR_PRIx \" [%u]\\n\",\n>> +                  __func__, offset, size);\n>> +    return 0;\n>> +}\n>> +\n>> +static void spapr_xive_esb_default_write(void *opaque, hwaddr offset,\n>> +                                         uint64_t value, unsigned size)\n>> +{\n>> +    qemu_log_mask(LOG_UNIMP, \"%s: 0x%\" HWADDR_PRIx \" <- 0x%\" PRIx64 \" [%u]\\n\",\n>> +                  __func__, offset, value, size);\n>> +}\n>> +\n>> +static const MemoryRegionOps spapr_xive_esb_default_ops = {\n>> +    .read = spapr_xive_esb_default_read,\n>> +    .write = spapr_xive_esb_default_write,\n>> +    .endianness = DEVICE_BIG_ENDIAN,\n>> +};\n>>  \n>>  void spapr_xive_reset(void *dev)\n>>  {\n>> @@ -144,6 +383,22 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)\n>>      xive->nr_eqs = xive->nr_targets * XIVE_EQ_PRIORITY_COUNT;\n>>      xive->eqt = g_malloc0(xive->nr_eqs * sizeof(XiveEQ));\n>>  \n>> +    /* VC BAR. That's the full window but we will only map the\n>> +     * subregions in use. */\n>> +    xive->esb_base = (P9_MMIO_BASE | VC_BAR_DEFAULT);\n>> +    xive->esb_shift = ESB_SHIFT;\n>> +\n>> +    /* Install default memory region handlers to log bogus access */\n>> +    memory_region_init_io(&xive->esb_mr, NULL, &spapr_xive_esb_default_ops,\n>> +                          NULL, \"xive.esb.full\", VC_BAR_SIZE);\n>> +    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &xive->esb_mr);\n>> +\n>> +    /* Install the ESB memory region in the overall one */\n>> +    memory_region_init_io(&xive->esb_iomem, OBJECT(xive), &spapr_xive_esb_ops,\n>> +                          xive, \"xive.esb\",\n>> +                          (1ull << xive->esb_shift) * xive->nr_irqs);\n>> +    memory_region_add_subregion(&xive->esb_mr, 0, &xive->esb_iomem);\n>> +\n>>      qemu_register_reset(spapr_xive_reset, dev);\n>>  }\n>>  \n>> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h\n>> index eab92c4c1bb8..0f516534d76a 100644\n>> --- a/include/hw/ppc/spapr_xive.h\n>> +++ b/include/hw/ppc/spapr_xive.h\n>> @@ -46,6 +46,12 @@ struct sPAPRXive {\n>>      XiveIVE      *ivt;\n>>      XiveEQ       *eqt;\n>>      uint32_t     nr_eqs;\n>> +\n>> +    /* ESB memory region */\n>> +    uint32_t     esb_shift;\n>> +    hwaddr       esb_base;\n>> +    MemoryRegion esb_mr;\n>> +    MemoryRegion esb_iomem;\n>>  };\n>>  \n>>  #endif /* PPC_SPAPR_XIVE_H */\n>","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":"ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=nongnu.org\n\t(client-ip=2001:4830:134:3::11; helo=lists.gnu.org;\n\tenvelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n\treceiver=<UNKNOWN>)","Received":["from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])\n\t(using TLSv1 with cipher AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xy0Xz2Lz6z9sP1\n\tfor <incoming@patchwork.ozlabs.org>;\n\tWed, 20 Sep 2017 23:13:07 +1000 (AEST)","from localhost ([::1]:47989 helo=lists.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.71) (envelope-from\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)\n\tid 1dueoL-000870-CN\n\tfor incoming@patchwork.ozlabs.org; Wed, 20 Sep 2017 09:13:05 -0400","from eggs.gnu.org ([2001:4830:134:3::10]:41283)\n\tby lists.gnu.org with esmtp (Exim 4.71)\n\t(envelope-from <clg@kaod.org>) id 1duen1-0007cX-W7\n\tfor qemu-devel@nongnu.org; Wed, 20 Sep 2017 09:12:21 -0400","from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)\n\t(envelope-from <clg@kaod.org>) id 1duemO-0006gC-CS\n\tfor qemu-devel@nongnu.org; Wed, 20 Sep 2017 09:11:43 -0400","from 3.mo2.mail-out.ovh.net ([46.105.58.226]:46952)\n\tby eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)\n\t(Exim 4.71) (envelope-from <clg@kaod.org>) id 1duemO-0006f6-3H\n\tfor qemu-devel@nongnu.org; Wed, 20 Sep 2017 09:11:04 -0400","from player157.ha.ovh.net (b9.ovh.net [213.186.33.59])\n\tby mo2.mail-out.ovh.net (Postfix) with ESMTP id DD3CAAC970\n\tfor <qemu-devel@nongnu.org>; Wed, 20 Sep 2017 14:54:37 +0200 (CEST)","from zorba.kaod.org (LFbn-1-2231-173.w90-76.abo.wanadoo.fr\n\t[90.76.52.173]) (Authenticated sender: postmaster@kaod.org)\n\tby player157.ha.ovh.net (Postfix) with ESMTPSA id BF32150009F;\n\tWed, 20 Sep 2017 14:54:31 +0200 (CEST)"],"To":"David Gibson <david@gibson.dropbear.id.au>","References":"<20170911171235.29331-1-clg@kaod.org>\n\t<20170911171235.29331-8-clg@kaod.org> <20170919025747.GM27153@umbus>","From":"=?utf-8?q?C=C3=A9dric_Le_Goater?= <clg@kaod.org>","Message-ID":"<131a8102-00e4-9621-ddca-39bc685b95cf@kaod.org>","Date":"Wed, 20 Sep 2017 14:54:31 +0200","User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101\n\tThunderbird/52.3.0","MIME-Version":"1.0","In-Reply-To":"<20170919025747.GM27153@umbus>","Content-Type":"text/plain; charset=windows-1252","Content-Language":"en-US","X-Ovh-Tracer-Id":"7281476174190381944","X-VR-SPAMSTATE":"OK","X-VR-SPAMSCORE":"-100","X-VR-SPAMCAUSE":"gggruggvucftvghtrhhoucdtuddrfeelledriedtgddvlecutefuodetggdotefrodftvfcurfhrohhfihhlvgemucfqggfjpdevjffgvefmvefgnecuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd","Content-Transfer-Encoding":"quoted-printable","X-detected-operating-system":"by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]\n\t[fuzzy]","X-Received-From":"46.105.58.226","Subject":"Re: [Qemu-devel] [RFC PATCH v2 07/21] ppc/xive: add MMIO handlers\n\tfor the XIVE interrupt sources","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.21","Precedence":"list","List-Id":"<qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<http://lists.nongnu.org/archive/html/qemu-devel/>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Cc":"Alexey Kardashevskiy <aik@ozlabs.ru>, qemu-ppc@nongnu.org,\n\tqemu-devel@nongnu.org, Alexander Graf <agraf@suse.de>","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"\"Qemu-devel\"\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>"}},{"id":1773498,"web_url":"http://patchwork.ozlabs.org/comment/1773498/","msgid":"<20170922105805.GN4998@umbus.fritz.box>","list_archive_url":null,"date":"2017-09-22T10:58:05","subject":"Re: [Qemu-devel] [RFC PATCH v2 07/21] ppc/xive: add MMIO handlers\n\tfor the XIVE interrupt sources","submitter":{"id":47,"url":"http://patchwork.ozlabs.org/api/people/47/","name":"David Gibson","email":"david@gibson.dropbear.id.au"},"content":"On Wed, Sep 20, 2017 at 02:54:31PM +0200, Cédric Le Goater wrote:\n> On 09/19/2017 04:57 AM, David Gibson wrote:\n> > On Mon, Sep 11, 2017 at 07:12:21PM +0200, Cédric Le Goater wrote:\n> >> Each interrupt source is associated with a two bit state machine\n> >> called an Event State Buffer (ESB) which is controlled by MMIO to\n> >> trigger events. See code for more details on the states and\n> >> transitions.\n> >>\n> >> The MMIO space for the ESB translation is 512GB large on baremetal\n> >> (powernv) systems and the BAR depends on the chip id. In our model for\n> >> the sPAPR machine, we choose to only map a sub memory region for the\n> >> provisionned IRQ numbers and to use the mapping address of chip 0 on a\n> >> real system. The OS will get the address of the MMIO page of the ESB\n> >> entry associated with an IRQ using the H_INT_GET_SOURCE_INFO hcall.\n> > \n> > On bare metal, are the MMIOs for each irq source mapped contiguously?\n> \n> yes. \n>  \n> >> For KVM support, we should think of a way to map this QEMU memory\n> >> region in the host to trigger events directly.\n> > \n> > This would rely on being able to map them without mapping those for\n> > any other VM or the host.  Does that mean allocating a contiguous (and\n> > aligned) hunk of irqs for a guest?\n> \n> I think so yes, the IRQ and the memory regions are tied, and also being \n> able to pass the MMIO region from the host to the guest, a bit like VFIO \n> for the IOMMU regions I suppose. But I haven't dig the problem too much. \n> \n> This is an important part in the overall design. \n> \n> > We're going to need to be careful about irq allocation here.\n> > Even though GET_SOURCE_INFO allows dynamic mapping of irq numbers to\n> > MMIO addresses, \n> \n> GET_SOURCE_INFO only retrieves the address of the MMIO region for \n> a 'lisn'. it is not dynamically mapped.\n\nOk... what's a \"lisn\"?\n\n\n> In the KVM case, the initial\n> information on the address would come from OPAL and then the host \n> kernel would translate this information for the guest.\n> \n> > we need the MMIO addresses to be stable and consistent, because \n> > we can't have them change across migration.  \n> \n> yes. I will catch my XIVE guru next week in Paris to clarify that\n> part. \n> \n> > We need to have this consistent between in-qemu and in-KVM XIVE\n> > implementations as well.\n> \n> yes.\n> \n> C.\n> \n> >>\n> >> Signed-off-by: Cédric Le Goater <clg@kaod.org>\n> >> ---\n> >>  hw/intc/spapr_xive.c        | 255 ++++++++++++++++++++++++++++++++++++++++++++\n> >>  include/hw/ppc/spapr_xive.h |   6 ++\n> >>  2 files changed, 261 insertions(+)\n> >>\n> >> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c\n> >> index 1ed7b6a286e9..8a85d64efc4c 100644\n> >> --- a/hw/intc/spapr_xive.c\n> >> +++ b/hw/intc/spapr_xive.c\n> >> @@ -33,6 +33,218 @@ static void spapr_xive_irq(sPAPRXive *xive, int srcno)\n> >>  }\n> >>  \n> >>  /*\n> >> + * \"magic\" Event State Buffer (ESB) MMIO offsets.\n> >> + *\n> >> + * Each interrupt source has a 2-bit state machine called ESB\n> >> + * which can be controlled by MMIO. It's made of 2 bits, P and\n> >> + * Q. P indicates that an interrupt is pending (has been sent\n> >> + * to a queue and is waiting for an EOI). Q indicates that the\n> >> + * interrupt has been triggered while pending.\n> >> + *\n> >> + * This acts as a coalescing mechanism in order to guarantee\n> >> + * that a given interrupt only occurs at most once in a queue.\n> >> + *\n> >> + * When doing an EOI, the Q bit will indicate if the interrupt\n> >> + * needs to be re-triggered.\n> >> + *\n> >> + * The following offsets into the ESB MMIO allow to read or\n> >> + * manipulate the PQ bits. They must be used with an 8-bytes\n> >> + * load instruction. They all return the previous state of the\n> >> + * interrupt (atomically).\n> >> + *\n> >> + * Additionally, some ESB pages support doing an EOI via a\n> >> + * store at 0 and some ESBs support doing a trigger via a\n> >> + * separate trigger page.\n> >> + */\n> >> +#define XIVE_ESB_GET            0x800\n> >> +#define XIVE_ESB_SET_PQ_00      0xc00\n> >> +#define XIVE_ESB_SET_PQ_01      0xd00\n> >> +#define XIVE_ESB_SET_PQ_10      0xe00\n> >> +#define XIVE_ESB_SET_PQ_11      0xf00\n> >> +\n> >> +#define XIVE_ESB_VAL_P          0x2\n> >> +#define XIVE_ESB_VAL_Q          0x1\n> >> +\n> >> +#define XIVE_ESB_RESET          0x0\n> >> +#define XIVE_ESB_PENDING        XIVE_ESB_VAL_P\n> >> +#define XIVE_ESB_QUEUED         (XIVE_ESB_VAL_P | XIVE_ESB_VAL_Q)\n> >> +#define XIVE_ESB_OFF            XIVE_ESB_VAL_Q\n> >> +\n> >> +static uint8_t spapr_xive_pq_get(sPAPRXive *xive, uint32_t idx)\n> >> +{\n> >> +    uint32_t byte = idx / 4;\n> >> +    uint32_t bit  = (idx % 4) * 2;\n> >> +\n> >> +    assert(byte < xive->sbe_size);\n> >> +\n> >> +    return (xive->sbe[byte] >> bit) & 0x3;\n> >> +}\n> >> +\n> >> +static uint8_t spapr_xive_pq_set(sPAPRXive *xive, uint32_t idx, uint8_t pq)\n> >> +{\n> >> +    uint32_t byte = idx / 4;\n> >> +    uint32_t bit  = (idx % 4) * 2;\n> >> +    uint8_t old, new;\n> >> +\n> >> +    assert(byte < xive->sbe_size);\n> >> +\n> >> +    old = xive->sbe[byte];\n> >> +\n> >> +    new = xive->sbe[byte] & ~(0x3 << bit);\n> >> +    new |= (pq & 0x3) << bit;\n> >> +\n> >> +    xive->sbe[byte] = new;\n> >> +\n> >> +    return (old >> bit) & 0x3;\n> >> +}\n> >> +\n> >> +static bool spapr_xive_pq_eoi(sPAPRXive *xive, uint32_t srcno)\n> >> +{\n> >> +    uint8_t old_pq = spapr_xive_pq_get(xive, srcno);\n> >> +\n> >> +    switch (old_pq) {\n> >> +    case XIVE_ESB_RESET:\n> >> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_RESET);\n> >> +        return false;\n> >> +    case XIVE_ESB_PENDING:\n> >> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_RESET);\n> >> +        return false;\n> >> +    case XIVE_ESB_QUEUED:\n> >> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_PENDING);\n> >> +        return true;\n> >> +    case XIVE_ESB_OFF:\n> >> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_OFF);\n> >> +        return false;\n> >> +    default:\n> >> +         g_assert_not_reached();\n> >> +    }\n> >> +}\n> >> +\n> >> +static bool spapr_xive_pq_trigger(sPAPRXive *xive, uint32_t srcno)\n> >> +{\n> >> +    uint8_t old_pq = spapr_xive_pq_get(xive, srcno);\n> >> +\n> >> +    switch (old_pq) {\n> >> +    case XIVE_ESB_RESET:\n> >> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_PENDING);\n> >> +        return true;\n> >> +    case XIVE_ESB_PENDING:\n> >> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_QUEUED);\n> >> +        return true;\n> >> +    case XIVE_ESB_QUEUED:\n> >> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_QUEUED);\n> >> +        return true;\n> >> +    case XIVE_ESB_OFF:\n> >> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_OFF);\n> >> +        return false;\n> >> +    default:\n> >> +         g_assert_not_reached();\n> >> +    }\n> >> +}\n> >> +\n> >> +/*\n> >> + * XIVE Interrupt Source MMIOs\n> >> + */\n> >> +static void spapr_xive_source_eoi(sPAPRXive *xive, uint32_t srcno)\n> >> +{\n> >> +    ICSIRQState *irq = &xive->ics->irqs[srcno];\n> >> +\n> >> +    if (irq->flags & XICS_FLAGS_IRQ_LSI) {\n> >> +        irq->status &= ~XICS_STATUS_SENT;\n> >> +    }\n> >> +}\n> >> +\n> >> +/* TODO: handle second page\n> >> + *\n> >> + * Some HW use a separate page for trigger. We only support the case\n> >> + * in which the trigger can be done in the same page as the EOI.\n> >> + */\n> >> +static uint64_t spapr_xive_esb_read(void *opaque, hwaddr addr, unsigned size)\n> >> +{\n> >> +    sPAPRXive *xive = SPAPR_XIVE(opaque);\n> >> +    uint32_t offset = addr & 0xF00;\n> >> +    uint32_t srcno = addr >> xive->esb_shift;\n> >> +    XiveIVE *ive;\n> >> +    uint64_t ret = -1;\n> >> +\n> >> +    ive = spapr_xive_get_ive(xive, srcno);\n> >> +    if (!ive || !(ive->w & IVE_VALID))  {\n> >> +        qemu_log_mask(LOG_GUEST_ERROR, \"XIVE: invalid LISN %d\\n\", srcno);\n> >> +        goto out;\n> > \n> > Since there's a whole (4k) page for each source, I wonder if we should\n> > actually map each one as a separate MMIO region to allow us to tweak\n> > the mappings more flexibly.\n> > \n> >> +    }\n> >> +\n> >> +    switch (offset) {\n> >> +    case 0:\n> >> +        spapr_xive_source_eoi(xive, srcno);\n> >> +\n> >> +        /* return TRUE or FALSE depending on PQ value */\n> >> +        ret = spapr_xive_pq_eoi(xive, srcno);\n> >> +        break;\n> >> +\n> >> +    case XIVE_ESB_GET:\n> >> +        ret = spapr_xive_pq_get(xive, srcno);\n> >> +        break;\n> >> +\n> >> +    case XIVE_ESB_SET_PQ_00:\n> >> +    case XIVE_ESB_SET_PQ_01:\n> >> +    case XIVE_ESB_SET_PQ_10:\n> >> +    case XIVE_ESB_SET_PQ_11:\n> >> +        ret = spapr_xive_pq_set(xive, srcno, (offset >> 8) & 0x3);\n> >> +        break;\n> >> +    default:\n> >> +        qemu_log_mask(LOG_GUEST_ERROR, \"XIVE: invalid ESB addr %d\\n\", offset);\n> >> +    }\n> >> +\n> >> +out:\n> >> +    return ret;\n> >> +}\n> >> +\n> >> +static void spapr_xive_esb_write(void *opaque, hwaddr addr,\n> >> +                           uint64_t value, unsigned size)\n> >> +{\n> >> +    sPAPRXive *xive = SPAPR_XIVE(opaque);\n> >> +    uint32_t offset = addr & 0xF00;\n> >> +    uint32_t srcno = addr >> xive->esb_shift;\n> >> +    XiveIVE *ive;\n> >> +    bool notify = false;\n> >> +\n> >> +    ive = spapr_xive_get_ive(xive, srcno);\n> >> +    if (!ive || !(ive->w & IVE_VALID))  {\n> >> +        qemu_log_mask(LOG_GUEST_ERROR, \"XIVE: invalid LISN %d\\n\", srcno);\n> >> +        return;\n> >> +    }\n> >> +\n> >> +    switch (offset) {\n> >> +    case 0:\n> >> +        /* TODO: should we trigger even if the IVE is masked ? */\n> >> +        notify = spapr_xive_pq_trigger(xive, srcno);\n> >> +        break;\n> >> +    default:\n> >> +        qemu_log_mask(LOG_GUEST_ERROR, \"XIVE: invalid ESB write addr %d\\n\",\n> >> +                      offset);\n> >> +        return;\n> >> +    }\n> >> +\n> >> +    if (notify && !(ive->w & IVE_MASKED)) {\n> >> +        qemu_irq_pulse(xive->qirqs[srcno]);\n> >> +    }\n> >> +}\n> >> +\n> >> +static const MemoryRegionOps spapr_xive_esb_ops = {\n> >> +    .read = spapr_xive_esb_read,\n> >> +    .write = spapr_xive_esb_write,\n> >> +    .endianness = DEVICE_BIG_ENDIAN,\n> >> +    .valid = {\n> >> +        .min_access_size = 8,\n> >> +        .max_access_size = 8,\n> >> +    },\n> >> +    .impl = {\n> >> +        .min_access_size = 8,\n> >> +        .max_access_size = 8,\n> >> +    },\n> >> +};\n> >> +\n> >> +/*\n> >>   * XIVE Interrupt Source\n> >>   */\n> >>  static void spapr_xive_source_set_irq_msi(sPAPRXive *xive, int srcno, int val)\n> >> @@ -74,6 +286,33 @@ static void spapr_xive_source_set_irq(void *opaque, int srcno, int val)\n> >>  /*\n> >>   * Main XIVE object\n> >>   */\n> >> +#define P9_MMIO_BASE     0x006000000000000ull\n> >> +\n> >> +/* VC BAR contains set translations for the ESBs and the EQs. */\n> >> +#define VC_BAR_DEFAULT   0x10000000000ull\n> >> +#define VC_BAR_SIZE      0x08000000000ull\n> >> +#define ESB_SHIFT        16 /* One 64k page. OPAL has two */\n> >> +\n> >> +static uint64_t spapr_xive_esb_default_read(void *p, hwaddr offset,\n> >> +                                            unsigned size)\n> >> +{\n> >> +    qemu_log_mask(LOG_UNIMP, \"%s: 0x%\" HWADDR_PRIx \" [%u]\\n\",\n> >> +                  __func__, offset, size);\n> >> +    return 0;\n> >> +}\n> >> +\n> >> +static void spapr_xive_esb_default_write(void *opaque, hwaddr offset,\n> >> +                                         uint64_t value, unsigned size)\n> >> +{\n> >> +    qemu_log_mask(LOG_UNIMP, \"%s: 0x%\" HWADDR_PRIx \" <- 0x%\" PRIx64 \" [%u]\\n\",\n> >> +                  __func__, offset, value, size);\n> >> +}\n> >> +\n> >> +static const MemoryRegionOps spapr_xive_esb_default_ops = {\n> >> +    .read = spapr_xive_esb_default_read,\n> >> +    .write = spapr_xive_esb_default_write,\n> >> +    .endianness = DEVICE_BIG_ENDIAN,\n> >> +};\n> >>  \n> >>  void spapr_xive_reset(void *dev)\n> >>  {\n> >> @@ -144,6 +383,22 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)\n> >>      xive->nr_eqs = xive->nr_targets * XIVE_EQ_PRIORITY_COUNT;\n> >>      xive->eqt = g_malloc0(xive->nr_eqs * sizeof(XiveEQ));\n> >>  \n> >> +    /* VC BAR. That's the full window but we will only map the\n> >> +     * subregions in use. */\n> >> +    xive->esb_base = (P9_MMIO_BASE | VC_BAR_DEFAULT);\n> >> +    xive->esb_shift = ESB_SHIFT;\n> >> +\n> >> +    /* Install default memory region handlers to log bogus access */\n> >> +    memory_region_init_io(&xive->esb_mr, NULL, &spapr_xive_esb_default_ops,\n> >> +                          NULL, \"xive.esb.full\", VC_BAR_SIZE);\n> >> +    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &xive->esb_mr);\n> >> +\n> >> +    /* Install the ESB memory region in the overall one */\n> >> +    memory_region_init_io(&xive->esb_iomem, OBJECT(xive), &spapr_xive_esb_ops,\n> >> +                          xive, \"xive.esb\",\n> >> +                          (1ull << xive->esb_shift) * xive->nr_irqs);\n> >> +    memory_region_add_subregion(&xive->esb_mr, 0, &xive->esb_iomem);\n> >> +\n> >>      qemu_register_reset(spapr_xive_reset, dev);\n> >>  }\n> >>  \n> >> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h\n> >> index eab92c4c1bb8..0f516534d76a 100644\n> >> --- a/include/hw/ppc/spapr_xive.h\n> >> +++ b/include/hw/ppc/spapr_xive.h\n> >> @@ -46,6 +46,12 @@ struct sPAPRXive {\n> >>      XiveIVE      *ivt;\n> >>      XiveEQ       *eqt;\n> >>      uint32_t     nr_eqs;\n> >> +\n> >> +    /* ESB memory region */\n> >> +    uint32_t     esb_shift;\n> >> +    hwaddr       esb_base;\n> >> +    MemoryRegion esb_mr;\n> >> +    MemoryRegion esb_iomem;\n> >>  };\n> >>  \n> >>  #endif /* PPC_SPAPR_XIVE_H */\n> > \n>","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=nongnu.org\n\t(client-ip=2001:4830:134:3::11; helo=lists.gnu.org;\n\tenvelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n\treceiver=<UNKNOWN>)","ozlabs.org;\n\tdkim=fail reason=\"signature verification failed\" (1024-bit key;\n\tunprotected) header.d=gibson.dropbear.id.au\n\theader.i=@gibson.dropbear.id.au header.b=\"NidhQyou\"; \n\tdkim-atps=neutral"],"Received":["from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])\n\t(using TLSv1 with cipher AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xzB1Q37Y7z9sNw\n\tfor <incoming@patchwork.ozlabs.org>;\n\tFri, 22 Sep 2017 21:23:22 +1000 (AEST)","from localhost ([::1]:58076 helo=lists.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.71) (envelope-from\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)\n\tid 1dvM3E-00077H-JJ\n\tfor incoming@patchwork.ozlabs.org; Fri, 22 Sep 2017 07:23:20 -0400","from eggs.gnu.org ([2001:4830:134:3::10]:37842)\n\tby lists.gnu.org with esmtp (Exim 4.71)\n\t(envelope-from <dgibson@ozlabs.org>) id 1dvLzq-0004i5-Bm\n\tfor qemu-devel@nongnu.org; Fri, 22 Sep 2017 07:19:52 -0400","from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)\n\t(envelope-from <dgibson@ozlabs.org>) id 1dvLzo-0007Dn-7b\n\tfor qemu-devel@nongnu.org; Fri, 22 Sep 2017 07:19:50 -0400","from ozlabs.org ([2401:3900:2:1::2]:51433)\n\tby eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)\n\t(Exim 4.71) (envelope-from <dgibson@ozlabs.org>)\n\tid 1dvLzn-0007AZ-2K; Fri, 22 Sep 2017 07:19:48 -0400","by ozlabs.org (Postfix, from userid 1007)\n\tid 3xz9xC0zh0z9sP1; Fri, 22 Sep 2017 21:19:43 +1000 (AEST)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/simple;\n\td=gibson.dropbear.id.au; s=201602; t=1506079183;\n\tbh=7bsIDvAX6PIZptL2ugNtzMeKqPs5/qgROR6gXo/9Gso=;\n\th=Date:From:To:Cc:Subject:References:In-Reply-To:From;\n\tb=NidhQyouMp2iiFmNfqxtlVcBcZlUuPb+CqU1C2Ae5aAoA4LXtGh+IETKodvwqUERA\n\twZ5sdSnVfDAGZU4RqAEwCHKiS7343FsEMLcpw1MhHxrBMiVaPQUTRM0g5s1tYOZg9v\n\tcjfOm8p/Suoi4TGcx+35ptXqriJAN5nTGLiVQ7Ic=","Date":"Fri, 22 Sep 2017 20:58:05 +1000","From":"David Gibson <david@gibson.dropbear.id.au>","To":"=?iso-8859-1?q?C=E9dric?= Le Goater <clg@kaod.org>","Message-ID":"<20170922105805.GN4998@umbus.fritz.box>","References":"<20170911171235.29331-1-clg@kaod.org>\n\t<20170911171235.29331-8-clg@kaod.org>\n\t<20170919025747.GM27153@umbus>\n\t<131a8102-00e4-9621-ddca-39bc685b95cf@kaod.org>","MIME-Version":"1.0","Content-Type":"multipart/signed; micalg=pgp-sha256;\n\tprotocol=\"application/pgp-signature\"; boundary=\"aSnC4ZPPfhCvD8sN\"","Content-Disposition":"inline","In-Reply-To":"<131a8102-00e4-9621-ddca-39bc685b95cf@kaod.org>","User-Agent":"Mutt/1.9.0 (2017-09-02)","X-detected-operating-system":"by eggs.gnu.org: Genre and OS details not\n\trecognized.","X-Received-From":"2401:3900:2:1::2","Subject":"Re: [Qemu-devel] [RFC PATCH v2 07/21] ppc/xive: add MMIO handlers\n\tfor the XIVE interrupt sources","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.21","Precedence":"list","List-Id":"<qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<http://lists.nongnu.org/archive/html/qemu-devel/>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Cc":"Alexey Kardashevskiy <aik@ozlabs.ru>, qemu-ppc@nongnu.org,\n\tqemu-devel@nongnu.org, Alexander Graf <agraf@suse.de>","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"\"Qemu-devel\"\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>"}},{"id":1773547,"web_url":"http://patchwork.ozlabs.org/comment/1773547/","msgid":"<c8358619-6eb4-b534-4c2d-2523ba6ebd00@kaod.org>","list_archive_url":null,"date":"2017-09-22T12:26:42","subject":"Re: [Qemu-devel] [RFC PATCH v2 07/21] ppc/xive: add MMIO handlers\n\tfor the XIVE interrupt sources","submitter":{"id":68548,"url":"http://patchwork.ozlabs.org/api/people/68548/","name":"Cédric Le Goater","email":"clg@kaod.org"},"content":"On 09/22/2017 12:58 PM, David Gibson wrote:\n> On Wed, Sep 20, 2017 at 02:54:31PM +0200, Cédric Le Goater wrote:\n>> On 09/19/2017 04:57 AM, David Gibson wrote:\n>>> On Mon, Sep 11, 2017 at 07:12:21PM +0200, Cédric Le Goater wrote:\n>>>> Each interrupt source is associated with a two bit state machine\n>>>> called an Event State Buffer (ESB) which is controlled by MMIO to\n>>>> trigger events. See code for more details on the states and\n>>>> transitions.\n>>>>\n>>>> The MMIO space for the ESB translation is 512GB large on baremetal\n>>>> (powernv) systems and the BAR depends on the chip id. In our model for\n>>>> the sPAPR machine, we choose to only map a sub memory region for the\n>>>> provisionned IRQ numbers and to use the mapping address of chip 0 on a\n>>>> real system. The OS will get the address of the MMIO page of the ESB\n>>>> entry associated with an IRQ using the H_INT_GET_SOURCE_INFO hcall.\n>>>\n>>> On bare metal, are the MMIOs for each irq source mapped contiguously?\n>>\n>> yes. \n>>  \n>>>> For KVM support, we should think of a way to map this QEMU memory\n>>>> region in the host to trigger events directly.\n>>>\n>>> This would rely on being able to map them without mapping those for\n>>> any other VM or the host.  Does that mean allocating a contiguous (and\n>>> aligned) hunk of irqs for a guest?\n>>\n>> I think so yes, the IRQ and the memory regions are tied, and also being \n>> able to pass the MMIO region from the host to the guest, a bit like VFIO \n>> for the IOMMU regions I suppose. But I haven't dig the problem too much. \n>>\n>> This is an important part in the overall design. \n>>\n>>> We're going to need to be careful about irq allocation here.\n>>> Even though GET_SOURCE_INFO allows dynamic mapping of irq numbers to\n>>> MMIO addresses, \n>>\n>> GET_SOURCE_INFO only retrieves the address of the MMIO region for \n>> a 'lisn'. it is not dynamically mapped.\n> \n> Ok... what's a \"lisn\"?\n\nLogical Interrupt Source Number. This is the source number the OS guest \nmanipulates in the hcalls. The OS registers also an EISN, Effective Interrupt \nSource Number, associated with the LISN, which will be stored in the event \nqueue.\n\nC. \n\n> \n> \n>> In the KVM case, the initial\n>> information on the address would come from OPAL and then the host \n>> kernel would translate this information for the guest.\n>>\n>>> we need the MMIO addresses to be stable and consistent, because \n>>> we can't have them change across migration.  \n>>\n>> yes. I will catch my XIVE guru next week in Paris to clarify that\n>> part. \n>>\n>>> We need to have this consistent between in-qemu and in-KVM XIVE\n>>> implementations as well.\n>>\n>> yes.\n>>\n>> C.\n>>\n>>>>\n>>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>\n>>>> ---\n>>>>  hw/intc/spapr_xive.c        | 255 ++++++++++++++++++++++++++++++++++++++++++++\n>>>>  include/hw/ppc/spapr_xive.h |   6 ++\n>>>>  2 files changed, 261 insertions(+)\n>>>>\n>>>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c\n>>>> index 1ed7b6a286e9..8a85d64efc4c 100644\n>>>> --- a/hw/intc/spapr_xive.c\n>>>> +++ b/hw/intc/spapr_xive.c\n>>>> @@ -33,6 +33,218 @@ static void spapr_xive_irq(sPAPRXive *xive, int srcno)\n>>>>  }\n>>>>  \n>>>>  /*\n>>>> + * \"magic\" Event State Buffer (ESB) MMIO offsets.\n>>>> + *\n>>>> + * Each interrupt source has a 2-bit state machine called ESB\n>>>> + * which can be controlled by MMIO. It's made of 2 bits, P and\n>>>> + * Q. P indicates that an interrupt is pending (has been sent\n>>>> + * to a queue and is waiting for an EOI). Q indicates that the\n>>>> + * interrupt has been triggered while pending.\n>>>> + *\n>>>> + * This acts as a coalescing mechanism in order to guarantee\n>>>> + * that a given interrupt only occurs at most once in a queue.\n>>>> + *\n>>>> + * When doing an EOI, the Q bit will indicate if the interrupt\n>>>> + * needs to be re-triggered.\n>>>> + *\n>>>> + * The following offsets into the ESB MMIO allow to read or\n>>>> + * manipulate the PQ bits. They must be used with an 8-bytes\n>>>> + * load instruction. They all return the previous state of the\n>>>> + * interrupt (atomically).\n>>>> + *\n>>>> + * Additionally, some ESB pages support doing an EOI via a\n>>>> + * store at 0 and some ESBs support doing a trigger via a\n>>>> + * separate trigger page.\n>>>> + */\n>>>> +#define XIVE_ESB_GET            0x800\n>>>> +#define XIVE_ESB_SET_PQ_00      0xc00\n>>>> +#define XIVE_ESB_SET_PQ_01      0xd00\n>>>> +#define XIVE_ESB_SET_PQ_10      0xe00\n>>>> +#define XIVE_ESB_SET_PQ_11      0xf00\n>>>> +\n>>>> +#define XIVE_ESB_VAL_P          0x2\n>>>> +#define XIVE_ESB_VAL_Q          0x1\n>>>> +\n>>>> +#define XIVE_ESB_RESET          0x0\n>>>> +#define XIVE_ESB_PENDING        XIVE_ESB_VAL_P\n>>>> +#define XIVE_ESB_QUEUED         (XIVE_ESB_VAL_P | XIVE_ESB_VAL_Q)\n>>>> +#define XIVE_ESB_OFF            XIVE_ESB_VAL_Q\n>>>> +\n>>>> +static uint8_t spapr_xive_pq_get(sPAPRXive *xive, uint32_t idx)\n>>>> +{\n>>>> +    uint32_t byte = idx / 4;\n>>>> +    uint32_t bit  = (idx % 4) * 2;\n>>>> +\n>>>> +    assert(byte < xive->sbe_size);\n>>>> +\n>>>> +    return (xive->sbe[byte] >> bit) & 0x3;\n>>>> +}\n>>>> +\n>>>> +static uint8_t spapr_xive_pq_set(sPAPRXive *xive, uint32_t idx, uint8_t pq)\n>>>> +{\n>>>> +    uint32_t byte = idx / 4;\n>>>> +    uint32_t bit  = (idx % 4) * 2;\n>>>> +    uint8_t old, new;\n>>>> +\n>>>> +    assert(byte < xive->sbe_size);\n>>>> +\n>>>> +    old = xive->sbe[byte];\n>>>> +\n>>>> +    new = xive->sbe[byte] & ~(0x3 << bit);\n>>>> +    new |= (pq & 0x3) << bit;\n>>>> +\n>>>> +    xive->sbe[byte] = new;\n>>>> +\n>>>> +    return (old >> bit) & 0x3;\n>>>> +}\n>>>> +\n>>>> +static bool spapr_xive_pq_eoi(sPAPRXive *xive, uint32_t srcno)\n>>>> +{\n>>>> +    uint8_t old_pq = spapr_xive_pq_get(xive, srcno);\n>>>> +\n>>>> +    switch (old_pq) {\n>>>> +    case XIVE_ESB_RESET:\n>>>> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_RESET);\n>>>> +        return false;\n>>>> +    case XIVE_ESB_PENDING:\n>>>> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_RESET);\n>>>> +        return false;\n>>>> +    case XIVE_ESB_QUEUED:\n>>>> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_PENDING);\n>>>> +        return true;\n>>>> +    case XIVE_ESB_OFF:\n>>>> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_OFF);\n>>>> +        return false;\n>>>> +    default:\n>>>> +         g_assert_not_reached();\n>>>> +    }\n>>>> +}\n>>>> +\n>>>> +static bool spapr_xive_pq_trigger(sPAPRXive *xive, uint32_t srcno)\n>>>> +{\n>>>> +    uint8_t old_pq = spapr_xive_pq_get(xive, srcno);\n>>>> +\n>>>> +    switch (old_pq) {\n>>>> +    case XIVE_ESB_RESET:\n>>>> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_PENDING);\n>>>> +        return true;\n>>>> +    case XIVE_ESB_PENDING:\n>>>> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_QUEUED);\n>>>> +        return true;\n>>>> +    case XIVE_ESB_QUEUED:\n>>>> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_QUEUED);\n>>>> +        return true;\n>>>> +    case XIVE_ESB_OFF:\n>>>> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_OFF);\n>>>> +        return false;\n>>>> +    default:\n>>>> +         g_assert_not_reached();\n>>>> +    }\n>>>> +}\n>>>> +\n>>>> +/*\n>>>> + * XIVE Interrupt Source MMIOs\n>>>> + */\n>>>> +static void spapr_xive_source_eoi(sPAPRXive *xive, uint32_t srcno)\n>>>> +{\n>>>> +    ICSIRQState *irq = &xive->ics->irqs[srcno];\n>>>> +\n>>>> +    if (irq->flags & XICS_FLAGS_IRQ_LSI) {\n>>>> +        irq->status &= ~XICS_STATUS_SENT;\n>>>> +    }\n>>>> +}\n>>>> +\n>>>> +/* TODO: handle second page\n>>>> + *\n>>>> + * Some HW use a separate page for trigger. We only support the case\n>>>> + * in which the trigger can be done in the same page as the EOI.\n>>>> + */\n>>>> +static uint64_t spapr_xive_esb_read(void *opaque, hwaddr addr, unsigned size)\n>>>> +{\n>>>> +    sPAPRXive *xive = SPAPR_XIVE(opaque);\n>>>> +    uint32_t offset = addr & 0xF00;\n>>>> +    uint32_t srcno = addr >> xive->esb_shift;\n>>>> +    XiveIVE *ive;\n>>>> +    uint64_t ret = -1;\n>>>> +\n>>>> +    ive = spapr_xive_get_ive(xive, srcno);\n>>>> +    if (!ive || !(ive->w & IVE_VALID))  {\n>>>> +        qemu_log_mask(LOG_GUEST_ERROR, \"XIVE: invalid LISN %d\\n\", srcno);\n>>>> +        goto out;\n>>>\n>>> Since there's a whole (4k) page for each source, I wonder if we should\n>>> actually map each one as a separate MMIO region to allow us to tweak\n>>> the mappings more flexibly.\n>>>\n>>>> +    }\n>>>> +\n>>>> +    switch (offset) {\n>>>> +    case 0:\n>>>> +        spapr_xive_source_eoi(xive, srcno);\n>>>> +\n>>>> +        /* return TRUE or FALSE depending on PQ value */\n>>>> +        ret = spapr_xive_pq_eoi(xive, srcno);\n>>>> +        break;\n>>>> +\n>>>> +    case XIVE_ESB_GET:\n>>>> +        ret = spapr_xive_pq_get(xive, srcno);\n>>>> +        break;\n>>>> +\n>>>> +    case XIVE_ESB_SET_PQ_00:\n>>>> +    case XIVE_ESB_SET_PQ_01:\n>>>> +    case XIVE_ESB_SET_PQ_10:\n>>>> +    case XIVE_ESB_SET_PQ_11:\n>>>> +        ret = spapr_xive_pq_set(xive, srcno, (offset >> 8) & 0x3);\n>>>> +        break;\n>>>> +    default:\n>>>> +        qemu_log_mask(LOG_GUEST_ERROR, \"XIVE: invalid ESB addr %d\\n\", offset);\n>>>> +    }\n>>>> +\n>>>> +out:\n>>>> +    return ret;\n>>>> +}\n>>>> +\n>>>> +static void spapr_xive_esb_write(void *opaque, hwaddr addr,\n>>>> +                           uint64_t value, unsigned size)\n>>>> +{\n>>>> +    sPAPRXive *xive = SPAPR_XIVE(opaque);\n>>>> +    uint32_t offset = addr & 0xF00;\n>>>> +    uint32_t srcno = addr >> xive->esb_shift;\n>>>> +    XiveIVE *ive;\n>>>> +    bool notify = false;\n>>>> +\n>>>> +    ive = spapr_xive_get_ive(xive, srcno);\n>>>> +    if (!ive || !(ive->w & IVE_VALID))  {\n>>>> +        qemu_log_mask(LOG_GUEST_ERROR, \"XIVE: invalid LISN %d\\n\", srcno);\n>>>> +        return;\n>>>> +    }\n>>>> +\n>>>> +    switch (offset) {\n>>>> +    case 0:\n>>>> +        /* TODO: should we trigger even if the IVE is masked ? */\n>>>> +        notify = spapr_xive_pq_trigger(xive, srcno);\n>>>> +        break;\n>>>> +    default:\n>>>> +        qemu_log_mask(LOG_GUEST_ERROR, \"XIVE: invalid ESB write addr %d\\n\",\n>>>> +                      offset);\n>>>> +        return;\n>>>> +    }\n>>>> +\n>>>> +    if (notify && !(ive->w & IVE_MASKED)) {\n>>>> +        qemu_irq_pulse(xive->qirqs[srcno]);\n>>>> +    }\n>>>> +}\n>>>> +\n>>>> +static const MemoryRegionOps spapr_xive_esb_ops = {\n>>>> +    .read = spapr_xive_esb_read,\n>>>> +    .write = spapr_xive_esb_write,\n>>>> +    .endianness = DEVICE_BIG_ENDIAN,\n>>>> +    .valid = {\n>>>> +        .min_access_size = 8,\n>>>> +        .max_access_size = 8,\n>>>> +    },\n>>>> +    .impl = {\n>>>> +        .min_access_size = 8,\n>>>> +        .max_access_size = 8,\n>>>> +    },\n>>>> +};\n>>>> +\n>>>> +/*\n>>>>   * XIVE Interrupt Source\n>>>>   */\n>>>>  static void spapr_xive_source_set_irq_msi(sPAPRXive *xive, int srcno, int val)\n>>>> @@ -74,6 +286,33 @@ static void spapr_xive_source_set_irq(void *opaque, int srcno, int val)\n>>>>  /*\n>>>>   * Main XIVE object\n>>>>   */\n>>>> +#define P9_MMIO_BASE     0x006000000000000ull\n>>>> +\n>>>> +/* VC BAR contains set translations for the ESBs and the EQs. */\n>>>> +#define VC_BAR_DEFAULT   0x10000000000ull\n>>>> +#define VC_BAR_SIZE      0x08000000000ull\n>>>> +#define ESB_SHIFT        16 /* One 64k page. OPAL has two */\n>>>> +\n>>>> +static uint64_t spapr_xive_esb_default_read(void *p, hwaddr offset,\n>>>> +                                            unsigned size)\n>>>> +{\n>>>> +    qemu_log_mask(LOG_UNIMP, \"%s: 0x%\" HWADDR_PRIx \" [%u]\\n\",\n>>>> +                  __func__, offset, size);\n>>>> +    return 0;\n>>>> +}\n>>>> +\n>>>> +static void spapr_xive_esb_default_write(void *opaque, hwaddr offset,\n>>>> +                                         uint64_t value, unsigned size)\n>>>> +{\n>>>> +    qemu_log_mask(LOG_UNIMP, \"%s: 0x%\" HWADDR_PRIx \" <- 0x%\" PRIx64 \" [%u]\\n\",\n>>>> +                  __func__, offset, value, size);\n>>>> +}\n>>>> +\n>>>> +static const MemoryRegionOps spapr_xive_esb_default_ops = {\n>>>> +    .read = spapr_xive_esb_default_read,\n>>>> +    .write = spapr_xive_esb_default_write,\n>>>> +    .endianness = DEVICE_BIG_ENDIAN,\n>>>> +};\n>>>>  \n>>>>  void spapr_xive_reset(void *dev)\n>>>>  {\n>>>> @@ -144,6 +383,22 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)\n>>>>      xive->nr_eqs = xive->nr_targets * XIVE_EQ_PRIORITY_COUNT;\n>>>>      xive->eqt = g_malloc0(xive->nr_eqs * sizeof(XiveEQ));\n>>>>  \n>>>> +    /* VC BAR. That's the full window but we will only map the\n>>>> +     * subregions in use. */\n>>>> +    xive->esb_base = (P9_MMIO_BASE | VC_BAR_DEFAULT);\n>>>> +    xive->esb_shift = ESB_SHIFT;\n>>>> +\n>>>> +    /* Install default memory region handlers to log bogus access */\n>>>> +    memory_region_init_io(&xive->esb_mr, NULL, &spapr_xive_esb_default_ops,\n>>>> +                          NULL, \"xive.esb.full\", VC_BAR_SIZE);\n>>>> +    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &xive->esb_mr);\n>>>> +\n>>>> +    /* Install the ESB memory region in the overall one */\n>>>> +    memory_region_init_io(&xive->esb_iomem, OBJECT(xive), &spapr_xive_esb_ops,\n>>>> +                          xive, \"xive.esb\",\n>>>> +                          (1ull << xive->esb_shift) * xive->nr_irqs);\n>>>> +    memory_region_add_subregion(&xive->esb_mr, 0, &xive->esb_iomem);\n>>>> +\n>>>>      qemu_register_reset(spapr_xive_reset, dev);\n>>>>  }\n>>>>  \n>>>> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h\n>>>> index eab92c4c1bb8..0f516534d76a 100644\n>>>> --- a/include/hw/ppc/spapr_xive.h\n>>>> +++ b/include/hw/ppc/spapr_xive.h\n>>>> @@ -46,6 +46,12 @@ struct sPAPRXive {\n>>>>      XiveIVE      *ivt;\n>>>>      XiveEQ       *eqt;\n>>>>      uint32_t     nr_eqs;\n>>>> +\n>>>> +    /* ESB memory region */\n>>>> +    uint32_t     esb_shift;\n>>>> +    hwaddr       esb_base;\n>>>> +    MemoryRegion esb_mr;\n>>>> +    MemoryRegion esb_iomem;\n>>>>  };\n>>>>  \n>>>>  #endif /* PPC_SPAPR_XIVE_H */\n>>>\n>>\n>","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":"ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=nongnu.org\n\t(client-ip=2001:4830:134:3::11; helo=lists.gnu.org;\n\tenvelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n\treceiver=<UNKNOWN>)","Received":["from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])\n\t(using TLSv1 with cipher AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xzCgG1tvPz9sPk\n\tfor <incoming@patchwork.ozlabs.org>;\n\tFri, 22 Sep 2017 22:37:46 +1000 (AEST)","from localhost ([::1]:58646 helo=lists.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.71) (envelope-from\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)\n\tid 1dvNDD-000472-MB\n\tfor incoming@patchwork.ozlabs.org; Fri, 22 Sep 2017 08:37:43 -0400","from eggs.gnu.org ([2001:4830:134:3::10]:53597)\n\tby lists.gnu.org with esmtp (Exim 4.71)\n\t(envelope-from <clg@kaod.org>) id 1dvN2p-0003lb-Vz\n\tfor qemu-devel@nongnu.org; Fri, 22 Sep 2017 08:27:06 -0400","from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)\n\t(envelope-from <clg@kaod.org>) id 1dvN2j-0002vu-9D\n\tfor qemu-devel@nongnu.org; Fri, 22 Sep 2017 08:26:59 -0400","from 6.mo179.mail-out.ovh.net ([46.105.56.76]:52435)\n\tby eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)\n\t(Exim 4.71) (envelope-from <clg@kaod.org>) id 1dvN2j-0002vE-0g\n\tfor qemu-devel@nongnu.org; Fri, 22 Sep 2017 08:26:53 -0400","from player690.ha.ovh.net (b6.ovh.net [213.186.33.56])\n\tby mo179.mail-out.ovh.net (Postfix) with ESMTP id 4D4AD5B3B8\n\tfor <qemu-devel@nongnu.org>; Fri, 22 Sep 2017 14:26:51 +0200 (CEST)","from zorba.kaod.org (LFbn-1-2231-173.w90-76.abo.wanadoo.fr\n\t[90.76.52.173]) (Authenticated sender: postmaster@kaod.org)\n\tby player690.ha.ovh.net (Postfix) with ESMTPSA id AF55E540071;\n\tFri, 22 Sep 2017 14:26:42 +0200 (CEST)"],"To":"David Gibson <david@gibson.dropbear.id.au>","References":"<20170911171235.29331-1-clg@kaod.org>\n\t<20170911171235.29331-8-clg@kaod.org> <20170919025747.GM27153@umbus>\n\t<131a8102-00e4-9621-ddca-39bc685b95cf@kaod.org>\n\t<20170922105805.GN4998@umbus.fritz.box>","From":"=?utf-8?q?C=C3=A9dric_Le_Goater?= <clg@kaod.org>","Message-ID":"<c8358619-6eb4-b534-4c2d-2523ba6ebd00@kaod.org>","Date":"Fri, 22 Sep 2017 14:26:42 +0200","User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101\n\tThunderbird/52.3.0","MIME-Version":"1.0","In-Reply-To":"<20170922105805.GN4998@umbus.fritz.box>","Content-Type":"text/plain; charset=windows-1252","Content-Language":"en-US","X-Ovh-Tracer-Id":"110901142576663416","X-VR-SPAMSTATE":"OK","X-VR-SPAMSCORE":"-100","X-VR-SPAMCAUSE":"gggruggvucftvghtrhhoucdtuddrfeelledrieeggdehgecutefuodetggdotefrodftvfcurfhrohhfihhlvgemucfqggfjpdevjffgvefmvefgnecuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd","Content-Transfer-Encoding":"quoted-printable","X-detected-operating-system":"by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]\n\t[fuzzy]","X-Received-From":"46.105.56.76","Subject":"Re: [Qemu-devel] [RFC PATCH v2 07/21] ppc/xive: add MMIO handlers\n\tfor the XIVE interrupt sources","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.21","Precedence":"list","List-Id":"<qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<http://lists.nongnu.org/archive/html/qemu-devel/>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Cc":"Alexey Kardashevskiy <aik@ozlabs.ru>, qemu-ppc@nongnu.org,\n\tqemu-devel@nongnu.org, Alexander Graf <agraf@suse.de>","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"\"Qemu-devel\"\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>"}},{"id":1776857,"web_url":"http://patchwork.ozlabs.org/comment/1776857/","msgid":"<1506587243.25626.24.camel@kernel.crashing.org>","list_archive_url":null,"date":"2017-09-28T08:27:23","subject":"Re: [Qemu-devel] [RFC PATCH v2 07/21] ppc/xive: add MMIO handlers\n\tfor the XIVE interrupt sources","submitter":{"id":38,"url":"http://patchwork.ozlabs.org/api/people/38/","name":"Benjamin Herrenschmidt","email":"benh@kernel.crashing.org"},"content":"On Wed, 2017-09-20 at 14:54 +0200, Cédric Le Goater wrote:\n> On 09/19/2017 04:57 AM, David Gibson wrote:\n> > On Mon, Sep 11, 2017 at 07:12:21PM +0200, Cédric Le Goater wrote:\n> > > Each interrupt source is associated with a two bit state machine\n> > > called an Event State Buffer (ESB) which is controlled by MMIO to\n> > > trigger events. See code for more details on the states and\n> > > transitions.\n> > > \n> > > The MMIO space for the ESB translation is 512GB large on baremetal\n> > > (powernv) systems and the BAR depends on the chip id. In our model for\n> > > the sPAPR machine, we choose to only map a sub memory region for the\n> > > provisionned IRQ numbers and to use the mapping address of chip 0 on a\n> > > real system. The OS will get the address of the MMIO page of the ESB\n> > > entry associated with an IRQ using the H_INT_GET_SOURCE_INFO hcall.\n> > \n> > On bare metal, are the MMIOs for each irq source mapped contiguously?\n> \n> yes. \n\nSort-of...\n\nThere are several source \"controllers\" in the system. Each PHB gets a\nrange of numbers, and the XIVE itself has about half a million of\ngeneric sources (aka 'IPIs') which we use for IPIs and virtual device\ninterrupts.\n\nEach of those \"controller\" has its own MMIO area. So the MMIOs are only\nmapped contiguously within a given controller.\n\nWith pass-through, things are a bit more complex because a given guest\nvisible source can become either an IPI (when not attached to the host\ninterrupt) or a real HW source. So we'll have to invalidate the GPA-\n>HVA mapping and remap. Tricky (but doable). I have some ideas about\nhow to plumb all that but haven't really fully thought it out.\n\n> > > For KVM support, we should think of a way to map this QEMU memory\n> > > region in the host to trigger events directly.\n> > \n> > This would rely on being able to map them without mapping those for\n> > any other VM or the host.  Does that mean allocating a contiguous (and\n> > aligned) hunk of irqs for a guest?\n> \n> I think so yes, the IRQ and the memory regions are tied, and also being \n> able to pass the MMIO region from the host to the guest, a bit like VFIO \n> for the IOMMU regions I suppose. But I haven't dig the problem too much. \n> \n> This is an important part in the overall design. \n\nThere are also MMIO regions associated with queues.\n\n> > We're going to need to be careful about irq allocation here.\n> > Even though GET_SOURCE_INFO allows dynamic mapping of irq numbers to\n> > MMIO addresses, \n\n> GET_SOURCE_INFO only retrieves the address of the MMIO region for \n> a 'lisn'.\n\nAn interrupt number as coming from the device-tree.\n\n>  it is not dynamically mapped. In the KVM case, the initial\n> information on the address would come from OPAL and then the host \n> kernel would translate this information for the guest.\n> \n> > we need the MMIO addresses to be stable and consistent, because \n> > we can't have them change across migration.  \n> \n> yes. I will catch my XIVE guru next week in Paris to clarify that\n> part. \n> \n> > We need to have this consistent between in-qemu and in-KVM XIVE\n> > implementations as well.\n> \n> yes.\n> \n> C.\n> \n> > > \n> > > Signed-off-by: Cédric Le Goater <clg@kaod.org>\n> > > ---\n> > >  hw/intc/spapr_xive.c        | 255 ++++++++++++++++++++++++++++++++++++++++++++\n> > >  include/hw/ppc/spapr_xive.h |   6 ++\n> > >  2 files changed, 261 insertions(+)\n> > > \n> > > diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c\n> > > index 1ed7b6a286e9..8a85d64efc4c 100644\n> > > --- a/hw/intc/spapr_xive.c\n> > > +++ b/hw/intc/spapr_xive.c\n> > > @@ -33,6 +33,218 @@ static void spapr_xive_irq(sPAPRXive *xive, int srcno)\n> > >  }\n> > >  \n> > >  /*\n> > > + * \"magic\" Event State Buffer (ESB) MMIO offsets.\n> > > + *\n> > > + * Each interrupt source has a 2-bit state machine called ESB\n> > > + * which can be controlled by MMIO. It's made of 2 bits, P and\n> > > + * Q. P indicates that an interrupt is pending (has been sent\n> > > + * to a queue and is waiting for an EOI). Q indicates that the\n> > > + * interrupt has been triggered while pending.\n> > > + *\n> > > + * This acts as a coalescing mechanism in order to guarantee\n> > > + * that a given interrupt only occurs at most once in a queue.\n> > > + *\n> > > + * When doing an EOI, the Q bit will indicate if the interrupt\n> > > + * needs to be re-triggered.\n> > > + *\n> > > + * The following offsets into the ESB MMIO allow to read or\n> > > + * manipulate the PQ bits. They must be used with an 8-bytes\n> > > + * load instruction. They all return the previous state of the\n> > > + * interrupt (atomically).\n> > > + *\n> > > + * Additionally, some ESB pages support doing an EOI via a\n> > > + * store at 0 and some ESBs support doing a trigger via a\n> > > + * separate trigger page.\n> > > + */\n> > > +#define XIVE_ESB_GET            0x800\n> > > +#define XIVE_ESB_SET_PQ_00      0xc00\n> > > +#define XIVE_ESB_SET_PQ_01      0xd00\n> > > +#define XIVE_ESB_SET_PQ_10      0xe00\n> > > +#define XIVE_ESB_SET_PQ_11      0xf00\n> > > +\n> > > +#define XIVE_ESB_VAL_P          0x2\n> > > +#define XIVE_ESB_VAL_Q          0x1\n> > > +\n> > > +#define XIVE_ESB_RESET          0x0\n> > > +#define XIVE_ESB_PENDING        XIVE_ESB_VAL_P\n> > > +#define XIVE_ESB_QUEUED         (XIVE_ESB_VAL_P | XIVE_ESB_VAL_Q)\n> > > +#define XIVE_ESB_OFF            XIVE_ESB_VAL_Q\n> > > +\n> > > +static uint8_t spapr_xive_pq_get(sPAPRXive *xive, uint32_t idx)\n> > > +{\n> > > +    uint32_t byte = idx / 4;\n> > > +    uint32_t bit  = (idx % 4) * 2;\n> > > +\n> > > +    assert(byte < xive->sbe_size);\n> > > +\n> > > +    return (xive->sbe[byte] >> bit) & 0x3;\n> > > +}\n> > > +\n> > > +static uint8_t spapr_xive_pq_set(sPAPRXive *xive, uint32_t idx, uint8_t pq)\n> > > +{\n> > > +    uint32_t byte = idx / 4;\n> > > +    uint32_t bit  = (idx % 4) * 2;\n> > > +    uint8_t old, new;\n> > > +\n> > > +    assert(byte < xive->sbe_size);\n> > > +\n> > > +    old = xive->sbe[byte];\n> > > +\n> > > +    new = xive->sbe[byte] & ~(0x3 << bit);\n> > > +    new |= (pq & 0x3) << bit;\n> > > +\n> > > +    xive->sbe[byte] = new;\n> > > +\n> > > +    return (old >> bit) & 0x3;\n> > > +}\n> > > +\n> > > +static bool spapr_xive_pq_eoi(sPAPRXive *xive, uint32_t srcno)\n> > > +{\n> > > +    uint8_t old_pq = spapr_xive_pq_get(xive, srcno);\n> > > +\n> > > +    switch (old_pq) {\n> > > +    case XIVE_ESB_RESET:\n> > > +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_RESET);\n> > > +        return false;\n> > > +    case XIVE_ESB_PENDING:\n> > > +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_RESET);\n> > > +        return false;\n> > > +    case XIVE_ESB_QUEUED:\n> > > +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_PENDING);\n> > > +        return true;\n> > > +    case XIVE_ESB_OFF:\n> > > +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_OFF);\n> > > +        return false;\n> > > +    default:\n> > > +         g_assert_not_reached();\n> > > +    }\n> > > +}\n> > > +\n> > > +static bool spapr_xive_pq_trigger(sPAPRXive *xive, uint32_t srcno)\n> > > +{\n> > > +    uint8_t old_pq = spapr_xive_pq_get(xive, srcno);\n> > > +\n> > > +    switch (old_pq) {\n> > > +    case XIVE_ESB_RESET:\n> > > +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_PENDING);\n> > > +        return true;\n> > > +    case XIVE_ESB_PENDING:\n> > > +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_QUEUED);\n> > > +        return true;\n> > > +    case XIVE_ESB_QUEUED:\n> > > +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_QUEUED);\n> > > +        return true;\n> > > +    case XIVE_ESB_OFF:\n> > > +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_OFF);\n> > > +        return false;\n> > > +    default:\n> > > +         g_assert_not_reached();\n> > > +    }\n> > > +}\n> > > +\n> > > +/*\n> > > + * XIVE Interrupt Source MMIOs\n> > > + */\n> > > +static void spapr_xive_source_eoi(sPAPRXive *xive, uint32_t srcno)\n> > > +{\n> > > +    ICSIRQState *irq = &xive->ics->irqs[srcno];\n> > > +\n> > > +    if (irq->flags & XICS_FLAGS_IRQ_LSI) {\n> > > +        irq->status &= ~XICS_STATUS_SENT;\n> > > +    }\n> > > +}\n> > > +\n> > > +/* TODO: handle second page\n> > > + *\n> > > + * Some HW use a separate page for trigger. We only support the case\n> > > + * in which the trigger can be done in the same page as the EOI.\n> > > + */\n> > > +static uint64_t spapr_xive_esb_read(void *opaque, hwaddr addr, unsigned size)\n> > > +{\n> > > +    sPAPRXive *xive = SPAPR_XIVE(opaque);\n> > > +    uint32_t offset = addr & 0xF00;\n> > > +    uint32_t srcno = addr >> xive->esb_shift;\n> > > +    XiveIVE *ive;\n> > > +    uint64_t ret = -1;\n> > > +\n> > > +    ive = spapr_xive_get_ive(xive, srcno);\n> > > +    if (!ive || !(ive->w & IVE_VALID))  {\n> > > +        qemu_log_mask(LOG_GUEST_ERROR, \"XIVE: invalid LISN %d\\n\", srcno);\n> > > +        goto out;\n> > \n> > Since there's a whole (4k) page for each source, I wonder if we should\n> > actually map each one as a separate MMIO region to allow us to tweak\n> > the mappings more flexibly.\n> > \n> > > +    }\n> > > +\n> > > +    switch (offset) {\n> > > +    case 0:\n> > > +        spapr_xive_source_eoi(xive, srcno);\n> > > +\n> > > +        /* return TRUE or FALSE depending on PQ value */\n> > > +        ret = spapr_xive_pq_eoi(xive, srcno);\n> > > +        break;\n> > > +\n> > > +    case XIVE_ESB_GET:\n> > > +        ret = spapr_xive_pq_get(xive, srcno);\n> > > +        break;\n> > > +\n> > > +    case XIVE_ESB_SET_PQ_00:\n> > > +    case XIVE_ESB_SET_PQ_01:\n> > > +    case XIVE_ESB_SET_PQ_10:\n> > > +    case XIVE_ESB_SET_PQ_11:\n> > > +        ret = spapr_xive_pq_set(xive, srcno, (offset >> 8) & 0x3);\n> > > +        break;\n> > > +    default:\n> > > +        qemu_log_mask(LOG_GUEST_ERROR, \"XIVE: invalid ESB addr %d\\n\", offset);\n> > > +    }\n> > > +\n> > > +out:\n> > > +    return ret;\n> > > +}\n> > > +\n> > > +static void spapr_xive_esb_write(void *opaque, hwaddr addr,\n> > > +                           uint64_t value, unsigned size)\n> > > +{\n> > > +    sPAPRXive *xive = SPAPR_XIVE(opaque);\n> > > +    uint32_t offset = addr & 0xF00;\n> > > +    uint32_t srcno = addr >> xive->esb_shift;\n> > > +    XiveIVE *ive;\n> > > +    bool notify = false;\n> > > +\n> > > +    ive = spapr_xive_get_ive(xive, srcno);\n> > > +    if (!ive || !(ive->w & IVE_VALID))  {\n> > > +        qemu_log_mask(LOG_GUEST_ERROR, \"XIVE: invalid LISN %d\\n\", srcno);\n> > > +        return;\n> > > +    }\n> > > +\n> > > +    switch (offset) {\n> > > +    case 0:\n> > > +        /* TODO: should we trigger even if the IVE is masked ? */\n> > > +        notify = spapr_xive_pq_trigger(xive, srcno);\n> > > +        break;\n> > > +    default:\n> > > +        qemu_log_mask(LOG_GUEST_ERROR, \"XIVE: invalid ESB write addr %d\\n\",\n> > > +                      offset);\n> > > +        return;\n> > > +    }\n> > > +\n> > > +    if (notify && !(ive->w & IVE_MASKED)) {\n> > > +        qemu_irq_pulse(xive->qirqs[srcno]);\n> > > +    }\n> > > +}\n> > > +\n> > > +static const MemoryRegionOps spapr_xive_esb_ops = {\n> > > +    .read = spapr_xive_esb_read,\n> > > +    .write = spapr_xive_esb_write,\n> > > +    .endianness = DEVICE_BIG_ENDIAN,\n> > > +    .valid = {\n> > > +        .min_access_size = 8,\n> > > +        .max_access_size = 8,\n> > > +    },\n> > > +    .impl = {\n> > > +        .min_access_size = 8,\n> > > +        .max_access_size = 8,\n> > > +    },\n> > > +};\n> > > +\n> > > +/*\n> > >   * XIVE Interrupt Source\n> > >   */\n> > >  static void spapr_xive_source_set_irq_msi(sPAPRXive *xive, int srcno, int val)\n> > > @@ -74,6 +286,33 @@ static void spapr_xive_source_set_irq(void *opaque, int srcno, int val)\n> > >  /*\n> > >   * Main XIVE object\n> > >   */\n> > > +#define P9_MMIO_BASE     0x006000000000000ull\n> > > +\n> > > +/* VC BAR contains set translations for the ESBs and the EQs. */\n> > > +#define VC_BAR_DEFAULT   0x10000000000ull\n> > > +#define VC_BAR_SIZE      0x08000000000ull\n> > > +#define ESB_SHIFT        16 /* One 64k page. OPAL has two */\n> > > +\n> > > +static uint64_t spapr_xive_esb_default_read(void *p, hwaddr offset,\n> > > +                                            unsigned size)\n> > > +{\n> > > +    qemu_log_mask(LOG_UNIMP, \"%s: 0x%\" HWADDR_PRIx \" [%u]\\n\",\n> > > +                  __func__, offset, size);\n> > > +    return 0;\n> > > +}\n> > > +\n> > > +static void spapr_xive_esb_default_write(void *opaque, hwaddr offset,\n> > > +                                         uint64_t value, unsigned size)\n> > > +{\n> > > +    qemu_log_mask(LOG_UNIMP, \"%s: 0x%\" HWADDR_PRIx \" <- 0x%\" PRIx64 \" [%u]\\n\",\n> > > +                  __func__, offset, value, size);\n> > > +}\n> > > +\n> > > +static const MemoryRegionOps spapr_xive_esb_default_ops = {\n> > > +    .read = spapr_xive_esb_default_read,\n> > > +    .write = spapr_xive_esb_default_write,\n> > > +    .endianness = DEVICE_BIG_ENDIAN,\n> > > +};\n> > >  \n> > >  void spapr_xive_reset(void *dev)\n> > >  {\n> > > @@ -144,6 +383,22 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)\n> > >      xive->nr_eqs = xive->nr_targets * XIVE_EQ_PRIORITY_COUNT;\n> > >      xive->eqt = g_malloc0(xive->nr_eqs * sizeof(XiveEQ));\n> > >  \n> > > +    /* VC BAR. That's the full window but we will only map the\n> > > +     * subregions in use. */\n> > > +    xive->esb_base = (P9_MMIO_BASE | VC_BAR_DEFAULT);\n> > > +    xive->esb_shift = ESB_SHIFT;\n> > > +\n> > > +    /* Install default memory region handlers to log bogus access */\n> > > +    memory_region_init_io(&xive->esb_mr, NULL, &spapr_xive_esb_default_ops,\n> > > +                          NULL, \"xive.esb.full\", VC_BAR_SIZE);\n> > > +    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &xive->esb_mr);\n> > > +\n> > > +    /* Install the ESB memory region in the overall one */\n> > > +    memory_region_init_io(&xive->esb_iomem, OBJECT(xive), &spapr_xive_esb_ops,\n> > > +                          xive, \"xive.esb\",\n> > > +                          (1ull << xive->esb_shift) * xive->nr_irqs);\n> > > +    memory_region_add_subregion(&xive->esb_mr, 0, &xive->esb_iomem);\n> > > +\n> > >      qemu_register_reset(spapr_xive_reset, dev);\n> > >  }\n> > >  \n> > > diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h\n> > > index eab92c4c1bb8..0f516534d76a 100644\n> > > --- a/include/hw/ppc/spapr_xive.h\n> > > +++ b/include/hw/ppc/spapr_xive.h\n> > > @@ -46,6 +46,12 @@ struct sPAPRXive {\n> > >      XiveIVE      *ivt;\n> > >      XiveEQ       *eqt;\n> > >      uint32_t     nr_eqs;\n> > > +\n> > > +    /* ESB memory region */\n> > > +    uint32_t     esb_shift;\n> > > +    hwaddr       esb_base;\n> > > +    MemoryRegion esb_mr;\n> > > +    MemoryRegion esb_iomem;\n> > >  };\n> > >  \n> > >  #endif /* PPC_SPAPR_XIVE_H */","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":"ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=nongnu.org\n\t(client-ip=2001:4830:134:3::11; helo=lists.gnu.org;\n\tenvelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n\treceiver=<UNKNOWN>)","Received":["from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])\n\t(using TLSv1 with cipher AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3y2nsP2dcKz9tXN\n\tfor <incoming@patchwork.ozlabs.org>;\n\tThu, 28 Sep 2017 18:28:57 +1000 (AEST)","from localhost ([::1]:57942 helo=lists.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.71) (envelope-from\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)\n\tid 1dxUBj-0000DY-G3\n\tfor incoming@patchwork.ozlabs.org; Thu, 28 Sep 2017 04:28:55 -0400","from eggs.gnu.org ([2001:4830:134:3::10]:48003)\n\tby lists.gnu.org with esmtp (Exim 4.71)\n\t(envelope-from <benh@kernel.crashing.org>) id 1dxUB3-00008O-MI\n\tfor qemu-devel@nongnu.org; Thu, 28 Sep 2017 04:28:19 -0400","from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)\n\t(envelope-from <benh@kernel.crashing.org>) id 1dxUAw-0004fc-Tw\n\tfor qemu-devel@nongnu.org; Thu, 28 Sep 2017 04:28:13 -0400","from gate.crashing.org ([63.228.1.57]:47979)\n\tby eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)\n\t(Exim 4.71) (envelope-from <benh@kernel.crashing.org>)\n\tid 1dxUAv-0004be-9L; Thu, 28 Sep 2017 04:28:06 -0400","from localhost (localhost.localdomain [127.0.0.1])\n\tby gate.crashing.org (8.14.1/8.13.8) with ESMTP id v8S8RN82029192;\n\tThu, 28 Sep 2017 03:27:26 -0500"],"Message-ID":"<1506587243.25626.24.camel@kernel.crashing.org>","From":"Benjamin Herrenschmidt <benh@kernel.crashing.org>","To":"=?iso-8859-1?q?C=E9dric?= Le Goater <clg@kaod.org>, David Gibson\n\t<david@gibson.dropbear.id.au>","Date":"Thu, 28 Sep 2017 10:27:23 +0200","In-Reply-To":"<131a8102-00e4-9621-ddca-39bc685b95cf@kaod.org>","References":"<20170911171235.29331-1-clg@kaod.org>\n\t<20170911171235.29331-8-clg@kaod.org> <20170919025747.GM27153@umbus>\n\t<131a8102-00e4-9621-ddca-39bc685b95cf@kaod.org>","Content-Type":"text/plain; charset=\"UTF-8\"","X-Mailer":"Evolution 3.24.5 (3.24.5-1.fc26) ","Mime-Version":"1.0","Content-Transfer-Encoding":"quoted-printable","X-MIME-Autoconverted":"from 8bit to quoted-printable by gate.crashing.org id\n\tv8S8RN82029192","X-detected-operating-system":"by eggs.gnu.org: GNU/Linux 2.6.x [fuzzy]","X-Received-From":"63.228.1.57","Subject":"Re: [Qemu-devel] [RFC PATCH v2 07/21] ppc/xive: add MMIO handlers\n\tfor the XIVE interrupt sources","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.21","Precedence":"list","List-Id":"<qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<http://lists.nongnu.org/archive/html/qemu-devel/>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Cc":"Alexey Kardashevskiy <aik@ozlabs.ru>, qemu-ppc@nongnu.org,\n\tqemu-devel@nongnu.org, Alexander Graf <agraf@suse.de>","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"\"Qemu-devel\"\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>"}},{"id":1776858,"web_url":"http://patchwork.ozlabs.org/comment/1776858/","msgid":"<1506587342.25626.26.camel@kernel.crashing.org>","list_archive_url":null,"date":"2017-09-28T08:29:02","subject":"Re: [Qemu-devel] [RFC PATCH v2 07/21] ppc/xive: add MMIO handlers\n\tfor the XIVE interrupt sources","submitter":{"id":38,"url":"http://patchwork.ozlabs.org/api/people/38/","name":"Benjamin Herrenschmidt","email":"benh@kernel.crashing.org"},"content":"On Wed, 2017-09-20 at 15:05 +0200, Cédric Le Goater wrote:\n> > > +/*\n> > > + * XIVE Interrupt Source MMIOs\n> > > + */\n> > > +static uint64_t spapr_xive_esb_read(void *opaque, hwaddr addr, unsigned size)\n> > > +{\n> > > +    sPAPRXive *xive = SPAPR_XIVE(opaque);\n> > > +    uint32_t offset = addr & 0xF00;\n> > > +    uint32_t srcno = addr >> xive->esb_shift;\n> > > +    XiveIVE *ive;\n> > > +    uint64_t ret = -1;\n> > > +\n> > > +    ive = spapr_xive_get_ive(xive, srcno);\n> > > +    if (!ive || !(ive->w & IVE_VALID))  {\n> > > +        qemu_log_mask(LOG_GUEST_ERROR, \"XIVE: invalid LISN %d\\n\", srcno);\n> > > +        goto out;\n> > \n> > Since there's a whole (4k) page for each source, I wonder if we should\n> > actually map each one as a separate MMIO region to allow us to tweak\n> > the mappings more flexibly\n> \n> yes we could have a subregion for each source. In that case, \n> we should also handle IVE_VALID properly. That will require \n> a specific XIVE allocator which was difficult to do while\n> keeping the compatibility with XICS for migration and CAS.\n\nThat will be a serious bloat with lots of interrupts. We also cannot\npossibly have a KVM mm region per interrupt or even a vma.\n\nI'm thinking of some kind of /dev/xive (or some other KVM or irqfd\norignated fd) that allows you to mmap a single big region whose content\nis demand-faulted and invalidated by the kernel to map the various\ninterrupts.\n\nSo that it looks like a single VMA (and KVM memory block).\n\nBen.\n\n> C.\n> \n>","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":"ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=nongnu.org\n\t(client-ip=2001:4830:134:3::11; helo=lists.gnu.org;\n\tenvelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n\treceiver=<UNKNOWN>)","Received":["from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])\n\t(using TLSv1 with cipher AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3y2nvD34nFz9rxl\n\tfor <incoming@patchwork.ozlabs.org>;\n\tThu, 28 Sep 2017 18:30:32 +1000 (AEST)","from localhost ([::1]:57950 helo=lists.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.71) (envelope-from\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)\n\tid 1dxUDG-0001WF-Iy\n\tfor incoming@patchwork.ozlabs.org; Thu, 28 Sep 2017 04:30:30 -0400","from eggs.gnu.org ([2001:4830:134:3::10]:49067)\n\tby lists.gnu.org with esmtp (Exim 4.71)\n\t(envelope-from <benh@kernel.crashing.org>) id 1dxUCO-00016B-IS\n\tfor qemu-devel@nongnu.org; Thu, 28 Sep 2017 04:29:37 -0400","from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)\n\t(envelope-from <benh@kernel.crashing.org>) id 1dxUCN-0005Ta-LY\n\tfor qemu-devel@nongnu.org; Thu, 28 Sep 2017 04:29:36 -0400","from gate.crashing.org ([63.228.1.57]:48003)\n\tby eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)\n\t(Exim 4.71) (envelope-from <benh@kernel.crashing.org>)\n\tid 1dxUCM-0005RN-Pr; Thu, 28 Sep 2017 04:29:35 -0400","from localhost (localhost.localdomain [127.0.0.1])\n\tby gate.crashing.org (8.14.1/8.13.8) with ESMTP id v8S8T2rQ029311;\n\tThu, 28 Sep 2017 03:29:05 -0500"],"Message-ID":"<1506587342.25626.26.camel@kernel.crashing.org>","From":"Benjamin Herrenschmidt <benh@kernel.crashing.org>","To":"=?iso-8859-1?q?C=E9dric?= Le Goater <clg@kaod.org>, David Gibson\n\t<david@gibson.dropbear.id.au>","Date":"Thu, 28 Sep 2017 10:29:02 +0200","In-Reply-To":"<771e897f-f304-3616-2e9c-e582f556b914@kaod.org>","References":"<20170911171235.29331-1-clg@kaod.org>\n\t<20170911171235.29331-8-clg@kaod.org> <20170919025747.GM27153@umbus>\n\t<771e897f-f304-3616-2e9c-e582f556b914@kaod.org>","Content-Type":"text/plain; charset=\"UTF-8\"","X-Mailer":"Evolution 3.24.5 (3.24.5-1.fc26) ","Mime-Version":"1.0","Content-Transfer-Encoding":"quoted-printable","X-MIME-Autoconverted":"from 8bit to quoted-printable by gate.crashing.org id\n\tv8S8T2rQ029311","X-detected-operating-system":"by eggs.gnu.org: GNU/Linux 2.6.x [fuzzy]","X-Received-From":"63.228.1.57","Subject":"Re: [Qemu-devel] [RFC PATCH v2 07/21] ppc/xive: add MMIO handlers\n\tfor the XIVE interrupt sources","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.21","Precedence":"list","List-Id":"<qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<http://lists.nongnu.org/archive/html/qemu-devel/>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Cc":"Alexey Kardashevskiy <aik@ozlabs.ru>, qemu-ppc@nongnu.org,\n\tqemu-devel@nongnu.org, Alexander Graf <agraf@suse.de>","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"\"Qemu-devel\"\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>"}}]