diff mbox series

[v6,09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents

Message ID 20240325190339.696686-10-nifan.cxl@gmail.com
State New
Headers show
Series Enabling DCD emulation support in Qemu | expand

Commit Message

fan March 25, 2024, 7:02 p.m. UTC
From: Fan Ni <fan.ni@samsung.com>

To simulate FM functionalities for initiating Dynamic Capacity Add
(Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
add/release dynamic capacity extents requests.

With the change, we allow to release an extent only when its DPA range
is contained by a single accepted extent in the device. That is to say,
extent superset release is not supported yet.

1. Add dynamic capacity extents:

For example, the command to add two continuous extents (each 128MiB long)
to region 0 (starting at DPA offset 0) looks like below:

{ "execute": "qmp_capabilities" }

{ "execute": "cxl-add-dynamic-capacity",
  "arguments": {
      "path": "/machine/peripheral/cxl-dcd0",
      "region-id": 0,
      "extents": [
      {
          "offset": 0,
          "len": 134217728
      },
      {
          "offset": 134217728,
          "len": 134217728
      }
      ]
  }
}

2. Release dynamic capacity extents:

For example, the command to release an extent of size 128MiB from region 0
(DPA offset 128MiB) looks like below:

{ "execute": "cxl-release-dynamic-capacity",
  "arguments": {
      "path": "/machine/peripheral/cxl-dcd0",
      "region-id": 0,
      "extents": [
      {
          "offset": 134217728,
          "len": 134217728
      }
      ]
  }
}

Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/cxl/cxl-mailbox-utils.c  |  26 ++--
 hw/mem/cxl_type3.c          | 252 +++++++++++++++++++++++++++++++++++-
 hw/mem/cxl_type3_stubs.c    |  14 ++
 include/hw/cxl/cxl_device.h |   8 ++
 include/hw/cxl/cxl_events.h |  18 +++
 qapi/cxl.json               |  61 ++++++++-
 6 files changed, 367 insertions(+), 12 deletions(-)

Comments

Gregory Price April 3, 2024, 6:16 p.m. UTC | #1
On Mon, Mar 25, 2024 at 12:02:27PM -0700, nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
> 
> To simulate FM functionalities for initiating Dynamic Capacity Add
> (Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
> r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
> add/release dynamic capacity extents requests.
> 
... snip 
> +
> +/*
> + * The main function to process dynamic capacity event. Currently DC extents
> + * add/release requests are processed.
> + */
> +static void qmp_cxl_process_dynamic_capacity(const char *path, CxlEventLog log,
> +                                             CXLDCEventType type, uint16_t hid,
> +                                             uint8_t rid,
> +                                             CXLDCExtentRecordList *records,
> +                                             Error **errp)
> +{
... snip 
> +    /* Sanity check and count the extents */
> +    list = records;
> +    while (list) {
> +        offset = list->value->offset;
> +        len = list->value->len;
> +        dpa = offset + dcd->dc.regions[rid].base;
> +
> +        if (len == 0) {
> +            error_setg(errp, "extent with 0 length is not allowed");
> +            return;
> +        }
> +
> +        if (offset % block_size || len % block_size) {
> +            error_setg(errp, "dpa or len is not aligned to region block size");
> +            return;
> +        }
> +
> +        if (offset + len > dcd->dc.regions[rid].len) {
> +            error_setg(errp, "extent range is beyond the region end");
> +            return;
> +        }
> +
> +        /* No duplicate or overlapped extents are allowed */
> +        if (test_any_bits_set(blk_bitmap, offset / block_size,
> +                              len / block_size)) {
> +            error_setg(errp, "duplicate or overlapped extents are detected");
> +            return;
> +        }
> +        bitmap_set(blk_bitmap, offset / block_size, len / block_size);
> +
> +        num_extents++;

I think num_extents is always equal to the length of the list, otherwise
this code will return with error.

Nitpick:
This can be moved to the bottom w/ `list = list->next` to express that a
little more clearly.

> +        if (type == DC_EVENT_RELEASE_CAPACITY) {
> +            if (cxl_extents_overlaps_dpa_range(&dcd->dc.extents_pending,
> +                                               dpa, len)) {
> +                error_setg(errp,
> +                           "cannot release extent with pending DPA range");
> +                return;
> +            }
> +            if (!cxl_extents_contains_dpa_range(&dcd->dc.extents,
> +                                                dpa, len)) {
> +                error_setg(errp,
> +                           "cannot release extent with non-existing DPA range");
> +                return;
> +            }
> +        }
> +        list = list->next;
> +    }
> +
> +    if (num_extents == 0) {

Since num_extents is always the length of the list, this is equivalent to
`if (!records)` prior to the while loop. Makes it a little more clear that:

1. There must be at least 1 extent
2. All extents must be valid for the command to be serviced.

> +        error_setg(errp, "no valid extents to send to process");
> +        return;
> +    }
> +

I'm looking at adding the MHD extensions around this point, e.g.:

/* If MHD cannot allocate requested extents, the cmd fails */
if (type == DC_EVENT_ADD_CAPACITY && dcd->mhd_dcd_extents_allocate &&
    num_extents != dcd->mhd_dcd_extents_allocate(...))
	return;

where mhd_dcd_extents_allocate checks the MHD block bitmap and tags
for correctness (shared // no double-allocations, etc). On success,
it garuantees proper ownership.

the release path would then be done in the release response path from
the host, as opposed to the release event injection.

Do you see any issues with that flow?

> +    /* Create extent list for event being passed to host */
> +    i = 0;
> +    list = records;
> +    extents = g_new0(CXLDCExtentRaw, num_extents);
> +    while (list) {
> +        offset = list->value->offset;
> +        len = list->value->len;
> +        dpa = dcd->dc.regions[rid].base + offset;
> +
> +        extents[i].start_dpa = dpa;
> +        extents[i].len = len;
> +        memset(extents[i].tag, 0, 0x10);
> +        extents[i].shared_seq = 0;
> +        list = list->next;
> +        i++;
> +    }
> +
> +    /*
> +     * CXL r3.1 section 8.2.9.2.1.6: Dynamic Capacity Event Record
> +     *
> +     * All Dynamic Capacity event records shall set the Event Record Severity
> +     * field in the Common Event Record Format to Informational Event. All
> +     * Dynamic Capacity related events shall be logged in the Dynamic Capacity
> +     * Event Log.
> +     */
> +    cxl_assign_event_header(hdr, &dynamic_capacity_uuid, flags, sizeof(dCap),
> +                            cxl_device_get_timestamp(&dcd->cxl_dstate));
> +
> +    dCap.type = type;
> +    /* FIXME: for now, validity flag is cleared */
> +    dCap.validity_flags = 0;
> +    stw_le_p(&dCap.host_id, hid);
> +    /* only valid for DC_REGION_CONFIG_UPDATED event */
> +    dCap.updated_region_id = 0;
> +    /*
> +     * FIXME: for now, the "More" flag is cleared as there is only one
> +     * extent associating with each record and tag-based release is
> +     * not supported.
> +     */
> +    dCap.flags = 0;
> +    for (i = 0; i < num_extents; i++) {
> +        memcpy(&dCap.dynamic_capacity_extent, &extents[i],
> +               sizeof(CXLDCExtentRaw));
> +
> +        if (type == DC_EVENT_ADD_CAPACITY) {
> +            cxl_insert_extent_to_extent_list(&dcd->dc.extents_pending,
> +                                             extents[i].start_dpa,
> +                                             extents[i].len,
> +                                             extents[i].tag,
> +                                             extents[i].shared_seq);
> +        }
> +
> +        if (cxl_event_insert(&dcd->cxl_dstate, enc_log,
> +                             (CXLEventRecordRaw *)&dCap)) {

Pardon if I missed a prior discussion about this, but what happens to
pending events in the scenario where cxl_event_insert fails?

~Gregory
Gregory Price April 5, 2024, 4:07 p.m. UTC | #2
On Fri, Apr 05, 2024 at 01:27:19PM +0100, Jonathan Cameron wrote:
> On Wed, 3 Apr 2024 14:16:25 -0400
> Gregory Price <gregory.price@memverge.com> wrote:
> 
> A few follow up comments.
> 
> > 
> > > +        error_setg(errp, "no valid extents to send to process");
> > > +        return;
> > > +    }
> > > +  
> > 
> > I'm looking at adding the MHD extensions around this point, e.g.:
> > 
> > /* If MHD cannot allocate requested extents, the cmd fails */
> > if (type == DC_EVENT_ADD_CAPACITY && dcd->mhd_dcd_extents_allocate &&
> >     num_extents != dcd->mhd_dcd_extents_allocate(...))
> > 	return;
> > 
> > where mhd_dcd_extents_allocate checks the MHD block bitmap and tags
> > for correctness (shared // no double-allocations, etc). On success,
> > it garuantees proper ownership.
> > 
> > the release path would then be done in the release response path from
> > the host, as opposed to the release event injection.
> 
> I think it would be polite to check if the QMP command on release
> for whether it is asking something plausible - makes for an easier
> to user QMP interface.  I guess it's not strictly required though.
> What races are there on release?

The only real critical section, barring force-release beign supported,
is when you clear the bits in the device allowing new requests to swipe
those blocks. The appropriate place appears to be after the host kernel
has responded to the release extent request.

Also need to handle the case of multiple add-requests contending for the
same region, but that's just an "oops failed to get all the bits, roll
back" scenario - easy to handle.

Could go coarse-grained to just lock access to the bitmap entirely while
operating on it, or be fancy and use atomics to go lockless. The latter
code already exists in the Niagara model for reference.

> We aren't support force release
> for now, and for anything else, it's host specific (unlike add where
> the extra rules kick in).   AS such I 'think' a check at command
> time will be valid as long as the host hasn't done an async
> release of capacity between that and the event record.  That
> is a race we always have and the host should at most log it and
> not release capacity twice.
>

Borrowing from the Ira's flow chart, here are the pieces I believe are
needed to implement MHD support for DCD.

Orchestrator      FM         Device       Host Kernel    Host User

    |             |           |            |              |
    |-- Add ----->|-- Add --->A--- Add --->|              |
    |  Capacity   |  Extent   |   Extent   |              |
    |             |           |            |              |
    |             |<--Accept--B<--Accept --|              |
    |             |   Extent  |   Extent   |              |
    |             |           |            |              |
    |             |     ... snip ...       |              |
    |             |           |            |              |
    |-- Remove -->|--Release->C---Release->|              |
    |  Capacity   |  Extent   |   Extent   |              |
    |             |           |            |              |
    |             |<-Release--D<--Release--|              |
    |             |  Extent   |   Extent   |              |
    |             |           |            |              |

1. (A) Upon Device Receiving Add Capacity Request
   a. the device sanity checks the request against local mappings
   b. the mhd hook is called to sanity check against global mappings
   c. the mhd bitmap is updated, marking the capacity owned by that head

   function: qmp_cxl_process_dynamic_capacity

2. (B) Upon Device Receiving Add Dynamic Capacity Response
   a. accepted extents are compared to the original request
   b. not accepted extents are cleared from the bitmap (local and MHD)
   (Note: My understanding is that for now each request = 1 extent)

   function: cmd_dcd_add_dyn_cap_rsp

3. (C) Upon Device receiving Release Dynamic Capacity Request
   a. check for a pending release request. If exists, error.
   b. check that the bits in the MHD bitmap are actually set

   function: qmp_cxl_process_dynamic_capacity

4. (D) Upon Device receiving Release Dynamic Capacity Response
   a. clear the bits in the mhd bitmap
   b. remove the pending request from the pending list

   function: cmd_dcd_release_dyn_cap

Something to note: The MHD bitmap is essentially the same as the
existing DCD extent bitmap - except that it is located in a shared
region of memory (mmap file, shm, whatever - pick one).

Maybe it's worth abstracting out the bitmap twiddling to make that
backable by a file mmap'd SHARED and use atomics to twiddle the bits?

That would be about 90% of the way to MH-DCD.

Maybe flock() could be used for coarse locking on the a shared bitmap
in the short term?  This mitigates your concern of using shm.h as
the coordination piece, though i'm not sure how portable flock() is.

~Gregory
Gregory Price April 5, 2024, 6:09 p.m. UTC | #3
On Fri, Apr 05, 2024 at 06:44:52PM +0100, Jonathan Cameron wrote:
> On Fri, 5 Apr 2024 12:07:45 -0400
> Gregory Price <gregory.price@memverge.com> wrote:
> 
> > 3. (C) Upon Device receiving Release Dynamic Capacity Request
> >    a. check for a pending release request. If exists, error.
> 
> Not sure that's necessary - can queue as long as the head
> can track if the bits are in a pending release state.
> 

Yeah probably it's fine to just queue the event and everything
downstream just handles it.

> >    b. check that the bits in the MHD bitmap are actually set
> Good.
> > 
> >    function: qmp_cxl_process_dynamic_capacity
> > 
> > 4. (D) Upon Device receiving Release Dynamic Capacity Response
> >    a. clear the bits in the mhd bitmap
> >    b. remove the pending request from the pending list
> > 
> >    function: cmd_dcd_release_dyn_cap
> > 
> > Something to note: The MHD bitmap is essentially the same as the
> > existing DCD extent bitmap - except that it is located in a shared
> > region of memory (mmap file, shm, whatever - pick one).
> 
> I think you will ideally also have a per head one to track head access
> to the things offered by the mhd.
> 

Generally I try not to duplicate state, reduces consistency problems.

You do still need a shared memory state and a per-head state to capture
per-head data, but the allocation bitmap is really device-global state.

Either way you have a race condition when checking the bitmap during a
memory access in the process of adding/releasing capacity - but that's
more an indication of bad host behavior than it is of a bug in the
implementatio of the emulated device. Probably we don't need to
read-lock the bitmap (for access validation), only write-lock.

My preference, for what it's worth, would be to have a single bitmap
and have it be anonymous-memory for Single-head and file-backed for
for Multi-head.  I'll have to work out the locking mechanism.

~Gregory
fan April 9, 2024, 9:26 p.m. UTC | #4
On Fri, Apr 05, 2024 at 01:18:56PM +0100, Jonathan Cameron wrote:
> On Mon, 25 Mar 2024 12:02:27 -0700
> nifan.cxl@gmail.com wrote:
> 
> > From: Fan Ni <fan.ni@samsung.com>
> > 
> > To simulate FM functionalities for initiating Dynamic Capacity Add
> > (Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
> > r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
> > add/release dynamic capacity extents requests.
> > 
> > With the change, we allow to release an extent only when its DPA range
> > is contained by a single accepted extent in the device. That is to say,
> > extent superset release is not supported yet.
> > 
> > 1. Add dynamic capacity extents:
> > 
> > For example, the command to add two continuous extents (each 128MiB long)
> > to region 0 (starting at DPA offset 0) looks like below:
> > 
> > { "execute": "qmp_capabilities" }
> > 
> > { "execute": "cxl-add-dynamic-capacity",
> >   "arguments": {
> >       "path": "/machine/peripheral/cxl-dcd0",
> >       "region-id": 0,
> >       "extents": [
> >       {
> >           "offset": 0,
> >           "len": 134217728
> >       },
> >       {
> >           "offset": 134217728,
> >           "len": 134217728
> >       }
> 
> Hi Fan,
> 
> I talk more on this inline, but to me this interface takes multiple extents
> so that we can treat them as a single 'offer' of capacity. That is they
> should be linked in the event log with the more flag and the host should
> have to handle them in one go (I known Ira and Navneet's code doesn't handle
> this yet, but that doesn't mean QEMU shouldn't).
> 
> Alternative for now would be to only support a single entry.  Keep the
> interface defined to take multiple entries but reject it at runtime.
> 
> I don't want to end up with a more complex interface in the end just
> because we allowed this form to not set the MORE flag today.
> We will need this to do tagged handling and ultimately sharing, so good
> to get it right from the start.
> 
> For tagged handling I think the right option is to have the tag alongside
> region-id not in the individual extents.  That way the interface is naturally
> used to generate the right description to the host.
> 
> >       ]
> >   }
> > }
Hi Jonathan,
Thanks for the detailed comments.

For the QMP interface, I have one question. 
Do we want the interface to follow exactly as shown in
Table 7-70 and Table 7-71 in cxl r3.1?

Fan

> > 
> > 2. Release dynamic capacity extents:
> > 
> > For example, the command to release an extent of size 128MiB from region 0
> > (DPA offset 128MiB) looks like below:
> > 
> > { "execute": "cxl-release-dynamic-capacity",
> >   "arguments": {
> >       "path": "/machine/peripheral/cxl-dcd0",
> >       "region-id": 0,
> >       "extents": [
> >       {
> >           "offset": 134217728,
> >           "len": 134217728
> >       }
> >       ]
> >   }
> > }
> > 
> > Signed-off-by: Fan Ni <fan.ni@samsung.com>
> 
> 
> 
> >          /* to-be-added range should not overlap with range already accepted */
> >          QTAILQ_FOREACH(ent, &ct3d->dc.extents, node) {
> > @@ -1585,9 +1586,13 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
> >      CXLDCExtentList *extent_list = &ct3d->dc.extents;
> >      uint32_t i;
> >      uint64_t dpa, len;
> > +    CXLDCExtent *ent;
> >      CXLRetCode ret;
> >  
> >      if (in->num_entries_updated == 0) {
> > +        /* Always remove the first pending extent when response received. */
> > +        ent = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> > +        cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending, ent);
> >          return CXL_MBOX_SUCCESS;
> >      }
> >  
> > @@ -1604,6 +1609,8 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
> >  
> >      ret = cxl_dcd_add_dyn_cap_rsp_dry_run(ct3d, in);
> >      if (ret != CXL_MBOX_SUCCESS) {
> > +        ent = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> > +        cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending, ent);
> 
> Ah this deals with the todo I suggest you add to the earlier patch.
> I'd not mind so much if you hadn't been so thorough on other todo notes ;)
> Add one in the earlier patch and get rid of ti here like you do below.
> 
> However as I note below I think we need to handle these as groups of extents
> not single extents. That way we keep an 'offered' set offered at the same time by
> a single command (and expose to host using the more flag) together and reject
> them on mass.
> 
> 
> >          return ret;
> >      }
> >  
> > @@ -1613,10 +1620,9 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
> >  
> >          cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
> >          ct3d->dc.total_extent_count += 1;
> > -        /*
> > -         * TODO: we will add a pending extent list based on event log record
> > -         * and process the list according here.
> > -         */
> > +
> > +        ent = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> > +        cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending, ent);
> >      }
> >  
> >      return CXL_MBOX_SUCCESS;
> > diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> > index 951bd79a82..74cb64e843 100644
> > --- a/hw/mem/cxl_type3.c
> > +++ b/hw/mem/cxl_type3.c
> 
> >  
> >  static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
> > @@ -1449,7 +1454,8 @@ static int ct3d_qmp_cxl_event_log_enc(CxlEventLog log)
> >          return CXL_EVENT_TYPE_FAIL;
> >      case CXL_EVENT_LOG_FATAL:
> >          return CXL_EVENT_TYPE_FATAL;
> > -/* DCD not yet supported */
> 
> Drop the comment but don't add the code.  We are handling DCD differently
> from other events, so this code should never deal with it.
> 
> > +    case CXL_EVENT_LOG_DYNCAP:
> > +        return CXL_EVENT_TYPE_DYNAMIC_CAP;
> >      default:
> >          return -EINVAL;
> >      }
> > @@ -1700,6 +1706,250 @@ void qmp_cxl_inject_memory_module_event(const char *path, CxlEventLog log,
> >      }
> >  }
> 
> > +/*
> > + * Check whether the range [dpa, dpa + len -1] has overlaps with extents in
> 
> space after - (just looks odd otherwise)
> 
> > + * the list.
> > + * Return value: return true if has overlaps; otherwise, return false
> > + */
> > +static bool cxl_extents_overlaps_dpa_range(CXLDCExtentList *list,
> > +                                           uint64_t dpa, uint64_t len)
> > +{
> > +    CXLDCExtent *ent;
> > +    Range range1, range2;
> > +
> > +    if (!list) {
> > +        return false;
> > +    }
> > +
> > +    range_init_nofail(&range1, dpa, len);
> > +    QTAILQ_FOREACH(ent, list, node) {
> > +        range_init_nofail(&range2, ent->start_dpa, ent->len);
> > +        if (range_overlaps_range(&range1, &range2)) {
> > +            return true;
> > +        }
> > +    }
> > +    return false;
> > +}
> > +
> > +/*
> > + * Check whether the range [dpa, dpa + len -1] is contained by extents in 
> 
> space after -
> 
> > + * the list.
> > + * Will check multiple extents containment once superset release is added.
> > + * Return value: return true if range is contained; otherwise, return false
> > + */
> > +bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
> > +                                    uint64_t dpa, uint64_t len)
> > +{
> > +    CXLDCExtent *ent;
> > +    Range range1, range2;
> > +
> > +    if (!list) {
> > +        return false;
> > +    }
> > +
> > +    range_init_nofail(&range1, dpa, len);
> > +    QTAILQ_FOREACH(ent, list, node) {
> > +        range_init_nofail(&range2, ent->start_dpa, ent->len);
> > +        if (range_contains_range(&range2, &range1)) {
> > +            return true;
> > +        }
> > +    }
> > +    return false;
> > +}
> > +
> > +/*
> > + * The main function to process dynamic capacity event. Currently DC extents
> > + * add/release requests are processed.
> > + */
> > +static void qmp_cxl_process_dynamic_capacity(const char *path, CxlEventLog log,
> 
> As below. Don't pass in a CxlEventLog.  Whilst some infrastructure is shared
> with other event logs, we don't want to accidentally enable other events
> being added to the DC event log.
> 
> > +                                             CXLDCEventType type, uint16_t hid,
> > +                                             uint8_t rid,
> > +                                             CXLDCExtentRecordList *records,
> > +                                             Error **errp)
> > +{
> > +    Object *obj;
> > +    CXLEventDynamicCapacity dCap = {};
> > +    CXLEventRecordHdr *hdr = &dCap.hdr;
> > +    CXLType3Dev *dcd;
> > +    uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
> > +    uint32_t num_extents = 0;
> > +    CXLDCExtentRecordList *list;
> > +    g_autofree CXLDCExtentRaw *extents = NULL;
> > +    uint8_t enc_log;
> > +    uint64_t dpa, offset, len, block_size;
> > +    int i, rc;
> > +    g_autofree unsigned long *blk_bitmap = NULL;
> > +
> > +    obj = object_resolve_path_type(path, TYPE_CXL_TYPE3, NULL);
> > +    if (!obj) {
> > +        error_setg(errp, "Unable to resolve CXL type 3 device");
> > +        return;
> > +    }
> > +
> > +    dcd = CXL_TYPE3(obj);
> > +    if (!dcd->dc.num_regions) {
> > +        error_setg(errp, "No dynamic capacity support from the device");
> > +        return;
> > +    }
> > +
> > +    rc = ct3d_qmp_cxl_event_log_enc(log);
> 
> enc_log = CXL_EVENT_TYPE_DYNAMIC_CAP; always so don't look it up.
> 
> > +    if (rc < 0) {
> > +        error_setg(errp, "Unhandled error log type");
> > +        return;
> > +    }
> > +    enc_log = rc;
> > +
> > +    if (rid >= dcd->dc.num_regions) {
> > +        error_setg(errp, "region id is too large");
> > +        return;
> > +    }
> > +    block_size = dcd->dc.regions[rid].block_size;
> > +    blk_bitmap = bitmap_new(dcd->dc.regions[rid].len / block_size);
> > +
> > +    /* Sanity check and count the extents */
> > +    list = records;
> > +    while (list) {
> > +        offset = list->value->offset;
> > +        len = list->value->len;
> > +        dpa = offset + dcd->dc.regions[rid].base;
> > +
> > +        if (len == 0) {
> > +            error_setg(errp, "extent with 0 length is not allowed");
> > +            return;
> > +        }
> > +
> > +        if (offset % block_size || len % block_size) {
> > +            error_setg(errp, "dpa or len is not aligned to region block size");
> > +            return;
> > +        }
> > +
> > +        if (offset + len > dcd->dc.regions[rid].len) {
> > +            error_setg(errp, "extent range is beyond the region end");
> > +            return;
> > +        }
> > +
> > +        /* No duplicate or overlapped extents are allowed */
> > +        if (test_any_bits_set(blk_bitmap, offset / block_size,
> > +                              len / block_size)) {
> > +            error_setg(errp, "duplicate or overlapped extents are detected");
> > +            return;
> > +        }
> > +        bitmap_set(blk_bitmap, offset / block_size, len / block_size);
> > +
> > +        num_extents++;
> > +        if (type == DC_EVENT_RELEASE_CAPACITY) {
> > +            if (cxl_extents_overlaps_dpa_range(&dcd->dc.extents_pending,
> > +                                               dpa, len)) {
> > +                error_setg(errp,
> > +                           "cannot release extent with pending DPA range");
> > +                return;
> > +            }
> > +            if (!cxl_extents_contains_dpa_range(&dcd->dc.extents,
> > +                                                dpa, len)) {
> > +                error_setg(errp,
> > +                           "cannot release extent with non-existing DPA range");
> > +                return;
> > +            }
> > +        }
> > +        list = list->next;
> > +    }
> > +    if (num_extents == 0) {
> 
> We can just check if there is a first one.  That check can be done before
> counting them and is probably a little more elegant than leaving it to down here.
> I'm not sure we can pass in an empty list but if we can (easy to poke interface
> and check) then I assume records == NULL. 
> 
> > +        error_setg(errp, "no valid extents to send to process");
> > +        return;
> > +    }
> > +
> > +    /* Create extent list for event being passed to host */
> > +    i = 0;
> > +    list = records;
> > +    extents = g_new0(CXLDCExtentRaw, num_extents);
> > +    while (list) {
> > +        offset = list->value->offset;
> > +        len = list->value->len;
> > +        dpa = dcd->dc.regions[rid].base + offset;
> > +
> > +        extents[i].start_dpa = dpa;
> > +        extents[i].len = len;
> > +        memset(extents[i].tag, 0, 0x10);
> > +        extents[i].shared_seq = 0;
> > +        list = list->next;
> > +        i++;
> > +    }
> > +
> > +    /*
> > +     * CXL r3.1 section 8.2.9.2.1.6: Dynamic Capacity Event Record
> > +     *
> > +     * All Dynamic Capacity event records shall set the Event Record Severity
> > +     * field in the Common Event Record Format to Informational Event. All
> > +     * Dynamic Capacity related events shall be logged in the Dynamic Capacity
> > +     * Event Log.
> > +     */
> > +    cxl_assign_event_header(hdr, &dynamic_capacity_uuid, flags, sizeof(dCap),
> > +                            cxl_device_get_timestamp(&dcd->cxl_dstate));
> > +
> > +    dCap.type = type;
> > +    /* FIXME: for now, validity flag is cleared */
> > +    dCap.validity_flags = 0;
> > +    stw_le_p(&dCap.host_id, hid);
> > +    /* only valid for DC_REGION_CONFIG_UPDATED event */
> > +    dCap.updated_region_id = 0;
> > +    /*
> > +     * FIXME: for now, the "More" flag is cleared as there is only one
> > +     * extent associating with each record and tag-based release is
> > +     * not supported.
> 
> This is misleading by my understanding of the specification.
> More isn't directly related to tags (though it is necessary for some
> flows with tags, when sharing is enabled anyway).
> The reference to record also isn't that relevant. The idea is you set
> it for all but the last record pushed to the event log (from a given
> action from an FM).
> 
> The whole reason to have a multi extent injection interface is to set
> the more flag to indicate that the OS needs to treat a bunch of extents
> as one 'offer' of capacity.  So a rejection from the OS needs to take
> out 'all those records'.  The proposed linux code will currently reject
> all by the first extent (I moaned about that yesterday). 
> 
> It is fine to not support this in the current code, but then I would check
> the number of extents and reject any multi extent commands until we
> do support it.
> 
> Ultimately I want a qmp command with more than one extent to mean
> they are one 'offer' of capacity and must be handled as such by
> the OS.  I.e. it can't reply with multiple unrelated acceptance
> or reject replies.
> 
> On the add side this is easy to support, the fiddly bit is if the
> OS rejects some or all of the capacity and you then need to
> take out all the extents offered that it hasn't accepted in it's reply.
> 
> Pending list will need to maintain that association.
> Maybe the easiest way is to have pending list be a list of sublists?
> That way each sublist is handled in one go and any non accepted extents
> in that sub list are dropped.
> 
>  
> > +     */
> > +    dCap.flags = 0;
> > +    for (i = 0; i < num_extents; i++) {
> > +        memcpy(&dCap.dynamic_capacity_extent, &extents[i],
> > +               sizeof(CXLDCExtentRaw));
> > +
> > +        if (type == DC_EVENT_ADD_CAPACITY) {
> > +            cxl_insert_extent_to_extent_list(&dcd->dc.extents_pending,
> > +                                             extents[i].start_dpa,
> > +                                             extents[i].len,
> > +                                             extents[i].tag,
> > +                                             extents[i].shared_seq);
> > +        }
> > +
> > +        if (cxl_event_insert(&dcd->cxl_dstate, enc_log,
> > +                             (CXLEventRecordRaw *)&dCap)) {
> > +            cxl_event_irq_assert(dcd);
> > +        }
> > +    }
> > +}
> > +
> > +void qmp_cxl_add_dynamic_capacity(const char *path, uint8_t region_id,
> > +                                  CXLDCExtentRecordList  *records,
> > +                                  Error **errp)
> > +{
> > +   qmp_cxl_process_dynamic_capacity(path, CXL_EVENT_LOG_DYNCAP,
> 
> Drop passing in the log, it doesn't make sense given these events only occur
> on that log and we can hard code it in the function.
> 
> > +                                    DC_EVENT_ADD_CAPACITY, 0,
> > +                                    region_id, records, errp);
> > +}
> > +
> > +void qmp_cxl_release_dynamic_capacity(const char *path, uint8_t region_id,
> > +                                      CXLDCExtentRecordList  *records,
> > +                                      Error **errp)
> > +{
> > +    qmp_cxl_process_dynamic_capacity(path, CXL_EVENT_LOG_DYNCAP,
> > +                                     DC_EVENT_RELEASE_CAPACITY, 0,
> > +                                     region_id, records, errp);
> > +}
> > +
> >  static void ct3_class_init(ObjectClass *oc, void *data)
> >  {
> 
> 
> > diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> > index df3511e91b..b84063d9f4 100644
> > --- a/include/hw/cxl/cxl_device.h
> > +++ b/include/hw/cxl/cxl_device.h
> > @@ -494,6 +494,7 @@ struct CXLType3Dev {
> >           */
> >          uint64_t total_capacity; /* 256M aligned */
> >          CXLDCExtentList extents;
> > +        CXLDCExtentList extents_pending;
> >          uint32_t total_extent_count;
> >          uint32_t ext_list_gen_seq;
> 
> 
> 
> >  #endif /* CXL_EVENTS_H */
> > diff --git a/qapi/cxl.json b/qapi/cxl.json
> > index 8cc4c72fa9..2645004666 100644
> > --- a/qapi/cxl.json
> > +++ b/qapi/cxl.json
> > @@ -19,13 +19,16 @@
> >  #
> >  # @fatal: Fatal Event Log
> >  #
> > +# @dyncap: Dynamic Capacity Event Log
> > +#
> >  # Since: 8.1
> >  ##
> >  { 'enum': 'CxlEventLog',
> >    'data': ['informational',
> >             'warning',
> >             'failure',
> > -           'fatal']
> > +           'fatal',
> > +           'dyncap']
> 
> Does this have the side effect of letting us inject error events
> onto the dynamic capacity log? 
> 
> >   }
> >  
> >  ##
> > @@ -361,3 +364,59 @@
> >  ##
> >  {'command': 'cxl-inject-correctable-error',
> >   'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
> ...
> 
> > +##
> > +# @cxl-add-dynamic-capacity:
> > +#
> > +# Command to start add dynamic capacity extents flow. The device will
> > +# have to acknowledged the acceptance of the extents before they are usable.
> > +#
> > +# @path: CXL DCD canonical QOM path
> > +# @region-id: id of the region where the extent to add
> > +# @extents: Extents to add
> > +#
> > +# Since : 9.0
> 
> 9.1
> 
> > +##
> > +{ 'command': 'cxl-add-dynamic-capacity',
> > +  'data': { 'path': 'str',
> > +            'region-id': 'uint8',
> > +            'extents': [ 'CXLDCExtentRecord' ]
> > +           }
> > +}
> > +
> > +##
> > +# @cxl-release-dynamic-capacity:
> > +#
> > +# Command to start release dynamic capacity extents flow. The host will
> > +# need to respond to indicate that it has released the capacity before it
> > +# is made unavailable for read and write and can be re-added.
> > +#
> > +# @path: CXL DCD canonical QOM path
> > +# @region-id: id of the region where the extent to release
> > +# @extents: Extents to release
> > +#
> > +# Since : 9.0
> 
> 9.1
> 
> > +##
> > +{ 'command': 'cxl-release-dynamic-capacity',
> > +  'data': { 'path': 'str',
> > +            'region-id': 'uint8',
> > +            'extents': [ 'CXLDCExtentRecord' ]
> > +           }
> > +}
>
fan April 15, 2024, 8:06 p.m. UTC | #5
From ce75be83e915fbc4dd6e489f976665b81174002b Mon Sep 17 00:00:00 2001
From: Fan Ni <fan.ni@samsung.com>
Date: Tue, 20 Feb 2024 09:48:31 -0800
Subject: [PATCH 09/13] hw/cxl/events: Add qmp interfaces to add/release
 dynamic capacity extents

To simulate FM functionalities for initiating Dynamic Capacity Add
(Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
add/release dynamic capacity extents requests.

With the change, we allow to release an extent only when its DPA range
is contained by a single accepted extent in the device. That is to say,
extent superset release is not supported yet.

1. Add dynamic capacity extents:

For example, the command to add two continuous extents (each 128MiB long)
to region 0 (starting at DPA offset 0) looks like below:

{ "execute": "qmp_capabilities" }

{ "execute": "cxl-add-dynamic-capacity",
  "arguments": {
      "path": "/machine/peripheral/cxl-dcd0",
      "hid": 0,
      "selection-policy": 2,
      "region-id": 0,
      "tag": "",
      "extents": [
      {
          "offset": 0,
          "len": 134217728
      },
      {
          "offset": 134217728,
          "len": 134217728
      }
      ]
  }
}

2. Release dynamic capacity extents:

For example, the command to release an extent of size 128MiB from region 0
(DPA offset 128MiB) looks like below:

{ "execute": "cxl-release-dynamic-capacity",
  "arguments": {
      "path": "/machine/peripheral/cxl-dcd0",
      "hid": 0,
      "flags": 1,
      "region-id": 0,
      "tag": "",
      "extents": [
      {
          "offset": 134217728,
          "len": 134217728
      }
      ]
  }
}

Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/cxl/cxl-mailbox-utils.c  |  65 ++++++--
 hw/mem/cxl_type3.c          | 310 +++++++++++++++++++++++++++++++++++-
 hw/mem/cxl_type3_stubs.c    |  20 +++
 include/hw/cxl/cxl_device.h |  22 +++
 include/hw/cxl/cxl_events.h |  18 +++
 qapi/cxl.json               |  69 ++++++++
 6 files changed, 491 insertions(+), 13 deletions(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index cd9092b6bf..839ae836a1 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -1405,7 +1405,7 @@ static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
  * Check whether any bit between addr[nr, nr+size) is set,
  * return true if any bit is set, otherwise return false
  */
-static bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
+bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
                               unsigned long size)
 {
     unsigned long res = find_next_bit(addr, size + nr, nr);
@@ -1444,7 +1444,7 @@ CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len)
     return NULL;
 }
 
-static void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
+void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
                                              uint64_t dpa,
                                              uint64_t len,
                                              uint8_t *tag,
@@ -1470,6 +1470,44 @@ void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
     g_free(extent);
 }
 
+/*
+ * Add a new extent to the extent "group" if group exists;
+ * otherwise, create a new group
+ * Return value: return the group where the extent is inserted.
+ */
+CXLDCExtentGroup *cxl_insert_extent_to_extent_group(CXLDCExtentGroup *group,
+                                                    uint64_t dpa,
+                                                    uint64_t len,
+                                                    uint8_t *tag,
+                                                    uint16_t shared_seq)
+{
+    if (!group) {
+        group = g_new0(CXLDCExtentGroup, 1);
+        QTAILQ_INIT(&group->list);
+    }
+    cxl_insert_extent_to_extent_list(&group->list, dpa, len,
+                                     tag, shared_seq);
+    return group;
+}
+
+void cxl_extent_group_list_insert_tail(CXLDCExtentGroupList *list,
+                                       CXLDCExtentGroup *group)
+{
+    QTAILQ_INSERT_TAIL(list, group, node);
+}
+
+void cxl_extent_group_list_delete_front(CXLDCExtentGroupList *list)
+{
+    CXLDCExtent *ent, *ent_next;
+    CXLDCExtentGroup *group = QTAILQ_FIRST(list);
+
+    QTAILQ_REMOVE(list, group, node);
+    QTAILQ_FOREACH_SAFE(ent, &group->list, node, ent_next) {
+        cxl_remove_extent_from_extent_list(&group->list, ent);
+    }
+    g_free(group);
+}
+
 /*
  * CXL r3.1 Table 8-168: Add Dynamic Capacity Response Input Payload
  * CXL r3.1 Table 8-170: Release Dynamic Capacity Input Payload
@@ -1541,6 +1579,7 @@ static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
 {
     uint32_t i;
     CXLDCExtent *ent;
+    CXLDCExtentGroup *ext_group;
     uint64_t dpa, len;
     Range range1, range2;
 
@@ -1551,9 +1590,13 @@ static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
         range_init_nofail(&range1, dpa, len);
 
         /*
-         * TODO: once the pending extent list is added, check against
-         * the list will be added here.
+         * The host-accepted DPA range must be contained by the first extent
+         * group in the pending list
          */
+        ext_group = QTAILQ_FIRST(&ct3d->dc.extents_pending);
+        if (!cxl_extents_contains_dpa_range(&ext_group->list, dpa, len)) {
+            return CXL_MBOX_INVALID_PA;
+        }
 
         /* to-be-added range should not overlap with range already accepted */
         QTAILQ_FOREACH(ent, &ct3d->dc.extents, node) {
@@ -1588,26 +1631,26 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
     CXLRetCode ret;
 
     if (in->num_entries_updated == 0) {
-        /*
-         * TODO: once the pending list is introduced, extents in the beginning
-         * will get wiped out.
-         */
+        cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
         return CXL_MBOX_SUCCESS;
     }
 
     /* Adding extents causes exceeding device's extent tracking ability. */
     if (in->num_entries_updated + ct3d->dc.total_extent_count >
         CXL_NUM_EXTENTS_SUPPORTED) {
+        cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
         return CXL_MBOX_RESOURCES_EXHAUSTED;
     }
 
     ret = cxl_detect_malformed_extent_list(ct3d, in);
     if (ret != CXL_MBOX_SUCCESS) {
+        cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
         return ret;
     }
 
     ret = cxl_dcd_add_dyn_cap_rsp_dry_run(ct3d, in);
     if (ret != CXL_MBOX_SUCCESS) {
+        cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
         return ret;
     }
 
@@ -1617,11 +1660,9 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
 
         cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
         ct3d->dc.total_extent_count += 1;
-        /*
-         * TODO: we will add a pending extent list based on event log record
-         * and process the list accordingly here.
-         */
     }
+    /* Remove the first extent group in the pending list*/
+    cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
 
     return CXL_MBOX_SUCCESS;
 }
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 2d4b6242f0..8d99b27b27 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -667,6 +667,7 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
         ct3d->dc.total_capacity += region->len;
     }
     QTAILQ_INIT(&ct3d->dc.extents);
+    QTAILQ_INIT(&ct3d->dc.extents_pending);
 
     return true;
 }
@@ -674,10 +675,19 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
 static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
 {
     CXLDCExtent *ent, *ent_next;
+    CXLDCExtentGroup *group, *group_next;
 
     QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node, ent_next) {
         cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
     }
+
+    QTAILQ_FOREACH_SAFE(group, &ct3d->dc.extents_pending, node, group_next) {
+        QTAILQ_REMOVE(&ct3d->dc.extents_pending, group, node);
+        QTAILQ_FOREACH_SAFE(ent, &group->list, node, ent_next) {
+            cxl_remove_extent_from_extent_list(&group->list, ent);
+        }
+        g_free(group);
+    }
 }
 
 static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
@@ -1442,7 +1452,6 @@ static int ct3d_qmp_cxl_event_log_enc(CxlEventLog log)
         return CXL_EVENT_TYPE_FAIL;
     case CXL_EVENT_LOG_FATAL:
         return CXL_EVENT_TYPE_FATAL;
-/* DCD not yet supported */
     default:
         return -EINVAL;
     }
@@ -1693,6 +1702,305 @@ void qmp_cxl_inject_memory_module_event(const char *path, CxlEventLog log,
     }
 }
 
+/* CXL r3.1 Table 8-50: Dynamic Capacity Event Record */
+static const QemuUUID dynamic_capacity_uuid = {
+    .data = UUID(0xca95afa7, 0xf183, 0x4018, 0x8c, 0x2f,
+                 0x95, 0x26, 0x8e, 0x10, 0x1a, 0x2a),
+};
+
+typedef enum CXLDCEventType {
+    DC_EVENT_ADD_CAPACITY = 0x0,
+    DC_EVENT_RELEASE_CAPACITY = 0x1,
+    DC_EVENT_FORCED_RELEASE_CAPACITY = 0x2,
+    DC_EVENT_REGION_CONFIG_UPDATED = 0x3,
+    DC_EVENT_ADD_CAPACITY_RSP = 0x4,
+    DC_EVENT_CAPACITY_RELEASED = 0x5,
+} CXLDCEventType;
+
+/*
+ * Check whether the range [dpa, dpa + len - 1] has overlaps with extents in
+ * the list.
+ * Return value: return true if has overlaps; otherwise, return false
+ */
+static bool cxl_extents_overlaps_dpa_range(CXLDCExtentList *list,
+                                           uint64_t dpa, uint64_t len)
+{
+    CXLDCExtent *ent;
+    Range range1, range2;
+
+    if (!list) {
+        return false;
+    }
+
+    range_init_nofail(&range1, dpa, len);
+    QTAILQ_FOREACH(ent, list, node) {
+        range_init_nofail(&range2, ent->start_dpa, ent->len);
+        if (range_overlaps_range(&range1, &range2)) {
+            return true;
+        }
+    }
+    return false;
+}
+
+/*
+ * Check whether the range [dpa, dpa + len - 1] is contained by extents in
+ * the list.
+ * Will check multiple extents containment once superset release is added.
+ * Return value: return true if range is contained; otherwise, return false
+ */
+bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
+                                    uint64_t dpa, uint64_t len)
+{
+    CXLDCExtent *ent;
+    Range range1, range2;
+
+    if (!list) {
+        return false;
+    }
+
+    range_init_nofail(&range1, dpa, len);
+    QTAILQ_FOREACH(ent, list, node) {
+        range_init_nofail(&range2, ent->start_dpa, ent->len);
+        if (range_contains_range(&range2, &range1)) {
+            return true;
+        }
+    }
+    return false;
+}
+
+static bool cxl_extent_groups_overlaps_dpa_range(CXLDCExtentGroupList *list,
+                                                uint64_t dpa, uint64_t len)
+{
+    CXLDCExtentGroup *group;
+
+    if (!list) {
+        return false;
+    }
+
+    QTAILQ_FOREACH(group, list, node) {
+        if (cxl_extents_overlaps_dpa_range(&group->list, dpa, len)) {
+            return true;
+        }
+    }
+    return false;
+}
+
+/*
+ * The main function to process dynamic capacity event with extent list.
+ * Currently DC extents add/release requests are processed.
+ */
+static void qmp_cxl_process_dynamic_capacity_prescriptive(const char *path,
+        uint16_t hid, CXLDCEventType type, uint8_t rid,
+        CXLDCExtentRecordList *records, Error **errp)
+{
+    Object *obj;
+    CXLEventDynamicCapacity dCap = {};
+    CXLEventRecordHdr *hdr = &dCap.hdr;
+    CXLType3Dev *dcd;
+    uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
+    uint32_t num_extents = 0;
+    CXLDCExtentRecordList *list;
+    CXLDCExtentGroup *group = NULL;
+    g_autofree CXLDCExtentRaw *extents = NULL;
+    uint8_t enc_log = CXL_EVENT_TYPE_DYNAMIC_CAP;
+    uint64_t dpa, offset, len, block_size;
+    g_autofree unsigned long *blk_bitmap = NULL;
+    int i;
+
+    obj = object_resolve_path_type(path, TYPE_CXL_TYPE3, NULL);
+    if (!obj) {
+        error_setg(errp, "Unable to resolve CXL type 3 device");
+        return;
+    }
+
+    dcd = CXL_TYPE3(obj);
+    if (!dcd->dc.num_regions) {
+        error_setg(errp, "No dynamic capacity support from the device");
+        return;
+    }
+
+
+    if (rid >= dcd->dc.num_regions) {
+        error_setg(errp, "region id is too large");
+        return;
+    }
+    block_size = dcd->dc.regions[rid].block_size;
+    blk_bitmap = bitmap_new(dcd->dc.regions[rid].len / block_size);
+
+    /* Sanity check and count the extents */
+    list = records;
+    while (list) {
+        offset = list->value->offset;
+        len = list->value->len;
+        dpa = offset + dcd->dc.regions[rid].base;
+
+        if (len == 0) {
+            error_setg(errp, "extent with 0 length is not allowed");
+            return;
+        }
+
+        if (offset % block_size || len % block_size) {
+            error_setg(errp, "dpa or len is not aligned to region block size");
+            return;
+        }
+
+        if (offset + len > dcd->dc.regions[rid].len) {
+            error_setg(errp, "extent range is beyond the region end");
+            return;
+        }
+
+        /* No duplicate or overlapped extents are allowed */
+        if (test_any_bits_set(blk_bitmap, offset / block_size,
+                              len / block_size)) {
+            error_setg(errp, "duplicate or overlapped extents are detected");
+            return;
+        }
+        bitmap_set(blk_bitmap, offset / block_size, len / block_size);
+
+        if (type == DC_EVENT_RELEASE_CAPACITY) {
+            if (cxl_extent_groups_overlaps_dpa_range(&dcd->dc.extents_pending,
+                                                     dpa, len)) {
+                error_setg(errp,
+                           "cannot release extent with pending DPA range");
+                return;
+            }
+            if (!cxl_extents_contains_dpa_range(&dcd->dc.extents, dpa, len)) {
+                error_setg(errp,
+                           "cannot release extent with non-existing DPA range");
+                return;
+            }
+        } else if (type == DC_EVENT_ADD_CAPACITY) {
+            if (cxl_extents_overlaps_dpa_range(&dcd->dc.extents, dpa, len)) {
+                error_setg(errp,
+                           "cannot add DPA already accessible  to the same LD");
+                return;
+            }
+        }
+        list = list->next;
+        num_extents++;
+    }
+
+    if (num_extents > 1) {
+        error_setg(errp,
+                   "TODO: remove the check once kernel support More flag");
+        return;
+    }
+
+    /* Create extent list for event being passed to host */
+    i = 0;
+    list = records;
+    extents = g_new0(CXLDCExtentRaw, num_extents);
+    while (list) {
+        offset = list->value->offset;
+        len = list->value->len;
+        dpa = dcd->dc.regions[rid].base + offset;
+
+        extents[i].start_dpa = dpa;
+        extents[i].len = len;
+        memset(extents[i].tag, 0, 0x10);
+        extents[i].shared_seq = 0;
+        if (type == DC_EVENT_ADD_CAPACITY) {
+            group = cxl_insert_extent_to_extent_group(group,
+                                                      extents[i].start_dpa,
+                                                      extents[i].len,
+                                                      extents[i].tag,
+                                                      extents[i].shared_seq);
+        }
+
+        list = list->next;
+        i++;
+    }
+    if (group) {
+        cxl_extent_group_list_insert_tail(&dcd->dc.extents_pending, group);
+    }
+
+    /*
+     * CXL r3.1 section 8.2.9.2.1.6: Dynamic Capacity Event Record
+     *
+     * All Dynamic Capacity event records shall set the Event Record Severity
+     * field in the Common Event Record Format to Informational Event. All
+     * Dynamic Capacity related events shall be logged in the Dynamic Capacity
+     * Event Log.
+     */
+    cxl_assign_event_header(hdr, &dynamic_capacity_uuid, flags, sizeof(dCap),
+                            cxl_device_get_timestamp(&dcd->cxl_dstate));
+
+    dCap.type = type;
+    /* FIXME: for now, validity flag is cleared */
+    dCap.validity_flags = 0;
+    stw_le_p(&dCap.host_id, hid);
+    /* only valid for DC_REGION_CONFIG_UPDATED event */
+    dCap.updated_region_id = 0;
+    dCap.flags = 0;
+    for (i = 0; i < num_extents; i++) {
+        memcpy(&dCap.dynamic_capacity_extent, &extents[i],
+               sizeof(CXLDCExtentRaw));
+
+        if (i < num_extents - 1) {
+            /* Set "More" flag */
+            dCap.flags |= BIT(0);
+        }
+
+        if (cxl_event_insert(&dcd->cxl_dstate, enc_log,
+                             (CXLEventRecordRaw *)&dCap)) {
+            cxl_event_irq_assert(dcd);
+        }
+    }
+}
+
+void qmp_cxl_add_dynamic_capacity(const char *path, uint16_t hid,
+                                  uint8_t sel_policy, uint8_t region_id,
+                                  const char *tag,
+                                  CXLDCExtentRecordList  *records,
+                                  Error **errp)
+{
+    enum {
+        CXL_SEL_POLICY_FREE,
+        CXL_SEL_POLICY_CONTIGUOUS,
+        CXL_SEL_POLICY_PRESCRIPTIVE,
+        CXL_SEL_POLICY_ENABLESHAREDACCESS,
+    };
+    switch (sel_policy) {
+    case CXL_SEL_POLICY_PRESCRIPTIVE:
+        qmp_cxl_process_dynamic_capacity_prescriptive(path, hid,
+                                                      DC_EVENT_ADD_CAPACITY,
+                                                      region_id, records, errp);
+        break;
+    default:
+        error_setg(errp, "Selection policy not supported");
+        break;
+    }
+}
+
+#define REMOVAL_POLICY_MASK 0xf
+#define FORCED_REMOVAL_BIT BIT(4)
+
+void qmp_cxl_release_dynamic_capacity(const char *path, uint16_t hid,
+                                      uint8_t flags, uint8_t region_id,
+                                      const char *tag,
+                                      CXLDCExtentRecordList  *records,
+                                      Error **errp)
+{
+    CXLDCEventType type = DC_EVENT_RELEASE_CAPACITY;
+
+    if (flags & FORCED_REMOVAL_BIT) {
+        /* TODO: enable forced removal in the future */
+        type = DC_EVENT_FORCED_RELEASE_CAPACITY;
+        error_setg(errp, "Forced removal not supported yet");
+        return;
+    }
+
+    switch (flags & REMOVAL_POLICY_MASK) {
+    case 1:
+        qmp_cxl_process_dynamic_capacity_prescriptive(path, hid, type,
+                                                      region_id, records, errp);
+        break;
+    default:
+        error_setg(errp, "Removal policy not supported");
+        break;
+    }
+}
+
 static void ct3_class_init(ObjectClass *oc, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(oc);
diff --git a/hw/mem/cxl_type3_stubs.c b/hw/mem/cxl_type3_stubs.c
index 3e1851e32b..810685e0d5 100644
--- a/hw/mem/cxl_type3_stubs.c
+++ b/hw/mem/cxl_type3_stubs.c
@@ -67,3 +67,23 @@ void qmp_cxl_inject_correctable_error(const char *path, CxlCorErrorType type,
 {
     error_setg(errp, "CXL Type 3 support is not compiled in");
 }
+
+void qmp_cxl_add_dynamic_capacity(const char *path,
+                                  uint16_t hid,
+                                  uint8_t sel_policy,
+                                  uint8_t region_id,
+                                  const char *tag,
+                                  CXLDCExtentRecordList  *records,
+                                  Error **errp)
+{
+    error_setg(errp, "CXL Type 3 support is not compiled in");
+}
+
+void qmp_cxl_release_dynamic_capacity(const char *path, uint16_t hid,
+                                      uint8_t flags, uint8_t region_id,
+                                      const char *tag,
+                                      CXLDCExtentRecordList  *records,
+                                      Error **errp)
+{
+    error_setg(errp, "CXL Type 3 support is not compiled in");
+}
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index df3511e91b..c69ff6b5de 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -443,6 +443,12 @@ typedef struct CXLDCExtent {
 } CXLDCExtent;
 typedef QTAILQ_HEAD(, CXLDCExtent) CXLDCExtentList;
 
+typedef struct CXLDCExtentGroup {
+    CXLDCExtentList list;
+    QTAILQ_ENTRY(CXLDCExtentGroup) node;
+} CXLDCExtentGroup;
+typedef QTAILQ_HEAD(, CXLDCExtentGroup) CXLDCExtentGroupList;
+
 typedef struct CXLDCRegion {
     uint64_t base;       /* aligned to 256*MiB */
     uint64_t decode_len; /* aligned to 256*MiB */
@@ -494,6 +500,7 @@ struct CXLType3Dev {
          */
         uint64_t total_capacity; /* 256M aligned */
         CXLDCExtentList extents;
+        CXLDCExtentGroupList extents_pending;
         uint32_t total_extent_count;
         uint32_t ext_list_gen_seq;
 
@@ -555,4 +562,19 @@ CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len);
 
 void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
                                         CXLDCExtent *extent);
+void cxl_insert_extent_to_extent_list(CXLDCExtentList *list, uint64_t dpa,
+                                      uint64_t len, uint8_t *tag,
+                                      uint16_t shared_seq);
+bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
+                       unsigned long size);
+bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
+                                    uint64_t dpa, uint64_t len);
+CXLDCExtentGroup *cxl_insert_extent_to_extent_group(CXLDCExtentGroup *group,
+                                                    uint64_t dpa,
+                                                    uint64_t len,
+                                                    uint8_t *tag,
+                                                    uint16_t shared_seq);
+void cxl_extent_group_list_insert_tail(CXLDCExtentGroupList *list,
+                                       CXLDCExtentGroup *group);
+void cxl_extent_group_list_delete_front(CXLDCExtentGroupList *list);
 #endif
diff --git a/include/hw/cxl/cxl_events.h b/include/hw/cxl/cxl_events.h
index 5170b8dbf8..38cadaa0f3 100644
--- a/include/hw/cxl/cxl_events.h
+++ b/include/hw/cxl/cxl_events.h
@@ -166,4 +166,22 @@ typedef struct CXLEventMemoryModule {
     uint8_t reserved[0x3d];
 } QEMU_PACKED CXLEventMemoryModule;
 
+/*
+ * CXL r3.1 section Table 8-50: Dynamic Capacity Event Record
+ * All fields little endian.
+ */
+typedef struct CXLEventDynamicCapacity {
+    CXLEventRecordHdr hdr;
+    uint8_t type;
+    uint8_t validity_flags;
+    uint16_t host_id;
+    uint8_t updated_region_id;
+    uint8_t flags;
+    uint8_t reserved2[2];
+    uint8_t dynamic_capacity_extent[0x28]; /* defined in cxl_device.h */
+    uint8_t reserved[0x18];
+    uint32_t extents_avail;
+    uint32_t tags_avail;
+} QEMU_PACKED CXLEventDynamicCapacity;
+
 #endif /* CXL_EVENTS_H */
diff --git a/qapi/cxl.json b/qapi/cxl.json
index 8cc4c72fa9..8cdcfce708 100644
--- a/qapi/cxl.json
+++ b/qapi/cxl.json
@@ -361,3 +361,72 @@
 ##
 {'command': 'cxl-inject-correctable-error',
  'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
+
+##
+# @CXLDCExtentRecord:
+#
+# Record of a single extent to add/release
+#
+# @offset: offset to the start of the region where the extent to be operated
+# @len: length of the extent
+#
+# Since: 9.1
+##
+{ 'struct': 'CXLDCExtentRecord',
+  'data': {
+      'offset':'uint64',
+      'len': 'uint64'
+  }
+}
+
+##
+# @cxl-add-dynamic-capacity:
+#
+# Command to start add dynamic capacity extents flow. The device will
+# have to acknowledged the acceptance of the extents before they are usable.
+#
+# @path: CXL DCD canonical QOM path
+# @hid: host id
+# @selection-policy: policy to use for selecting extents for adding capacity
+# @region-id: id of the region where the extent to add
+# @tag: Context field
+# @extents: Extents to add
+#
+# Since : 9.1
+##
+{ 'command': 'cxl-add-dynamic-capacity',
+  'data': { 'path': 'str',
+            'hid': 'uint16',
+            'selection-policy': 'uint8',
+            'region-id': 'uint8',
+            'tag': 'str',
+            'extents': [ 'CXLDCExtentRecord' ]
+           }
+}
+
+##
+# @cxl-release-dynamic-capacity:
+#
+# Command to start release dynamic capacity extents flow. The host will
+# need to respond to indicate that it has released the capacity before it
+# is made unavailable for read and write and can be re-added.
+#
+# @path: CXL DCD canonical QOM path
+# @hid: host id
+# @flags: bit[3:0] for removal policy, bit[4] for forced removal, bit[5] for
+#     sanitize on release, bit[7:6] reserved
+# @region-id: id of the region where the extent to release
+# @tag: Context field
+# @extents: Extents to release
+#
+# Since : 9.1
+##
+{ 'command': 'cxl-release-dynamic-capacity',
+  'data': { 'path': 'str',
+            'hid': 'uint16',
+            'flags': 'uint8',
+            'region-id': 'uint8',
+            'tag': 'str',
+            'extents': [ 'CXLDCExtentRecord' ]
+           }
+}
fan April 16, 2024, 4:52 p.m. UTC | #6
On Tue, Apr 16, 2024 at 03:58:22PM +0100, Jonathan Cameron wrote:
> On Mon, 15 Apr 2024 13:06:04 -0700
> fan <nifan.cxl@gmail.com> wrote:
> 
> > From ce75be83e915fbc4dd6e489f976665b81174002b Mon Sep 17 00:00:00 2001
> > From: Fan Ni <fan.ni@samsung.com>
> > Date: Tue, 20 Feb 2024 09:48:31 -0800
> > Subject: [PATCH 09/13] hw/cxl/events: Add qmp interfaces to add/release
> >  dynamic capacity extents
> > 
> > To simulate FM functionalities for initiating Dynamic Capacity Add
> > (Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
> > r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
> > add/release dynamic capacity extents requests.
> > 
> > With the change, we allow to release an extent only when its DPA range
> > is contained by a single accepted extent in the device. That is to say,
> > extent superset release is not supported yet.
> > 
> > 1. Add dynamic capacity extents:
> > 
> > For example, the command to add two continuous extents (each 128MiB long)
> > to region 0 (starting at DPA offset 0) looks like below:
> > 
> > { "execute": "qmp_capabilities" }
> > 
> > { "execute": "cxl-add-dynamic-capacity",
> >   "arguments": {
> >       "path": "/machine/peripheral/cxl-dcd0",
> >       "hid": 0,
> >       "selection-policy": 2,
> >       "region-id": 0,
> >       "tag": "",
> >       "extents": [
> >       {
> >           "offset": 0,
> >           "len": 134217728
> >       },
> >       {
> >           "offset": 134217728,
> >           "len": 134217728
> >       }
> >       ]
> >   }
> > }
> > 
> > 2. Release dynamic capacity extents:
> > 
> > For example, the command to release an extent of size 128MiB from region 0
> > (DPA offset 128MiB) looks like below:
> > 
> > { "execute": "cxl-release-dynamic-capacity",
> >   "arguments": {
> >       "path": "/machine/peripheral/cxl-dcd0",
> >       "hid": 0,
> >       "flags": 1,
> >       "region-id": 0,
> >       "tag": "",
> >       "extents": [
> >       {
> >           "offset": 134217728,
> >           "len": 134217728
> >       }
> >       ]
> >   }
> > }
> > 
> > Signed-off-by: Fan Ni <fan.ni@samsung.com>
> 
> Nice!  A few small comments inline - particularly don't be nice to the
> kernel by blocking things it doesn't understand yet ;)
> 
> Jonathan
> 
> > ---
> >  hw/cxl/cxl-mailbox-utils.c  |  65 ++++++--
> >  hw/mem/cxl_type3.c          | 310 +++++++++++++++++++++++++++++++++++-
> >  hw/mem/cxl_type3_stubs.c    |  20 +++
> >  include/hw/cxl/cxl_device.h |  22 +++
> >  include/hw/cxl/cxl_events.h |  18 +++
> >  qapi/cxl.json               |  69 ++++++++
> >  6 files changed, 491 insertions(+), 13 deletions(-)
> > 
> > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > index cd9092b6bf..839ae836a1 100644
> > --- a/hw/cxl/cxl-mailbox-utils.c
> > +++ b/hw/cxl/cxl-mailbox-utils.c
> 
> >  /*
> >   * CXL r3.1 Table 8-168: Add Dynamic Capacity Response Input Payload
> >   * CXL r3.1 Table 8-170: Release Dynamic Capacity Input Payload
> > @@ -1541,6 +1579,7 @@ static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
> >  {
> >      uint32_t i;
> >      CXLDCExtent *ent;
> > +    CXLDCExtentGroup *ext_group;
> >      uint64_t dpa, len;
> >      Range range1, range2;
> >  
> > @@ -1551,9 +1590,13 @@ static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
> >          range_init_nofail(&range1, dpa, len);
> >  
> >          /*
> > -         * TODO: once the pending extent list is added, check against
> > -         * the list will be added here.
> > +         * The host-accepted DPA range must be contained by the first extent
> > +         * group in the pending list
> >           */
> > +        ext_group = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> > +        if (!cxl_extents_contains_dpa_range(&ext_group->list, dpa, len)) {
> > +            return CXL_MBOX_INVALID_PA;
> > +        }
> >  
> >          /* to-be-added range should not overlap with range already accepted */
> >          QTAILQ_FOREACH(ent, &ct3d->dc.extents, node) {
> > @@ -1588,26 +1631,26 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
> >      CXLRetCode ret;
> >  
> >      if (in->num_entries_updated == 0) {
> > -        /*
> > -         * TODO: once the pending list is introduced, extents in the beginning
> > -         * will get wiped out.
> > -         */
> > +        cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
> >          return CXL_MBOX_SUCCESS;
> >      }
> >  
> >      /* Adding extents causes exceeding device's extent tracking ability. */
> >      if (in->num_entries_updated + ct3d->dc.total_extent_count >
> >          CXL_NUM_EXTENTS_SUPPORTED) {
> > +        cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
> >          return CXL_MBOX_RESOURCES_EXHAUSTED;
> >      }
> >  
> >      ret = cxl_detect_malformed_extent_list(ct3d, in);
> >      if (ret != CXL_MBOX_SUCCESS) {
> > +        cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
> 
> If it's a bad message from the host, I don't think the device is supposed to
> do anything with pending extents.

It is not clear to me here. 

In the spec r3.1 8.2.9.9.9.3, Add Dynamic Capacity Response (Opcode 4802h),
there is text like "After this command is received, the device is free to
reclaim capacity that the host does not utilize.", that seems to imply
as long as the response is received, we need to update the pending list
so the capacity unused can be reclaimed. But of course, we can say if
there is error, we cannot tell whether the host accepts the extents or
not so not update the pending list.

> 
> >          return ret;
> >      }
> >  
> >      ret = cxl_dcd_add_dyn_cap_rsp_dry_run(ct3d, in);
> >      if (ret != CXL_MBOX_SUCCESS) {
> > +        cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
> >          return ret;
> >      }
> 
> 
> 
> > diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> > index 2d4b6242f0..8d99b27b27 100644
> > --- a/hw/mem/cxl_type3.c
> > +++ b/hw/mem/cxl_type3.c
> 
> > +/*
> > + * The main function to process dynamic capacity event with extent list.
> > + * Currently DC extents add/release requests are processed.
> > + */
> > +static void qmp_cxl_process_dynamic_capacity_prescriptive(const char *path,
> > +        uint16_t hid, CXLDCEventType type, uint8_t rid,
> > +        CXLDCExtentRecordList *records, Error **errp)
> > +{
> > +    Object *obj;
> > +    CXLEventDynamicCapacity dCap = {};
> > +    CXLEventRecordHdr *hdr = &dCap.hdr;
> > +    CXLType3Dev *dcd;
> > +    uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
> > +    uint32_t num_extents = 0;
> > +    CXLDCExtentRecordList *list;
> > +    CXLDCExtentGroup *group = NULL;
> > +    g_autofree CXLDCExtentRaw *extents = NULL;
> > +    uint8_t enc_log = CXL_EVENT_TYPE_DYNAMIC_CAP;
> > +    uint64_t dpa, offset, len, block_size;
> > +    g_autofree unsigned long *blk_bitmap = NULL;
> > +    int i;
> > +
> > +    obj = object_resolve_path_type(path, TYPE_CXL_TYPE3, NULL);
> > +    if (!obj) {
> > +        error_setg(errp, "Unable to resolve CXL type 3 device");
> > +        return;
> > +    }
> > +
> > +    dcd = CXL_TYPE3(obj);
> > +    if (!dcd->dc.num_regions) {
> > +        error_setg(errp, "No dynamic capacity support from the device");
> > +        return;
> > +    }
> > +
> > +
> > +    if (rid >= dcd->dc.num_regions) {
> > +        error_setg(errp, "region id is too large");
> > +        return;
> > +    }
> > +    block_size = dcd->dc.regions[rid].block_size;
> > +    blk_bitmap = bitmap_new(dcd->dc.regions[rid].len / block_size);
> > +
> > +    /* Sanity check and count the extents */
> > +    list = records;
> > +    while (list) {
> > +        offset = list->value->offset;
> > +        len = list->value->len;
> > +        dpa = offset + dcd->dc.regions[rid].base;
> > +
> > +        if (len == 0) {
> > +            error_setg(errp, "extent with 0 length is not allowed");
> > +            return;
> > +        }
> > +
> > +        if (offset % block_size || len % block_size) {
> > +            error_setg(errp, "dpa or len is not aligned to region block size");
> > +            return;
> > +        }
> > +
> > +        if (offset + len > dcd->dc.regions[rid].len) {
> > +            error_setg(errp, "extent range is beyond the region end");
> > +            return;
> > +        }
> > +
> > +        /* No duplicate or overlapped extents are allowed */
> > +        if (test_any_bits_set(blk_bitmap, offset / block_size,
> > +                              len / block_size)) {
> > +            error_setg(errp, "duplicate or overlapped extents are detected");
> > +            return;
> > +        }
> > +        bitmap_set(blk_bitmap, offset / block_size, len / block_size);
> > +
> > +        if (type == DC_EVENT_RELEASE_CAPACITY) {
> > +            if (cxl_extent_groups_overlaps_dpa_range(&dcd->dc.extents_pending,
> > +                                                     dpa, len)) {
> > +                error_setg(errp,
> > +                           "cannot release extent with pending DPA range");
> > +                return;
> > +            }
> > +            if (!cxl_extents_contains_dpa_range(&dcd->dc.extents, dpa, len)) {
> > +                error_setg(errp,
> > +                           "cannot release extent with non-existing DPA range");
> > +                return;
> > +            }
> > +        } else if (type == DC_EVENT_ADD_CAPACITY) {
> > +            if (cxl_extents_overlaps_dpa_range(&dcd->dc.extents, dpa, len)) {
> > +                error_setg(errp,
> > +                           "cannot add DPA already accessible  to the same LD");
> > +                return;
> > +            }
> > +        }
> > +        list = list->next;
> > +        num_extents++;
> > +    }
> > +
> > +    if (num_extents > 1) {
> > +        error_setg(errp,
> > +                   "TODO: remove the check once kernel support More flag");
> Not our problem :)  For now we can just test the kernel by passing in single
> extents via separate commands.
> 
> I don't want to carry unnecessary limitations in qemu.
> 

Will remove the check here.

> > +        return;
> > +    }
> > +
> 
> > +
> > +#define REMOVAL_POLICY_MASK 0xf
> > +#define FORCED_REMOVAL_BIT BIT(4)
> > +
> > +void qmp_cxl_release_dynamic_capacity(const char *path, uint16_t hid,
> > +                                      uint8_t flags, uint8_t region_id,
> > +                                      const char *tag,
> > +                                      CXLDCExtentRecordList  *records,
> > +                                      Error **errp)
> > +{
> > +    CXLDCEventType type = DC_EVENT_RELEASE_CAPACITY;
> > +
> > +    if (flags & FORCED_REMOVAL_BIT) {
> > +        /* TODO: enable forced removal in the future */
> > +        type = DC_EVENT_FORCED_RELEASE_CAPACITY;
> > +        error_setg(errp, "Forced removal not supported yet");
> > +        return;
> > +    }
> > +
> > +    switch (flags & REMOVAL_POLICY_MASK) {
> > +    case 1:
> Probably benefit form a suitable define.
> 
> > +        qmp_cxl_process_dynamic_capacity_prescriptive(path, hid, type,
> > +                                                      region_id, records, errp);
> > +        break;
> 
> I'd not noticed before but might as well return from these case blocks.

Sorry. I do not follow here. What do you mean by "return from these case
blocks", are you referring the check above about the forced removal case?

Fan

> 
> > +    default:
> > +        error_setg(errp, "Removal policy not supported");
> > +        break;
> > +    }
> > +}
Gregory Price April 16, 2024, 5:14 p.m. UTC | #7
On Tue, Apr 16, 2024 at 03:58:22PM +0100, Jonathan Cameron wrote:
> On Mon, 15 Apr 2024 13:06:04 -0700
> fan <nifan.cxl@gmail.com> wrote:
> 
> > From ce75be83e915fbc4dd6e489f976665b81174002b Mon Sep 17 00:00:00 2001
> > From: Fan Ni <fan.ni@samsung.com>
> > Date: Tue, 20 Feb 2024 09:48:31 -0800
> > Subject: [PATCH 09/13] hw/cxl/events: Add qmp interfaces to add/release
> >  dynamic capacity extents
> > 
> > +
> > +    if (num_extents > 1) {
> > +        error_setg(errp,
> > +                   "TODO: remove the check once kernel support More flag");
> Not our problem :)  For now we can just test the kernel by passing in single
> extents via separate commands.
> 
> I don't want to carry unnecessary limitations in qemu.
> 

Probably worth popping in to say some out of band discussions around the
`more bit` suggest it may be a while before it is supported.

Allowing QEMU to send more bit messages to the kernel would be extremely
helpful for validation that the kernel won't blow up if/when a real
device implements it.  So yes, please allow it!

~Gregory
diff mbox series

Patch

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index a9eca516c8..7094e007b9 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -1407,7 +1407,7 @@  static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
  * Check whether any bit between addr[nr, nr+size) is set,
  * return true if any bit is set, otherwise return false
  */
-static bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
+bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
                               unsigned long size)
 {
     unsigned long res = find_next_bit(addr, size + nr, nr);
@@ -1446,7 +1446,7 @@  CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len)
     return NULL;
 }
 
-static void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
+void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
                                              uint64_t dpa,
                                              uint64_t len,
                                              uint8_t *tag,
@@ -1552,10 +1552,11 @@  static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
 
         range_init_nofail(&range1, dpa, len);
 
-        /*
-         * TODO: once the pending extent list is added, check against
-         * the list will be added here.
-         */
+        /* host-accepted DPA range must be contained by pending extent */
+        if (!cxl_extents_contains_dpa_range(&ct3d->dc.extents_pending,
+                                            dpa, len)) {
+            return CXL_MBOX_INVALID_PA;
+        }
 
         /* to-be-added range should not overlap with range already accepted */
         QTAILQ_FOREACH(ent, &ct3d->dc.extents, node) {
@@ -1585,9 +1586,13 @@  static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
     CXLDCExtentList *extent_list = &ct3d->dc.extents;
     uint32_t i;
     uint64_t dpa, len;
+    CXLDCExtent *ent;
     CXLRetCode ret;
 
     if (in->num_entries_updated == 0) {
+        /* Always remove the first pending extent when response received. */
+        ent = QTAILQ_FIRST(&ct3d->dc.extents_pending);
+        cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending, ent);
         return CXL_MBOX_SUCCESS;
     }
 
@@ -1604,6 +1609,8 @@  static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
 
     ret = cxl_dcd_add_dyn_cap_rsp_dry_run(ct3d, in);
     if (ret != CXL_MBOX_SUCCESS) {
+        ent = QTAILQ_FIRST(&ct3d->dc.extents_pending);
+        cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending, ent);
         return ret;
     }
 
@@ -1613,10 +1620,9 @@  static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
 
         cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
         ct3d->dc.total_extent_count += 1;
-        /*
-         * TODO: we will add a pending extent list based on event log record
-         * and process the list according here.
-         */
+
+        ent = QTAILQ_FIRST(&ct3d->dc.extents_pending);
+        cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending, ent);
     }
 
     return CXL_MBOX_SUCCESS;
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 951bd79a82..74cb64e843 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -674,6 +674,7 @@  static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
         ct3d->dc.total_capacity += region->len;
     }
     QTAILQ_INIT(&ct3d->dc.extents);
+    QTAILQ_INIT(&ct3d->dc.extents_pending);
 
     return true;
 }
@@ -685,6 +686,10 @@  static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
     QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node, ent_next) {
         cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
     }
+    QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents_pending, node, ent_next) {
+        cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending,
+                                           ent);
+    }
 }
 
 static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
@@ -1449,7 +1454,8 @@  static int ct3d_qmp_cxl_event_log_enc(CxlEventLog log)
         return CXL_EVENT_TYPE_FAIL;
     case CXL_EVENT_LOG_FATAL:
         return CXL_EVENT_TYPE_FATAL;
-/* DCD not yet supported */
+    case CXL_EVENT_LOG_DYNCAP:
+        return CXL_EVENT_TYPE_DYNAMIC_CAP;
     default:
         return -EINVAL;
     }
@@ -1700,6 +1706,250 @@  void qmp_cxl_inject_memory_module_event(const char *path, CxlEventLog log,
     }
 }
 
+/* CXL r3.1 Table 8-50: Dynamic Capacity Event Record */
+static const QemuUUID dynamic_capacity_uuid = {
+    .data = UUID(0xca95afa7, 0xf183, 0x4018, 0x8c, 0x2f,
+                 0x95, 0x26, 0x8e, 0x10, 0x1a, 0x2a),
+};
+
+typedef enum CXLDCEventType {
+    DC_EVENT_ADD_CAPACITY = 0x0,
+    DC_EVENT_RELEASE_CAPACITY = 0x1,
+    DC_EVENT_FORCED_RELEASE_CAPACITY = 0x2,
+    DC_EVENT_REGION_CONFIG_UPDATED = 0x3,
+    DC_EVENT_ADD_CAPACITY_RSP = 0x4,
+    DC_EVENT_CAPACITY_RELEASED = 0x5,
+} CXLDCEventType;
+
+/*
+ * Check whether the range [dpa, dpa + len -1] has overlaps with extents in
+ * the list.
+ * Return value: return true if has overlaps; otherwise, return false
+ */
+static bool cxl_extents_overlaps_dpa_range(CXLDCExtentList *list,
+                                           uint64_t dpa, uint64_t len)
+{
+    CXLDCExtent *ent;
+    Range range1, range2;
+
+    if (!list) {
+        return false;
+    }
+
+    range_init_nofail(&range1, dpa, len);
+    QTAILQ_FOREACH(ent, list, node) {
+        range_init_nofail(&range2, ent->start_dpa, ent->len);
+        if (range_overlaps_range(&range1, &range2)) {
+            return true;
+        }
+    }
+    return false;
+}
+
+/*
+ * Check whether the range [dpa, dpa + len -1] is contained by extents in
+ * the list.
+ * Will check multiple extents containment once superset release is added.
+ * Return value: return true if range is contained; otherwise, return false
+ */
+bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
+                                    uint64_t dpa, uint64_t len)
+{
+    CXLDCExtent *ent;
+    Range range1, range2;
+
+    if (!list) {
+        return false;
+    }
+
+    range_init_nofail(&range1, dpa, len);
+    QTAILQ_FOREACH(ent, list, node) {
+        range_init_nofail(&range2, ent->start_dpa, ent->len);
+        if (range_contains_range(&range2, &range1)) {
+            return true;
+        }
+    }
+    return false;
+}
+
+/*
+ * The main function to process dynamic capacity event. Currently DC extents
+ * add/release requests are processed.
+ */
+static void qmp_cxl_process_dynamic_capacity(const char *path, CxlEventLog log,
+                                             CXLDCEventType type, uint16_t hid,
+                                             uint8_t rid,
+                                             CXLDCExtentRecordList *records,
+                                             Error **errp)
+{
+    Object *obj;
+    CXLEventDynamicCapacity dCap = {};
+    CXLEventRecordHdr *hdr = &dCap.hdr;
+    CXLType3Dev *dcd;
+    uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
+    uint32_t num_extents = 0;
+    CXLDCExtentRecordList *list;
+    g_autofree CXLDCExtentRaw *extents = NULL;
+    uint8_t enc_log;
+    uint64_t dpa, offset, len, block_size;
+    int i, rc;
+    g_autofree unsigned long *blk_bitmap = NULL;
+
+    obj = object_resolve_path_type(path, TYPE_CXL_TYPE3, NULL);
+    if (!obj) {
+        error_setg(errp, "Unable to resolve CXL type 3 device");
+        return;
+    }
+
+    dcd = CXL_TYPE3(obj);
+    if (!dcd->dc.num_regions) {
+        error_setg(errp, "No dynamic capacity support from the device");
+        return;
+    }
+
+    rc = ct3d_qmp_cxl_event_log_enc(log);
+    if (rc < 0) {
+        error_setg(errp, "Unhandled error log type");
+        return;
+    }
+    enc_log = rc;
+
+    if (rid >= dcd->dc.num_regions) {
+        error_setg(errp, "region id is too large");
+        return;
+    }
+    block_size = dcd->dc.regions[rid].block_size;
+    blk_bitmap = bitmap_new(dcd->dc.regions[rid].len / block_size);
+
+    /* Sanity check and count the extents */
+    list = records;
+    while (list) {
+        offset = list->value->offset;
+        len = list->value->len;
+        dpa = offset + dcd->dc.regions[rid].base;
+
+        if (len == 0) {
+            error_setg(errp, "extent with 0 length is not allowed");
+            return;
+        }
+
+        if (offset % block_size || len % block_size) {
+            error_setg(errp, "dpa or len is not aligned to region block size");
+            return;
+        }
+
+        if (offset + len > dcd->dc.regions[rid].len) {
+            error_setg(errp, "extent range is beyond the region end");
+            return;
+        }
+
+        /* No duplicate or overlapped extents are allowed */
+        if (test_any_bits_set(blk_bitmap, offset / block_size,
+                              len / block_size)) {
+            error_setg(errp, "duplicate or overlapped extents are detected");
+            return;
+        }
+        bitmap_set(blk_bitmap, offset / block_size, len / block_size);
+
+        num_extents++;
+        if (type == DC_EVENT_RELEASE_CAPACITY) {
+            if (cxl_extents_overlaps_dpa_range(&dcd->dc.extents_pending,
+                                               dpa, len)) {
+                error_setg(errp,
+                           "cannot release extent with pending DPA range");
+                return;
+            }
+            if (!cxl_extents_contains_dpa_range(&dcd->dc.extents,
+                                                dpa, len)) {
+                error_setg(errp,
+                           "cannot release extent with non-existing DPA range");
+                return;
+            }
+        }
+        list = list->next;
+    }
+    if (num_extents == 0) {
+        error_setg(errp, "no valid extents to send to process");
+        return;
+    }
+
+    /* Create extent list for event being passed to host */
+    i = 0;
+    list = records;
+    extents = g_new0(CXLDCExtentRaw, num_extents);
+    while (list) {
+        offset = list->value->offset;
+        len = list->value->len;
+        dpa = dcd->dc.regions[rid].base + offset;
+
+        extents[i].start_dpa = dpa;
+        extents[i].len = len;
+        memset(extents[i].tag, 0, 0x10);
+        extents[i].shared_seq = 0;
+        list = list->next;
+        i++;
+    }
+
+    /*
+     * CXL r3.1 section 8.2.9.2.1.6: Dynamic Capacity Event Record
+     *
+     * All Dynamic Capacity event records shall set the Event Record Severity
+     * field in the Common Event Record Format to Informational Event. All
+     * Dynamic Capacity related events shall be logged in the Dynamic Capacity
+     * Event Log.
+     */
+    cxl_assign_event_header(hdr, &dynamic_capacity_uuid, flags, sizeof(dCap),
+                            cxl_device_get_timestamp(&dcd->cxl_dstate));
+
+    dCap.type = type;
+    /* FIXME: for now, validity flag is cleared */
+    dCap.validity_flags = 0;
+    stw_le_p(&dCap.host_id, hid);
+    /* only valid for DC_REGION_CONFIG_UPDATED event */
+    dCap.updated_region_id = 0;
+    /*
+     * FIXME: for now, the "More" flag is cleared as there is only one
+     * extent associating with each record and tag-based release is
+     * not supported.
+     */
+    dCap.flags = 0;
+    for (i = 0; i < num_extents; i++) {
+        memcpy(&dCap.dynamic_capacity_extent, &extents[i],
+               sizeof(CXLDCExtentRaw));
+
+        if (type == DC_EVENT_ADD_CAPACITY) {
+            cxl_insert_extent_to_extent_list(&dcd->dc.extents_pending,
+                                             extents[i].start_dpa,
+                                             extents[i].len,
+                                             extents[i].tag,
+                                             extents[i].shared_seq);
+        }
+
+        if (cxl_event_insert(&dcd->cxl_dstate, enc_log,
+                             (CXLEventRecordRaw *)&dCap)) {
+            cxl_event_irq_assert(dcd);
+        }
+    }
+}
+
+void qmp_cxl_add_dynamic_capacity(const char *path, uint8_t region_id,
+                                  CXLDCExtentRecordList  *records,
+                                  Error **errp)
+{
+   qmp_cxl_process_dynamic_capacity(path, CXL_EVENT_LOG_DYNCAP,
+                                    DC_EVENT_ADD_CAPACITY, 0,
+                                    region_id, records, errp);
+}
+
+void qmp_cxl_release_dynamic_capacity(const char *path, uint8_t region_id,
+                                      CXLDCExtentRecordList  *records,
+                                      Error **errp)
+{
+    qmp_cxl_process_dynamic_capacity(path, CXL_EVENT_LOG_DYNCAP,
+                                     DC_EVENT_RELEASE_CAPACITY, 0,
+                                     region_id, records, errp);
+}
+
 static void ct3_class_init(ObjectClass *oc, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(oc);
diff --git a/hw/mem/cxl_type3_stubs.c b/hw/mem/cxl_type3_stubs.c
index 3e1851e32b..d913b11b4d 100644
--- a/hw/mem/cxl_type3_stubs.c
+++ b/hw/mem/cxl_type3_stubs.c
@@ -67,3 +67,17 @@  void qmp_cxl_inject_correctable_error(const char *path, CxlCorErrorType type,
 {
     error_setg(errp, "CXL Type 3 support is not compiled in");
 }
+
+void qmp_cxl_add_dynamic_capacity(const char *path, uint8_t region_id,
+                                  CXLDCExtentRecordList  *records,
+                                  Error **errp)
+{
+    error_setg(errp, "CXL Type 3 support is not compiled in");
+}
+
+void qmp_cxl_release_dynamic_capacity(const char *path, uint8_t region_id,
+                                      CXLDCExtentRecordList  *records,
+                                      Error **errp)
+{
+    error_setg(errp, "CXL Type 3 support is not compiled in");
+}
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index df3511e91b..b84063d9f4 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -494,6 +494,7 @@  struct CXLType3Dev {
          */
         uint64_t total_capacity; /* 256M aligned */
         CXLDCExtentList extents;
+        CXLDCExtentList extents_pending;
         uint32_t total_extent_count;
         uint32_t ext_list_gen_seq;
 
@@ -555,4 +556,11 @@  CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len);
 
 void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
                                         CXLDCExtent *extent);
+void cxl_insert_extent_to_extent_list(CXLDCExtentList *list, uint64_t dpa,
+                                      uint64_t len, uint8_t *tag,
+                                      uint16_t shared_seq);
+bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
+                       unsigned long size);
+bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
+                                    uint64_t dpa, uint64_t len);
 #endif
diff --git a/include/hw/cxl/cxl_events.h b/include/hw/cxl/cxl_events.h
index 5170b8dbf8..38cadaa0f3 100644
--- a/include/hw/cxl/cxl_events.h
+++ b/include/hw/cxl/cxl_events.h
@@ -166,4 +166,22 @@  typedef struct CXLEventMemoryModule {
     uint8_t reserved[0x3d];
 } QEMU_PACKED CXLEventMemoryModule;
 
+/*
+ * CXL r3.1 section Table 8-50: Dynamic Capacity Event Record
+ * All fields little endian.
+ */
+typedef struct CXLEventDynamicCapacity {
+    CXLEventRecordHdr hdr;
+    uint8_t type;
+    uint8_t validity_flags;
+    uint16_t host_id;
+    uint8_t updated_region_id;
+    uint8_t flags;
+    uint8_t reserved2[2];
+    uint8_t dynamic_capacity_extent[0x28]; /* defined in cxl_device.h */
+    uint8_t reserved[0x18];
+    uint32_t extents_avail;
+    uint32_t tags_avail;
+} QEMU_PACKED CXLEventDynamicCapacity;
+
 #endif /* CXL_EVENTS_H */
diff --git a/qapi/cxl.json b/qapi/cxl.json
index 8cc4c72fa9..2645004666 100644
--- a/qapi/cxl.json
+++ b/qapi/cxl.json
@@ -19,13 +19,16 @@ 
 #
 # @fatal: Fatal Event Log
 #
+# @dyncap: Dynamic Capacity Event Log
+#
 # Since: 8.1
 ##
 { 'enum': 'CxlEventLog',
   'data': ['informational',
            'warning',
            'failure',
-           'fatal']
+           'fatal',
+           'dyncap']
  }
 
 ##
@@ -361,3 +364,59 @@ 
 ##
 {'command': 'cxl-inject-correctable-error',
  'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
+
+##
+# @CXLDCExtentRecord:
+#
+# Record of a single extent to add/release
+#
+# @offset: offset to the start of the region where the extent to be operated
+# @len: length of the extent
+#
+# Since: 9.0
+##
+{ 'struct': 'CXLDCExtentRecord',
+  'data': {
+      'offset':'uint64',
+      'len': 'uint64'
+  }
+}
+
+##
+# @cxl-add-dynamic-capacity:
+#
+# Command to start add dynamic capacity extents flow. The device will
+# have to acknowledged the acceptance of the extents before they are usable.
+#
+# @path: CXL DCD canonical QOM path
+# @region-id: id of the region where the extent to add
+# @extents: Extents to add
+#
+# Since : 9.0
+##
+{ 'command': 'cxl-add-dynamic-capacity',
+  'data': { 'path': 'str',
+            'region-id': 'uint8',
+            'extents': [ 'CXLDCExtentRecord' ]
+           }
+}
+
+##
+# @cxl-release-dynamic-capacity:
+#
+# Command to start release dynamic capacity extents flow. The host will
+# need to respond to indicate that it has released the capacity before it
+# is made unavailable for read and write and can be re-added.
+#
+# @path: CXL DCD canonical QOM path
+# @region-id: id of the region where the extent to release
+# @extents: Extents to release
+#
+# Since : 9.0
+##
+{ 'command': 'cxl-release-dynamic-capacity',
+  'data': { 'path': 'str',
+            'region-id': 'uint8',
+            'extents': [ 'CXLDCExtentRecord' ]
+           }
+}