Patchwork [v2,6/8] block: request overlap detection

login
register
mail settings
Submitter Stefan Hajnoczi
Date Nov. 17, 2011, 1:40 p.m.
Message ID <1321537232-799-7-git-send-email-stefanha@linux.vnet.ibm.com>
Download mbox | patch
Permalink /patch/126221/
State New
Headers show

Comments

Stefan Hajnoczi - Nov. 17, 2011, 1:40 p.m.
Detect overlapping requests and remember to align to cluster boundaries
if the image format uses them.  This assumes that allocating I/O is
performed in cluster granularity - which is true for qcow2, qed, etc.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 block.c |   40 ++++++++++++++++++++++++++++++++++++++--
 1 files changed, 38 insertions(+), 2 deletions(-)
Kevin Wolf - Nov. 22, 2011, 3:06 p.m.
Am 17.11.2011 14:40, schrieb Stefan Hajnoczi:
> Detect overlapping requests and remember to align to cluster boundaries
> if the image format uses them.  This assumes that allocating I/O is
> performed in cluster granularity - which is true for qcow2, qed, etc.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>

>  static void coroutine_fn wait_for_overlapping_requests(BlockDriverState *bs,
>          int64_t sector_num, int nb_sectors)
>  {
>      BdrvTrackedRequest *req;
> +    int64_t cluster_sector_num;
> +    int cluster_nb_sectors;
>      bool retry;
>  
> +    /* If we touch the same cluster it counts as an overlap */
> +    round_to_clusters(bs, sector_num, nb_sectors,
> +                      &cluster_sector_num, &cluster_nb_sectors);

Is this really required? Image formats must be able to deal with two
concurrent write requests to the same cluster, and I don't think it
makes a difference whether it's a guest write request or a COR one.

Or does the queuing protect more than just that a COR never takes
precedence over a guest write?

Kevin
Stefan Hajnoczi - Nov. 22, 2011, 3:10 p.m.
On Tue, Nov 22, 2011 at 3:06 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 17.11.2011 14:40, schrieb Stefan Hajnoczi:
>> Detect overlapping requests and remember to align to cluster boundaries
>> if the image format uses them.  This assumes that allocating I/O is
>> performed in cluster granularity - which is true for qcow2, qed, etc.
>>
>> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>
>>  static void coroutine_fn wait_for_overlapping_requests(BlockDriverState *bs,
>>          int64_t sector_num, int nb_sectors)
>>  {
>>      BdrvTrackedRequest *req;
>> +    int64_t cluster_sector_num;
>> +    int cluster_nb_sectors;
>>      bool retry;
>>
>> +    /* If we touch the same cluster it counts as an overlap */
>> +    round_to_clusters(bs, sector_num, nb_sectors,
>> +                      &cluster_sector_num, &cluster_nb_sectors);
>
> Is this really required? Image formats must be able to deal with two
> concurrent write requests to the same cluster, and I don't think it
> makes a difference whether it's a guest write request or a COR one.
>
> Or does the queuing protect more than just that a COR never takes
> precedence over a guest write?

It guarantees that a write request and a copy-on-read request racing
for the same cluster will be serialized.  Either:
1. CoR, then write.
2. Allocating write, then normal read.

If we don't do this we risk:
3. CoR (part 1: read), allocating write, CoR (part 2: write)

The result is that we dropped the write request!

Stefan
Kevin Wolf - Nov. 22, 2011, 4:36 p.m.
Am 22.11.2011 16:10, schrieb Stefan Hajnoczi:
> On Tue, Nov 22, 2011 at 3:06 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>> Am 17.11.2011 14:40, schrieb Stefan Hajnoczi:
>>> Detect overlapping requests and remember to align to cluster boundaries
>>> if the image format uses them.  This assumes that allocating I/O is
>>> performed in cluster granularity - which is true for qcow2, qed, etc.
>>>
>>> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>>
>>>  static void coroutine_fn wait_for_overlapping_requests(BlockDriverState *bs,
>>>          int64_t sector_num, int nb_sectors)
>>>  {
>>>      BdrvTrackedRequest *req;
>>> +    int64_t cluster_sector_num;
>>> +    int cluster_nb_sectors;
>>>      bool retry;
>>>
>>> +    /* If we touch the same cluster it counts as an overlap */
>>> +    round_to_clusters(bs, sector_num, nb_sectors,
>>> +                      &cluster_sector_num, &cluster_nb_sectors);
>>
>> Is this really required? Image formats must be able to deal with two
>> concurrent write requests to the same cluster, and I don't think it
>> makes a difference whether it's a guest write request or a COR one.
>>
>> Or does the queuing protect more than just that a COR never takes
>> precedence over a guest write?
> 
> It guarantees that a write request and a copy-on-read request racing
> for the same cluster will be serialized.  Either:
> 1. CoR, then write.
> 2. Allocating write, then normal read.
> 
> If we don't do this we risk:
> 3. CoR (part 1: read), allocating write, CoR (part 2: write)
> 
> The result is that we dropped the write request!

Right, this requirement comes in with the next patch that makes COR
round clusters for optimisation. I think this relationship deserves a
comment here.

Anyway, I merged the series into the block branch (Only to notice that
it would have been better to merge bdrv_co_is_allocated first... Will
review that tomorrow.) If you send another version for this patch to add
a comment, I'll replace it.

Kevin

Patch

diff --git a/block.c b/block.c
index da7aaa2..0eef122 100644
--- a/block.c
+++ b/block.c
@@ -1133,21 +1133,57 @@  static void tracked_request_begin(BdrvTrackedRequest *req,
     QLIST_INSERT_HEAD(&bs->tracked_requests, req, list);
 }
 
+/**
+ * Round a region to cluster boundaries
+ */
+static void round_to_clusters(BlockDriverState *bs,
+                              int64_t sector_num, int nb_sectors,
+                              int64_t *cluster_sector_num,
+                              int *cluster_nb_sectors)
+{
+    BlockDriverInfo bdi;
+
+    if (bdrv_get_info(bs, &bdi) < 0 || bdi.cluster_size == 0) {
+        *cluster_sector_num = sector_num;
+        *cluster_nb_sectors = nb_sectors;
+    } else {
+        int64_t c = bdi.cluster_size / BDRV_SECTOR_SIZE;
+        *cluster_sector_num = QEMU_ALIGN_DOWN(sector_num, c);
+        *cluster_nb_sectors = QEMU_ALIGN_UP(sector_num - *cluster_sector_num +
+                                            nb_sectors, c);
+    }
+}
+
 static bool tracked_request_overlaps(BdrvTrackedRequest *req,
                                      int64_t sector_num, int nb_sectors) {
-    return false; /* not yet implemented */
+    /*        aaaa   bbbb */
+    if (sector_num >= req->sector_num + req->nb_sectors) {
+        return false;
+    }
+    /* bbbb   aaaa        */
+    if (req->sector_num >= sector_num + nb_sectors) {
+        return false;
+    }
+    return true;
 }
 
 static void coroutine_fn wait_for_overlapping_requests(BlockDriverState *bs,
         int64_t sector_num, int nb_sectors)
 {
     BdrvTrackedRequest *req;
+    int64_t cluster_sector_num;
+    int cluster_nb_sectors;
     bool retry;
 
+    /* If we touch the same cluster it counts as an overlap */
+    round_to_clusters(bs, sector_num, nb_sectors,
+                      &cluster_sector_num, &cluster_nb_sectors);
+
     do {
         retry = false;
         QLIST_FOREACH(req, &bs->tracked_requests, list) {
-            if (tracked_request_overlaps(req, sector_num, nb_sectors)) {
+            if (tracked_request_overlaps(req, cluster_sector_num,
+                                         cluster_nb_sectors)) {
                 qemu_co_queue_wait(&req->wait_queue);
                 retry = true;
                 break;