Patchwork [for-1.4,12/12] QAPI QMP HMP: Fix and improve memchar-read/-write docs

login
register
mail settings
Submitter Markus Armbruster
Date Feb. 5, 2013, 4:22 p.m.
Message ID <1360081335-6594-13-git-send-email-armbru@redhat.com>
Download mbox | patch
Permalink /patch/218325/
State New
Headers show

Comments

Markus Armbruster - Feb. 5, 2013, 4:22 p.m.
In particular, document the impact of our crappy UTF-8 handling on
reading.

Now the QMP examples even work.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 hmp-commands.hx  | 20 ++++++++------------
 qapi-schema.json | 37 +++++++++++++++++++++++--------------
 qemu-char.c      |  7 +++++++
 qmp-commands.hx  | 37 +++++++++++++++++++------------------
 4 files changed, 57 insertions(+), 44 deletions(-)
Eric Blake - Feb. 5, 2013, 10:49 p.m.
On 02/05/2013 09:22 AM, Markus Armbruster wrote:
> In particular, document the impact of our crappy UTF-8 handling on
> reading.
> 
> Now the QMP examples even work.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>  hmp-commands.hx  | 20 ++++++++------------
>  qapi-schema.json | 37 +++++++++++++++++++++++--------------
>  qemu-char.c      |  7 +++++++
>  qmp-commands.hx  | 37 +++++++++++++++++++------------------
>  4 files changed, 57 insertions(+), 44 deletions(-)

> +++ b/hmp-commands.hx
> @@ -844,16 +844,15 @@ ETEXI
>          .name       = "memchar_write",
>          .args_type  = "device:s,data:s",
>          .params     = "device data",
> -        .help       = "Provide writing interface for CirMemCharDriver. Write"
> -                      "'data' to it.",
> +        .help       = "Write to a memory character device",
>          .mhandler.cmd = hmp_memchar_write,
>      },
>  
>  STEXI
>  @item memchar_write @var{device} @var{data}
>  @findex memchar_write
> -Provide writing interface for CirMemCharDriver. Write @var{data}
> -to char device 'memory'.
> +Write @var{data} to memory character device @var{device}.  @var{data}
> +must be an UTF-8 string.

s/an UTF-8/a UTF-8/

Blame stupid English rules for making that one tough.  The rule of thumb
for leading 'h' and 'u' is to use 'a' when pronounced, 'an' when
relaxed, as in: 'a unicorn' vs. 'an umbrella'; 'a half hour' vs. 'an
hour'.  In this case, the acronym is generally spoken you-tee-eff-eight,
so it gets 'a' since the 'u' was pronounced.

> +Read up to @var{size} bytes from memory character device @var{device}.
> +Bug: can screw up when the buffer contains invalid UTF-8 sequences,
> +NUL characters, after the ring buffer lost data, and when reading
> +stops because the size limit is reached.

Fair enough for 1.4 since we are in hard freeze; and hopefully 1.5 makes
life better.

Since the grammar fix is minor, feel free to keep this even if you respin:

Reviewed-by: Eric Blake <eblake@redhat.com>
Markus Armbruster - Feb. 6, 2013, 7:20 a.m.
Eric Blake <eblake@redhat.com> writes:

> On 02/05/2013 09:22 AM, Markus Armbruster wrote:
>> In particular, document the impact of our crappy UTF-8 handling on
>> reading.
>> 
>> Now the QMP examples even work.
>> 
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>  hmp-commands.hx  | 20 ++++++++------------
>>  qapi-schema.json | 37 +++++++++++++++++++++++--------------
>>  qemu-char.c      |  7 +++++++
>>  qmp-commands.hx  | 37 +++++++++++++++++++------------------
>>  4 files changed, 57 insertions(+), 44 deletions(-)
>
>> +++ b/hmp-commands.hx
>> @@ -844,16 +844,15 @@ ETEXI
>>          .name       = "memchar_write",
>>          .args_type  = "device:s,data:s",
>>          .params     = "device data",
>> -        .help       = "Provide writing interface for CirMemCharDriver. Write"
>> -                      "'data' to it.",
>> +        .help       = "Write to a memory character device",
>>          .mhandler.cmd = hmp_memchar_write,
>>      },
>>  
>>  STEXI
>>  @item memchar_write @var{device} @var{data}
>>  @findex memchar_write
>> -Provide writing interface for CirMemCharDriver. Write @var{data}
>> -to char device 'memory'.
>> +Write @var{data} to memory character device @var{device}.  @var{data}
>> +must be an UTF-8 string.
>
> s/an UTF-8/a UTF-8/
>
> Blame stupid English rules for making that one tough.  The rule of thumb
> for leading 'h' and 'u' is to use 'a' when pronounced, 'an' when
> relaxed, as in: 'a unicorn' vs. 'an umbrella'; 'a half hour' vs. 'an
> hour'.  In this case, the acronym is generally spoken you-tee-eff-eight,
> so it gets 'a' since the 'u' was pronounced.

Thanks much for your attention to detail.  Will correct on respin.

>> +Read up to @var{size} bytes from memory character device @var{device}.
>> +Bug: can screw up when the buffer contains invalid UTF-8 sequences,
>> +NUL characters, after the ring buffer lost data, and when reading
>> +stops because the size limit is reached.
>
> Fair enough for 1.4 since we are in hard freeze; and hopefully 1.5 makes
> life better.
>
> Since the grammar fix is minor, feel free to keep this even if you respin:
>
> Reviewed-by: Eric Blake <eblake@redhat.com>

I agree we should tackle our Unicode issues for 1.5.

Patch

diff --git a/hmp-commands.hx b/hmp-commands.hx
index bdd48f3..ab75b95 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -844,16 +844,15 @@  ETEXI
         .name       = "memchar_write",
         .args_type  = "device:s,data:s",
         .params     = "device data",
-        .help       = "Provide writing interface for CirMemCharDriver. Write"
-                      "'data' to it.",
+        .help       = "Write to a memory character device",
         .mhandler.cmd = hmp_memchar_write,
     },
 
 STEXI
 @item memchar_write @var{device} @var{data}
 @findex memchar_write
-Provide writing interface for CirMemCharDriver. Write @var{data}
-to char device 'memory'.
+Write @var{data} to memory character device @var{device}.  @var{data}
+must be an UTF-8 string.
 
 ETEXI
 
@@ -861,20 +860,17 @@  ETEXI
         .name       = "memchar_read",
         .args_type  = "device:s,size:i",
         .params     = "device size",
-        .help       = "Provide read interface for CirMemCharDriver. Read from"
-                      "it and return the data with size.",
+        .help       = "Read from a memory character device",
         .mhandler.cmd = hmp_memchar_read,
     },
 
 STEXI
 @item memchar_read @var{device}
 @findex memchar_read
-Provide read interface for CirMemCharDriver. Read from char device
-'memory' and return the data.
-
-@var{size} is the size of data want to read from. Refer to unencoded
-size of the raw data, would adjust to the init size of the memchar
-if the requested size is larger than it.
+Read up to @var{size} bytes from memory character device @var{device}.
+Bug: can screw up when the buffer contains invalid UTF-8 sequences,
+NUL characters, after the ring buffer lost data, and when reading
+stops because the size limit is reached.
 
 ETEXI
 
diff --git a/qapi-schema.json b/qapi-schema.json
index 6f63791..f34f0a2 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -329,9 +329,9 @@ 
 #
 # An enumeration of data format.
 #
-# @utf8: The data format is 'utf8'.
+# @utf8: Data is UTF-8 encoded (RFC 3629)
 #
-# @base64: The data format is 'base64'.
+# @base64: Data is Base64 encoded (RFC 3548)
 #
 # Since: 1.4
 ##
@@ -341,15 +341,18 @@ 
 ##
 # @memchar-write:
 #
-# Provide writing interface for memchardev. Write data to char
-# device 'memory'.
+# Write to a memory character device
 #
-# @device: the name of the memory char device.
+# @device: the memory character device name.
 #
-# @data: the source data write to memchar.
+# @data: data to write
 #
-# @format: #optional the format of the data write to chardev 'memory',
-#          by default is 'utf8'.
+# @format: #optional data encoding (default 'utf8').
+#          - base64: data must be base64 encoded text.  It's binary
+#            decoding gets written.
+#          - utf8: data's UTF-8 encoding is written
+#          - data itself is always Unicode regardless of format, like
+#            any other string.
 #
 # Returns: Nothing on success
 #
@@ -362,15 +365,21 @@ 
 ##
 # @memchar-read:
 #
-# Provide read interface for memchardev. Read from the char
-# device 'memory' and return the data.
+# Read from a memory character device.
 #
-# @device: the name of the memory char device.
+# @device: the memory character device name.
 #
-# @size: the size to read in bytes.
+# @size: how many bytes to read at most
 #
-# @format: #optional the format of the data want to read from
-#          memchardev, by default is 'utf8'.
+# @format: #optional data encoding (default 'utf8').
+#          - base64: the data read is returned in base64 encoding.
+#          - utf8: the data read is interpreted as UTF-8.
+#            Bug: can screw up when the buffer contains invalid UTF-8
+#            sequences, NUL characters, after the ring buffer lost
+#            data, and when reading stops because the size limit is
+#            reached.
+#          - The return value is always Unicode regardless of format,
+#            like any other string.
 #
 # Returns: data read from the device
 #
diff --git a/qemu-char.c b/qemu-char.c
index 3123432..effde37 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -2814,6 +2814,13 @@  char *qmp_memchar_read(const char *device, int64_t size,
         data = g_base64_encode(read_data, count);
         g_free(read_data);
     } else {
+        /*
+         * FIXME should read only complete, valid UTF-8 characters up
+         * to @size bytes.  Invalid sequences should be replaced by a
+         * suitable replacement character.  Except when (and only
+         * when) ring buffer lost characters since last read, initial
+         * continuation characters should be dropped.
+         */
         read_data[count] = 0;
         data = (char *)read_data;
     }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 51ce2e6..4a2b22f 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -475,21 +475,19 @@  SQMP
 memchar-write
 -------------
 
-Provide writing interface for CirMemCharDriver. Write data to memory
-char device.
+Write to a memory character device.
 
 Arguments:
 
-- "device": the name of the char device, must be unique (json-string)
-- "data": the source data write to memory (json-string)
-- "format": the data format write to memory, default is
-            utf8. (json-string, optional)
-          - Possible values: "utf8", "base64"
+- "device": memory character device name (json-string)
+- "data": data to write (json-string)
+- "format": data format (json-string, optional)
+          - Possible values: "utf8" (default), "base64"
 
 Example:
 
 -> { "execute": "memchar-write",
-                "arguments": { "device": foo,
+                "arguments": { "device": "foo",
                                "data": "abcdefgh",
                                "format": "utf8" } }
 <- { "return": {} }
@@ -506,23 +504,26 @@  SQMP
 memchar-read
 -------------
 
-Provide read interface for CirMemCharDriver. Read from the char
-device memory and return the data with size.
+Read from a memory character device.
 
 Arguments:
 
-- "device": the name of the char device, must be unique (json-string)
-- "size": the memory size wanted to read in bytes (refer to unencoded
-          size of the raw data), would adjust to the init size of the
-          memchar if the requested size is larger than it. (json-int)
-- "format": the data format write to memchardev, default is
-            utf8. (json-string, optional)
-          - Possible values: "utf8", "base64"
+- "device": memory character device name (json-string)
+- "size": how many bytes to read at most (json-int)
+          - Number of data bytes, not number of characters in encoded data
+- "format": data format (json-string, optional)
+          - Possible values: "utf8" (default), "base64"
+          - Naturally, format "utf8" works only when the ring buffer
+            contains valid UTF-8 text.  Invalid UTF-8 sequences get
+            replaced.  Bug: replacement doesn't work.  Bug: can screw
+            up on encountering NUL characters, after the ring buffer
+            lost data, and when reading stops because the size limit
+            is reached.
 
 Example:
 
 -> { "execute": "memchar-read",
-                "arguments": { "device": foo,
+                "arguments": { "device": "foo",
                                "size": 1000,
                                "format": "utf8" } }
 <- {"return": "abcdefgh"}