diff mbox series

[ovs-dev,v2] python: Fix decoding error when the received data is larger than 4096.

Message ID 20180218125116.5032-1-ligs@dtdream.com
State Superseded
Headers show
Series [ovs-dev,v2] python: Fix decoding error when the received data is larger than 4096. | expand

Commit Message

Guoshuai Li Feb. 18, 2018, 12:51 p.m. UTC
It can only receive 4096 bytes of data each time in jsonrpc,
when there are similar and Chinese characters occupy multiple bytes,
it may receive half a character, this time the decoding will be abnormal.
We need to receive the completed character to decode.

Signed-off-by: Guoshuai Li <ligs@dtdream.com>
---
 python/ovs/jsonrpc.py | 14 +++++++++-----
 tests/ovsdb-idl.at    | 34 ++++++++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+), 5 deletions(-)

Comments

Ben Pfaff Feb. 23, 2018, 10:35 p.m. UTC | #1
On Sun, Feb 18, 2018 at 08:51:16PM +0800, Guoshuai Li wrote:
> It can only receive 4096 bytes of data each time in jsonrpc,
> when there are similar and Chinese characters occupy multiple bytes,
> it may receive half a character, this time the decoding will be abnormal.
> We need to receive the completed character to decode.
> 
> Signed-off-by: Guoshuai Li <ligs@dtdream.com>

Thanks for finding and fixing the problem.

I am worried that, with this patch, any genuine encoding errors will be
ignored.  For example, bytes 0xc0 and 0xc1 never appear in a valid UTF-8
sequence, yet I believe that this code will silently ignore them, along
with whatever other data appears along with them.  Do you have an idea
for how to deal with that?

Thanks,

Ben.
Guoshuai Li Feb. 26, 2018, 6:02 a.m. UTC | #2
> On Sun, Feb 18, 2018 at 08:51:16PM +0800, Guoshuai Li wrote:
>> It can only receive 4096 bytes of data each time in jsonrpc,
>> when there are similar and Chinese characters occupy multiple bytes,
>> it may receive half a character, this time the decoding will be abnormal.
>> We need to receive the completed character to decode.
>>
>> Signed-off-by: Guoshuai Li <ligs@dtdream.com>
> Thanks for finding and fixing the problem.
>
> I am worried that, with this patch, any genuine encoding errors will be
> ignored.  For example, bytes 0xc0 and 0xc1 never appear in a valid UTF-8
> sequence, yet I believe that this code will silently ignore them, along
> with whatever other data appears along with them.  Do you have an idea
> for how to deal with that?
>
> Thanks,
>
> Ben.

Hello Ben

I sent a new version:
https://mail.openvswitch.org/pipermail/ovs-dev/2018-February/344705.html
But I am not satisfied with it.
This patch can not solve the received data length is exactly a multiple 
of 4096, and with 0xc0 and 0xc1.
I do not know how to judge that I have received all the data.
I think I need more people's help.

Thanks~!
Ben Pfaff Feb. 26, 2018, 8:03 p.m. UTC | #3
On Mon, Feb 26, 2018 at 02:02:05PM +0800, Guoshuai Li wrote:
> >On Sun, Feb 18, 2018 at 08:51:16PM +0800, Guoshuai Li wrote:
> >>It can only receive 4096 bytes of data each time in jsonrpc,
> >>when there are similar and Chinese characters occupy multiple bytes,
> >>it may receive half a character, this time the decoding will be abnormal.
> >>We need to receive the completed character to decode.
> >>
> >>Signed-off-by: Guoshuai Li <ligs@dtdream.com>
> >Thanks for finding and fixing the problem.
> >
> >I am worried that, with this patch, any genuine encoding errors will be
> >ignored.  For example, bytes 0xc0 and 0xc1 never appear in a valid UTF-8
> >sequence, yet I believe that this code will silently ignore them, along
> >with whatever other data appears along with them.  Do you have an idea
> >for how to deal with that?
> >
> >Thanks,
> >
> >Ben.
> 
> Hello Ben
> 
> I sent a new version:
> https://mail.openvswitch.org/pipermail/ovs-dev/2018-February/344705.html
> But I am not satisfied with it.
> This patch can not solve the received data length is exactly a multiple of
> 4096, and with 0xc0 and 0xc1.
> I do not know how to judge that I have received all the data.
> I think I need more people's help.

It looks like we should be using IncrementalDecoder:
https://docs.python.org/3/library/codecs.html?highlight=decode#codecs.IncrementalDecoder
Guoshuai Li March 1, 2018, 6:31 a.m. UTC | #4
Thanks Ben, this is really a good interface, it makes the code change is 
simple.

A new patch:
https://mail.openvswitch.org/pipermail/ovs-dev/2018-March/344845.html

> It looks like we should be using IncrementalDecoder:
> https://docs.python.org/3/library/codecs.html?highlight=decode#codecs.IncrementalDecoder
Ben Pfaff March 1, 2018, 4:56 p.m. UTC | #5
Thanks!  I'll review it.

On Thu, Mar 01, 2018 at 02:31:37PM +0800, Guoshuai Li wrote:
> 
> Thanks Ben, this is really a good interface, it makes the code change is
> simple.
> 
> A new patch:
> https://mail.openvswitch.org/pipermail/ovs-dev/2018-March/344845.html
> 
> >It looks like we should be using IncrementalDecoder:
> >https://docs.python.org/3/library/codecs.html?highlight=decode#codecs.IncrementalDecoder
>
diff mbox series

Patch

diff --git a/python/ovs/jsonrpc.py b/python/ovs/jsonrpc.py
index d284190e0..a46821093 100644
--- a/python/ovs/jsonrpc.py
+++ b/python/ovs/jsonrpc.py
@@ -262,17 +262,21 @@  class Connection(object):
         if self.status:
             return self.status, None
 
+        recv_buffer = bytearray()
         while True:
             if not self.input:
+                recv_data = None
                 error, data = self.stream.recv(4096)
+                recv_buffer.extend(bytes(data))
                 # Python 3 has separate types for strings and bytes.  We
                 # received bytes from a socket.  We expect it to be string
                 # data, so we convert it here as soon as possible.
                 if data and not error:
                     try:
-                        data = data.decode('utf-8')
+                        recv_data = recv_buffer.decode('utf-8')
+                        recv_buffer = bytearray()
                     except UnicodeError:
-                        error = errno.EILSEQ
+                        continue
                 if error:
                     if (sys.platform == "win32" and
                             error == errno.WSAEWOULDBLOCK):
@@ -287,12 +291,12 @@  class Connection(object):
                                   % (self.name, os.strerror(error)))
                         self.error(error)
                         return self.status, None
-                elif not data:
+                elif not recv_data:
                     self.error(EOF)
                     return EOF, None
                 else:
-                    self.input += data
-                    self.received_bytes += len(data)
+                    self.input += recv_data
+                    self.received_bytes += len(recv_data)
             else:
                 if self.parser is None:
                     self.parser = ovs.json.Parser()
diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at
index 59b2c1991..d63b7758f 100644
--- a/tests/ovsdb-idl.at
+++ b/tests/ovsdb-idl.at
@@ -328,6 +328,40 @@  OVSDB_CHECK_IDL([simple idl, writing via IDL with unicode],
 003: done
 ]])
 
+# same as OVSDB_CHECK_IDL but uses the Python IDL implementation.
+m4_define([OVSDB_CHECK_IDL_PYN_WITH_EXPOUT],
+  [AT_SETUP([$1])
+   AT_SKIP_IF([test $7 = no])
+   AT_KEYWORDS([ovsdb server idl positive Python $5])
+   AT_CHECK([ovsdb_start_idltest])
+   m4_if([$2], [], [],
+     [AT_CHECK([ovsdb-client transact unix:socket $2], [0], [ignore], [ignore])])
+   AT_CHECK([$8 $srcdir/test-ovsdb.py  -t10 idl $srcdir/idltest.ovsschema unix:socket $3],
+            [0], [stdout], [ignore])
+   echo "$4" > expout
+   AT_CHECK([sort stdout | uuidfilt]m4_if([$6],,, [[| $6]]),
+            [0], [expout])
+   OVSDB_SERVER_SHUTDOWN
+   AT_CLEANUP])
+
+m4_define([OVSDB_CHECK_IDL_PY_WITH_EXPOUT],
+   [OVSDB_CHECK_IDL_PYN_WITH_EXPOUT([$1 - Python2], [$2], [$3], [$4], [$5], [$6],
+                                    [$HAVE_PYTHON], [$PYTHON])
+    OVSDB_CHECK_IDL_PYN_WITH_EXPOUT([$1 - Python3], [$2], [$3], [$4], [$5], [$6],
+                                    [$HAVE_PYTHON3], [$PYTHON3])])
+
+OVSDB_CHECK_IDL_PY_WITH_EXPOUT([simple idl, writing large data via IDL with unicode],
+  [['["idltest",
+      {"op": "insert",
+       "table": "simple",
+       "row": {"s": "'$(printf "测试超过四千零九十六个字节的中文字符串以使解码出现问题。%.0s" {1..50})'"}}]']],
+  [['set 0 b 1, insert 1, set 1 s '$(printf "测试超过四千零九十六个字节的中文字符串以使解码出现问题。%.0s" {1..100})'']],
+  [[000: i=0 r=0 b=false s=$(printf "测试超过四千零九十六个字节的中文字符串以使解码出现问题。%.0s" {1..50}) u=<0> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<1>
+001: commit, status=success
+002: i=0 r=0 b=true s=$(printf "测试超过四千零九十六个字节的中文字符串以使解码出现问题。%.0s" {1..50}) u=<0> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<1>
+002: i=1 r=0 b=false s=$(printf "测试超过四千零九十六个字节的中文字符串以使解码出现问题。%.0s" {1..100}) u=<0> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<2>
+003: done]])
+
 OVSDB_CHECK_IDL([simple idl, handling verification failure],
   [['["idltest",
       {"op": "insert",