diff mbox

[ovs-dev] python: Allow specifying the iterations in Idl.run

Message ID 1458444546-16599-1-git-send-email-jason@koelker.net
State Changes Requested
Headers show

Commit Message

Jason Kölker March 20, 2016, 3:29 a.m. UTC
Callers to Idl.run() should be able to specify the number of iterations
to run in one call. Instrumentation revealed that a call to Idl.run() with a
can block up to a few seconds, with most of the time being spent parsing
messages.

Since parsing is a CPU only task eventlet and friends are unable to
to provide concurrency. Specifying a configurable number of iterations
will allow calling code to better manage concurrency.

Signed-off-by: Jason Kölker <jason@koelker.net>
---
 python/ovs/db/idl.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Jason Kölker March 20, 2016, 3:32 a.m. UTC | #1
I would like to request that, if accepted, this also be applied to branch-2.5.

Happy Hacking!

7-11
Ben Pfaff March 23, 2016, 6:29 p.m. UTC | #2
On Sun, Mar 20, 2016 at 03:29:06AM +0000, Jason Kölker wrote:
> Callers to Idl.run() should be able to specify the number of iterations
> to run in one call. Instrumentation revealed that a call to Idl.run() with a
> can block up to a few seconds, with most of the time being spent parsing
> messages.
> 
> Since parsing is a CPU only task eventlet and friends are unable to
> to provide concurrency. Specifying a configurable number of iterations
> will allow calling code to better manage concurrency.
> 
> Signed-off-by: Jason Kölker <jason@koelker.net>

A couple of seconds is really extreme.  I'm actually shocked to hear
that parsing 4 kB * 200 == 800 kB of data can take multiple seconds.  If
it takes that long, then I'd suggest an approach different from this
patch.  First, reduce the iteration count from 50 to something smaller.
Second, figure out why JSON parsing is so slow, and fix it.

stackoverflow points to some JSON push parsers:
http://stackoverflow.com/questions/444380/is-there-a-streaming-api-for-json
Jason Kölker March 23, 2016, 7:31 p.m. UTC | #3
On Wed, Mar 23, 2016 at 6:29 PM, Ben Pfaff <blp@ovn.org> wrote:
> On Sun, Mar 20, 2016 at 03:29:06AM +0000, Jason Kölker wrote:
>> Callers to Idl.run() should be able to specify the number of iterations
>> to run in one call. Instrumentation revealed that a call to Idl.run() with a
>> can block up to a few seconds, with most of the time being spent parsing
>> messages.
>>
>> Since parsing is a CPU only task eventlet and friends are unable to
>> to provide concurrency. Specifying a configurable number of iterations
>> will allow calling code to better manage concurrency.
>>
>> Signed-off-by: Jason Kölker <jason@koelker.net>
>
> A couple of seconds is really extreme.  I'm actually shocked to hear
> that parsing 4 kB * 200 == 800 kB of data can take multiple seconds.

I'm still working on tracking it all down. I *think* it has something to do
with the socket stream buffering. But as we only see it from time to time
on our controller and running just the Idl under eventlet (no other services)
doesn't have the same behavior; its been slow getting a reproducible case.

>  If it takes that long, then I'd suggest an approach different from this
> patch.  First, reduce the iteration count from 50 to something smaller.
> Second, figure out why JSON parsing is so slow, and fix it.

Agreed. This was next on my list to start going through the JSON parser and
instrumenting it to track it down, once I can get a good test to
reproduce reliably.

> stackoverflow points to some JSON push parsers:
> http://stackoverflow.com/questions/444380/is-there-a-streaming-api-for-json

Thanks! I'll look through the them and compare and see if there are some speed
ups we can use. I briefly looked at what it would take to replace the
parser with
ijson. That might be the direction I try to go long term, but I'd
rather not have an
external dependency. I'll carry this patch internally for the time
being until I can
better understand whats really going on.

Happy Hacking!

7-11
Terry Wilson March 24, 2016, 6:05 p.m. UTC | #4
On Wed, Mar 23, 2016 at 2:31 PM, Jason Kölker <jason@koelker.net> wrote:
> On Wed, Mar 23, 2016 at 6:29 PM, Ben Pfaff <blp@ovn.org> wrote:
>> On Sun, Mar 20, 2016 at 03:29:06AM +0000, Jason Kölker wrote:
>>> Callers to Idl.run() should be able to specify the number of iterations
>>> to run in one call. Instrumentation revealed that a call to Idl.run() with a
>>> can block up to a few seconds, with most of the time being spent parsing
>>> messages.
>>>
>>> Since parsing is a CPU only task eventlet and friends are unable to
>>> to provide concurrency. Specifying a configurable number of iterations
>>> will allow calling code to better manage concurrency.
>>>
>>> Signed-off-by: Jason Kölker <jason@koelker.net>
>>
>> A couple of seconds is really extreme.  I'm actually shocked to hear
>> that parsing 4 kB * 200 == 800 kB of data can take multiple seconds.
>
> I'm still working on tracking it all down. I *think* it has something to do
> with the socket stream buffering. But as we only see it from time to time
> on our controller and running just the Idl under eventlet (no other services)
> doesn't have the same behavior; its been slow getting a reproducible case.
>
>>  If it takes that long, then I'd suggest an approach different from this
>> patch.  First, reduce the iteration count from 50 to something smaller.
>> Second, figure out why JSON parsing is so slow, and fix it.
>
> Agreed. This was next on my list to start going through the JSON parser and
> instrumenting it to track it down, once I can get a good test to
> reproduce reliably.
>
>> stackoverflow points to some JSON push parsers:
>> http://stackoverflow.com/questions/444380/is-there-a-streaming-api-for-json
>
> Thanks! I'll look through the them and compare and see if there are some speed
> ups we can use. I briefly looked at what it would take to replace the
> parser with
> ijson. That might be the direction I try to go long term, but I'd
> rather not have an
> external dependency. I'll carry this patch internally for the time
> being until I can
> better understand whats really going on.

I remember looking into the OVS python JSON library a while back and
it was over 60x slower than most other JSON parsers, including the
in-tree C-based version it is essentially a copy of. The built-in
python JSON library doesn't fit the streaming use case that we need,
but py-yajl was mostly a good fit architecturally. I made a pull
request (https://github.com/rtyler/py-yajl/pull/38) to fix the support
that we would need, but never got a response on it. The other solution
I thought of at the time was just to do bindings to the C version
in-tree and see how fast it was.
diff mbox

Patch

diff --git a/python/ovs/db/idl.py b/python/ovs/db/idl.py
index e69d35e..930d92f 100644
--- a/python/ovs/db/idl.py
+++ b/python/ovs/db/idl.py
@@ -140,7 +140,7 @@  class Idl(object):
         update."""
         self._session.close()
 
-    def run(self):
+    def run(self, iterations=50):
         """Processes a batch of messages from the database server.  Returns
         True if the database as seen through the IDL changed, False if it did
         not change.  The initial fetch of the entire contents of the remote
@@ -162,7 +162,7 @@  class Idl(object):
         initial_change_seqno = self.change_seqno
         self._session.run()
         i = 0
-        while i < 50:
+        while i < iterations:
             i += 1
             if not self._session.is_connected():
                 break