Message ID | 1458444546-16599-1-git-send-email-jason@koelker.net |
---|---|
State | Changes Requested |
Headers | show |
I would like to request that, if accepted, this also be applied to branch-2.5. Happy Hacking! 7-11
On Sun, Mar 20, 2016 at 03:29:06AM +0000, Jason Kölker wrote: > Callers to Idl.run() should be able to specify the number of iterations > to run in one call. Instrumentation revealed that a call to Idl.run() with a > can block up to a few seconds, with most of the time being spent parsing > messages. > > Since parsing is a CPU only task eventlet and friends are unable to > to provide concurrency. Specifying a configurable number of iterations > will allow calling code to better manage concurrency. > > Signed-off-by: Jason Kölker <jason@koelker.net> A couple of seconds is really extreme. I'm actually shocked to hear that parsing 4 kB * 200 == 800 kB of data can take multiple seconds. If it takes that long, then I'd suggest an approach different from this patch. First, reduce the iteration count from 50 to something smaller. Second, figure out why JSON parsing is so slow, and fix it. stackoverflow points to some JSON push parsers: http://stackoverflow.com/questions/444380/is-there-a-streaming-api-for-json
On Wed, Mar 23, 2016 at 6:29 PM, Ben Pfaff <blp@ovn.org> wrote: > On Sun, Mar 20, 2016 at 03:29:06AM +0000, Jason Kölker wrote: >> Callers to Idl.run() should be able to specify the number of iterations >> to run in one call. Instrumentation revealed that a call to Idl.run() with a >> can block up to a few seconds, with most of the time being spent parsing >> messages. >> >> Since parsing is a CPU only task eventlet and friends are unable to >> to provide concurrency. Specifying a configurable number of iterations >> will allow calling code to better manage concurrency. >> >> Signed-off-by: Jason Kölker <jason@koelker.net> > > A couple of seconds is really extreme. I'm actually shocked to hear > that parsing 4 kB * 200 == 800 kB of data can take multiple seconds. I'm still working on tracking it all down. I *think* it has something to do with the socket stream buffering. But as we only see it from time to time on our controller and running just the Idl under eventlet (no other services) doesn't have the same behavior; its been slow getting a reproducible case. > If it takes that long, then I'd suggest an approach different from this > patch. First, reduce the iteration count from 50 to something smaller. > Second, figure out why JSON parsing is so slow, and fix it. Agreed. This was next on my list to start going through the JSON parser and instrumenting it to track it down, once I can get a good test to reproduce reliably. > stackoverflow points to some JSON push parsers: > http://stackoverflow.com/questions/444380/is-there-a-streaming-api-for-json Thanks! I'll look through the them and compare and see if there are some speed ups we can use. I briefly looked at what it would take to replace the parser with ijson. That might be the direction I try to go long term, but I'd rather not have an external dependency. I'll carry this patch internally for the time being until I can better understand whats really going on. Happy Hacking! 7-11
On Wed, Mar 23, 2016 at 2:31 PM, Jason Kölker <jason@koelker.net> wrote: > On Wed, Mar 23, 2016 at 6:29 PM, Ben Pfaff <blp@ovn.org> wrote: >> On Sun, Mar 20, 2016 at 03:29:06AM +0000, Jason Kölker wrote: >>> Callers to Idl.run() should be able to specify the number of iterations >>> to run in one call. Instrumentation revealed that a call to Idl.run() with a >>> can block up to a few seconds, with most of the time being spent parsing >>> messages. >>> >>> Since parsing is a CPU only task eventlet and friends are unable to >>> to provide concurrency. Specifying a configurable number of iterations >>> will allow calling code to better manage concurrency. >>> >>> Signed-off-by: Jason Kölker <jason@koelker.net> >> >> A couple of seconds is really extreme. I'm actually shocked to hear >> that parsing 4 kB * 200 == 800 kB of data can take multiple seconds. > > I'm still working on tracking it all down. I *think* it has something to do > with the socket stream buffering. But as we only see it from time to time > on our controller and running just the Idl under eventlet (no other services) > doesn't have the same behavior; its been slow getting a reproducible case. > >> If it takes that long, then I'd suggest an approach different from this >> patch. First, reduce the iteration count from 50 to something smaller. >> Second, figure out why JSON parsing is so slow, and fix it. > > Agreed. This was next on my list to start going through the JSON parser and > instrumenting it to track it down, once I can get a good test to > reproduce reliably. > >> stackoverflow points to some JSON push parsers: >> http://stackoverflow.com/questions/444380/is-there-a-streaming-api-for-json > > Thanks! I'll look through the them and compare and see if there are some speed > ups we can use. I briefly looked at what it would take to replace the > parser with > ijson. That might be the direction I try to go long term, but I'd > rather not have an > external dependency. I'll carry this patch internally for the time > being until I can > better understand whats really going on. I remember looking into the OVS python JSON library a while back and it was over 60x slower than most other JSON parsers, including the in-tree C-based version it is essentially a copy of. The built-in python JSON library doesn't fit the streaming use case that we need, but py-yajl was mostly a good fit architecturally. I made a pull request (https://github.com/rtyler/py-yajl/pull/38) to fix the support that we would need, but never got a response on it. The other solution I thought of at the time was just to do bindings to the C version in-tree and see how fast it was.
diff --git a/python/ovs/db/idl.py b/python/ovs/db/idl.py index e69d35e..930d92f 100644 --- a/python/ovs/db/idl.py +++ b/python/ovs/db/idl.py @@ -140,7 +140,7 @@ class Idl(object): update.""" self._session.close() - def run(self): + def run(self, iterations=50): """Processes a batch of messages from the database server. Returns True if the database as seen through the IDL changed, False if it did not change. The initial fetch of the entire contents of the remote @@ -162,7 +162,7 @@ class Idl(object): initial_change_seqno = self.change_seqno self._session.run() i = 0 - while i < 50: + while i < iterations: i += 1 if not self._session.is_connected(): break
Callers to Idl.run() should be able to specify the number of iterations to run in one call. Instrumentation revealed that a call to Idl.run() with a can block up to a few seconds, with most of the time being spent parsing messages. Since parsing is a CPU only task eventlet and friends are unable to to provide concurrency. Specifying a configurable number of iterations will allow calling code to better manage concurrency. Signed-off-by: Jason Kölker <jason@koelker.net> --- python/ovs/db/idl.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)