Patchwork [RFC] Some more io-thread optimizations

login
register
mail settings
Submitter Jan Kiszka
Date Feb. 14, 2011, 9:50 a.m.
Message ID <4D58FAF2.6070106@siemens.com>
Download mbox | patch
Permalink /patch/83053/
State New
Headers show

Comments

Jan Kiszka - Feb. 14, 2011, 9:50 a.m.
Hi,

patch below further reduces the io-thread overhead in tcg mode so that
specifically emulating smp boxes gets noticeably faster. Its essence:
poll the file descriptors until select returns 0, keeping the global
mutex locked. This reduces ping pong with the vcpu threads, most
noticeably in tcg mode where we run in lock-step.

Split up in two patches, I'm planning to route those changes via the kvm
queue (as they collide with other patches there).

Jan

--------8<---------
Marcelo Tosatti - Feb. 15, 2011, 10:03 p.m.
On Mon, Feb 14, 2011 at 10:50:42AM +0100, Jan Kiszka wrote:
> Hi,
> 
> patch below further reduces the io-thread overhead in tcg mode so that
> specifically emulating smp boxes gets noticeably faster. Its essence:
> poll the file descriptors until select returns 0, keeping the global
> mutex locked. This reduces ping pong with the vcpu threads, most
> noticeably in tcg mode where we run in lock-step.
> 
> Split up in two patches, I'm planning to route those changes via the kvm
> queue (as they collide with other patches there).
> 
> Jan

Not sure this makes sense for all cases. There could be scenarios where
a single pass is more efficient (think latency to acquire mutex from
vcpu context in kvm mode, with intensive file IO in progress).
Jan Kiszka - Feb. 16, 2011, 8:26 a.m.
On 2011-02-15 23:03, Marcelo Tosatti wrote:
> On Mon, Feb 14, 2011 at 10:50:42AM +0100, Jan Kiszka wrote:
>> Hi,
>>
>> patch below further reduces the io-thread overhead in tcg mode so that
>> specifically emulating smp boxes gets noticeably faster. Its essence:
>> poll the file descriptors until select returns 0, keeping the global
>> mutex locked. This reduces ping pong with the vcpu threads, most
>> noticeably in tcg mode where we run in lock-step.
>>
>> Split up in two patches, I'm planning to route those changes via the kvm
>> queue (as they collide with other patches there).
>>
>> Jan
> 
> Not sure this makes sense for all cases. There could be scenarios where
> a single pass is more efficient (think latency to acquire mutex from
> vcpu context in kvm mode, with intensive file IO in progress).

Yeah, likely true. Only TCG has these insane long lock-holding times and
requires signal-based mutex handover. I will exclude KVM from this.

Jan

Patch

diff --git a/sysemu.h b/sysemu.h
index 23ae17e..0a69464 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -73,7 +73,7 @@  void cpu_synchronize_all_post_init(void);
 
 void qemu_announce_self(void);
 
-void main_loop_wait(int nonblocking);
+int main_loop_wait(int nonblocking);
 
 bool qemu_savevm_state_blocked(Monitor *mon);
 int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, int blk_enable,
diff --git a/vl.c b/vl.c
index ed2cdfa..66b7c6f 100644
--- a/vl.c
+++ b/vl.c
@@ -1311,7 +1311,7 @@  void qemu_system_powerdown_request(void)
     qemu_notify_event();
 }
 
-void main_loop_wait(int nonblocking)
+int main_loop_wait(int nonblocking)
 {
     IOHandlerRecord *ioh;
     fd_set rfds, wfds, xfds;
@@ -1356,9 +1356,16 @@  void main_loop_wait(int nonblocking)
 
     slirp_select_fill(&nfds, &rfds, &wfds, &xfds);
 
-    qemu_mutex_unlock_iothread();
+    if (timeout > 0) {
+        qemu_mutex_unlock_iothread();
+    }
+
     ret = select(nfds + 1, &rfds, &wfds, &xfds, &tv);
-    qemu_mutex_lock_iothread();
+
+    if (timeout > 0) {
+        qemu_mutex_lock_iothread();
+    }
+
     if (ret > 0) {
         IOHandlerRecord *pioh;
 
@@ -1386,6 +1393,7 @@  void main_loop_wait(int nonblocking)
        them.  */
     qemu_bh_poll();
 
+    return ret;
 }
 
 static int vm_can_run(void)
@@ -1405,6 +1413,7 @@  qemu_irq qemu_system_powerdown;
 
 static void main_loop(void)
 {
+    int last_io = 0;
     int r;
 
     qemu_main_loop_start();
@@ -1421,7 +1430,12 @@  static void main_loop(void)
 #ifdef CONFIG_PROFILER
             ti = profile_getclock();
 #endif
-            main_loop_wait(nonblocking);
+#ifdef CONFIG_IOTHREAD
+            if (last_io > 0) {
+                nonblocking = true;
+            }
+#endif
+            last_io = main_loop_wait(nonblocking);
 #ifdef CONFIG_PROFILER
             dev_time += profile_getclock() - ti;
 #endif