Message ID | cover.1638267778.git.huangy81@chinatelecom.cn |
---|---|
Headers | show |
Series | support dirty restraint on vCPU | expand |
On Tue, Nov 30, 2021 at 06:28:10PM +0800, huangy81@chinatelecom.cn wrote: > From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn> > > The patch [2/3] has not been touched so far. Any corrections and > suggetions are welcome. I played with it today, but the vcpu didn't got throttled as expected. What I did was starting two workload with 500mb/s, each pinned on one vcpu thread: [root@fedora ~]# pgrep -fa mig_mon 595 ./mig_mon mm_dirty 1000 500 sequential 604 ./mig_mon mm_dirty 1000 500 sequential [root@fedora ~]# taskset -pc 595 pid 595's current affinity list: 2 [root@fedora ~]# taskset -pc 604 pid 604's current affinity list: 3 Then start throttle with 100mb/s: (QEMU) set-dirty-limit cpu-index=3 dirty-rate=100 {"return": {}} (QEMU) set-dirty-limit cpu-index=2 dirty-rate=100 {"return": {}} I can see the workload dropped a tiny little bit (perhaps 500mb -> 499mb), then it keeps going.. Further throttle won't work too: (QEMU) set-dirty-limit cpu-index=2 dirty-rate=10 {"return": {}} Funnily, the ssh client got slowed down instead... :( Yong, how did you test it?
1. Start vm with kernel+initrd.img with qemu command line as following: [root@Hyman_server1 fast_qemu]# cat vm.sh #!/bin/bash /usr/bin/qemu-system-x86_64 \ -display none -vga none \ -name guest=simple_vm,debug-threads=on \ -monitor stdio \ -machine pc-i440fx-2.12 \ -accel kvm,dirty-ring-size=65536 -cpu host \ -kernel /home/work/fast_qemu/vmlinuz-5.13.0-rc4+ \ -initrd /home/work/fast_qemu/initrd-stress.img \ -append "noapic edd=off printk.time=1 noreplace-smp cgroup_disable=memory pci=noearly console=ttyS0 debug ramsize=1500 ratio=1 sleep=1" \ -chardev file,id=charserial0,path=/var/log/vm_console.log \ -serial chardev:charserial0 \ -qmp unix:/tmp/qmp-sock,server,nowait \ -D /var/log/vm.log \ --trace events=/home/work/fast_qemu/events \ -m 4096 -smp 2 -device sga 2. Enable the dirtylimit trace event which will output to /var/log/vm.log [root@Hyman_server1 fast_qemu]# cat /home/work/fast_qemu/events dirtylimit_state_init dirtylimit_vcpu dirtylimit_impose dirtyrate_do_calculate_vcpu 3. Connect the qmp server with low level qmp client and set-dirty-limit [root@Hyman_server1 my_qemu]# python3.6 ./scripts/qmp/qmp-shell -v -p /tmp/qmp-sock Welcome to the QMP low-level shell! Connected to QEMU 6.1.92 (QEMU) set-dirty-limit cpu-index=1 dirty-rate=400 { "arguments": { "cpu-index": 1, "dirty-rate": 400 }, "execute": "set-dirty-limit" } 4. observe the vcpu current dirty rate and quota dirty rate... [root@Hyman_server1 ~]# tail -f /var/log/vm.log dirtylimit_state_init dirtylimit state init: max cpus 2 dirtylimit_vcpu CPU[1] set quota dirtylimit 400 dirtylimit_impose CPU[1] impose dirtylimit: quota 400, current 0, percentage 0 dirtyrate_do_calculate_vcpu vcpu[0]: 1075 MB/s dirtyrate_do_calculate_vcpu vcpu[1]: 1061 MB/s dirtylimit_impose CPU[1] impose dirtylimit: quota 400, current 1061, percentage 62 dirtyrate_do_calculate_vcpu vcpu[0]: 1133 MB/s dirtyrate_do_calculate_vcpu vcpu[1]: 380 MB/s dirtylimit_impose CPU[1] impose dirtylimit: quota 400, current 380, percentage 57 dirtyrate_do_calculate_vcpu vcpu[0]: 1227 MB/s dirtyrate_do_calculate_vcpu vcpu[1]: 464 MB/s We can observe that vcpu-1's dirtyrate is about 400MB/s with dirty page limit set and the vcpu-0 is not affected. 5. observe the vm stress info... [root@Hyman_server1 fast_qemu]# tail -f /var/log/vm_console.log [ 0.838051] Run /init as init process [ 0.839216] with arguments: [ 0.840153] /init [ 0.840882] with environment: [ 0.841884] HOME=/ [ 0.842649] TERM=linux [ 0.843478] edd=off [ 0.844233] ramsize=1500 [ 0.845079] ratio=1 [ 0.845829] sleep=1 /init (00001): INFO: RAM 1500 MiB across 2 CPUs, ratio 1, sleep 1 us [ 1.158011] random: init: uninitialized urandom read (4096 bytes read) [ 1.448205] random: init: uninitialized urandom read (4096 bytes read) /init (00001): INFO: 1638282593684ms copied 1 GB in 00729ms /init (00110): INFO: 1638282593964ms copied 1 GB in 00719ms /init (00001): INFO: 1638282594405ms copied 1 GB in 00719ms /init (00110): INFO: 1638282594677ms copied 1 GB in 00713ms /init (00001): INFO: 1638282595093ms copied 1 GB in 00686ms /init (00110): INFO: 1638282595339ms copied 1 GB in 00662ms /init (00001): INFO: 1638282595764ms copied 1 GB in 00670m PS: the kernel and initrd images comes from: kernel image: vmlinuz-5.13.0-rc4+, normal centos vmlinuz copied from /boot directory initrd.img: initrd-stress.img, only contains a stress binary, which compiled from qemu source tests/migration/stress.c and run as init in vm. you can view README.md file of my project "met"(https://github.com/newfriday/met) to compile the initrd-stress.img. :) On 11/30/21 20:57, Peter Xu wrote: > On Tue, Nov 30, 2021 at 06:28:10PM +0800, huangy81@chinatelecom.cn wrote: >> From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn> >> >> The patch [2/3] has not been touched so far. Any corrections and >> suggetions are welcome. > > I played with it today, but the vcpu didn't got throttled as expected. > > What I did was starting two workload with 500mb/s, each pinned on one vcpu > thread: > > [root@fedora ~]# pgrep -fa mig_mon > 595 ./mig_mon mm_dirty 1000 500 sequential > 604 ./mig_mon mm_dirty 1000 500 sequential > [root@fedora ~]# taskset -pc 595 > pid 595's current affinity list: 2 > [root@fedora ~]# taskset -pc 604 > pid 604's current affinity list: 3 > > Then start throttle with 100mb/s: > > (QEMU) set-dirty-limit cpu-index=3 dirty-rate=100 > {"return": {}} > (QEMU) set-dirty-limit cpu-index=2 dirty-rate=100 > {"return": {}} > > I can see the workload dropped a tiny little bit (perhaps 500mb -> 499mb), then > it keeps going.. > > Further throttle won't work too: > > (QEMU) set-dirty-limit cpu-index=2 dirty-rate=10 > {"return": {}} > > Funnily, the ssh client got slowed down instead... :( > > Yong, how did you test it? >
On 11/30/21 22:57, Hyman Huang wrote: > 1. > Start vm with kernel+initrd.img with qemu command line as following: > > [root@Hyman_server1 fast_qemu]# cat vm.sh > #!/bin/bash > /usr/bin/qemu-system-x86_64 \ > -display none -vga none \ > -name guest=simple_vm,debug-threads=on \ > -monitor stdio \ > -machine pc-i440fx-2.12 \ > -accel kvm,dirty-ring-size=65536 -cpu host \ > -kernel /home/work/fast_qemu/vmlinuz-5.13.0-rc4+ \ > -initrd /home/work/fast_qemu/initrd-stress.img \ > -append "noapic edd=off printk.time=1 noreplace-smp > cgroup_disable=memory pci=noearly console=ttyS0 debug ramsize=1500 > ratio=1 sleep=1" \ > -chardev file,id=charserial0,path=/var/log/vm_console.log \ > -serial chardev:charserial0 \ > -qmp unix:/tmp/qmp-sock,server,nowait \ > -D /var/log/vm.log \ > --trace events=/home/work/fast_qemu/events \ > -m 4096 -smp 2 -device sga > > 2. > Enable the dirtylimit trace event which will output to /var/log/vm.log > [root@Hyman_server1 fast_qemu]# cat /home/work/fast_qemu/events > dirtylimit_state_init > dirtylimit_vcpu > dirtylimit_impose > dirtyrate_do_calculate_vcpu > > > 3. > Connect the qmp server with low level qmp client and set-dirty-limit > > [root@Hyman_server1 my_qemu]# python3.6 ./scripts/qmp/qmp-shell -v -p > /tmp/qmp-sock > > Welcome to the QMP low-level shell! > Connected to QEMU 6.1.92 > > (QEMU) set-dirty-limit cpu-index=1 dirty-rate=400 > > > { > "arguments": { > "cpu-index": 1, > "dirty-rate": 400 > }, > "execute": "set-dirty-limit" > } > > 4. > observe the vcpu current dirty rate and quota dirty rate... > > [root@Hyman_server1 ~]# tail -f /var/log/vm.log > dirtylimit_state_init dirtylimit state init: max cpus 2 > dirtylimit_vcpu CPU[1] set quota dirtylimit 400 > dirtylimit_impose CPU[1] impose dirtylimit: quota 400, current 0, > percentage 0 > dirtyrate_do_calculate_vcpu vcpu[0]: 1075 MB/s > dirtyrate_do_calculate_vcpu vcpu[1]: 1061 MB/s > dirtylimit_impose CPU[1] impose dirtylimit: quota 400, current 1061, > percentage 62 > dirtyrate_do_calculate_vcpu vcpu[0]: 1133 MB/s > dirtyrate_do_calculate_vcpu vcpu[1]: 380 MB/s > dirtylimit_impose CPU[1] impose dirtylimit: quota 400, current 380, > percentage 57 > dirtyrate_do_calculate_vcpu vcpu[0]: 1227 MB/s > dirtyrate_do_calculate_vcpu vcpu[1]: 464 MB/s > > We can observe that vcpu-1's dirtyrate is about 400MB/s with dirty page > limit set and the vcpu-0 is not affected. > > 5. > observe the vm stress info... > [root@Hyman_server1 fast_qemu]# tail -f /var/log/vm_console.log > [ 0.838051] Run /init as init process > [ 0.839216] with arguments: > [ 0.840153] /init > [ 0.840882] with environment: > [ 0.841884] HOME=/ > [ 0.842649] TERM=linux > [ 0.843478] edd=off > [ 0.844233] ramsize=1500 > [ 0.845079] ratio=1 > [ 0.845829] sleep=1 > /init (00001): INFO: RAM 1500 MiB across 2 CPUs, ratio 1, sleep 1 us > [ 1.158011] random: init: uninitialized urandom read (4096 bytes read) > [ 1.448205] random: init: uninitialized urandom read (4096 bytes read) > /init (00001): INFO: 1638282593684ms copied 1 GB in 00729ms > /init (00110): INFO: 1638282593964ms copied 1 GB in 00719ms > /init (00001): INFO: 1638282594405ms copied 1 GB in 00719ms > /init (00110): INFO: 1638282594677ms copied 1 GB in 00713ms > /init (00001): INFO: 1638282595093ms copied 1 GB in 00686ms > /init (00110): INFO: 1638282595339ms copied 1 GB in 00662ms > /init (00001): INFO: 1638282595764ms copied 1 GB in 00670m > > PS: the kernel and initrd images comes from: > > kernel image: vmlinuz-5.13.0-rc4+, normal centos vmlinuz copied from > /boot directory > > initrd.img: initrd-stress.img, only contains a stress binary, which > compiled from qemu source tests/migration/stress.c and run as init > in vm. > > you can view README.md file of my project > "met"(https://github.com/newfriday/met) to compile the > initrd-stress.img. :) > > On 11/30/21 20:57, Peter Xu wrote: >> On Tue, Nov 30, 2021 at 06:28:10PM +0800, huangy81@chinatelecom.cn wrote: >>> From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn> >>> >>> The patch [2/3] has not been touched so far. Any corrections and >>> suggetions are welcome. >> >> I played with it today, but the vcpu didn't got throttled as expected. >> >> What I did was starting two workload with 500mb/s, each pinned on one >> vcpu >> thread: >> >> [root@fedora ~]# pgrep -fa mig_mon >> 595 ./mig_mon mm_dirty 1000 500 sequential >> 604 ./mig_mon mm_dirty 1000 500 sequential >> [root@fedora ~]# taskset -pc 595 >> pid 595's current affinity list: 2 >> [root@fedora ~]# taskset -pc 604 >> pid 604's current affinity list: 3 >> >> Then start throttle with 100mb/s: >> >> (QEMU) set-dirty-limit cpu-index=3 dirty-rate=100 >> {"return": {}} >> (QEMU) set-dirty-limit cpu-index=2 dirty-rate=100 >> {"return": {}} >> >> I can see the workload dropped a tiny little bit (perhaps 500mb -> >> 499mb), then >> it keeps going.. The test step above i listed assume that dirtyrate calculated by dirtylimit_calc_func via dirty-ring is accurate, which differ from your test policy. The macro DIRTYLIMIT_CALC_TIME_MS used as calculation period in migration/dirtyrate.c has a big affect on result. So "how we define the right dirtyrate" is worth discussing. Anyway, one of our target is to improve the memory performence during migration, so i think memory write/read speed in vm is a convincing metric. I'll test the dirtyrate in the way your metioned and analyze the result. >> >> Further throttle won't work too: >> >> (QEMU) set-dirty-limit cpu-index=2 dirty-rate=10 >> {"return": {}} >> >> Funnily, the ssh client got slowed down instead... :( >> >> Yong, how did you test it? >> >
From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn> The patch [2/3] has not been touched so far. Any corrections and suggetions are welcome. Please review, thanks! v7: - rebase on master - polish the comments and error message according to the advices given by Markus - introduce dirtylimit_enabled function to pre-check if dirty page limit is enabled before canceling. v6: - rebase on master - fix dirtylimit setup crash found by Markus - polish the comments according to the advice given by Markus - adjust the qemu qmp command tag to 7.0 v5: - rebase on master - adjust the throttle algorithm by removing the tuning in RESTRAINT_RATIO case so that dirty page rate could reachs the quota more quickly. - fix percentage update in throttle iteration. v4: - rebase on master - modify the following points according to the advice given by Markus 1. move the defination into migration.json 2. polish the comments of set-dirty-limit 3. do the syntax check and change dirty rate to dirty page rate Thanks for the carefule reviews made by Markus. Please review, thanks! v3: - rebase on master - modify the following points according to the advice given by Markus 1. remove the DirtyRateQuotaVcpu and use its field as option directly 2. add comments to show details of what dirtylimit setup do 3. explain how to use dirtylimit in combination with existing qmp commands "calc-dirty-rate" and "query-dirty-rate" in documentation. Thanks for the carefule reviews made by Markus. Please review, thanks! Hyman v2: - rebase on master - modify the following points according to the advices given by Juan 1. rename dirtyrestraint to dirtylimit 2. implement the full lifecyle function of dirtylimit_calc, include dirtylimit_calc and dirtylimit_calc_quit 3. introduce 'quit' field in dirtylimit_calc_state to implement the dirtylimit_calc_quit 4. remove the ready_cond and ready_mtx since it may not be suitable 5. put the 'record_dirtypage' function code at the beggining of the file 6. remove the unnecesary return; - other modifications has been made after code review 1. introduce 'bmap' and 'nr' field in dirtylimit_state to record the number of running thread forked by dirtylimit 2. stop the dirtyrate calculation thread if all the dirtylimit thread are stopped 3. do some renaming works dirtyrate calulation thread -> dirtylimit-calc dirtylimit thread -> dirtylimit-{cpu_index} function name do_dirtyrestraint -> dirtylimit_check qmp command dirty-restraint -> set-drity-limit qmp command dirty-restraint-cancel -> cancel-dirty-limit header file dirtyrestraint.h -> dirtylimit.h Please review, thanks ! thanks for the accurate and timely advices given by Juan. we really appreciate it if corrections and suggetions about this patchset are proposed. Best Regards ! Hyman v1: this patchset introduce a mechanism to impose dirty restraint on vCPU, aiming to keep the vCPU running in a certain dirtyrate given by user. dirty restraint on vCPU maybe an alternative method to implement convergence logic for live migration, which could improve guest memory performance during migration compared with traditional method in theory. For the current live migration implementation, the convergence logic throttles all vCPUs of the VM, which has some side effects. -'read processes' on vCPU will be unnecessarily penalized - throttle increase percentage step by step, which seems struggling to find the optimal throttle percentage when dirtyrate is high. - hard to predict the remaining time of migration if the throttling percentage reachs 99% to a certain extent, the dirty restraint machnism can fix these effects by throttling at vCPU granularity during migration. the implementation is rather straightforward, we calculate vCPU dirtyrate via the Dirty Ring mechanism periodically as the commit 0e21bf246 "implement dirty-ring dirtyrate calculation" does, for vCPU that be specified to impose dirty restraint, we throttle it periodically as the auto-converge does, once after throttling, we compare the quota dirtyrate with current dirtyrate, if current dirtyrate is not under the quota, increase the throttling percentage until current dirtyrate is under the quota. this patchset is the basis of implmenting a new auto-converge method for live migration, we introduce two qmp commands for impose/cancel the dirty restraint on specified vCPU, so it also can be an independent api to supply the upper app such as libvirt, which can use it to implement the convergence logic during live migration, supplemented with the qmp 'calc-dirty-rate' command or whatever. we post this patchset for RFC and any corrections and suggetions about the implementation, api, throttleing algorithm or whatever are very appreciated! Please review, thanks ! Best Regards ! Hyman Huang (3): migration/dirtyrate: implement vCPU dirtyrate calculation periodically cpu-throttle: implement vCPU throttle cpus-common: implement dirty page limit on vCPU cpus-common.c | 48 +++++++ include/exec/memory.h | 5 +- include/hw/core/cpu.h | 9 ++ include/sysemu/cpu-throttle.h | 30 ++++ include/sysemu/dirtylimit.h | 44 ++++++ migration/dirtyrate.c | 139 +++++++++++++++++-- migration/dirtyrate.h | 2 + qapi/migration.json | 43 ++++++ softmmu/cpu-throttle.c | 316 ++++++++++++++++++++++++++++++++++++++++++ softmmu/trace-events | 5 + softmmu/vl.c | 1 + 11 files changed, 631 insertions(+), 11 deletions(-) create mode 100644 include/sysemu/dirtylimit.h