Patchwork [00/35] Migration thread (20121211)

login
register
mail settings
Submitter Juan Quintela
Date Dec. 11, 2012, 12:46 p.m.
Message ID <1355230031-28233-1-git-send-email-quintela@redhat.com>
Download mbox
Permalink /patch/205200/
State New
Headers show

Pull-request

ssh://repo.or.cz/srv/git/qemu/quintela.git thread-20121211

Comments

Juan Quintela - Dec. 11, 2012, 12:46 p.m.
Hi

Changes:
- we are back to full patch series
- we reuse save_iterate() code in complete phase
- I didn't try to change the block-migration code, it should continue
   working, but has none of the optimizations.
- we split the complete phase into:
  * we write some ram into the buffer
  * we flush the buffer
  instead of doing it in a big push
- we change the iterate phase to tell the subsystem how much free space
  is in the buffer.  We should be able now to use smaller buffers.

- We "could" change easily to move back to the iterate phase if the
  complete phase is taking too long, is just the case of having to put
  a timing test into the new loop.  I didn't put it, because my
  testing showed that now we are much, much better and being inside
  the downtime target.

- Last patch is just to printing where the timing is spent in the
  complete phase, that would be removed when I ask for a pull.

- this branch includes the patches that I sent for request sooner today.

The following changes since commit 77db8657048f233edf21e1a9ebdc30a367fbdc36:

  migration: Fix madvise breakage if host and guest have different page sizes (2012-12-11 12:45:56 +0100)

are available in the git repository at:

  ssh://repo.or.cz/srv/git/qemu/quintela.git thread-20121211

for you to fetch changes up to 8c109644beeb1e9d228e0b3b2fb65adad80c3b96:


Please test & comment & review, Juan.



[20121029 version]
After discussing with Anthony and Paolo, this is the minimal migration
thread support that we can do for 1.3.

- fixed all the comments (thanks eric, paolo and orit).
- buffered_file.c remains until 1.4.
- testing for vinod showed very nice results, that is why the bitmap
  handling optimizations remain.

Note: Writes has become blocking, and I have to change the "remove"
the feature now in qemu-sockets.c.  Checked that migration was the
only user of that feature.  If new users appear, they just need to add
the socket_set_nonblock() by hand.

Please, review.

Thanks, Juan.

Juan Quintela (32):
  buffered_file: Move from using a timer to use a thread
  migration: make qemu_fopen_ops_buffered() return void
  migration: stop all cpus correctly
  migration: make writes blocking
  migration: remove unfreeze logic
  migration: take finer locking
  buffered_file: Unfold the trick to restart generating migration data
  buffered_file: don't flush on put buffer
  buffered_file: unfold buffered_append in buffered_put_buffer
  savevm: New save live migration method: pending
  migration: include qemu-file.h
  migration-fd: remove duplicate include
  migration: move buffered_file.c code into migration.c
  migration: move migration_fd_put_ready()
  migration: Inline qemu_fopen_ops_buffered into migrate_fd_connect
  migration: move migration notifier
  migration: move begining stage to the migration thread
  migration: Add buffered_flush error handling
  migration: move exit condition to migration thread
  migration: unfold rest of migrate_fd_put_ready() into thread
  ram: rename last_block to last_seen_block
  ram: Add last_sent_block
  memory: introduce memory_region_test_and_clear_dirty
  ram: Use memory_region_test_and_clear_dirty
  migration: Only go to the iterate stage if there is anything to send
  ram: optimize migration bitmap walking
  ram: account the amount of transferred ram better
  ram: refactor ram_save_block() return value
  ram: add free_space parameter to save_live functions
  ram: remove xbrle last_stage optimization
  ram: reuse ram_save_iterate() for the complete stage
  migration: print times for end phase

Paolo Bonzini (1):
  split MRU ram list

Umesh Deshpande (2):
  add a version number to ram_list
  protect the ramlist with a separate mutex

 Makefile.objs     |   2 +-
 arch_init.c       | 241 +++++++++++++++----------------
 block-migration.c |  51 ++-----
 block.c           |   6 +
 buffered_file.c   | 269 -----------------------------------
 buffered_file.h   |  22 ---
 cpu-all.h         |  13 +-
 cpus.c            |  17 +++
 exec.c            |  44 +++++-
 memory.c          |  16 +++
 memory.h          |  16 +++
 migration-exec.c  |   3 +-
 migration-fd.c    |   4 +-
 migration-tcp.c   |   3 +-
 migration-unix.c  |   3 +-
 migration.c       | 419 +++++++++++++++++++++++++++++++++++++++++-------------
 migration.h       |   4 +-
 qemu-file.h       |   5 -
 savevm.c          |  47 ++++--
 sysemu.h          |   3 +-
 vmstate.h         |   3 +-
 21 files changed, 608 insertions(+), 583 deletions(-)
 delete mode 100644 buffered_file.c
 delete mode 100644 buffered_file.h
Paolo Bonzini - Dec. 11, 2012, 2:22 p.m.
Il 11/12/2012 13:46, Juan Quintela ha scritto:
> Please test & comment & review, Juan.

Please, not this way.

This hardly takes into account all of the comments I sent to this series.

For example on 21 September I wrote about patch 4 that "usleep is broken
under IIRC Solaris (it uses signals and conflicts with qemu-timer.c).
Please use g_usleep instead."  It still doesn't do.

At the beginning of November I went through another review of this code.
 I found another small problem in patch 4 (it calls migrate_fd_close
twice).  I commented that patch 6 has a stale commit message, and
incorrectly uses a global.  It still does all of these things.

At the time, I also found that ratelimiting is currently broken
(including in the 1.3 release), and you haven't reviewed my patches to
fix that, nor included them in this series.

You are also ignoring other people's comments too.  Peter mentioned on
21 October "Maybe worth adding a comment explaining why we have the two
lists, to save people having to ferret around in the revision history
later?".  Nope.

But this is all fine.  The worst part is that you *knew* I had worked on
the migration thread code, fixing all the above and more, and redoing
the second half of your series in a way that actually removed the memory
copies and serialized memory contents outside the BQL (no doubt there
are other cleanups that may be in these series and not in mine; I grant
that).  Vinod had tested the branch successfully and Orit is already
using it as a base for further improvements.

I was going to ask you today about how to move on with the migration
thread work.  Instead I found out that we duplicated effort on fixing
the conflicts, and the review comments have fallen on the floor once more.

So, I'm not going to review this series.  I pushed my version of the
patches to github on the migration-thread-20121211 branch.  Please
consider at least reviewing it.

Paolo