Message ID | 1321114074-3681-1-git-send-email-aliguori@us.ibm.com |
---|---|
State | New |
Headers | show |
Anthony Liguori <aliguori@us.ibm.com> wrote: > I think this is an accurate reflection of the state of migration today. This > is the second release in a row where we're scrambling to fix a critical issue > in migration. We need to make our mind about it. A patch to do the reopen was posted long, long ago. Code existed on RHEL5 from 3 years ago. Answer was that: - we need to do it other way - we need to change it inside qcow2 No suggestions for former, and internals of qcow2 are quite difficult to grasp (at least for me). Then my fault for not pushing more for the patches. But then it happens again with migration with Huge Memory machines. Series were rejected because it "only" fixed completely the stalls on the iothread, and it don't fixed completely the problem on the vcpus. And here we still are, we need to finish the migration thread to get things included. And then, we have problems with the format (that is not comprehensible). Took almost 2 years to convince you that we need a "size", checksum, start/end markers. And we got: - on one hand, we need to have perfect solutions to get them integrated (huge memory patches) - on the other hand, we can think about including patches that only fix one of the more minor points that we have (visitors). So, the question is: What we expect for migration? - Backward compatibility: A must for corporate users -> a burden for everybody else - Testing: What is that? - Format: it is a mess, but as Avi likes to point, we would have to "maintain" current one temporarily (a.k.a. forever). To make things more interesting, lots of changes on migration touch lot of code (i.e. not only migration*/savevm.c), and getting that patches accepted take forever. > The first step in fixing this problem is being up front about what the current > state of the subsystem is. If we're going to treat migration as a release > blocking feature in the future, than we need to promote the migration subsystem > above 'Odd Fixes' status. Later, Juan. PD. And yes, I agree that migration is in a very sad state today.
On 11/14/2011 11:40 AM, Juan Quintela wrote: > Anthony Liguori<aliguori@us.ibm.com> wrote: >> I think this is an accurate reflection of the state of migration today. This >> is the second release in a row where we're scrambling to fix a critical issue >> in migration. > > We need to make our mind about it. Ultimately, we need to make migration a priority. That's what I'm trying to do here. The first step is to be open about the state of migration today. I personally don't have the bandwidth to invest a lot of effort in migration, but I can invest time in trying to find more people to work on migration, and help put together a proper roadmap. We need to outline and document what we support and what we don't support. We need to invest in a test infrastructure. We need a roadmap that we can reasonably execute on. In short, we need to turn migration into a first class subsystem. It's not about any single person or any single patch series. It's about deciding that migration is an important feature and deserves more focus and attention. That's the conversation I'm trying to start with this patch. Regards, Anthony Liguori > > A patch to do the reopen was posted long, long ago. Code existed on > RHEL5 from 3 years ago. Answer was that: > - we need to do it other way > - we need to change it inside qcow2 > > No suggestions for former, and internals of qcow2 are quite difficult > to grasp (at least for me). > > Then my fault for not pushing more for the patches. > > But then it happens again with migration with Huge Memory machines. > Series were rejected because it "only" fixed completely the stalls on > the iothread, and it don't fixed completely the problem on the vcpus. > And here we still are, we need to finish the migration thread to get > things included. > > And then, we have problems with the format (that is not > comprehensible). Took almost 2 years to convince you that we need a > "size", checksum, start/end markers. And we got: > - on one hand, we need to have perfect solutions to get them integrated > (huge memory patches) > - on the other hand, we can think about including patches that only fix > one of the more minor points that we have (visitors). > > So, the question is: > > What we expect for migration? > - Backward compatibility: A must for corporate users -> a burden for > everybody else > - Testing: What is that? > - Format: it is a mess, but as Avi likes to point, we would have to > "maintain" current one temporarily (a.k.a. forever). > > To make things more interesting, lots of changes on migration touch lot > of code (i.e. not only migration*/savevm.c), and getting that patches > accepted take forever. > >> The first step in fixing this problem is being up front about what the current >> state of the subsystem is. If we're going to treat migration as a release >> blocking feature in the future, than we need to promote the migration subsystem >> above 'Odd Fixes' status. > > Later, Juan. > > PD. And yes, I agree that migration is in a very sad state today. >
On Mon, Nov 14, 2011 at 03:08:25PM -0600, Anthony Liguori wrote: > On 11/14/2011 11:40 AM, Juan Quintela wrote: > >Anthony Liguori<aliguori@us.ibm.com> wrote: > >>I think this is an accurate reflection of the state of migration today. This > >>is the second release in a row where we're scrambling to fix a critical issue > >>in migration. > > > >We need to make our mind about it. > > Ultimately, we need to make migration a priority. That's what I'm > trying to do here. > > The first step is to be open about the state of migration today. I > personally don't have the bandwidth to invest a lot of effort in > migration, but I can invest time in trying to find more people to > work on migration, and help put together a proper roadmap. It would help to have a migration wiki page or document that explains the implications of migration on QEMU code - what to look out for in device emulation code. Although regular QEMU contributors may know the background on migration/save/load, it would be not only helpful for new contributors but also a good refresher for those of us who have picked up the assumptions around migration piecewise. I think a good document would raise migration awareness and help us review new patches with an eye towards correct migration behavior. The rules need to be laid down by someone who understands migration quite well. Stefan
Am 14.11.2011 22:08, schrieb Anthony Liguori: > On 11/14/2011 11:40 AM, Juan Quintela wrote: >> Anthony Liguori<aliguori@us.ibm.com> wrote: >>> I think this is an accurate reflection of the state of migration today. This >>> is the second release in a row where we're scrambling to fix a critical issue >>> in migration. >> >> We need to make our mind about it. > > Ultimately, we need to make migration a priority. That's what I'm trying to do > here. When you make everything a priority, being a priority doesn't have much of a meaning any more. Our current priorities are changing the entire device model, the monitor, migration, turning the block layer upside down - what's left? Okay, maybe vvfat and slirp. > The first step is to be open about the state of migration today. I personally > don't have the bandwidth to invest a lot of effort in migration, but I can > invest time in trying to find more people to work on migration, and help put > together a proper roadmap. > > We need to outline and document what we support and what we don't support. We > need to invest in a test infrastructure. We need a roadmap that we can > reasonably execute on. In short, we need to turn migration into a first class > subsystem. > > It's not about any single person or any single patch series. It's about > deciding that migration is an important feature and deserves more focus and > attention. I don't doubt that everyone will agree with this. The harder question is who should concentrate less on which other feature to have time to spend for migration. Kevin
On 11/15/2011 10:32 AM, Stefan Hajnoczi wrote: > It would help to have a migration wiki page or document that explains > the implications of migration on QEMU code - what to look out for in > device emulation code. > > Although regular QEMU contributors may know the background on > migration/save/load, it would be not only helpful for new contributors > but also a good refresher for those of us who have picked up the > assumptions around migration piecewise. > > I think a good document would raise migration awareness and help us > review new patches with an eye towards correct migration behavior. > > The rules need to be laid down by someone who understands migration > quite well. > Good idea. There needs to be a good explanation of what the migration state is; I think that's the biggest obstacle.
On 11/15/2011 02:32 AM, Stefan Hajnoczi wrote: > On Mon, Nov 14, 2011 at 03:08:25PM -0600, Anthony Liguori wrote: >> On 11/14/2011 11:40 AM, Juan Quintela wrote: >>> Anthony Liguori<aliguori@us.ibm.com> wrote: >>>> I think this is an accurate reflection of the state of migration today. This >>>> is the second release in a row where we're scrambling to fix a critical issue >>>> in migration. >>> >>> We need to make our mind about it. >> >> Ultimately, we need to make migration a priority. That's what I'm >> trying to do here. >> >> The first step is to be open about the state of migration today. I >> personally don't have the bandwidth to invest a lot of effort in >> migration, but I can invest time in trying to find more people to >> work on migration, and help put together a proper roadmap. > > It would help to have a migration wiki page or document that explains > the implications of migration on QEMU code - what to look out for in > device emulation code. > > Although regular QEMU contributors may know the background on > migration/save/load, it would be not only helpful for new contributors > but also a good refresher for those of us who have picked up the > assumptions around migration piecewise. > > I think a good document would raise migration awareness and help us > review new patches with an eye towards correct migration behavior. > > The rules need to be laid down by someone who understands migration > quite well. 100% agreed. I'll volunteer to start by taking the storage requirements wiki page, converting it to markdown, and adding it to docs/migration Regards, Anthony Liguori > > Stefan >
On 11/15/2011 03:36 AM, Kevin Wolf wrote: > Am 14.11.2011 22:08, schrieb Anthony Liguori: >> On 11/14/2011 11:40 AM, Juan Quintela wrote: >>> Anthony Liguori<aliguori@us.ibm.com> wrote: >>>> I think this is an accurate reflection of the state of migration today. This >>>> is the second release in a row where we're scrambling to fix a critical issue >>>> in migration. >>> >>> We need to make our mind about it. >> >> Ultimately, we need to make migration a priority. That's what I'm trying to do >> here. > > When you make everything a priority, being a priority doesn't have much > of a meaning any more. Our current priorities are changing the entire > device model, the monitor, migration, turning the block layer upside > down - what's left? Okay, maybe vvfat and slirp. Well, think of it as employment insurance :-) > >> The first step is to be open about the state of migration today. I personally >> don't have the bandwidth to invest a lot of effort in migration, but I can >> invest time in trying to find more people to work on migration, and help put >> together a proper roadmap. >> >> We need to outline and document what we support and what we don't support. We >> need to invest in a test infrastructure. We need a roadmap that we can >> reasonably execute on. In short, we need to turn migration into a first class >> subsystem. >> >> It's not about any single person or any single patch series. It's about >> deciding that migration is an important feature and deserves more focus and >> attention. > > I don't doubt that everyone will agree with this. The harder question is > who should concentrate less on which other feature to have time to spend > for migration. I don't think it's a question of trading patches in one subsystem for patches in another subsystem. I think it's more about having a planned, concerted effort, that systematically tackles the problems we're facing in migration. By spending more time planning, it makes it much easier for people to contribute. There's a lot of interest in migration. If we made it easier to participate in improving it, I'm sure we would attract at least a few more people to working on it. Regards, Anthony Liguori > > Kevin >
diff --git a/MAINTAINERS b/MAINTAINERS index 7ee301e..45b345f 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -461,6 +461,11 @@ M: Anthony Liguori <aliguori@us.ibm.com> S: Supported F: vl.c +Migration +M: Anthony Liguori <aliguori@us.ibm.com> +S: Odd Fixes +F: migration*.c savevm.c + Monitor (QMP/HMP) M: Luiz Capitulino <lcapitulino@redhat.com> M: Markus Armbruster <armbru@redhat.com>
I think this is an accurate reflection of the state of migration today. This is the second release in a row where we're scrambling to fix a critical issue in migration. The first step in fixing this problem is being up front about what the current state of the subsystem is. If we're going to treat migration as a release blocking feature in the future, than we need to promote the migration subsystem above 'Odd Fixes' status. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> --- MAINTAINERS | 5 +++++ 1 files changed, 5 insertions(+), 0 deletions(-)