diff mbox

migration: add a MAINTAINERS entry for migration

Message ID 1321114074-3681-1-git-send-email-aliguori@us.ibm.com
State New
Headers show

Commit Message

Anthony Liguori Nov. 12, 2011, 4:07 p.m. UTC
I think this is an accurate reflection of the state of migration today.  This
is the second release in a row where we're scrambling to fix a critical issue
in migration.

The first step in fixing this problem is being up front about what the current
state of the subsystem is.  If we're going to treat migration as a release
blocking feature in the future, than we need to promote the migration subsystem
above 'Odd Fixes' status.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
---
 MAINTAINERS |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

Comments

Juan Quintela Nov. 14, 2011, 5:40 p.m. UTC | #1
Anthony Liguori <aliguori@us.ibm.com> wrote:
> I think this is an accurate reflection of the state of migration today.  This
> is the second release in a row where we're scrambling to fix a critical issue
> in migration.

We need to make our mind about it.

A patch to do the reopen was posted long, long ago.  Code existed on
RHEL5 from 3 years ago.  Answer was that:
- we need to do it other way
- we need to change it inside qcow2

No suggestions for former, and internals of qcow2 are quite difficult
to grasp (at least for me).

Then my fault for not pushing more for the patches.

But then it happens again with migration with Huge Memory machines.
Series were rejected because it "only" fixed completely the stalls on
the iothread, and it don't fixed completely the problem on the vcpus.
And here we still are, we need to finish the migration thread to get
things included.

And then, we have problems with the format (that is not
comprehensible).  Took almost 2 years to convince you that we need a
"size", checksum, start/end markers.  And we got:
- on one hand, we need to have perfect solutions to get them integrated
  (huge memory patches)
- on the other hand, we can think about including patches that only fix
  one of the more minor points that we have (visitors).

So, the question is:

What we expect for migration?
- Backward compatibility: A must for corporate users -> a burden for
  everybody else
- Testing: What is that?
- Format: it is a mess, but as Avi likes to point, we would have to
  "maintain" current one temporarily  (a.k.a. forever).

To make things more interesting, lots of changes on migration touch lot
of code (i.e. not only migration*/savevm.c), and getting that patches
accepted take forever.

> The first step in fixing this problem is being up front about what the current
> state of the subsystem is.  If we're going to treat migration as a release
> blocking feature in the future, than we need to promote the migration subsystem
> above 'Odd Fixes' status.

Later, Juan.

PD.  And yes, I agree that migration is in a very sad state today.
Anthony Liguori Nov. 14, 2011, 9:08 p.m. UTC | #2
On 11/14/2011 11:40 AM, Juan Quintela wrote:
> Anthony Liguori<aliguori@us.ibm.com>  wrote:
>> I think this is an accurate reflection of the state of migration today.  This
>> is the second release in a row where we're scrambling to fix a critical issue
>> in migration.
>
> We need to make our mind about it.

Ultimately, we need to make migration a priority.  That's what I'm trying to do 
here.

The first step is to be open about the state of migration today.  I personally 
don't have the bandwidth to invest a lot of effort in migration, but I can 
invest time in trying to find more people to work on migration, and help put 
together a proper roadmap.

We need to outline and document what we support and what we don't support.  We 
need to invest in a test infrastructure.  We need a roadmap that we can 
reasonably execute on.  In short, we need to turn migration into a first class 
subsystem.

It's not about any single person or any single patch series.  It's about 
deciding that migration is an important feature and deserves more focus and 
attention.

That's the conversation I'm trying to start with this patch.

Regards,

Anthony Liguori

>
> A patch to do the reopen was posted long, long ago.  Code existed on
> RHEL5 from 3 years ago.  Answer was that:
> - we need to do it other way
> - we need to change it inside qcow2
>
> No suggestions for former, and internals of qcow2 are quite difficult
> to grasp (at least for me).
>
> Then my fault for not pushing more for the patches.
>
> But then it happens again with migration with Huge Memory machines.
> Series were rejected because it "only" fixed completely the stalls on
> the iothread, and it don't fixed completely the problem on the vcpus.
> And here we still are, we need to finish the migration thread to get
> things included.
>
> And then, we have problems with the format (that is not
> comprehensible).  Took almost 2 years to convince you that we need a
> "size", checksum, start/end markers.  And we got:
> - on one hand, we need to have perfect solutions to get them integrated
>    (huge memory patches)
> - on the other hand, we can think about including patches that only fix
>    one of the more minor points that we have (visitors).
>
> So, the question is:
>
> What we expect for migration?
> - Backward compatibility: A must for corporate users ->  a burden for
>    everybody else
> - Testing: What is that?
> - Format: it is a mess, but as Avi likes to point, we would have to
>    "maintain" current one temporarily  (a.k.a. forever).
>
> To make things more interesting, lots of changes on migration touch lot
> of code (i.e. not only migration*/savevm.c), and getting that patches
> accepted take forever.
>
>> The first step in fixing this problem is being up front about what the current
>> state of the subsystem is.  If we're going to treat migration as a release
>> blocking feature in the future, than we need to promote the migration subsystem
>> above 'Odd Fixes' status.
>
> Later, Juan.
>
> PD.  And yes, I agree that migration is in a very sad state today.
>
Stefan Hajnoczi Nov. 15, 2011, 8:32 a.m. UTC | #3
On Mon, Nov 14, 2011 at 03:08:25PM -0600, Anthony Liguori wrote:
> On 11/14/2011 11:40 AM, Juan Quintela wrote:
> >Anthony Liguori<aliguori@us.ibm.com>  wrote:
> >>I think this is an accurate reflection of the state of migration today.  This
> >>is the second release in a row where we're scrambling to fix a critical issue
> >>in migration.
> >
> >We need to make our mind about it.
> 
> Ultimately, we need to make migration a priority.  That's what I'm
> trying to do here.
> 
> The first step is to be open about the state of migration today.  I
> personally don't have the bandwidth to invest a lot of effort in
> migration, but I can invest time in trying to find more people to
> work on migration, and help put together a proper roadmap.

It would help to have a migration wiki page or document that explains
the implications of migration on QEMU code - what to look out for in
device emulation code.

Although regular QEMU contributors may know the background on
migration/save/load, it would be not only helpful for new contributors
but also a good refresher for those of us who have picked up the
assumptions around migration piecewise.

I think a good document would raise migration awareness and help us
review new patches with an eye towards correct migration behavior.

The rules need to be laid down by someone who understands migration
quite well.

Stefan
Kevin Wolf Nov. 15, 2011, 9:36 a.m. UTC | #4
Am 14.11.2011 22:08, schrieb Anthony Liguori:
> On 11/14/2011 11:40 AM, Juan Quintela wrote:
>> Anthony Liguori<aliguori@us.ibm.com>  wrote:
>>> I think this is an accurate reflection of the state of migration today.  This
>>> is the second release in a row where we're scrambling to fix a critical issue
>>> in migration.
>>
>> We need to make our mind about it.
> 
> Ultimately, we need to make migration a priority.  That's what I'm trying to do 
> here.

When you make everything a priority, being a priority doesn't have much
of a meaning any more. Our current priorities are changing the entire
device model, the monitor, migration, turning the block layer upside
down - what's left? Okay, maybe vvfat and slirp.

> The first step is to be open about the state of migration today.  I personally 
> don't have the bandwidth to invest a lot of effort in migration, but I can 
> invest time in trying to find more people to work on migration, and help put 
> together a proper roadmap.
> 
> We need to outline and document what we support and what we don't support.  We 
> need to invest in a test infrastructure.  We need a roadmap that we can 
> reasonably execute on.  In short, we need to turn migration into a first class 
> subsystem.
> 
> It's not about any single person or any single patch series.  It's about 
> deciding that migration is an important feature and deserves more focus and 
> attention.

I don't doubt that everyone will agree with this. The harder question is
who should concentrate less on which other feature to have time to spend
for migration.

Kevin
Avi Kivity Nov. 15, 2011, 1:04 p.m. UTC | #5
On 11/15/2011 10:32 AM, Stefan Hajnoczi wrote:
> It would help to have a migration wiki page or document that explains
> the implications of migration on QEMU code - what to look out for in
> device emulation code.
>
> Although regular QEMU contributors may know the background on
> migration/save/load, it would be not only helpful for new contributors
> but also a good refresher for those of us who have picked up the
> assumptions around migration piecewise.
>
> I think a good document would raise migration awareness and help us
> review new patches with an eye towards correct migration behavior.
>
> The rules need to be laid down by someone who understands migration
> quite well.
>

Good idea.  There needs to be a good explanation of what the migration
state is; I think that's the biggest obstacle.
Anthony Liguori Nov. 15, 2011, 1:45 p.m. UTC | #6
On 11/15/2011 02:32 AM, Stefan Hajnoczi wrote:
> On Mon, Nov 14, 2011 at 03:08:25PM -0600, Anthony Liguori wrote:
>> On 11/14/2011 11:40 AM, Juan Quintela wrote:
>>> Anthony Liguori<aliguori@us.ibm.com>   wrote:
>>>> I think this is an accurate reflection of the state of migration today.  This
>>>> is the second release in a row where we're scrambling to fix a critical issue
>>>> in migration.
>>>
>>> We need to make our mind about it.
>>
>> Ultimately, we need to make migration a priority.  That's what I'm
>> trying to do here.
>>
>> The first step is to be open about the state of migration today.  I
>> personally don't have the bandwidth to invest a lot of effort in
>> migration, but I can invest time in trying to find more people to
>> work on migration, and help put together a proper roadmap.
>
> It would help to have a migration wiki page or document that explains
> the implications of migration on QEMU code - what to look out for in
> device emulation code.
>
> Although regular QEMU contributors may know the background on
> migration/save/load, it would be not only helpful for new contributors
> but also a good refresher for those of us who have picked up the
> assumptions around migration piecewise.
>
> I think a good document would raise migration awareness and help us
> review new patches with an eye towards correct migration behavior.
>
> The rules need to be laid down by someone who understands migration
> quite well.

100% agreed.

I'll volunteer to start by taking the storage requirements wiki page, converting 
it to markdown, and adding it to docs/migration

Regards,

Anthony Liguori

>
> Stefan
>
Anthony Liguori Nov. 15, 2011, 1:50 p.m. UTC | #7
On 11/15/2011 03:36 AM, Kevin Wolf wrote:
> Am 14.11.2011 22:08, schrieb Anthony Liguori:
>> On 11/14/2011 11:40 AM, Juan Quintela wrote:
>>> Anthony Liguori<aliguori@us.ibm.com>   wrote:
>>>> I think this is an accurate reflection of the state of migration today.  This
>>>> is the second release in a row where we're scrambling to fix a critical issue
>>>> in migration.
>>>
>>> We need to make our mind about it.
>>
>> Ultimately, we need to make migration a priority.  That's what I'm trying to do
>> here.
>
> When you make everything a priority, being a priority doesn't have much
> of a meaning any more. Our current priorities are changing the entire
> device model, the monitor, migration, turning the block layer upside
> down - what's left? Okay, maybe vvfat and slirp.

Well, think of it as employment insurance :-)

>
>> The first step is to be open about the state of migration today.  I personally
>> don't have the bandwidth to invest a lot of effort in migration, but I can
>> invest time in trying to find more people to work on migration, and help put
>> together a proper roadmap.
>>
>> We need to outline and document what we support and what we don't support.  We
>> need to invest in a test infrastructure.  We need a roadmap that we can
>> reasonably execute on.  In short, we need to turn migration into a first class
>> subsystem.
>>
>> It's not about any single person or any single patch series.  It's about
>> deciding that migration is an important feature and deserves more focus and
>> attention.
>
> I don't doubt that everyone will agree with this. The harder question is
> who should concentrate less on which other feature to have time to spend
> for migration.

I don't think it's a question of trading patches in one subsystem for patches in 
another subsystem.

I think it's more about having a planned, concerted effort, that systematically 
tackles the problems we're facing in migration.

By spending more time planning, it makes it much easier for people to 
contribute.  There's a lot of interest in migration.  If we made it easier to 
participate in improving it, I'm sure we would attract at least a few more 
people to working on it.

Regards,

Anthony Liguori

>
> Kevin
>
diff mbox

Patch

diff --git a/MAINTAINERS b/MAINTAINERS
index 7ee301e..45b345f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -461,6 +461,11 @@  M: Anthony Liguori <aliguori@us.ibm.com>
 S: Supported
 F: vl.c
 
+Migration
+M: Anthony Liguori <aliguori@us.ibm.com>
+S: Odd Fixes
+F: migration*.c savevm.c
+
 Monitor (QMP/HMP)
 M: Luiz Capitulino <lcapitulino@redhat.com>
 M: Markus Armbruster <armbru@redhat.com>