Message ID | 20180425112723.1111-1-quintela@redhat.com |
---|---|
Headers | show |
Series | Multifd | expand |
Juan Quintela <quintela@redhat.com> wrote: > Hi > > > [v12] > > Big news, it is not RFC anymore, it works reliabely for me. > > Changes: > - Locknig changed completely (several times) > - We now send all pages through the channels. In a 2GB guest with 1 disk and a network card, the amount of data send for RAM was 80KB. > - This is not optimized yet, but it shouws clear improvements over precopy. testing over localhost networking I can guet: > - 2 VCPUs guest > - 2GB RAM > - runn stress --vm 4 --vm 500GB (i.e. dirtying 2GB or RAM each second) > > - Total time: precopy ~50seconds, multifd around 11seconds > - Bandwidth usage is around 273MB/s vs 71MB/s on the same hardware > > This is very preleminary testing, will send more numbers when I got them. But looks promissing. > > Things that will be improved later: > - Initial synchronization is too slow (around 1s) > - We synchronize all threads after each RAM section, we can move to only > synchronize them after we have done a bitmap syncrhronization > - We can improve bitmap walking (but that is independent of multifd) I forgot to put there that on the last 4 patches, I have not been able to split them in a way that: - is logical for review - works for multifd tests in all versions So, I ended trynig to get the "logical" viewe, and it works after the last patch. Why is that? - Before I am able to transmit data, I need to be able to end/synchronize the different channels - To finish channels in case of error, I just close the channels But I can't opet then yet. I have to think if I can come with a simpler way to split it, but you can also consider that the last 3-4 patches are a single one. Later, Juan.
On Wed, Apr 25, 2018 at 01:27:02PM +0200, Juan Quintela wrote: > > Hi > > > [v12] > > Big news, it is not RFC anymore, it works reliabely for me. > > Changes: > - Locknig changed completely (several times) > - We now send all pages through the channels. In a 2GB guest with 1 disk and a network card, the amount of data send for RAM was 80KB. > - This is not optimized yet, but it shouws clear improvements over precopy. testing over localhost networking I can guet: > - 2 VCPUs guest > - 2GB RAM > - runn stress --vm 4 --vm 500GB (i.e. dirtying 2GB or RAM each second) > > - Total time: precopy ~50seconds, multifd around 11seconds > - Bandwidth usage is around 273MB/s vs 71MB/s on the same hardware > > This is very preleminary testing, will send more numbers when I got them. But looks promissing. > > Things that will be improved later: > - Initial synchronization is too slow (around 1s) > - We synchronize all threads after each RAM section, we can move to only > synchronize them after we have done a bitmap syncrhronization > - We can improve bitmap walking (but that is independent of multifd) Hi, Juan, I got some high level review comments and notes: - This series may need to rebase after Guangrong's cleanup series. - Looks like now we allow multifd and compression be enabled together. Shall we restrict on that? - Is multifd only for TCP? If so, do we check against that? E.g., should we fail the unix/fd/exec migrations when multifd is enabled? - Why init sync is slow (1s)? Is there any clue of that problem? - Currently the sync between threads are still very complicated to me... we have these on the sender side (I didn't dig the recv side): - two global semaphores in multifd_send_state, - one mutex and two semaphores in each of the send thread, So in total we'll have 2+3*N such locks/sems. I'm thinking whether we can further simplify the sync logic a bit... Thanks,
* Juan Quintela (quintela@redhat.com) wrote: > Juan Quintela <quintela@redhat.com> wrote: > > Hi > > > > > > [v12] > > > > Big news, it is not RFC anymore, it works reliabely for me. > > > > Changes: > > - Locknig changed completely (several times) > > - We now send all pages through the channels. In a 2GB guest with 1 disk and a network card, the amount of data send for RAM was 80KB. > > - This is not optimized yet, but it shouws clear improvements over precopy. testing over localhost networking I can guet: > > - 2 VCPUs guest > > - 2GB RAM > > - runn stress --vm 4 --vm 500GB (i.e. dirtying 2GB or RAM each second) > > > > - Total time: precopy ~50seconds, multifd around 11seconds > > - Bandwidth usage is around 273MB/s vs 71MB/s on the same hardware > > > > This is very preleminary testing, will send more numbers when I got them. But looks promissing. > > > > Things that will be improved later: > > - Initial synchronization is too slow (around 1s) > > - We synchronize all threads after each RAM section, we can move to only > > synchronize them after we have done a bitmap syncrhronization > > - We can improve bitmap walking (but that is independent of multifd) > > I forgot to put there that on the last 4 patches, I have not been able > to split them in a way that: > - is logical for review > - works for multifd tests in all versions > > So, I ended trynig to get the "logical" viewe, and it works after the > last patch. Why is that? > - Before I am able to transmit data, I need to be able to > end/synchronize the different channels > - To finish channels in case of error, I just close the channels > But I can't opet then yet. > > I have to think if I can come with a simpler way to split it, but you > can also consider that the last 3-4 patches are a single one. I think most of the last few can be flattened into earlier patches; I'd prefer it rather than having patches that add stuff and then they get reworked/removed later. I don't think it matters that the order of the last few doesn't work until the end; since it didn't work at the beginning, it doesn't matter until the end of the series. Dave > Later, Juan. -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK