diff mbox

[30/51] ram: Move src_page_req* to RAMState

Message ID 20170323204544.12015-31-quintela@redhat.com
State New
Headers show

Commit Message

Juan Quintela March 23, 2017, 8:45 p.m. UTC
This are the last postcopy fields still at MigrationState.  Once there
Move MigrationSrcPageRequest to ram.c and remove MigrationState
parameters where appropiate.

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 include/migration/migration.h | 17 +-----------
 migration/migration.c         |  5 +---
 migration/ram.c               | 62 ++++++++++++++++++++++++++-----------------
 3 files changed, 40 insertions(+), 44 deletions(-)

Comments

Peter Xu March 30, 2017, 6:56 a.m. UTC | #1
On Thu, Mar 23, 2017 at 09:45:23PM +0100, Juan Quintela wrote:
> This are the last postcopy fields still at MigrationState.  Once there

s/This/These/

> Move MigrationSrcPageRequest to ram.c and remove MigrationState
> parameters where appropiate.
> 
> Signed-off-by: Juan Quintela <quintela@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

One question below though...

[...]

> @@ -1191,19 +1204,18 @@ static bool get_queued_page(RAMState *rs, MigrationState *ms,
>   *
>   * It should be empty at the end anyway, but in error cases there may
>   * xbe some left.
> - *
> - * @ms: current migration state
>   */
> -void flush_page_queue(MigrationState *ms)
> +void flush_page_queue(void)
>  {
> -    struct MigrationSrcPageRequest *mspr, *next_mspr;
> +    struct RAMSrcPageRequest *mspr, *next_mspr;
> +    RAMState *rs = &ram_state;
>      /* This queue generally should be empty - but in the case of a failed
>       * migration might have some droppings in.
>       */
>      rcu_read_lock();

Could I ask why we are taking the RCU read lock rather than the mutex
here?

> -    QSIMPLEQ_FOREACH_SAFE(mspr, &ms->src_page_requests, next_req, next_mspr) {
> +    QSIMPLEQ_FOREACH_SAFE(mspr, &rs->src_page_requests, next_req, next_mspr) {
>          memory_region_unref(mspr->rb->mr);
> -        QSIMPLEQ_REMOVE_HEAD(&ms->src_page_requests, next_req);
> +        QSIMPLEQ_REMOVE_HEAD(&rs->src_page_requests, next_req);
>          g_free(mspr);
>      }
>      rcu_read_unlock();

Thanks,

-- peterx
Juan Quintela March 30, 2017, 4:09 p.m. UTC | #2
Peter Xu <peterx@redhat.com> wrote:
> On Thu, Mar 23, 2017 at 09:45:23PM +0100, Juan Quintela wrote:
>> This are the last postcopy fields still at MigrationState.  Once there
>
> s/This/These/
>
>> Move MigrationSrcPageRequest to ram.c and remove MigrationState
>> parameters where appropiate.
>> 
>> Signed-off-by: Juan Quintela <quintela@redhat.com>
>
> Reviewed-by: Peter Xu <peterx@redhat.com>
>
> One question below though...
>
> [...]
>
>> @@ -1191,19 +1204,18 @@ static bool get_queued_page(RAMState *rs, MigrationState *ms,
>>   *
>>   * It should be empty at the end anyway, but in error cases there may
>>   * xbe some left.
>> - *
>> - * @ms: current migration state
>>   */
>> -void flush_page_queue(MigrationState *ms)
>> +void flush_page_queue(void)
>>  {
>> -    struct MigrationSrcPageRequest *mspr, *next_mspr;
>> +    struct RAMSrcPageRequest *mspr, *next_mspr;
>> +    RAMState *rs = &ram_state;
>>      /* This queue generally should be empty - but in the case of a failed
>>       * migration might have some droppings in.
>>       */
>>      rcu_read_lock();
>
> Could I ask why we are taking the RCU read lock rather than the mutex
> here?

I will let this one for dave.


>
>> -    QSIMPLEQ_FOREACH_SAFE(mspr, &ms->src_page_requests, next_req, next_mspr) {
>> +    QSIMPLEQ_FOREACH_SAFE(mspr, &rs->src_page_requests, next_req, next_mspr) {
>>          memory_region_unref(mspr->rb->mr);
>> -        QSIMPLEQ_REMOVE_HEAD(&ms->src_page_requests, next_req);
>> +        QSIMPLEQ_REMOVE_HEAD(&rs->src_page_requests, next_req);
>>          g_free(mspr);
>>      }
>>      rcu_read_unlock();
>
> Thanks,
>
> -- peterx
Dr. David Alan Gilbert March 31, 2017, 3:25 p.m. UTC | #3
* Peter Xu (peterx@redhat.com) wrote:
> On Thu, Mar 23, 2017 at 09:45:23PM +0100, Juan Quintela wrote:
> > This are the last postcopy fields still at MigrationState.  Once there
> 
> s/This/These/
> 
> > Move MigrationSrcPageRequest to ram.c and remove MigrationState
> > parameters where appropiate.
> > 
> > Signed-off-by: Juan Quintela <quintela@redhat.com>
> 
> Reviewed-by: Peter Xu <peterx@redhat.com>
> 
> One question below though...
> 
> [...]
> 
> > @@ -1191,19 +1204,18 @@ static bool get_queued_page(RAMState *rs, MigrationState *ms,
> >   *
> >   * It should be empty at the end anyway, but in error cases there may
> >   * xbe some left.
> > - *
> > - * @ms: current migration state
> >   */
> > -void flush_page_queue(MigrationState *ms)
> > +void flush_page_queue(void)
> >  {
> > -    struct MigrationSrcPageRequest *mspr, *next_mspr;
> > +    struct RAMSrcPageRequest *mspr, *next_mspr;
> > +    RAMState *rs = &ram_state;
> >      /* This queue generally should be empty - but in the case of a failed
> >       * migration might have some droppings in.
> >       */
> >      rcu_read_lock();
> 
> Could I ask why we are taking the RCU read lock rather than the mutex
> here?

It's a good question whether we need anything at all.
flush_page_queue is called only from migrate_fd_cleanup.
migrate_fd_cleanup is called either from a backhalf, which I think has the bql,
or from a failure path in migrate_fd_connect.
migrate_fd_connect is called from migration_channel_connect and rdma_start_outgoing_migration
which I think both end up at monitor commands so also in the bql.

So I think we can probably just lose the rcu_read_lock/unlock.

Dave

> 
> > -    QSIMPLEQ_FOREACH_SAFE(mspr, &ms->src_page_requests, next_req, next_mspr) {
> > +    QSIMPLEQ_FOREACH_SAFE(mspr, &rs->src_page_requests, next_req, next_mspr) {
> >          memory_region_unref(mspr->rb->mr);
> > -        QSIMPLEQ_REMOVE_HEAD(&ms->src_page_requests, next_req);
> > +        QSIMPLEQ_REMOVE_HEAD(&rs->src_page_requests, next_req);
> >          g_free(mspr);
> >      }
> >      rcu_read_unlock();
> 
> Thanks,
> 
> -- peterx
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Dr. David Alan Gilbert March 31, 2017, 4:52 p.m. UTC | #4
* Juan Quintela (quintela@redhat.com) wrote:
> This are the last postcopy fields still at MigrationState.  Once there
> Move MigrationSrcPageRequest to ram.c and remove MigrationState
> parameters where appropiate.
> 
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> ---
>  include/migration/migration.h | 17 +-----------
>  migration/migration.c         |  5 +---
>  migration/ram.c               | 62 ++++++++++++++++++++++++++-----------------
>  3 files changed, 40 insertions(+), 44 deletions(-)
> 
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index e032fb0..8a6caa3 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -128,18 +128,6 @@ struct MigrationIncomingState {
>  MigrationIncomingState *migration_incoming_get_current(void);
>  void migration_incoming_state_destroy(void);
>  
> -/*
> - * An outstanding page request, on the source, having been received
> - * and queued
> - */
> -struct MigrationSrcPageRequest {
> -    RAMBlock *rb;
> -    hwaddr    offset;
> -    hwaddr    len;
> -
> -    QSIMPLEQ_ENTRY(MigrationSrcPageRequest) next_req;
> -};
> -
>  struct MigrationState
>  {
>      size_t bytes_xfer;
> @@ -186,9 +174,6 @@ struct MigrationState
>      /* Flag set once the migration thread called bdrv_inactivate_all */
>      bool block_inactive;
>  
> -    /* Queue of outstanding page requests from the destination */
> -    QemuMutex src_page_req_mutex;
> -    QSIMPLEQ_HEAD(src_page_requests, MigrationSrcPageRequest) src_page_requests;
>      /* The semaphore is used to notify COLO thread that failover is finished */
>      QemuSemaphore colo_exit_sem;
>  
> @@ -371,7 +356,7 @@ void savevm_skip_configuration(void);
>  int global_state_store(void);
>  void global_state_store_running(void);
>  
> -void flush_page_queue(MigrationState *ms);
> +void flush_page_queue(void);
>  int ram_save_queue_pages(MigrationState *ms, const char *rbname,
>                           ram_addr_t start, ram_addr_t len);
>  uint64_t ram_pagesize_summary(void);
> diff --git a/migration/migration.c b/migration/migration.c
> index b220941..58c1587 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -109,7 +109,6 @@ MigrationState *migrate_get_current(void)
>      };
>  
>      if (!once) {
> -        qemu_mutex_init(&current_migration.src_page_req_mutex);
>          current_migration.parameters.tls_creds = g_strdup("");
>          current_migration.parameters.tls_hostname = g_strdup("");
>          once = true;
> @@ -949,7 +948,7 @@ static void migrate_fd_cleanup(void *opaque)
>      qemu_bh_delete(s->cleanup_bh);
>      s->cleanup_bh = NULL;
>  
> -    flush_page_queue(s);
> +    flush_page_queue();
>  
>      if (s->to_dst_file) {
>          trace_migrate_fd_cleanup();
> @@ -1123,8 +1122,6 @@ MigrationState *migrate_init(const MigrationParams *params)
>  
>      migrate_set_state(&s->state, MIGRATION_STATUS_NONE, MIGRATION_STATUS_SETUP);
>  
> -    QSIMPLEQ_INIT(&s->src_page_requests);
> -
>      s->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>      return s;
>  }
> diff --git a/migration/ram.c b/migration/ram.c
> index 325a0f3..601370c 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -151,6 +151,18 @@ struct RAMBitmap {
>  };
>  typedef struct RAMBitmap RAMBitmap;
>  
> +/*
> + * An outstanding page request, on the source, having been received
> + * and queued
> + */
> +struct RAMSrcPageRequest {
> +    RAMBlock *rb;
> +    hwaddr    offset;
> +    hwaddr    len;
> +
> +    QSIMPLEQ_ENTRY(RAMSrcPageRequest) next_req;
> +};
> +
>  /* State of RAM for migration */
>  struct RAMState {
>      /* Last block that we have visited searching for dirty pages */
> @@ -205,6 +217,9 @@ struct RAMState {
>      RAMBitmap *ram_bitmap;
>      /* The RAMBlock used in the last src_page_request */
>      RAMBlock *last_req_rb;
> +    /* Queue of outstanding page requests from the destination */
> +    QemuMutex src_page_req_mutex;
> +    QSIMPLEQ_HEAD(src_page_requests, RAMSrcPageRequest) src_page_requests;
>  };
>  typedef struct RAMState RAMState;
>  
> @@ -1084,20 +1099,20 @@ static bool find_dirty_block(RAMState *rs, QEMUFile *f, PageSearchStatus *pss,
>   *
>   * Returns the block of the page (or NULL if none available)
>   *
> - * @ms: current migration state
> + * @rs: current RAM state
>   * @offset: used to return the offset within the RAMBlock
>   * @ram_addr_abs: pointer into which to store the address of the dirty page
>   *                within the global ram_addr space
>   */
> -static RAMBlock *unqueue_page(MigrationState *ms, ram_addr_t *offset,
> +static RAMBlock *unqueue_page(RAMState *rs, ram_addr_t *offset,
>                                ram_addr_t *ram_addr_abs)
>  {
>      RAMBlock *block = NULL;
>  
> -    qemu_mutex_lock(&ms->src_page_req_mutex);
> -    if (!QSIMPLEQ_EMPTY(&ms->src_page_requests)) {
> -        struct MigrationSrcPageRequest *entry =
> -                                QSIMPLEQ_FIRST(&ms->src_page_requests);
> +    qemu_mutex_lock(&rs->src_page_req_mutex);
> +    if (!QSIMPLEQ_EMPTY(&rs->src_page_requests)) {
> +        struct RAMSrcPageRequest *entry =
> +                                QSIMPLEQ_FIRST(&rs->src_page_requests);
>          block = entry->rb;
>          *offset = entry->offset;
>          *ram_addr_abs = (entry->offset + entry->rb->offset) &
> @@ -1108,11 +1123,11 @@ static RAMBlock *unqueue_page(MigrationState *ms, ram_addr_t *offset,
>              entry->offset += TARGET_PAGE_SIZE;
>          } else {
>              memory_region_unref(block->mr);
> -            QSIMPLEQ_REMOVE_HEAD(&ms->src_page_requests, next_req);
> +            QSIMPLEQ_REMOVE_HEAD(&rs->src_page_requests, next_req);
>              g_free(entry);
>          }
>      }
> -    qemu_mutex_unlock(&ms->src_page_req_mutex);
> +    qemu_mutex_unlock(&rs->src_page_req_mutex);
>  
>      return block;
>  }
> @@ -1125,13 +1140,11 @@ static RAMBlock *unqueue_page(MigrationState *ms, ram_addr_t *offset,
>   * Returns if a queued page is found
>   *
>   * @rs: current RAM state
> - * @ms: current migration state
>   * @pss: data about the state of the current dirty page scan
>   * @ram_addr_abs: pointer into which to store the address of the dirty page
>   *                within the global ram_addr space
>   */
> -static bool get_queued_page(RAMState *rs, MigrationState *ms,
> -                            PageSearchStatus *pss,
> +static bool get_queued_page(RAMState *rs, PageSearchStatus *pss,
>                              ram_addr_t *ram_addr_abs)
>  {
>      RAMBlock  *block;
> @@ -1139,7 +1152,7 @@ static bool get_queued_page(RAMState *rs, MigrationState *ms,
>      bool dirty;
>  
>      do {
> -        block = unqueue_page(ms, &offset, ram_addr_abs);
> +        block = unqueue_page(rs, &offset, ram_addr_abs);
>          /*
>           * We're sending this page, and since it's postcopy nothing else
>           * will dirty it, and we must make sure it doesn't get sent again
> @@ -1191,19 +1204,18 @@ static bool get_queued_page(RAMState *rs, MigrationState *ms,
>   *
>   * It should be empty at the end anyway, but in error cases there may
>   * xbe some left.
> - *
> - * @ms: current migration state
>   */
> -void flush_page_queue(MigrationState *ms)
> +void flush_page_queue(void)

I'm not sure this is safe;  it's called from migrate_fd_cleanup right at
the end, if you do any finalisation/cleanup of the RAMState in ram_save_complete
then when is it safe to run this?

>  {
> -    struct MigrationSrcPageRequest *mspr, *next_mspr;
> +    struct RAMSrcPageRequest *mspr, *next_mspr;
> +    RAMState *rs = &ram_state;
>      /* This queue generally should be empty - but in the case of a failed
>       * migration might have some droppings in.
>       */
>      rcu_read_lock();
> -    QSIMPLEQ_FOREACH_SAFE(mspr, &ms->src_page_requests, next_req, next_mspr) {
> +    QSIMPLEQ_FOREACH_SAFE(mspr, &rs->src_page_requests, next_req, next_mspr) {
>          memory_region_unref(mspr->rb->mr);
> -        QSIMPLEQ_REMOVE_HEAD(&ms->src_page_requests, next_req);
> +        QSIMPLEQ_REMOVE_HEAD(&rs->src_page_requests, next_req);
>          g_free(mspr);
>      }
>      rcu_read_unlock();
> @@ -1260,16 +1272,16 @@ int ram_save_queue_pages(MigrationState *ms, const char *rbname,
>          goto err;
>      }
>  
> -    struct MigrationSrcPageRequest *new_entry =
> -        g_malloc0(sizeof(struct MigrationSrcPageRequest));
> +    struct RAMSrcPageRequest *new_entry =
> +        g_malloc0(sizeof(struct RAMSrcPageRequest));
>      new_entry->rb = ramblock;
>      new_entry->offset = start;
>      new_entry->len = len;
>  
>      memory_region_ref(ramblock->mr);
> -    qemu_mutex_lock(&ms->src_page_req_mutex);
> -    QSIMPLEQ_INSERT_TAIL(&ms->src_page_requests, new_entry, next_req);
> -    qemu_mutex_unlock(&ms->src_page_req_mutex);
> +    qemu_mutex_lock(&rs->src_page_req_mutex);
> +    QSIMPLEQ_INSERT_TAIL(&rs->src_page_requests, new_entry, next_req);
> +    qemu_mutex_unlock(&rs->src_page_req_mutex);

Hmm ok where did it get it's rs from?
Anyway, the thing I needed to convince myself of was that there was any guarantee that
RAMState would exist by the time the first request came in, something that we now need
to be careful of.
I think we're mostly OK; we call qemu_savevm_state_begin() at the top
of migration_thread so the ram_save_setup should be done and allocate
the RAMState before we get into the main loop and thus before we ever
look at the 'start_postcopy' flag and thus before we ever ask the destination
to send us stuff.

>      rcu_read_unlock();
>  
>      return 0;
> @@ -1408,7 +1420,7 @@ static int ram_find_and_save_block(RAMState *rs, QEMUFile *f, bool last_stage)
>  
>      do {
>          again = true;
> -        found = get_queued_page(rs, ms, &pss, &dirty_ram_abs);
> +        found = get_queued_page(rs, &pss, &dirty_ram_abs);
>  
>          if (!found) {
>              /* priority queue empty, so just search for something dirty */
> @@ -1968,6 +1980,8 @@ static int ram_state_init(RAMState *rs)
>  
>      memset(rs, 0, sizeof(*rs));
>      qemu_mutex_init(&rs->bitmap_mutex);
> +    qemu_mutex_init(&rs->src_page_req_mutex);
> +    QSIMPLEQ_INIT(&rs->src_page_requests);

Similar question to above; that mutex is going to get reinit'd
on a new migration and it shouldn't be without it being destroyed.
Maybe make it a once.

Dave

>  
>      if (migrate_use_xbzrle()) {
>          XBZRLE_cache_lock();
> -- 
> 2.9.3
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Peter Xu April 1, 2017, 7:15 a.m. UTC | #5
On Fri, Mar 31, 2017 at 04:25:56PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On Thu, Mar 23, 2017 at 09:45:23PM +0100, Juan Quintela wrote:
> > > This are the last postcopy fields still at MigrationState.  Once there
> > 
> > s/This/These/
> > 
> > > Move MigrationSrcPageRequest to ram.c and remove MigrationState
> > > parameters where appropiate.
> > > 
> > > Signed-off-by: Juan Quintela <quintela@redhat.com>
> > 
> > Reviewed-by: Peter Xu <peterx@redhat.com>
> > 
> > One question below though...
> > 
> > [...]
> > 
> > > @@ -1191,19 +1204,18 @@ static bool get_queued_page(RAMState *rs, MigrationState *ms,
> > >   *
> > >   * It should be empty at the end anyway, but in error cases there may
> > >   * xbe some left.
> > > - *
> > > - * @ms: current migration state
> > >   */
> > > -void flush_page_queue(MigrationState *ms)
> > > +void flush_page_queue(void)
> > >  {
> > > -    struct MigrationSrcPageRequest *mspr, *next_mspr;
> > > +    struct RAMSrcPageRequest *mspr, *next_mspr;
> > > +    RAMState *rs = &ram_state;
> > >      /* This queue generally should be empty - but in the case of a failed
> > >       * migration might have some droppings in.
> > >       */
> > >      rcu_read_lock();
> > 
> > Could I ask why we are taking the RCU read lock rather than the mutex
> > here?
> 
> It's a good question whether we need anything at all.
> flush_page_queue is called only from migrate_fd_cleanup.
> migrate_fd_cleanup is called either from a backhalf, which I think has the bql,
> or from a failure path in migrate_fd_connect.
> migrate_fd_connect is called from migration_channel_connect and rdma_start_outgoing_migration
> which I think both end up at monitor commands so also in the bql.
> 
> So I think we can probably just lose the rcu_read_lock/unlock.

Thanks for the confirmation.

(ps: even if we are not with bql, we should not need this
 rcu_read_lock, right? My understanding is: if we want to protect
 src_page_requests, we should need the mutex, not rcu lock; while for
 the memory_region_unref() since we have had the reference, looks like
 we don't need any kind of locking either)

> 
> Dave
> 
> > 
> > > -    QSIMPLEQ_FOREACH_SAFE(mspr, &ms->src_page_requests, next_req, next_mspr) {
> > > +    QSIMPLEQ_FOREACH_SAFE(mspr, &rs->src_page_requests, next_req, next_mspr) {
> > >          memory_region_unref(mspr->rb->mr);
> > > -        QSIMPLEQ_REMOVE_HEAD(&ms->src_page_requests, next_req);
> > > +        QSIMPLEQ_REMOVE_HEAD(&rs->src_page_requests, next_req);
> > >          g_free(mspr);
> > >      }
> > >      rcu_read_unlock();
> > 
> > Thanks,
> > 
> > -- peterx
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

-- peterx
Juan Quintela April 4, 2017, 5:42 p.m. UTC | #6
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> * Juan Quintela (quintela@redhat.com) wrote:

Hi

>> @@ -1191,19 +1204,18 @@ static bool get_queued_page(RAMState *rs, MigrationState *ms,
>>   *
>>   * It should be empty at the end anyway, but in error cases there may
>>   * xbe some left.
>> - *
>> - * @ms: current migration state
>>   */
>> -void flush_page_queue(MigrationState *ms)
>> +void flush_page_queue(void)
>
> I'm not sure this is safe;  it's called from migrate_fd_cleanup right at
> the end, if you do any finalisation/cleanup of the RAMState in
> ram_save_complete
> then when is it safe to run this?

But, looking into it, I think that we should be able to move this into
ram_save_cleanup() no?

We don't need it after that?
As an added bonus, we can remove the export of it.

>> @@ -1260,16 +1272,16 @@ int ram_save_queue_pages(MigrationState *ms, const char *rbname,
>>          goto err;
>>      }
>>  
>> -    struct MigrationSrcPageRequest *new_entry =
>> -        g_malloc0(sizeof(struct MigrationSrcPageRequest));
>> +    struct RAMSrcPageRequest *new_entry =
>> +        g_malloc0(sizeof(struct RAMSrcPageRequest));
>>      new_entry->rb = ramblock;
>>      new_entry->offset = start;
>>      new_entry->len = len;
>>  
>>      memory_region_ref(ramblock->mr);
>> -    qemu_mutex_lock(&ms->src_page_req_mutex);
>> -    QSIMPLEQ_INSERT_TAIL(&ms->src_page_requests, new_entry, next_req);
>> -    qemu_mutex_unlock(&ms->src_page_req_mutex);
>> +    qemu_mutex_lock(&rs->src_page_req_mutex);
>> +    QSIMPLEQ_INSERT_TAIL(&rs->src_page_requests, new_entry, next_req);
>> +    qemu_mutex_unlock(&rs->src_page_req_mutex);
>
> Hmm ok where did it get it's rs from?
> Anyway, the thing I needed to convince myself of was that there was
> any guarantee that
> RAMState would exist by the time the first request came in, something
> that we now need
> to be careful of.
> I think we're mostly OK; we call qemu_savevm_state_begin() at the top
> of migration_thread so the ram_save_setup should be done and allocate
> the RAMState before we get into the main loop and thus before we ever
> look at the 'start_postcopy' flag and thus before we ever ask the destination
> to send us stuff.
>
>>      rcu_read_unlock();
>>  
>>      return 0;
>> @@ -1408,7 +1420,7 @@ static int ram_find_and_save_block(RAMState *rs, QEMUFile *f, bool last_stage)
>>  
>>      do {
>>          again = true;
>> -        found = get_queued_page(rs, ms, &pss, &dirty_ram_abs);
>> +        found = get_queued_page(rs, &pss, &dirty_ram_abs);
>>  
>>          if (!found) {
>>              /* priority queue empty, so just search for something dirty */
>> @@ -1968,6 +1980,8 @@ static int ram_state_init(RAMState *rs)
>>  
>>      memset(rs, 0, sizeof(*rs));
>>      qemu_mutex_init(&rs->bitmap_mutex);
>> +    qemu_mutex_init(&rs->src_page_req_mutex);
>> +    QSIMPLEQ_INIT(&rs->src_page_requests);
>
> Similar question to above; that mutex is going to get reinit'd
> on a new migration and it shouldn't be without it being destroyed.
> Maybe make it a once.

good catch.  I think that the easiest way is to just move it into proper
RAMState, init it here, and destroy it at cleanup?

Later, Juan.
Dr. David Alan Gilbert April 5, 2017, 10:27 a.m. UTC | #7
* Peter Xu (peterx@redhat.com) wrote:
> On Fri, Mar 31, 2017 at 04:25:56PM +0100, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > On Thu, Mar 23, 2017 at 09:45:23PM +0100, Juan Quintela wrote:
> > > > This are the last postcopy fields still at MigrationState.  Once there
> > > 
> > > s/This/These/
> > > 
> > > > Move MigrationSrcPageRequest to ram.c and remove MigrationState
> > > > parameters where appropiate.
> > > > 
> > > > Signed-off-by: Juan Quintela <quintela@redhat.com>
> > > 
> > > Reviewed-by: Peter Xu <peterx@redhat.com>
> > > 
> > > One question below though...
> > > 
> > > [...]
> > > 
> > > > @@ -1191,19 +1204,18 @@ static bool get_queued_page(RAMState *rs, MigrationState *ms,
> > > >   *
> > > >   * It should be empty at the end anyway, but in error cases there may
> > > >   * xbe some left.
> > > > - *
> > > > - * @ms: current migration state
> > > >   */
> > > > -void flush_page_queue(MigrationState *ms)
> > > > +void flush_page_queue(void)
> > > >  {
> > > > -    struct MigrationSrcPageRequest *mspr, *next_mspr;
> > > > +    struct RAMSrcPageRequest *mspr, *next_mspr;
> > > > +    RAMState *rs = &ram_state;
> > > >      /* This queue generally should be empty - but in the case of a failed
> > > >       * migration might have some droppings in.
> > > >       */
> > > >      rcu_read_lock();
> > > 
> > > Could I ask why we are taking the RCU read lock rather than the mutex
> > > here?
> > 
> > It's a good question whether we need anything at all.
> > flush_page_queue is called only from migrate_fd_cleanup.
> > migrate_fd_cleanup is called either from a backhalf, which I think has the bql,
> > or from a failure path in migrate_fd_connect.
> > migrate_fd_connect is called from migration_channel_connect and rdma_start_outgoing_migration
> > which I think both end up at monitor commands so also in the bql.
> > 
> > So I think we can probably just lose the rcu_read_lock/unlock.
> 
> Thanks for the confirmation.
> 
> (ps: even if we are not with bql, we should not need this
>  rcu_read_lock, right? My understanding is: if we want to protect
>  src_page_requests, we should need the mutex, not rcu lock; while for
>  the memory_region_unref() since we have had the reference, looks like
>  we don't need any kind of locking either)

Right; I guess the memory_region_unref might cause the memory region
to be cleanup up in that loop without the rcu locks, but I don't think
it's a problem even if they are cleaned up.

Dave

> > 
> > Dave
> > 
> > > 
> > > > -    QSIMPLEQ_FOREACH_SAFE(mspr, &ms->src_page_requests, next_req, next_mspr) {
> > > > +    QSIMPLEQ_FOREACH_SAFE(mspr, &rs->src_page_requests, next_req, next_mspr) {
> > > >          memory_region_unref(mspr->rb->mr);
> > > > -        QSIMPLEQ_REMOVE_HEAD(&ms->src_page_requests, next_req);
> > > > +        QSIMPLEQ_REMOVE_HEAD(&rs->src_page_requests, next_req);
> > > >          g_free(mspr);
> > > >      }
> > > >      rcu_read_unlock();
> > > 
> > > Thanks,
> > > 
> > > -- peterx
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
> -- peterx
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Dr. David Alan Gilbert April 5, 2017, 10:34 a.m. UTC | #8
* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > * Juan Quintela (quintela@redhat.com) wrote:
> 
> Hi
> 
> >> @@ -1191,19 +1204,18 @@ static bool get_queued_page(RAMState *rs, MigrationState *ms,
> >>   *
> >>   * It should be empty at the end anyway, but in error cases there may
> >>   * xbe some left.
> >> - *
> >> - * @ms: current migration state
> >>   */
> >> -void flush_page_queue(MigrationState *ms)
> >> +void flush_page_queue(void)
> >
> > I'm not sure this is safe;  it's called from migrate_fd_cleanup right at
> > the end, if you do any finalisation/cleanup of the RAMState in
> > ram_save_complete
> > then when is it safe to run this?
> 
> But, looking into it, I think that we should be able to move this into
> ram_save_cleanup() no?
> 
> We don't need it after that?
> As an added bonus, we can remove the export of it.

As discussed by irc;  the thing I'm cautious about is getting the order
of cleanup right.
If you look at migration_completion you see we call
qemu_savevm_state_complete_postcopy() (which calls ram_save_complete)
before we call await_return_path_close_on_source  which ensures that the
thread that's handling requests from the destination and queuing them
has finished.

It seems right to make sure that thread has finished (and thus nothing
is trying to add anythign to that queue) before trying to clean it up.

Dave

> >> @@ -1260,16 +1272,16 @@ int ram_save_queue_pages(MigrationState *ms, const char *rbname,
> >>          goto err;
> >>      }
> >>  
> >> -    struct MigrationSrcPageRequest *new_entry =
> >> -        g_malloc0(sizeof(struct MigrationSrcPageRequest));
> >> +    struct RAMSrcPageRequest *new_entry =
> >> +        g_malloc0(sizeof(struct RAMSrcPageRequest));
> >>      new_entry->rb = ramblock;
> >>      new_entry->offset = start;
> >>      new_entry->len = len;
> >>  
> >>      memory_region_ref(ramblock->mr);
> >> -    qemu_mutex_lock(&ms->src_page_req_mutex);
> >> -    QSIMPLEQ_INSERT_TAIL(&ms->src_page_requests, new_entry, next_req);
> >> -    qemu_mutex_unlock(&ms->src_page_req_mutex);
> >> +    qemu_mutex_lock(&rs->src_page_req_mutex);
> >> +    QSIMPLEQ_INSERT_TAIL(&rs->src_page_requests, new_entry, next_req);
> >> +    qemu_mutex_unlock(&rs->src_page_req_mutex);
> >
> > Hmm ok where did it get it's rs from?
> > Anyway, the thing I needed to convince myself of was that there was
> > any guarantee that
> > RAMState would exist by the time the first request came in, something
> > that we now need
> > to be careful of.
> > I think we're mostly OK; we call qemu_savevm_state_begin() at the top
> > of migration_thread so the ram_save_setup should be done and allocate
> > the RAMState before we get into the main loop and thus before we ever
> > look at the 'start_postcopy' flag and thus before we ever ask the destination
> > to send us stuff.
> >
> >>      rcu_read_unlock();
> >>  
> >>      return 0;
> >> @@ -1408,7 +1420,7 @@ static int ram_find_and_save_block(RAMState *rs, QEMUFile *f, bool last_stage)
> >>  
> >>      do {
> >>          again = true;
> >> -        found = get_queued_page(rs, ms, &pss, &dirty_ram_abs);
> >> +        found = get_queued_page(rs, &pss, &dirty_ram_abs);
> >>  
> >>          if (!found) {
> >>              /* priority queue empty, so just search for something dirty */
> >> @@ -1968,6 +1980,8 @@ static int ram_state_init(RAMState *rs)
> >>  
> >>      memset(rs, 0, sizeof(*rs));
> >>      qemu_mutex_init(&rs->bitmap_mutex);
> >> +    qemu_mutex_init(&rs->src_page_req_mutex);
> >> +    QSIMPLEQ_INIT(&rs->src_page_requests);
> >
> > Similar question to above; that mutex is going to get reinit'd
> > on a new migration and it shouldn't be without it being destroyed.
> > Maybe make it a once.
> 
> good catch.  I think that the easiest way is to just move it into proper
> RAMState, init it here, and destroy it at cleanup?
> 
> Later, Juan.
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
diff mbox

Patch

diff --git a/include/migration/migration.h b/include/migration/migration.h
index e032fb0..8a6caa3 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -128,18 +128,6 @@  struct MigrationIncomingState {
 MigrationIncomingState *migration_incoming_get_current(void);
 void migration_incoming_state_destroy(void);
 
-/*
- * An outstanding page request, on the source, having been received
- * and queued
- */
-struct MigrationSrcPageRequest {
-    RAMBlock *rb;
-    hwaddr    offset;
-    hwaddr    len;
-
-    QSIMPLEQ_ENTRY(MigrationSrcPageRequest) next_req;
-};
-
 struct MigrationState
 {
     size_t bytes_xfer;
@@ -186,9 +174,6 @@  struct MigrationState
     /* Flag set once the migration thread called bdrv_inactivate_all */
     bool block_inactive;
 
-    /* Queue of outstanding page requests from the destination */
-    QemuMutex src_page_req_mutex;
-    QSIMPLEQ_HEAD(src_page_requests, MigrationSrcPageRequest) src_page_requests;
     /* The semaphore is used to notify COLO thread that failover is finished */
     QemuSemaphore colo_exit_sem;
 
@@ -371,7 +356,7 @@  void savevm_skip_configuration(void);
 int global_state_store(void);
 void global_state_store_running(void);
 
-void flush_page_queue(MigrationState *ms);
+void flush_page_queue(void);
 int ram_save_queue_pages(MigrationState *ms, const char *rbname,
                          ram_addr_t start, ram_addr_t len);
 uint64_t ram_pagesize_summary(void);
diff --git a/migration/migration.c b/migration/migration.c
index b220941..58c1587 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -109,7 +109,6 @@  MigrationState *migrate_get_current(void)
     };
 
     if (!once) {
-        qemu_mutex_init(&current_migration.src_page_req_mutex);
         current_migration.parameters.tls_creds = g_strdup("");
         current_migration.parameters.tls_hostname = g_strdup("");
         once = true;
@@ -949,7 +948,7 @@  static void migrate_fd_cleanup(void *opaque)
     qemu_bh_delete(s->cleanup_bh);
     s->cleanup_bh = NULL;
 
-    flush_page_queue(s);
+    flush_page_queue();
 
     if (s->to_dst_file) {
         trace_migrate_fd_cleanup();
@@ -1123,8 +1122,6 @@  MigrationState *migrate_init(const MigrationParams *params)
 
     migrate_set_state(&s->state, MIGRATION_STATUS_NONE, MIGRATION_STATUS_SETUP);
 
-    QSIMPLEQ_INIT(&s->src_page_requests);
-
     s->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
     return s;
 }
diff --git a/migration/ram.c b/migration/ram.c
index 325a0f3..601370c 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -151,6 +151,18 @@  struct RAMBitmap {
 };
 typedef struct RAMBitmap RAMBitmap;
 
+/*
+ * An outstanding page request, on the source, having been received
+ * and queued
+ */
+struct RAMSrcPageRequest {
+    RAMBlock *rb;
+    hwaddr    offset;
+    hwaddr    len;
+
+    QSIMPLEQ_ENTRY(RAMSrcPageRequest) next_req;
+};
+
 /* State of RAM for migration */
 struct RAMState {
     /* Last block that we have visited searching for dirty pages */
@@ -205,6 +217,9 @@  struct RAMState {
     RAMBitmap *ram_bitmap;
     /* The RAMBlock used in the last src_page_request */
     RAMBlock *last_req_rb;
+    /* Queue of outstanding page requests from the destination */
+    QemuMutex src_page_req_mutex;
+    QSIMPLEQ_HEAD(src_page_requests, RAMSrcPageRequest) src_page_requests;
 };
 typedef struct RAMState RAMState;
 
@@ -1084,20 +1099,20 @@  static bool find_dirty_block(RAMState *rs, QEMUFile *f, PageSearchStatus *pss,
  *
  * Returns the block of the page (or NULL if none available)
  *
- * @ms: current migration state
+ * @rs: current RAM state
  * @offset: used to return the offset within the RAMBlock
  * @ram_addr_abs: pointer into which to store the address of the dirty page
  *                within the global ram_addr space
  */
-static RAMBlock *unqueue_page(MigrationState *ms, ram_addr_t *offset,
+static RAMBlock *unqueue_page(RAMState *rs, ram_addr_t *offset,
                               ram_addr_t *ram_addr_abs)
 {
     RAMBlock *block = NULL;
 
-    qemu_mutex_lock(&ms->src_page_req_mutex);
-    if (!QSIMPLEQ_EMPTY(&ms->src_page_requests)) {
-        struct MigrationSrcPageRequest *entry =
-                                QSIMPLEQ_FIRST(&ms->src_page_requests);
+    qemu_mutex_lock(&rs->src_page_req_mutex);
+    if (!QSIMPLEQ_EMPTY(&rs->src_page_requests)) {
+        struct RAMSrcPageRequest *entry =
+                                QSIMPLEQ_FIRST(&rs->src_page_requests);
         block = entry->rb;
         *offset = entry->offset;
         *ram_addr_abs = (entry->offset + entry->rb->offset) &
@@ -1108,11 +1123,11 @@  static RAMBlock *unqueue_page(MigrationState *ms, ram_addr_t *offset,
             entry->offset += TARGET_PAGE_SIZE;
         } else {
             memory_region_unref(block->mr);
-            QSIMPLEQ_REMOVE_HEAD(&ms->src_page_requests, next_req);
+            QSIMPLEQ_REMOVE_HEAD(&rs->src_page_requests, next_req);
             g_free(entry);
         }
     }
-    qemu_mutex_unlock(&ms->src_page_req_mutex);
+    qemu_mutex_unlock(&rs->src_page_req_mutex);
 
     return block;
 }
@@ -1125,13 +1140,11 @@  static RAMBlock *unqueue_page(MigrationState *ms, ram_addr_t *offset,
  * Returns if a queued page is found
  *
  * @rs: current RAM state
- * @ms: current migration state
  * @pss: data about the state of the current dirty page scan
  * @ram_addr_abs: pointer into which to store the address of the dirty page
  *                within the global ram_addr space
  */
-static bool get_queued_page(RAMState *rs, MigrationState *ms,
-                            PageSearchStatus *pss,
+static bool get_queued_page(RAMState *rs, PageSearchStatus *pss,
                             ram_addr_t *ram_addr_abs)
 {
     RAMBlock  *block;
@@ -1139,7 +1152,7 @@  static bool get_queued_page(RAMState *rs, MigrationState *ms,
     bool dirty;
 
     do {
-        block = unqueue_page(ms, &offset, ram_addr_abs);
+        block = unqueue_page(rs, &offset, ram_addr_abs);
         /*
          * We're sending this page, and since it's postcopy nothing else
          * will dirty it, and we must make sure it doesn't get sent again
@@ -1191,19 +1204,18 @@  static bool get_queued_page(RAMState *rs, MigrationState *ms,
  *
  * It should be empty at the end anyway, but in error cases there may
  * xbe some left.
- *
- * @ms: current migration state
  */
-void flush_page_queue(MigrationState *ms)
+void flush_page_queue(void)
 {
-    struct MigrationSrcPageRequest *mspr, *next_mspr;
+    struct RAMSrcPageRequest *mspr, *next_mspr;
+    RAMState *rs = &ram_state;
     /* This queue generally should be empty - but in the case of a failed
      * migration might have some droppings in.
      */
     rcu_read_lock();
-    QSIMPLEQ_FOREACH_SAFE(mspr, &ms->src_page_requests, next_req, next_mspr) {
+    QSIMPLEQ_FOREACH_SAFE(mspr, &rs->src_page_requests, next_req, next_mspr) {
         memory_region_unref(mspr->rb->mr);
-        QSIMPLEQ_REMOVE_HEAD(&ms->src_page_requests, next_req);
+        QSIMPLEQ_REMOVE_HEAD(&rs->src_page_requests, next_req);
         g_free(mspr);
     }
     rcu_read_unlock();
@@ -1260,16 +1272,16 @@  int ram_save_queue_pages(MigrationState *ms, const char *rbname,
         goto err;
     }
 
-    struct MigrationSrcPageRequest *new_entry =
-        g_malloc0(sizeof(struct MigrationSrcPageRequest));
+    struct RAMSrcPageRequest *new_entry =
+        g_malloc0(sizeof(struct RAMSrcPageRequest));
     new_entry->rb = ramblock;
     new_entry->offset = start;
     new_entry->len = len;
 
     memory_region_ref(ramblock->mr);
-    qemu_mutex_lock(&ms->src_page_req_mutex);
-    QSIMPLEQ_INSERT_TAIL(&ms->src_page_requests, new_entry, next_req);
-    qemu_mutex_unlock(&ms->src_page_req_mutex);
+    qemu_mutex_lock(&rs->src_page_req_mutex);
+    QSIMPLEQ_INSERT_TAIL(&rs->src_page_requests, new_entry, next_req);
+    qemu_mutex_unlock(&rs->src_page_req_mutex);
     rcu_read_unlock();
 
     return 0;
@@ -1408,7 +1420,7 @@  static int ram_find_and_save_block(RAMState *rs, QEMUFile *f, bool last_stage)
 
     do {
         again = true;
-        found = get_queued_page(rs, ms, &pss, &dirty_ram_abs);
+        found = get_queued_page(rs, &pss, &dirty_ram_abs);
 
         if (!found) {
             /* priority queue empty, so just search for something dirty */
@@ -1968,6 +1980,8 @@  static int ram_state_init(RAMState *rs)
 
     memset(rs, 0, sizeof(*rs));
     qemu_mutex_init(&rs->bitmap_mutex);
+    qemu_mutex_init(&rs->src_page_req_mutex);
+    QSIMPLEQ_INIT(&rs->src_page_requests);
 
     if (migrate_use_xbzrle()) {
         XBZRLE_cache_lock();