mbox series

[for-5.0,v5,0/3] Fix some AIO context locking in jobs

Message ID 20200407115651.69472-1-s.reiter@proxmox.com
Headers show
Series Fix some AIO context locking in jobs | expand

Message

Stefan Reiter April 7, 2020, 11:56 a.m. UTC
Contains three seperate but related patches cleaning up and fixing some
issues regarding aio_context_acquire/aio_context_release for jobs. Mostly
affects blockjobs running for devices that have IO threads enabled AFAICT.


Changes from v4:
* Do job_ref/job_unref in job_txn_apply and job_exit since we need the job to
  survive the callback to access the potentially changed lock afterwards
* Reduce patch 2/3 to an assert, the context should already be acquired since
  it's a bdrv handler
* Collect R-by for 3/3

I've marked it 'for-5.0' this time, I think it would make sense to be
picked up together with Kevin's "block: Fix blk->in_flight during
blk_wait_while_drained()" series. With that series and these three patches
applied I can no longer reproduce any of the reported related crashes/hangs.


Changes from v3:
* commit_job appears to be unset in certain cases when replication_close is
  called, only access when necessary to avoid SIGSEGV

Missed this when shuffling around patches, sorry for noise with still-broken v3.

Changes from v2:
* reordered patch 1 to the end to not introduce temporary breakages
* added more fixes to job txn patch (should now pass the tests)

Changes from v1:
* fixed commit message for patch 1
* added patches 2 and 3


qemu: Stefan Reiter (3):
  job: take each job's lock individually in job_txn_apply
  replication: assert we own context before job_cancel_sync
  backup: don't acquire aio_context in backup_clean

 block/backup.c        |  4 ----
 block/replication.c   |  5 ++++-
 blockdev.c            |  9 ++++++++
 job-qmp.c             |  9 ++++++++
 job.c                 | 50 ++++++++++++++++++++++++++++++++++---------
 tests/test-blockjob.c |  2 ++
 6 files changed, 64 insertions(+), 15 deletions(-)

Comments

Kevin Wolf April 7, 2020, 2:22 p.m. UTC | #1
Am 07.04.2020 um 13:56 hat Stefan Reiter geschrieben:
> Contains three seperate but related patches cleaning up and fixing some
> issues regarding aio_context_acquire/aio_context_release for jobs. Mostly
> affects blockjobs running for devices that have IO threads enabled AFAICT.
> 
> 
> Changes from v4:
> * Do job_ref/job_unref in job_txn_apply and job_exit since we need the job to
>   survive the callback to access the potentially changed lock afterwards
> * Reduce patch 2/3 to an assert, the context should already be acquired since
>   it's a bdrv handler
> * Collect R-by for 3/3
> 
> I've marked it 'for-5.0' this time, I think it would make sense to be
> picked up together with Kevin's "block: Fix blk->in_flight during
> blk_wait_while_drained()" series. With that series and these three patches
> applied I can no longer reproduce any of the reported related crashes/hangs.

Thanks, applied to the block branch.

Kevin