Patchwork kvm segfaulting

login
register
mail settings
Submitter Paolo Bonzini
Date Feb. 13, 2013, 1:39 p.m.
Message ID <511B977C.6040302@redhat.com>
Download mbox | patch
Permalink /patch/220136/
State New
Headers show

Comments

Paolo Bonzini - Feb. 13, 2013, 1:39 p.m.
Il 13/02/2013 13:55, Stefan Priebe - Profihost AG ha scritto:
> Hi,
> Am 13.02.2013 12:36, schrieb Paolo Bonzini:
>> Il 13/02/2013 10:07, Stefan Priebe - Profihost AG ha scritto:
>>>>>>>
>>>>>>> commit 47a150a4bbb06e45ef439a8222e9f46a7c4cca3f
>>> ...
>>>>> You can certainly try reverting it, but this patch is fixing a real bug.
>>> Will try that. Yes but even if it fixes a bug and raises another one
>>> (kvm segfault) which is the worst one. It should be fixed.
>>
>> The KVM segfault is exposing a potential consistency problem.  What is
>> worse is not obvious.  Also, it is happening at reset time if this is
>> the culprit.  Reset usually happens at places where no data loss is caused.
>>
>> Can you find out what the VM was doing when it segfaulted?  (Or even,
>> can you place the corefile and kvm executable somewhere where I can
>> download it?)
> 
> Yes it was doing an fstrim -v / which resulted in:
> 
> [45648.453698] end_request: I/O error, dev sda, sector 9066952

Ok, very helpful.  One thing is to find why this failed.  This can
come later though.

First of all, please run "cat /sys/block/*/device/scsi_disk/*/provisioning_mode"
in a VM with a similar configuration as the one that crashed last.

Second, I attach another patch.

Third, if possible please compile QEMU with --enable-trace-backend=simple,
and run it with

  -trace events='bdrv_aio_discard
scsi_req_cancel
',file=qemu.$$.trace

This can give some clues.  The files should remain quite small,
so you can enable it on all VMs safely.

> Sadly not as i don't have acore dump. The kvm processes are started
> through variuos Daemons and there seems no way to activate core dumps
> for an already running process and i don't know which VM will crash next.

Probably the next that invokes fstrim. :)

>> If not, do your VMs reset themselves
>> often for example?
> No

Ok, good to know.

>> Can you reproduce it on non-rbd storage?
> I don't have another storage type. ;-(

No problem.

Paolo
Stefan Priebe - Profihost AG - Feb. 13, 2013, 1:46 p.m.
Hi paolo,

thanks for your work. Should i still apply your "old" patch to scsi-disk
or should i remove it?

Stefan
Am 13.02.2013 14:39, schrieb Paolo Bonzini:
> Il 13/02/2013 13:55, Stefan Priebe - Profihost AG ha scritto:
>> Hi,
>> Am 13.02.2013 12:36, schrieb Paolo Bonzini:
>>> Il 13/02/2013 10:07, Stefan Priebe - Profihost AG ha scritto:
>>>>>>>>
>>>>>>>> commit 47a150a4bbb06e45ef439a8222e9f46a7c4cca3f
>>>> ...
>>>>>> You can certainly try reverting it, but this patch is fixing a real bug.
>>>> Will try that. Yes but even if it fixes a bug and raises another one
>>>> (kvm segfault) which is the worst one. It should be fixed.
>>>
>>> The KVM segfault is exposing a potential consistency problem.  What is
>>> worse is not obvious.  Also, it is happening at reset time if this is
>>> the culprit.  Reset usually happens at places where no data loss is caused.
>>>
>>> Can you find out what the VM was doing when it segfaulted?  (Or even,
>>> can you place the corefile and kvm executable somewhere where I can
>>> download it?)
>>
>> Yes it was doing an fstrim -v / which resulted in:
>>
>> [45648.453698] end_request: I/O error, dev sda, sector 9066952
> 
> Ok, very helpful.  One thing is to find why this failed.  This can
> come later though.
> 
> First of all, please run "cat /sys/block/*/device/scsi_disk/*/provisioning_mode"
> in a VM with a similar configuration as the one that crashed last.
> 
> Second, I attach another patch.
> 
> Third, if possible please compile QEMU with --enable-trace-backend=simple,
> and run it with
> 
>   -trace events='bdrv_aio_discard
> scsi_req_cancel
> ',file=qemu.$$.trace
> 
> This can give some clues.  The files should remain quite small,
> so you can enable it on all VMs safely.
> 
>> Sadly not as i don't have acore dump. The kvm processes are started
>> through variuos Daemons and there seems no way to activate core dumps
>> for an already running process and i don't know which VM will crash next.
> 
> Probably the next that invokes fstrim. :)
> 
>>> If not, do your VMs reset themselves
>>> often for example?
>> No
> 
> Ok, good to know.
> 
>>> Can you reproduce it on non-rbd storage?
>> I don't have another storage type. ;-(
> 
> No problem.
> 
> Paolo
>
Stefan Priebe - Profihost AG - Feb. 13, 2013, 2:06 p.m.
Output of cat:
[: ~]# cat /sys/block/*/device/scsi_disk/*/provisioning_mode
writesame_16

Stefan

Am 13.02.2013 14:39, schrieb Paolo Bonzini:
> Il 13/02/2013 13:55, Stefan Priebe - Profihost AG ha scritto:
>> Hi,
>> Am 13.02.2013 12:36, schrieb Paolo Bonzini:
>>> Il 13/02/2013 10:07, Stefan Priebe - Profihost AG ha scritto:
>>>>>>>>
>>>>>>>> commit 47a150a4bbb06e45ef439a8222e9f46a7c4cca3f
>>>> ...
>>>>>> You can certainly try reverting it, but this patch is fixing a real bug.
>>>> Will try that. Yes but even if it fixes a bug and raises another one
>>>> (kvm segfault) which is the worst one. It should be fixed.
>>>
>>> The KVM segfault is exposing a potential consistency problem.  What is
>>> worse is not obvious.  Also, it is happening at reset time if this is
>>> the culprit.  Reset usually happens at places where no data loss is caused.
>>>
>>> Can you find out what the VM was doing when it segfaulted?  (Or even,
>>> can you place the corefile and kvm executable somewhere where I can
>>> download it?)
>>
>> Yes it was doing an fstrim -v / which resulted in:
>>
>> [45648.453698] end_request: I/O error, dev sda, sector 9066952
> 
> Ok, very helpful.  One thing is to find why this failed.  This can
> come later though.
> 
> First of all, please run "cat /sys/block/*/device/scsi_disk/*/provisioning_mode"
> in a VM with a similar configuration as the one that crashed last.
> 
> Second, I attach another patch.
> 
> Third, if possible please compile QEMU with --enable-trace-backend=simple,
> and run it with
> 
>   -trace events='bdrv_aio_discard
> scsi_req_cancel
> ',file=qemu.$$.trace
> 
> This can give some clues.  The files should remain quite small,
> so you can enable it on all VMs safely.
> 
>> Sadly not as i don't have acore dump. The kvm processes are started
>> through variuos Daemons and there seems no way to activate core dumps
>> for an already running process and i don't know which VM will crash next.
> 
> Probably the next that invokes fstrim. :)
> 
>>> If not, do your VMs reset themselves
>>> often for example?
>> No
> 
> Ok, good to know.
> 
>>> Can you reproduce it on non-rbd storage?
>> I don't have another storage type. ;-(
> 
> No problem.
> 
> Paolo
>
Stefan Priebe - Profihost AG - Feb. 13, 2013, 2:30 p.m.
Hi,

I added this:
-trace events=/tmp/events,file=/root/qemu.123.trace

and put the events in the events file as i couldn't handle \n in my app
starting the kvm process. But even when doing an fstrim the trace file
stays at 24 bytes - is this correct?

Stefan
Am 13.02.2013 14:39, schrieb Paolo Bonzini:
> Il 13/02/2013 13:55, Stefan Priebe - Profihost AG ha scritto:
>> Hi,
>> Am 13.02.2013 12:36, schrieb Paolo Bonzini:
>>> Il 13/02/2013 10:07, Stefan Priebe - Profihost AG ha scritto:
>>>>>>>>
>>>>>>>> commit 47a150a4bbb06e45ef439a8222e9f46a7c4cca3f
>>>> ...
>>>>>> You can certainly try reverting it, but this patch is fixing a real bug.
>>>> Will try that. Yes but even if it fixes a bug and raises another one
>>>> (kvm segfault) which is the worst one. It should be fixed.
>>>
>>> The KVM segfault is exposing a potential consistency problem.  What is
>>> worse is not obvious.  Also, it is happening at reset time if this is
>>> the culprit.  Reset usually happens at places where no data loss is caused.
>>>
>>> Can you find out what the VM was doing when it segfaulted?  (Or even,
>>> can you place the corefile and kvm executable somewhere where I can
>>> download it?)
>>
>> Yes it was doing an fstrim -v / which resulted in:
>>
>> [45648.453698] end_request: I/O error, dev sda, sector 9066952
> 
> Ok, very helpful.  One thing is to find why this failed.  This can
> come later though.
> 
> First of all, please run "cat /sys/block/*/device/scsi_disk/*/provisioning_mode"
> in a VM with a similar configuration as the one that crashed last.
> 
> Second, I attach another patch.
> 
> Third, if possible please compile QEMU with --enable-trace-backend=simple,
> and run it with
> 
>   -trace events='bdrv_aio_discard
> scsi_req_cancel
> ',file=qemu.$$.trace
> 
> This can give some clues.  The files should remain quite small,
> so you can enable it on all VMs safely.
> 
>> Sadly not as i don't have acore dump. The kvm processes are started
>> through variuos Daemons and there seems no way to activate core dumps
>> for an already running process and i don't know which VM will crash next.
> 
> Probably the next that invokes fstrim. :)
> 
>>> If not, do your VMs reset themselves
>>> often for example?
>> No
> 
> Ok, good to know.
> 
>>> Can you reproduce it on non-rbd storage?
>> I don't have another storage type. ;-(
> 
> No problem.
> 
> Paolo
>

Patch

diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c
index a97f1cd..01e1dec 100644
--- a/hw/scsi-bus.c
+++ b/hw/scsi-bus.c
@@ -1508,6 +1508,10 @@  void scsi_req_unref(SCSIRequest *req)
    will start the next chunk or complete the command.  */
 void scsi_req_continue(SCSIRequest *req)
 {
+    if (req->io_canceled) {
+        trace_scsi_req_continue_canceled(req->dev->id, req->lun, req->tag);
+        return;
+    }
     trace_scsi_req_continue(req->dev->id, req->lun, req->tag);
     if (req->cmd.mode == SCSI_XFER_TO_DEV) {
         req->ops->write_data(req);
diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
index d411586..4a0673c 100644
--- a/hw/scsi-disk.c
+++ b/hw/scsi-disk.c
@@ -178,6 +178,9 @@  static void scsi_aio_complete(void *opaque, int ret)
     assert(r->req.aiocb != NULL);
     r->req.aiocb = NULL;
     bdrv_acct_done(s->qdev.conf.bs, &r->acct);
+    if (r->req.io_canceled) {
+        goto done;
+    }
 
     if (ret < 0) {
         if (scsi_handle_rw_error(r, -ret)) {
@@ -223,6 +226,10 @@  static void scsi_write_do_fua(SCSIDiskReq *r)
 {
     SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
 
+    if (r->req.io_canceled) {
+        goto done;
+    }
+
     if (scsi_is_cmd_fua(&r->req.cmd)) {
         bdrv_acct_start(s->qdev.conf.bs, &r->acct, 0, BDRV_ACCT_FLUSH);
         r->req.aiocb = bdrv_aio_flush(s->qdev.conf.bs, scsi_aio_complete, r);
@@ -230,6 +237,8 @@  static void scsi_write_do_fua(SCSIDiskReq *r)
     }
 
     scsi_req_complete(&r->req, GOOD);
+
+done:
     if (!r->req.io_canceled) {
         scsi_req_unref(&r->req);
     }
@@ -243,6 +252,9 @@  static void scsi_dma_complete(void *opaque, int ret)
     assert(r->req.aiocb != NULL);
     r->req.aiocb = NULL;
     bdrv_acct_done(s->qdev.conf.bs, &r->acct);
+    if (r->req.io_canceled) {
+        goto done;
+    }
 
     if (ret < 0) {
         if (scsi_handle_rw_error(r, -ret)) {
@@ -274,6 +286,9 @@  static void scsi_read_complete(void * opaque, int ret)
     assert(r->req.aiocb != NULL);
     r->req.aiocb = NULL;
     bdrv_acct_done(s->qdev.conf.bs, &r->acct);
+    if (r->req.io_canceled) {
+        goto done;
+    }
 
     if (ret < 0) {
         if (scsi_handle_rw_error(r, -ret)) {
@@ -305,6 +320,9 @@  static void scsi_do_read(void *opaque, int ret)
         r->req.aiocb = NULL;
         bdrv_acct_done(s->qdev.conf.bs, &r->acct);
     }
+    if (r->req.io_canceled) {
+        goto done;
+    }
 
     if (ret < 0) {
         if (scsi_handle_rw_error(r, -ret)) {
@@ -312,10 +330,6 @@  static void scsi_do_read(void *opaque, int ret)
         }
     }
 
-    if (r->req.io_canceled) {
-        return;
-    }
-
     /* The request is used as the AIO opaque value, so add a ref.  */
     scsi_req_ref(&r->req);
 
@@ -423,6 +437,9 @@  static void scsi_write_complete(void * opaque, int ret)
         r->req.aiocb = NULL;
         bdrv_acct_done(s->qdev.conf.bs, &r->acct);
     }
+    if (r->req.io_canceled) {
+        goto done;
+    }
 
     if (ret < 0) {
         if (scsi_handle_rw_error(r, -ret)) {
@@ -1478,13 +1495,17 @@  static void scsi_unmap_complete(void *opaque, int ret)
     uint32_t nb_sectors;
 
     r->req.aiocb = NULL;
+    if (r->req.io_canceled) {
+        goto done;
+    }
+
     if (ret < 0) {
         if (scsi_handle_rw_error(r, -ret)) {
             goto done;
         }
     }
 
-    if (data->count > 0 && !r->req.io_canceled) {
+    if (data->count > 0) {
         sector_num = ldq_be_p(&data->inbuf[0]);
         nb_sectors = ldl_be_p(&data->inbuf[8]) & 0xffffffffULL;
         if (!check_lba_range(s, sector_num, nb_sectors)) {
@@ -1501,10 +1522,9 @@  static void scsi_unmap_complete(void *opaque, int ret)
         return;
     }
 
+    scsi_req_complete(&r->req, GOOD);
+
 done:
-    if (data->count == 0) {
-        scsi_req_complete(&r->req, GOOD);
-    }
     if (!r->req.io_canceled) {
         scsi_req_unref(&r->req);
     }
diff --git a/roms/seabios b/roms/seabios
index 4bd8aeb..e8a76b0 160000
--- a/roms/seabios
+++ b/roms/seabios
@@ -1 +1 @@ 
-Subproject commit 4bd8aebf3534e10d9aa21e820903f2cf9120708c
+Subproject commit e8a76b0f225bba5ba9d63ab227e0a37b3beb1059
diff --git a/trace-events b/trace-events
index f2c256c..a42497c 100644
--- a/trace-events
+++ b/trace-events
@@ -461,6 +461,7 @@  scsi_req_data(int target, int lun, int tag, int len) "target %d lun %d tag %d le
 scsi_req_data_canceled(int target, int lun, int tag, int len) "target %d lun %d tag %d len %d"
 scsi_req_dequeue(int target, int lun, int tag) "target %d lun %d tag %d"
 scsi_req_continue(int target, int lun, int tag) "target %d lun %d tag %d"
+scsi_req_continue_canceled(int target, int lun, int tag) "target %d lun %d tag %d"
 scsi_req_parsed(int target, int lun, int tag, int cmd, int mode, int xfer) "target %d lun %d tag %d command %d dir %d length %d"
 scsi_req_parsed_lba(int target, int lun, int tag, int cmd, uint64_t lba) "target %d lun %d tag %d command %d lba %"PRIu64
 scsi_req_parse_bad(int target, int lun, int tag, int cmd) "target %d lun %d tag %d command %d"