diff mbox

[1/1] virtio: fail device if set_event_notifier fails

Message ID 20170302185942.76255-1-pasic@linux.vnet.ibm.com
State New
Headers show

Commit Message

Halil Pasic March 2, 2017, 6:59 p.m. UTC
The function virtio_notify_irqfd used to ignore the return code of
event_notifier_set. Let's fail the device should this occur.

Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>

---

This patch is most likely flawed because virtio_notify_irqfd
is probably supposed to be thread-safe and neither strerror
nor virtio_error are that. Anyway lets get the discussion started with this.

There was a suggestion by Michael to make event_notifier_set void
and assert inside. After some thinking, I settled at the opinion,
that neither doing nothing nor crashing the guest is a good idea
should this failure happen in production. So I came up with this.

Looking forward to your feedback.
---
 hw/virtio/virtio.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Comments

Cornelia Huck March 3, 2017, 12:21 p.m. UTC | #1
On Thu,  2 Mar 2017 19:59:42 +0100
Halil Pasic <pasic@linux.vnet.ibm.com> wrote:

> The function virtio_notify_irqfd used to ignore the return code of
> event_notifier_set. Let's fail the device should this occur.

I'm wondering if there are reasons for event_notifier_set() to fail
beyond "we've hit an internal race and should make an effort to fix
that one, or else we have completely messed up in qemu". Marking the
device broken tells the guest that there's something wrong with the
device, but I think we want qemu bug reports when there's something
broken with the irqfd.

> 
> Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
> 
> ---
> 
> This patch is most likely flawed because virtio_notify_irqfd
> is probably supposed to be thread-safe and neither strerror
> nor virtio_error are that. Anyway lets get the discussion started with this.
> 
> There was a suggestion by Michael to make event_notifier_set void
> and assert inside. After some thinking, I settled at the opinion,
> that neither doing nothing nor crashing the guest is a good idea
> should this failure happen in production. So I came up with this.
> 
> Looking forward to your feedback.
> ---
>  hw/virtio/virtio.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 23483c7..e05f3e5 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -1555,6 +1555,8 @@ static bool virtio_should_notify(VirtIODevice *vdev, VirtQueue *vq)
>  void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq)
>  {
>      bool should_notify;
> +    int rc;
> +
>      rcu_read_lock();
>      should_notify = virtio_should_notify(vdev, vq);
>      rcu_read_unlock();
> @@ -1581,7 +1583,11 @@ void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq)
>       * to an atomic operation.
>       */
>      virtio_set_isr(vq->vdev, 0x1);
> -    event_notifier_set(&vq->guest_notifier);
> +    rc = event_notifier_set(&vq->guest_notifier);
> +    if (unlikely(rc)) {
> +        virtio_error(vdev, "guest notifier broken for vdev at %p (%s)", vdev,
> +                     strerror(-rc));
> +    }
>  }
> 
>  static void virtio_irq(VirtQueue *vq)
Halil Pasic March 3, 2017, 12:43 p.m. UTC | #2
On 03/03/2017 01:21 PM, Cornelia Huck wrote:
> On Thu,  2 Mar 2017 19:59:42 +0100
> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> 
>> The function virtio_notify_irqfd used to ignore the return code of
>> event_notifier_set. Let's fail the device should this occur.
> 
> I'm wondering if there are reasons for event_notifier_set() to fail
> beyond "we've hit an internal race and should make an effort to fix
> that one, or else we have completely messed up in qemu". Marking the
> device broken tells the guest that there's something wrong with the
> device, but I think we want qemu bug reports when there's something
> broken with the irqfd.
>

That's why the error is logged. I understand virtio_error like something
suitable for indicating bugs.

What do you suggest? Forcing a dump? I would rather leave it to the
user to figure out how important is the state sitting in the machine
and the device, and how much effort does (s)he want to put into recovering
from the failure. 

Halil
Cornelia Huck March 3, 2017, 12:50 p.m. UTC | #3
On Fri, 3 Mar 2017 13:43:32 +0100
Halil Pasic <pasic@linux.vnet.ibm.com> wrote:

> On 03/03/2017 01:21 PM, Cornelia Huck wrote:
> > On Thu,  2 Mar 2017 19:59:42 +0100
> > Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> > 
> >> The function virtio_notify_irqfd used to ignore the return code of
> >> event_notifier_set. Let's fail the device should this occur.
> > 
> > I'm wondering if there are reasons for event_notifier_set() to fail
> > beyond "we've hit an internal race and should make an effort to fix
> > that one, or else we have completely messed up in qemu". Marking the
> > device broken tells the guest that there's something wrong with the
> > device, but I think we want qemu bug reports when there's something
> > broken with the irqfd.
> >
> 
> That's why the error is logged. I understand virtio_error like something
> suitable for indicating bugs.
> 
> What do you suggest? Forcing a dump? I would rather leave it to the
> user to figure out how important is the state sitting in the machine
> and the device, and how much effort does (s)he want to put into recovering
> from the failure. 

How likely are those logged messages being brought to attention of the
admin? Does any management software flag machines with such error
messages? (that's more of a general question)

I'd like to have some kind of trigger that rings an alarm bell so that
the admin might consider reporting this, but I don't have a good idea
on how to do that either...
Halil Pasic March 3, 2017, 1:08 p.m. UTC | #4
On 03/03/2017 01:50 PM, Cornelia Huck wrote:
> On Fri, 3 Mar 2017 13:43:32 +0100
> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> 
>> On 03/03/2017 01:21 PM, Cornelia Huck wrote:
>>> On Thu,  2 Mar 2017 19:59:42 +0100
>>> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
>>>
>>>> The function virtio_notify_irqfd used to ignore the return code of
>>>> event_notifier_set. Let's fail the device should this occur.
>>>
>>> I'm wondering if there are reasons for event_notifier_set() to fail
>>> beyond "we've hit an internal race and should make an effort to fix
>>> that one, or else we have completely messed up in qemu". Marking the
>>> device broken tells the guest that there's something wrong with the
>>> device, but I think we want qemu bug reports when there's something
>>> broken with the irqfd.
>>>
>>
>> That's why the error is logged. I understand virtio_error like something
>> suitable for indicating bugs.
>>
>> What do you suggest? Forcing a dump? I would rather leave it to the
>> user to figure out how important is the state sitting in the machine
>> and the device, and how much effort does (s)he want to put into recovering
>> from the failure. 
> 
> How likely are those logged messages being brought to attention of the
> admin? Does any management software flag machines with such error
> messages? (that's more of a general question)
> 

I admit, I did not investigate this thoroughly, also because the patch
is flawed regarding multi-thread anyway. After a quick investigation
it seems the linux guest won't auto-reset the device so the guest should
end up with a not working device. I think it's pretty likely that the
admin will check the logs if the device was important.

I agree fully that it's a very general question, and I do not feel
competent for answering it.

> I'd like to have some kind of trigger that rings an alarm bell so that
> the admin might consider reporting this, but I don't have a good idea
> on how to do that either...
> 

There are tools for aggregating and processing logs, and triggering
alarm bells too (for example ELK= logstash + Kibana + Elasticsearch).
AFAIK logs are the most common way to deal with such stuff. But I'm far
form being an expert. Of course logs are only as good as the messages
landing in them...

Halil
Cornelia Huck March 6, 2017, 2:56 p.m. UTC | #5
On Fri, 3 Mar 2017 14:08:37 +0100
Halil Pasic <pasic@linux.vnet.ibm.com> wrote:

> On 03/03/2017 01:50 PM, Cornelia Huck wrote:
> > On Fri, 3 Mar 2017 13:43:32 +0100
> > Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> > 
> >> On 03/03/2017 01:21 PM, Cornelia Huck wrote:
> >>> On Thu,  2 Mar 2017 19:59:42 +0100
> >>> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> >>>
> >>>> The function virtio_notify_irqfd used to ignore the return code of
> >>>> event_notifier_set. Let's fail the device should this occur.
> >>>
> >>> I'm wondering if there are reasons for event_notifier_set() to fail
> >>> beyond "we've hit an internal race and should make an effort to fix
> >>> that one, or else we have completely messed up in qemu". Marking the
> >>> device broken tells the guest that there's something wrong with the
> >>> device, but I think we want qemu bug reports when there's something
> >>> broken with the irqfd.
> >>>
> >>
> >> That's why the error is logged. I understand virtio_error like something
> >> suitable for indicating bugs.
> >>
> >> What do you suggest? Forcing a dump? I would rather leave it to the
> >> user to figure out how important is the state sitting in the machine
> >> and the device, and how much effort does (s)he want to put into recovering
> >> from the failure. 
> > 
> > How likely are those logged messages being brought to attention of the
> > admin? Does any management software flag machines with such error
> > messages? (that's more of a general question)
> > 
> 
> I admit, I did not investigate this thoroughly, also because the patch
> is flawed regarding multi-thread anyway. After a quick investigation
> it seems the linux guest won't auto-reset the device so the guest should
> end up with a not working device. I think it's pretty likely that the
> admin will check the logs if the device was important.

Thinking a bit more about this, it seems setting the device broken is
not the right solution for exactly that reason. Setting the virtio
device broken is a way to signal the guest to 'you did something
broken; please reset the device and start anew' (and that's how current
callers use it). In our case, this is not the guest's fault.

Maybe go back to the assert 'solution'? But I'm not sure that's enough
if production builds disable asserts...

> 
> I agree fully that it's a very general question, and I do not feel
> competent for answering it.
> 
> > I'd like to have some kind of trigger that rings an alarm bell so that
> > the admin might consider reporting this, but I don't have a good idea
> > on how to do that either...
> > 
> 
> There are tools for aggregating and processing logs, and triggering
> alarm bells too (for example ELK= logstash + Kibana + Elasticsearch).
> AFAIK logs are the most common way to deal with such stuff. But I'm far
> form being an expert. Of course logs are only as good as the messages
> landing in them...

Let's hope this works properly, then.
Halil Pasic March 6, 2017, 3:21 p.m. UTC | #6
On 03/06/2017 03:56 PM, Cornelia Huck wrote:
> On Fri, 3 Mar 2017 14:08:37 +0100
> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> 
>> On 03/03/2017 01:50 PM, Cornelia Huck wrote:
>>> On Fri, 3 Mar 2017 13:43:32 +0100
>>> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
>>>
>>>> On 03/03/2017 01:21 PM, Cornelia Huck wrote:
>>>>> On Thu,  2 Mar 2017 19:59:42 +0100
>>>>> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
[...]
>> I admit, I did not investigate this thoroughly, also because the patch
>> is flawed regarding multi-thread anyway. After a quick investigation
>> it seems the linux guest won't auto-reset the device so the guest should
>> end up with a not working device. I think it's pretty likely that the
>> admin will check the logs if the device was important.
> 
> Thinking a bit more about this, it seems setting the device broken is
> not the right solution for exactly that reason. Setting the virtio
> device broken is a way to signal the guest to 'you did something
> broken; please reset the device and start anew' (and that's how current
> callers use it). In our case, this is not the guest's fault.

Do we have something to just say stuff broken without blaming the guest?
And device reset might not be that stupid at all in the given situation,
if we want to save what can be saved from the perspective of the guest.
(After reset stuff should work again until we hit the race again -- and
since turning ioeventfd on/off should not happen that often during normal
operation it could help limit damage suffered -- e.g. controlled shutdown).

> 
> Maybe go back to the assert 'solution'? But I'm not sure that's enough
> if production builds disable asserts...
> 

I will wait a bit, maybe other virtio folks are going to have an 
opinion too.

My concern about the assert solution is that for production it is
either too rigorous (kill off, hopefully with a dump) or not
enough (as you have mentioned, if NDEBUG assert does nothing).


I think there are setups where a loss of device does not have to be
fatal, and I would not like to be the one who makes it fatal (for the
guest).

Regards,
Halil
Cornelia Huck March 6, 2017, 4:04 p.m. UTC | #7
On Mon, 6 Mar 2017 16:21:13 +0100
Halil Pasic <pasic@linux.vnet.ibm.com> wrote:

> On 03/06/2017 03:56 PM, Cornelia Huck wrote:
> > On Fri, 3 Mar 2017 14:08:37 +0100
> > Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> > 
> >> On 03/03/2017 01:50 PM, Cornelia Huck wrote:
> >>> On Fri, 3 Mar 2017 13:43:32 +0100
> >>> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> >>>
> >>>> On 03/03/2017 01:21 PM, Cornelia Huck wrote:
> >>>>> On Thu,  2 Mar 2017 19:59:42 +0100
> >>>>> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> [...]
> >> I admit, I did not investigate this thoroughly, also because the patch
> >> is flawed regarding multi-thread anyway. After a quick investigation
> >> it seems the linux guest won't auto-reset the device so the guest should
> >> end up with a not working device. I think it's pretty likely that the
> >> admin will check the logs if the device was important.
> > 
> > Thinking a bit more about this, it seems setting the device broken is
> > not the right solution for exactly that reason. Setting the virtio
> > device broken is a way to signal the guest to 'you did something
> > broken; please reset the device and start anew' (and that's how current
> > callers use it). In our case, this is not the guest's fault.
> 
> Do we have something to just say stuff broken without blaming the guest?
> And device reset might not be that stupid at all in the given situation,
> if we want to save what can be saved from the perspective of the guest.
> (After reset stuff should work again until we hit the race again -- and
> since turning ioeventfd on/off should not happen that often during normal
> operation it could help limit damage suffered -- e.g. controlled shutdown).

Checking again, the spec says

DEVICE_NEEDS_RESET (64) Indicates that the device has experienced an
error from which it can’t recover.

Nothing about 'guest error'.

The only problem is that legacy devices don't have that state, which
means they'll have a broken device through no fault of their own.

> 
> > 
> > Maybe go back to the assert 'solution'? But I'm not sure that's enough
> > if production builds disable asserts...
> > 
> 
> I will wait a bit, maybe other virtio folks are going to have an 
> opinion too.
> 
> My concern about the assert solution is that for production it is
> either too rigorous (kill off, hopefully with a dump) or not
> enough (as you have mentioned, if NDEBUG assert does nothing).
> 
> 
> I think there are setups where a loss of device does not have to be
> fatal, and I would not like to be the one who makes it fatal (for the
> guest).

Basically, it's a host bug (and not a bug specific to a certain
device). Moving the device which was impacted to a broken state may be
a useful mitigation.

But yes, let's hear some other opinions.
Michael S. Tsirkin March 23, 2017, 5:09 p.m. UTC | #8
On Mon, Mar 06, 2017 at 05:04:41PM +0100, Cornelia Huck wrote:
> On Mon, 6 Mar 2017 16:21:13 +0100
> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> 
> > On 03/06/2017 03:56 PM, Cornelia Huck wrote:
> > > On Fri, 3 Mar 2017 14:08:37 +0100
> > > Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> > > 
> > >> On 03/03/2017 01:50 PM, Cornelia Huck wrote:
> > >>> On Fri, 3 Mar 2017 13:43:32 +0100
> > >>> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> > >>>
> > >>>> On 03/03/2017 01:21 PM, Cornelia Huck wrote:
> > >>>>> On Thu,  2 Mar 2017 19:59:42 +0100
> > >>>>> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> > [...]
> > >> I admit, I did not investigate this thoroughly, also because the patch
> > >> is flawed regarding multi-thread anyway. After a quick investigation
> > >> it seems the linux guest won't auto-reset the device so the guest should
> > >> end up with a not working device. I think it's pretty likely that the
> > >> admin will check the logs if the device was important.
> > > 
> > > Thinking a bit more about this, it seems setting the device broken is
> > > not the right solution for exactly that reason. Setting the virtio
> > > device broken is a way to signal the guest to 'you did something
> > > broken; please reset the device and start anew' (and that's how current
> > > callers use it). In our case, this is not the guest's fault.
> > 
> > Do we have something to just say stuff broken without blaming the guest?
> > And device reset might not be that stupid at all in the given situation,
> > if we want to save what can be saved from the perspective of the guest.
> > (After reset stuff should work again until we hit the race again -- and
> > since turning ioeventfd on/off should not happen that often during normal
> > operation it could help limit damage suffered -- e.g. controlled shutdown).
> 
> Checking again, the spec says
> 
> DEVICE_NEEDS_RESET (64) Indicates that the device has experienced an
> error from which it can’t recover.
> 
> Nothing about 'guest error'.
> 
> The only problem is that legacy devices don't have that state, which
> means they'll have a broken device through no fault of their own.
> 
> > 
> > > 
> > > Maybe go back to the assert 'solution'? But I'm not sure that's enough
> > > if production builds disable asserts...
> > > 
> > 
> > I will wait a bit, maybe other virtio folks are going to have an 
> > opinion too.
> > 
> > My concern about the assert solution is that for production it is
> > either too rigorous (kill off, hopefully with a dump) or not
> > enough (as you have mentioned, if NDEBUG assert does nothing).
> > 
> > 
> > I think there are setups where a loss of device does not have to be
> > fatal, and I would not like to be the one who makes it fatal (for the
> > guest).
> 
> Basically, it's a host bug (and not a bug specific to a certain
> device). Moving the device which was impacted to a broken state may be
> a useful mitigation.
> 
> But yes, let's hear some other opinions.

We don't support NDEBUG really so I think an assert is fine for now.
Handling unexpected errors more gracefully is laudable but I think we
want a more systematic approach than just open-coding it in
this specific place.
Halil Pasic March 29, 2017, 11:12 a.m. UTC | #9
On 03/23/2017 06:09 PM, Michael S. Tsirkin wrote:
> On Mon, Mar 06, 2017 at 05:04:41PM +0100, Cornelia Huck wrote:
>> On Mon, 6 Mar 2017 16:21:13 +0100
>> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
>>
>>> On 03/06/2017 03:56 PM, Cornelia Huck wrote:
>>>> On Fri, 3 Mar 2017 14:08:37 +0100
>>>> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
>>>>
>>>>> On 03/03/2017 01:50 PM, Cornelia Huck wrote:
>>>>>> On Fri, 3 Mar 2017 13:43:32 +0100
>>>>>> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
>>>>>>
>>>>>>> On 03/03/2017 01:21 PM, Cornelia Huck wrote:
>>>>>>>> On Thu,  2 Mar 2017 19:59:42 +0100
>>>>>>>> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
>>> [...]
>>>>> I admit, I did not investigate this thoroughly, also because the patch
>>>>> is flawed regarding multi-thread anyway. After a quick investigation
>>>>> it seems the linux guest won't auto-reset the device so the guest should
>>>>> end up with a not working device. I think it's pretty likely that the
>>>>> admin will check the logs if the device was important.
>>>>
>>>> Thinking a bit more about this, it seems setting the device broken is
>>>> not the right solution for exactly that reason. Setting the virtio
>>>> device broken is a way to signal the guest to 'you did something
>>>> broken; please reset the device and start anew' (and that's how current
>>>> callers use it). In our case, this is not the guest's fault.
>>>
>>> Do we have something to just say stuff broken without blaming the guest?
>>> And device reset might not be that stupid at all in the given situation,
>>> if we want to save what can be saved from the perspective of the guest.
>>> (After reset stuff should work again until we hit the race again -- and
>>> since turning ioeventfd on/off should not happen that often during normal
>>> operation it could help limit damage suffered -- e.g. controlled shutdown).
>>
>> Checking again, the spec says
>>
>> DEVICE_NEEDS_RESET (64) Indicates that the device has experienced an
>> error from which it can’t recover.
>>
>> Nothing about 'guest error'.
>>
>> The only problem is that legacy devices don't have that state, which
>> means they'll have a broken device through no fault of their own.
>>
>>>
>>>>
>>>> Maybe go back to the assert 'solution'? But I'm not sure that's enough
>>>> if production builds disable asserts...
>>>>
>>>
>>> I will wait a bit, maybe other virtio folks are going to have an 
>>> opinion too.
>>>
>>> My concern about the assert solution is that for production it is
>>> either too rigorous (kill off, hopefully with a dump) or not
>>> enough (as you have mentioned, if NDEBUG assert does nothing).
>>>
>>>
>>> I think there are setups where a loss of device does not have to be
>>> fatal, and I would not like to be the one who makes it fatal (for the
>>> guest).
>>
>> Basically, it's a host bug (and not a bug specific to a certain
>> device). Moving the device which was impacted to a broken state may be
>> a useful mitigation.
>>
>> But yes, let's hear some other opinions.
> 
> We don't support NDEBUG really so I think an assert is fine for now.

Thanks for your reply! You are right, virtio.c makes sure we do not have
NDEBUG defined, but at the very same  place a todo comment suggests that
not supporting NDEBUG is a temporary measure (and a so called technical
debt).

If my understanding is correct, I should use assert instead of
virtio_error for the next version. I that right?

> Handling unexpected errors more gracefully is laudable but I think we
> want a more systematic approach than just open-coding it in
> this specific place.
> 

Sorry, I mistook virtio_error for the systematic approach for handling
(expected and unexpected) errors in virtio. Should I create a patch which
documents what is considered an (in)appropriate usage of virtio_error?

AFAIU (based on Connie's comments) virtio_error should not be used for
reporting QEMU errors (or should only be used to report guest errors).

By the way I have read the exchange on '[PATCH 1/5] virtio: Error object
based virtio_error()'. The things you suggest there do make a lot of
sense to me. Unfortunately I can't tell if I'm going to have the time to
come up with an appropriate patch(set), so I won't volunteer for now.

Regards,

Halil
diff mbox

Patch

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 23483c7..e05f3e5 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -1555,6 +1555,8 @@  static bool virtio_should_notify(VirtIODevice *vdev, VirtQueue *vq)
 void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq)
 {
     bool should_notify;
+    int rc;
+
     rcu_read_lock();
     should_notify = virtio_should_notify(vdev, vq);
     rcu_read_unlock();
@@ -1581,7 +1583,11 @@  void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq)
      * to an atomic operation.
      */
     virtio_set_isr(vq->vdev, 0x1);
-    event_notifier_set(&vq->guest_notifier);
+    rc = event_notifier_set(&vq->guest_notifier);
+    if (unlikely(rc)) {
+        virtio_error(vdev, "guest notifier broken for vdev at %p (%s)", vdev,
+                     strerror(-rc));
+    }
 }
 
 static void virtio_irq(VirtQueue *vq)