diff mbox

[v3,2/2] vhost user: Add RARP injection for legacy guest

Message ID CABUUfwNNrvdcr1cp88mnqnHKH9L7f=yj6oRWOYAsAoB_T9w=CQ@mail.gmail.com
State New
Headers show

Commit Message

Thibaut Collet June 25, 2015, 2:22 p.m. UTC
On Thu, Jun 25, 2015 at 2:53 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Thu, Jun 25, 2015 at 01:01:29PM +0200, Thibaut Collet wrote:
>> On Thu, Jun 25, 2015 at 11:59 AM, Jason Wang <jasowang@redhat.com> wrote:
>> >
>> >
>> >
>> > On 06/24/2015 07:05 PM, Michael S. Tsirkin wrote:
>> > > On Wed, Jun 24, 2015 at 04:31:15PM +0800, Jason Wang wrote:
>> > >>
>> > >> On 06/23/2015 01:49 PM, Michael S. Tsirkin wrote:
>> > >>> On Tue, Jun 23, 2015 at 10:12:17AM +0800, Jason Wang wrote:
>> > >>>>>
>> > >>>>> On 06/18/2015 11:16 PM, Thibaut Collet wrote:
>> > >>>>>>> On Tue, Jun 16, 2015 at 10:05 AM, Jason Wang <jasowang@redhat.com> wrote:
>> > >>>>>>>>> On 06/16/2015 03:24 PM, Thibaut Collet wrote:
>> > >>>>>>>>>>> If my understanding is correct, on a resume operation, we have the
>> > >>>>>>>>>>> following callback trace:
>> > >>>>>>>>>>> 1. virtio_pci_restore function that calls all restore call back of
>> > >>>>>>>>>>> virtio devices
>> > >>>>>>>>>>> 2. virtnet_restore that calls try_fill_recv function for each virtual queues
>> > >>>>>>>>>>> 3. try_fill_recv function kicks the virtual queue (through
>> > >>>>>>>>>>> virtqueue_kick function)
>> > >>>>>>>>> Yes, but this happens only after pm resume not migration. Migration is
>> > >>>>>>>>> totally transparent to guest.
>> > >>>>>>>>>
>> > >>>>>>> Hi Jason,
>> > >>>>>>>
>> > >>>>>>> After a deeper look in the migration code of QEMU a resume event is
>> > >>>>>>> always sent when the live migration is finished.
>> > >>>>>>> On a live migration we have the following callback trace:
>> > >>>>>>> 1. The VM on the new host is set to the state RUN_STATE_INMIGRATE, the
>> > >>>>>>> autostart boolean to 1  and calls the qemu_start_incoming_migration
>> > >>>>>>> function (see function main of vl.c)
>> > >>>>>>> .....
>> > >>>>>>> 2. call of process_incoming_migration function in
>> > >>>>>>> migration/migration.c file whatever the way to do the live migration
>> > >>>>>>> (tcp:, fd:, unix:, exec: ...)
>> > >>>>>>> 3. call of process_incoming_migration_co function in migration/migration.c
>> > >>>>>>> 4. call of vm_start function in vl.c (otherwise the migrated VM stay
>> > >>>>>>> in the pause state, the autostart boolean is set to 1 by the main
>> > >>>>>>> function in vl.c)
>> > >>>>>>> 5. call of vm_start function that sets the VM is the RUN_STATE_RUNNING state.
>> > >>>>>>> 6. call of qapi_event_send_resume function that ends a resume event to the VM
>> > >>>>> AFAIK, this function sends resume event to qemu monitor not VM.
>> > >>>>>
>> > >>>>>>> So when a live migration is ended:
>> > >>>>>>> 1. a resume event is sent to the guest
>> > >>>>>>> 2. On the reception of this resume event the virtual queue are kicked
>> > >>>>>>> by the guest
>> > >>>>>>> 3. Backend vhost user catches this kick and can emit a RARP to guest
>> > >>>>>>> that does not support GUEST_ANNOUNCE
>> > >>>>>>>
>> > >>>>>>> This solution, as solution based on detection of DRIVER_OK status
>> > >>>>>>> suggested by Michael, allows backend to send the RARP to legacy guest
>> > >>>>>>> without involving QEMU and add ioctl to vhost-user.
>> > >>>>> A question here is did vhost-user code pass status to the backend? If
>> > >>>>> not, how can userspace backend detect DRIVER_OK?
>> > >>> Sorry, I must have been unclear.
>> > >>> vhost core calls VHOST_NET_SET_BACKEND on DRIVER_OK.
>> > >>> Unfortunately vhost user currently translates it to VHOST_USER_NONE.
>> > >> Looks like VHOST_NET_SET_BACKEND was only used for tap backend.
>> > >>
>> > >>> As a work around, I think kicking ioeventfds once you get
>> > >>> VHOST_NET_SET_BACKEND will work.
>> > >> Maybe just a eventfd_set() in vhost_net_start(). But is this
>> > >> "workaround" elegant enough to be documented? Is it better to do this
>> > >> explicitly with a new feature?
>> > > If you are going to do this anyway, there are a couple of other changes
>> > > we should do, in particular, decide what we want to do with control vq.
>> > >
>> >
>> > If I understand correctly, you mean VIRTIO_NET_CTRL_MQ and
>> > VIRTIO_NET_CTRL_GUEST_OFFLOADS? Looks like both of these were broken.
>> > Need more thought, maybe new kinds of requests.
>> >
>> >
>>
>> Are there any objections to add VHOST_NET_SET_BACKEND support to vhost
>> user with a patch like that:
>>
>>
>>  hw/net/vhost_net.c     |    8 ++++++++
>>  hw/virtio/vhost-user.c |   10 +++++++++-
>>  2 files changed, 17 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
>> index 907e002..7a008c0 100644
>> --- a/hw/net/vhost_net.c
>> +++ b/hw/net/vhost_net.c
>> @@ -234,6 +234,14 @@ static int vhost_net_start_one(struct vhost_net *net,
>>                  goto fail;
>>              }
>>          }
>> +    } else if (net->nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER) {
>> +         file.fd = 0;
>> +         for (file.index = 0; file.index < net->dev.nvqs; ++file.index) {
>> +            const VhostOps *vhost_ops = net->dev.vhost_ops;
>> +            int r = vhost_ops->vhost_call(&net->dev, VHOST_NET_SET_BACKEND,
>> +                                          &file);
>> +            assert(r >= 0);
>> +        }
>>      }
>>      return 0;
>>  fail:
>> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
>> index d6f2163..32c6bd9 100644
>> --- a/hw/virtio/vhost-user.c
>> +++ b/hw/virtio/vhost-user.c
>> @@ -41,6 +41,7 @@ typedef enum VhostUserRequest {
>>      VHOST_USER_SET_VRING_KICK = 12,
>>      VHOST_USER_SET_VRING_CALL = 13,
>>      VHOST_USER_SET_VRING_ERR = 14,
>> +    VHOST_USER_NET_SET_BACKEND = 15,
>>      VHOST_USER_MAX
>>  } VhostUserRequest;
>>
>> @@ -104,7 +105,8 @@ static unsigned long int
>> ioctl_to_vhost_user_request[VHOST_USER_MAX] = {
>>      VHOST_GET_VRING_BASE,   /* VHOST_USER_GET_VRING_BASE */
>>      VHOST_SET_VRING_KICK,   /* VHOST_USER_SET_VRING_KICK */
>>      VHOST_SET_VRING_CALL,   /* VHOST_USER_SET_VRING_CALL */
>> -    VHOST_SET_VRING_ERR     /* VHOST_USER_SET_VRING_ERR */
>> +    VHOST_SET_VRING_ERR,    /* VHOST_USER_SET_VRING_ERR */
>> +    VHOST_NET_SET_BACKEND   /* VHOST_USER_NET_SET_BACKEND */
>>  };
>>
>>  static VhostUserRequest vhost_user_request_translate(unsigned long int request)
>> @@ -287,6 +289,12 @@ static int vhost_user_call(struct vhost_dev *dev,
>> unsigned long int request,
>>              msg.u64 |= VHOST_USER_VRING_NOFD_MASK;
>>          }
>>          break;
>> +
>> +    case VHOST_NET_SET_BACKEND:
>> +        memcpy(&msg.file, arg, sizeof(struct vhost_vring_state));
>> +        msg.size = sizeof(m.state);
>> +        break;
>> +
>>      default:
>>          error_report("vhost-user trying to send unhandled ioctl");
>>          return -1;
>>
>>
>> This message will be sent when guest is ready and can be used by vhost
>> user backend to send RARP to legacy guest.
>>
>> This solution avoids to add new message and has no impact on control vq.
>
>
> I think that you can't add messages to protocol unconditionally.
> For example, snabbswitch simply crashes if it gets an unknown
> message.
>
> Either this needs a new feature bit, or implement
> [PATCH RFC] vhost-user: protocol extensions
> making it safe to add new messages.
>
> --
> MST

I understand.
Last idea before doing a RFC:
Normally guest notifies vhost of new buffer onto a virtqueue by
kicking the eventfd. This eventfd has been provided to vhost by QEMU.
So when DRIVER_OK is received by QEMU, QEMU can kick the eventfd.

A possible patch to do that is:
 hw/net/vhost_net.c |    7 +++++++
 1 file changed, 7 insertions(+)


kicking this eventfd has no impact for QEMU or the guest (they do not
poll it) and simply wake up vhost to allow it to send RARP for legacy
guest.

Regards.

Thibaut.

Comments

Jason Wang June 26, 2015, 4:06 a.m. UTC | #1
On 06/25/2015 10:22 PM, Thibaut Collet wrote:
> On Thu, Jun 25, 2015 at 2:53 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> On Thu, Jun 25, 2015 at 01:01:29PM +0200, Thibaut Collet wrote:
>>> On Thu, Jun 25, 2015 at 11:59 AM, Jason Wang <jasowang@redhat.com> wrote:
>>>>
>>>>
>>>> On 06/24/2015 07:05 PM, Michael S. Tsirkin wrote:
>>>>> On Wed, Jun 24, 2015 at 04:31:15PM +0800, Jason Wang wrote:
>>>>>> On 06/23/2015 01:49 PM, Michael S. Tsirkin wrote:
>>>>>>> On Tue, Jun 23, 2015 at 10:12:17AM +0800, Jason Wang wrote:
>>>>>>>>> On 06/18/2015 11:16 PM, Thibaut Collet wrote:
>>>>>>>>>>> On Tue, Jun 16, 2015 at 10:05 AM, Jason Wang <jasowang@redhat.com> wrote:
>>>>>>>>>>>>> On 06/16/2015 03:24 PM, Thibaut Collet wrote:
>>>>>>>>>>>>>>> If my understanding is correct, on a resume operation, we have the
>>>>>>>>>>>>>>> following callback trace:
>>>>>>>>>>>>>>> 1. virtio_pci_restore function that calls all restore call back of
>>>>>>>>>>>>>>> virtio devices
>>>>>>>>>>>>>>> 2. virtnet_restore that calls try_fill_recv function for each virtual queues
>>>>>>>>>>>>>>> 3. try_fill_recv function kicks the virtual queue (through
>>>>>>>>>>>>>>> virtqueue_kick function)
>>>>>>>>>>>>> Yes, but this happens only after pm resume not migration. Migration is
>>>>>>>>>>>>> totally transparent to guest.
>>>>>>>>>>>>>
>>>>>>>>>>> Hi Jason,
>>>>>>>>>>>
>>>>>>>>>>> After a deeper look in the migration code of QEMU a resume event is
>>>>>>>>>>> always sent when the live migration is finished.
>>>>>>>>>>> On a live migration we have the following callback trace:
>>>>>>>>>>> 1. The VM on the new host is set to the state RUN_STATE_INMIGRATE, the
>>>>>>>>>>> autostart boolean to 1  and calls the qemu_start_incoming_migration
>>>>>>>>>>> function (see function main of vl.c)
>>>>>>>>>>> .....
>>>>>>>>>>> 2. call of process_incoming_migration function in
>>>>>>>>>>> migration/migration.c file whatever the way to do the live migration
>>>>>>>>>>> (tcp:, fd:, unix:, exec: ...)
>>>>>>>>>>> 3. call of process_incoming_migration_co function in migration/migration.c
>>>>>>>>>>> 4. call of vm_start function in vl.c (otherwise the migrated VM stay
>>>>>>>>>>> in the pause state, the autostart boolean is set to 1 by the main
>>>>>>>>>>> function in vl.c)
>>>>>>>>>>> 5. call of vm_start function that sets the VM is the RUN_STATE_RUNNING state.
>>>>>>>>>>> 6. call of qapi_event_send_resume function that ends a resume event to the VM
>>>>>>>>> AFAIK, this function sends resume event to qemu monitor not VM.
>>>>>>>>>
>>>>>>>>>>> So when a live migration is ended:
>>>>>>>>>>> 1. a resume event is sent to the guest
>>>>>>>>>>> 2. On the reception of this resume event the virtual queue are kicked
>>>>>>>>>>> by the guest
>>>>>>>>>>> 3. Backend vhost user catches this kick and can emit a RARP to guest
>>>>>>>>>>> that does not support GUEST_ANNOUNCE
>>>>>>>>>>>
>>>>>>>>>>> This solution, as solution based on detection of DRIVER_OK status
>>>>>>>>>>> suggested by Michael, allows backend to send the RARP to legacy guest
>>>>>>>>>>> without involving QEMU and add ioctl to vhost-user.
>>>>>>>>> A question here is did vhost-user code pass status to the backend? If
>>>>>>>>> not, how can userspace backend detect DRIVER_OK?
>>>>>>> Sorry, I must have been unclear.
>>>>>>> vhost core calls VHOST_NET_SET_BACKEND on DRIVER_OK.
>>>>>>> Unfortunately vhost user currently translates it to VHOST_USER_NONE.
>>>>>> Looks like VHOST_NET_SET_BACKEND was only used for tap backend.
>>>>>>
>>>>>>> As a work around, I think kicking ioeventfds once you get
>>>>>>> VHOST_NET_SET_BACKEND will work.
>>>>>> Maybe just a eventfd_set() in vhost_net_start(). But is this
>>>>>> "workaround" elegant enough to be documented? Is it better to do this
>>>>>> explicitly with a new feature?
>>>>> If you are going to do this anyway, there are a couple of other changes
>>>>> we should do, in particular, decide what we want to do with control vq.
>>>>>
>>>> If I understand correctly, you mean VIRTIO_NET_CTRL_MQ and
>>>> VIRTIO_NET_CTRL_GUEST_OFFLOADS? Looks like both of these were broken.
>>>> Need more thought, maybe new kinds of requests.
>>>>
>>>>
>>> Are there any objections to add VHOST_NET_SET_BACKEND support to vhost
>>> user with a patch like that:
>>>
>>>
>>>  hw/net/vhost_net.c     |    8 ++++++++
>>>  hw/virtio/vhost-user.c |   10 +++++++++-
>>>  2 files changed, 17 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
>>> index 907e002..7a008c0 100644
>>> --- a/hw/net/vhost_net.c
>>> +++ b/hw/net/vhost_net.c
>>> @@ -234,6 +234,14 @@ static int vhost_net_start_one(struct vhost_net *net,
>>>                  goto fail;
>>>              }
>>>          }
>>> +    } else if (net->nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER) {
>>> +         file.fd = 0;
>>> +         for (file.index = 0; file.index < net->dev.nvqs; ++file.index) {
>>> +            const VhostOps *vhost_ops = net->dev.vhost_ops;
>>> +            int r = vhost_ops->vhost_call(&net->dev, VHOST_NET_SET_BACKEND,
>>> +                                          &file);
>>> +            assert(r >= 0);
>>> +        }
>>>      }
>>>      return 0;
>>>  fail:
>>> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
>>> index d6f2163..32c6bd9 100644
>>> --- a/hw/virtio/vhost-user.c
>>> +++ b/hw/virtio/vhost-user.c
>>> @@ -41,6 +41,7 @@ typedef enum VhostUserRequest {
>>>      VHOST_USER_SET_VRING_KICK = 12,
>>>      VHOST_USER_SET_VRING_CALL = 13,
>>>      VHOST_USER_SET_VRING_ERR = 14,
>>> +    VHOST_USER_NET_SET_BACKEND = 15,
>>>      VHOST_USER_MAX
>>>  } VhostUserRequest;
>>>
>>> @@ -104,7 +105,8 @@ static unsigned long int
>>> ioctl_to_vhost_user_request[VHOST_USER_MAX] = {
>>>      VHOST_GET_VRING_BASE,   /* VHOST_USER_GET_VRING_BASE */
>>>      VHOST_SET_VRING_KICK,   /* VHOST_USER_SET_VRING_KICK */
>>>      VHOST_SET_VRING_CALL,   /* VHOST_USER_SET_VRING_CALL */
>>> -    VHOST_SET_VRING_ERR     /* VHOST_USER_SET_VRING_ERR */
>>> +    VHOST_SET_VRING_ERR,    /* VHOST_USER_SET_VRING_ERR */
>>> +    VHOST_NET_SET_BACKEND   /* VHOST_USER_NET_SET_BACKEND */
>>>  };
>>>
>>>  static VhostUserRequest vhost_user_request_translate(unsigned long int request)
>>> @@ -287,6 +289,12 @@ static int vhost_user_call(struct vhost_dev *dev,
>>> unsigned long int request,
>>>              msg.u64 |= VHOST_USER_VRING_NOFD_MASK;
>>>          }
>>>          break;
>>> +
>>> +    case VHOST_NET_SET_BACKEND:
>>> +        memcpy(&msg.file, arg, sizeof(struct vhost_vring_state));
>>> +        msg.size = sizeof(m.state);
>>> +        break;
>>> +
>>>      default:
>>>          error_report("vhost-user trying to send unhandled ioctl");
>>>          return -1;
>>>
>>>
>>> This message will be sent when guest is ready and can be used by vhost
>>> user backend to send RARP to legacy guest.
>>>
>>> This solution avoids to add new message and has no impact on control vq.
>>
>> I think that you can't add messages to protocol unconditionally.
>> For example, snabbswitch simply crashes if it gets an unknown
>> message.
>>
>> Either this needs a new feature bit, or implement
>> [PATCH RFC] vhost-user: protocol extensions
>> making it safe to add new messages.
>>
>> --
>> MST
> I understand.
> Last idea before doing a RFC:
> Normally guest notifies vhost of new buffer onto a virtqueue by
> kicking the eventfd. This eventfd has been provided to vhost by QEMU.
> So when DRIVER_OK is received by QEMU, QEMU can kick the eventfd.
>
> A possible patch to do that is:
>  hw/net/vhost_net.c |    7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index 907e002..fbc55e0 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -234,6 +234,13 @@ static int vhost_net_start_one(struct vhost_net *net,
>                  goto fail;
>              }
>          }
> +    } else if (net->nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER) {
> +         int idx;
> +
> +         for (idx = 0; idx < net->dev.nvqs; ++idx) {
> +            struct VirtQueue *vq = virtio_get_queue(dev, idx);
> +            event_notifier_set(virtio_queue_get_host_notifier(vq));
> +        }
>      }
>      return 0;
>  fail:
>
> kicking this eventfd has no impact for QEMU or the guest (they do not
> poll it) and simply wake up vhost to allow it to send RARP for legacy
> guest.
>
> Regards.
>
> Thibaut.

This may work but the issue is:

- I believe we should document this in the spec. But it looks more like
a workaround and use implicit method to notify control message which
often cause ulgy codes in both side. This is not elegant to be
documented in the spec.
- Consider you may have 100 queues, then kick will happen 100 times and
backend need handle such case.
diff mbox

Patch

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 907e002..fbc55e0 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -234,6 +234,13 @@  static int vhost_net_start_one(struct vhost_net *net,
                 goto fail;
             }
         }
+    } else if (net->nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER) {
+         int idx;
+
+         for (idx = 0; idx < net->dev.nvqs; ++idx) {
+            struct VirtQueue *vq = virtio_get_queue(dev, idx);
+            event_notifier_set(virtio_queue_get_host_notifier(vq));
+        }
     }
     return 0;
 fail: