diff mbox

vmxnet3, vnet_hdr, and minimum length padding

Message ID 55898005.2040600@moose.net
State New
Headers show

Commit Message

Brian Kress June 23, 2015, 3:49 p.m. UTC
When running ESXi under qemu there is an issue with the ESXi guest 
discarding packets that are too short.  The guest discards any packets 
under the normal minimum length for an ethernet packet (60).  This 
results in odd behaviour where other hosts or VMs on other hosts can 
communicate with the ESXi guest just fine (since there's a physical NIC 
somewhere doing padding), but VMs on the host and the host itself cannot 
because the ARP request packets are too small for the ESXi host to accept.
     Someone in the past thought this was worth fixing, and added code 
to the vmxnet3 qemu emulation such that if it is receiving packets 
smaller than 60 bytes to pad the packet out to 60. Unfortunately this 
code is wrong (or at least in the wrong place). It does so BEFORE before 
taking into account the vnet_hdr at the front of the packet added by the 
tap device.   As a result, it might add padding, but it never adds 
enough.  Specifically it adds 10 less (the length of the vnet_hdr) than 
it needs to.
     The following (hopefully "obviously correct") patch simply swaps 
the order of processing the vnet header and the padding.  With this 
patch an ESXi guest is able to communicate with the host or other local VMs.

Comments

Stefan Hajnoczi June 25, 2015, 1:27 p.m. UTC | #1
On Tue, Jun 23, 2015 at 11:49:25AM -0400, Brian Kress wrote:

Thanks for sending a patch!

I have CCed the vmxnet3 maintainer and Jason Wang, who looks at net
subsystem patches:

  $ scripts/get_maintainer.pl -f hw/net/vmxnet3.c
  Dmitry Fleytman <dmitry@daynix.com> (maintainer:Vmware)

>     When running ESXi under qemu there is an issue with the ESXi guest
> discarding packets that are too short.  The guest discards any packets under
> the normal minimum length for an ethernet packet (60).  This results in odd
> behaviour where other hosts or VMs on other hosts can communicate with the
> ESXi guest just fine (since there's a physical NIC somewhere doing padding),
> but VMs on the host and the host itself cannot because the ARP request
> packets are too small for the ESXi host to accept.
>     Someone in the past thought this was worth fixing, and added code to the
> vmxnet3 qemu emulation such that if it is receiving packets smaller than 60
> bytes to pad the packet out to 60. Unfortunately this code is wrong (or at
> least in the wrong place). It does so BEFORE before taking into account the
> vnet_hdr at the front of the packet added by the tap device.   As a result,
> it might add padding, but it never adds enough.  Specifically it adds 10
> less (the length of the vnet_hdr) than it needs to.
>     The following (hopefully "obviously correct") patch simply swaps the
> order of processing the vnet header and the padding.  With this patch an
> ESXi guest is able to communicate with the host or other local VMs.
> 
> 

Please add your Signed-off-by.  Details about Signed-off-by are on the
http://qemu-project.org/Contribute/SubmitAPatch page.

> --- a/qemu-2.3.0/hw/net/vmxnet3.c       2015-04-27 10:08:24.000000000 -0400
> +++ b/qemu-2.3.0/hw/net/vmxnet3.c       2015-06-23 11:38:48.865728713 -0400
> @@ -1879,6 +1879,12 @@
>          return -1;
>      }
> 
> +    if (s->peer_has_vhdr) {
> +        vmxnet_rx_pkt_set_vhdr(s->rx_pkt, (struct virtio_net_hdr *)buf);
> +        buf += sizeof(struct virtio_net_hdr);
> +        size -= sizeof(struct virtio_net_hdr);
> +    }
> +
>      /* Pad to minimum Ethernet frame length */
>      if (size < sizeof(min_buf)) {
>          memcpy(min_buf, buf, size);
> @@ -1887,12 +1893,6 @@
>          size = sizeof(min_buf);
>      }
> 
> -    if (s->peer_has_vhdr) {
> -        vmxnet_rx_pkt_set_vhdr(s->rx_pkt, (struct virtio_net_hdr *)buf);
> -        buf += sizeof(struct virtio_net_hdr);
> -        size -= sizeof(struct virtio_net_hdr);
> -    }
> -
>      vmxnet_rx_pkt_set_packet_type(s->rx_pkt,
>          get_eth_packet_type(PKT_GET_ETH_HDR(buf)));
> 
> 
> 
> 
>
Dmitry Fleytman June 28, 2015, 2:56 p.m. UTC | #2
> On Jun 23, 2015, at 18:49 PM, Brian Kress <kressb@moose.net> wrote:
> 
>    When running ESXi under qemu there is an issue with the ESXi guest discarding packets that are too short.  The guest discards any packets under the normal minimum length for an ethernet packet (60).  This results in odd behaviour where other hosts or VMs on other hosts can communicate with the ESXi guest just fine (since there's a physical NIC somewhere doing padding), but VMs on the host and the host itself cannot because the ARP request packets are too small for the ESXi host to accept.
>    Someone in the past thought this was worth fixing, and added code to the vmxnet3 qemu emulation such that if it is receiving packets smaller than 60 bytes to pad the packet out to 60. Unfortunately this code is wrong (or at least in the wrong place). It does so BEFORE before taking into account the vnet_hdr at the front of the packet added by the tap device.   As a result, it might add padding, but it never adds enough.  Specifically it adds 10 less (the length of the vnet_hdr) than it needs to.
>    The following (hopefully "obviously correct") patch simply swaps the order of processing the vnet header and the padding.  With this patch an ESXi guest is able to communicate with the host or other local VMs.
> 
> 
> --- a/qemu-2.3.0/hw/net/vmxnet3.c       2015-04-27 10:08:24.000000000 -0400
> +++ b/qemu-2.3.0/hw/net/vmxnet3.c       2015-06-23 11:38:48.865728713 -0400
> @@ -1879,6 +1879,12 @@
>         return -1;
>     }
> 
> +    if (s->peer_has_vhdr) {
> +        vmxnet_rx_pkt_set_vhdr(s->rx_pkt, (struct virtio_net_hdr *)buf);
> +        buf += sizeof(struct virtio_net_hdr);
> +        size -= sizeof(struct virtio_net_hdr);
> +    }
> +
>     /* Pad to minimum Ethernet frame length */
>     if (size < sizeof(min_buf)) {
>         memcpy(min_buf, buf, size);
> @@ -1887,12 +1893,6 @@
>         size = sizeof(min_buf);
>     }
> 
> -    if (s->peer_has_vhdr) {
> -        vmxnet_rx_pkt_set_vhdr(s->rx_pkt, (struct virtio_net_hdr *)buf);
> -        buf += sizeof(struct virtio_net_hdr);
> -        size -= sizeof(struct virtio_net_hdr);
> -    }
> -

Reviewed-by: Dmitry Fleytman <dmitry@daynix.com <mailto:dmitry@daynix.com>>
The code is fine, thanks!

Please fix the patch according to Paolo comments.

Regards,
Dmitry.

>     vmxnet_rx_pkt_set_packet_type(s->rx_pkt,
>         get_eth_packet_type(PKT_GET_ETH_HDR(buf)));
> 
> 
> 
> 
>
Stefan Hajnoczi June 29, 2015, 3:06 p.m. UTC | #3
On Tue, Jun 23, 2015 at 11:49:25AM -0400, Brian Kress wrote:
>     When running ESXi under qemu there is an issue with the ESXi guest
> discarding packets that are too short.  The guest discards any packets under
> the normal minimum length for an ethernet packet (60).  This results in odd
> behaviour where other hosts or VMs on other hosts can communicate with the
> ESXi guest just fine (since there's a physical NIC somewhere doing padding),
> but VMs on the host and the host itself cannot because the ARP request
> packets are too small for the ESXi host to accept.
>     Someone in the past thought this was worth fixing, and added code to the
> vmxnet3 qemu emulation such that if it is receiving packets smaller than 60
> bytes to pad the packet out to 60. Unfortunately this code is wrong (or at
> least in the wrong place). It does so BEFORE before taking into account the
> vnet_hdr at the front of the packet added by the tap device.   As a result,
> it might add padding, but it never adds enough.  Specifically it adds 10
> less (the length of the vnet_hdr) than it needs to.
>     The following (hopefully "obviously correct") patch simply swaps the
> order of processing the vnet header and the padding.  With this patch an
> ESXi guest is able to communicate with the host or other local VMs.
> 
> 
> --- a/qemu-2.3.0/hw/net/vmxnet3.c       2015-04-27 10:08:24.000000000 -0400
> +++ b/qemu-2.3.0/hw/net/vmxnet3.c       2015-06-23 11:38:48.865728713 -0400
> @@ -1879,6 +1879,12 @@
>          return -1;
>      }
> 
> +    if (s->peer_has_vhdr) {
> +        vmxnet_rx_pkt_set_vhdr(s->rx_pkt, (struct virtio_net_hdr *)buf);
> +        buf += sizeof(struct virtio_net_hdr);
> +        size -= sizeof(struct virtio_net_hdr);
> +    }
> +
>      /* Pad to minimum Ethernet frame length */
>      if (size < sizeof(min_buf)) {
>          memcpy(min_buf, buf, size);
> @@ -1887,12 +1893,6 @@
>          size = sizeof(min_buf);
>      }
> 
> -    if (s->peer_has_vhdr) {
> -        vmxnet_rx_pkt_set_vhdr(s->rx_pkt, (struct virtio_net_hdr *)buf);
> -        buf += sizeof(struct virtio_net_hdr);
> -        size -= sizeof(struct virtio_net_hdr);
> -    }
> -
>      vmxnet_rx_pkt_set_packet_type(s->rx_pkt,
>          get_eth_packet_type(PKT_GET_ETH_HDR(buf)));

Thanks, applied to my net tree:
https://github.com/stefanha/qemu/commits/net

Stefan
diff mbox

Patch

--- a/qemu-2.3.0/hw/net/vmxnet3.c       2015-04-27 10:08:24.000000000 -0400
+++ b/qemu-2.3.0/hw/net/vmxnet3.c       2015-06-23 11:38:48.865728713 -0400
@@ -1879,6 +1879,12 @@ 
          return -1;
      }

+    if (s->peer_has_vhdr) {
+        vmxnet_rx_pkt_set_vhdr(s->rx_pkt, (struct virtio_net_hdr *)buf);
+        buf += sizeof(struct virtio_net_hdr);
+        size -= sizeof(struct virtio_net_hdr);
+    }
+
      /* Pad to minimum Ethernet frame length */
      if (size < sizeof(min_buf)) {
          memcpy(min_buf, buf, size);
@@ -1887,12 +1893,6 @@ 
          size = sizeof(min_buf);
      }

-    if (s->peer_has_vhdr) {
-        vmxnet_rx_pkt_set_vhdr(s->rx_pkt, (struct virtio_net_hdr *)buf);
-        buf += sizeof(struct virtio_net_hdr);
-        size -= sizeof(struct virtio_net_hdr);
-    }
-
      vmxnet_rx_pkt_set_packet_type(s->rx_pkt,
          get_eth_packet_type(PKT_GET_ETH_HDR(buf)));