Patchwork e1000: Pad short frames to minimum size (60 bytes)

login
register
mail settings
Submitter Stefan Hajnoczi
Date Sept. 18, 2010, 8:43 p.m.
Message ID <1284842625-13920-1-git-send-email-stefanha@linux.vnet.ibm.com>
Download mbox | patch
Permalink /patch/65137/
State New
Headers show

Comments

Stefan Hajnoczi - Sept. 18, 2010, 8:43 p.m.
The OpenIndiana (Solaris) e1000g driver drops frames that are too long
or too short.  It expects to receive frames of at least the Ethernet
minimum size.  ARP requests in particular are small and will be dropped
if they are not padded appropriately, preventing a Solaris VM from
becoming visible on the network.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 hw/e1000.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)
Hervé Poussineau - Sept. 18, 2010, 8:57 p.m.
Hi,

Stefan Hajnoczi a écrit :
> The OpenIndiana (Solaris) e1000g driver drops frames that are too long
> or too short.  It expects to receive frames of at least the Ethernet
> minimum size.  ARP requests in particular are small and will be dropped
> if they are not padded appropriately, preventing a Solaris VM from
> becoming visible on the network.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> ---
>  hw/e1000.c |   10 ++++++++++
>  1 files changed, 10 insertions(+), 0 deletions(-)
>
>   
[...]

Another patch creating ARP replies at least 64 bytes long has been 
committed:
http://git.savannah.gnu.org/cgit/qemu.git/commit/?id=dbf3c4b4baceb91eb64d09f787cbe92d65188813

Does it fix your issue?

Hervé
Stefan Hajnoczi - Sept. 18, 2010, 9:12 p.m.
On Sat, Sep 18, 2010 at 9:57 PM, Hervé Poussineau <hpoussin@reactos.org> wrote:
> Another patch creating ARP replies at least 64 bytes long has been
> committed:
> http://git.savannah.gnu.org/cgit/qemu.git/commit/?id=dbf3c4b4baceb91eb64d09f787cbe92d65188813
>
> Does it fix your issue?

No I don't think so.  This is an e1000 issue, it will happen if you
use tap networking too.  The commit you linked to only affects slirp
and pads its ARP code.

I think there are two places where the minimum frame length can be enforced:
1. The NIC emulation code.  This is currently how rtl8139, pcnet, and
ne2000 do it.  My patch adds the same for e1000.
2. The net layer.  If we're emulating Ethernet then it would be
possible to pad to minimum frame length in common networking code
(net.c).

Stefan
Edgar Iglesias - Sept. 18, 2010, 9:27 p.m.
On Sat, Sep 18, 2010 at 09:43:45PM +0100, Stefan Hajnoczi wrote:
> The OpenIndiana (Solaris) e1000g driver drops frames that are too long
> or too short.  It expects to receive frames of at least the Ethernet
> minimum size.  ARP requests in particular are small and will be dropped
> if they are not padded appropriately, preventing a Solaris VM from
> becoming visible on the network.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> ---
>  hw/e1000.c |   10 ++++++++++
>  1 files changed, 10 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/e1000.c b/hw/e1000.c
> index 7d7d140..bc983f9 100644
> --- a/hw/e1000.c
> +++ b/hw/e1000.c
> @@ -55,6 +55,7 @@ static int debugflags = DBGBIT(TXERR) | DBGBIT(GENERAL);
>  
>  #define IOPORT_SIZE       0x40
>  #define PNPMMIO_SIZE      0x20000
> +#define MIN_BUF_SIZE      60
>  
>  /*
>   * HW models:
> @@ -635,10 +636,19 @@ e1000_receive(VLANClientState *nc, const uint8_t *buf, size_t size)
>      uint32_t rdh_start;
>      uint16_t vlan_special = 0;
>      uint8_t vlan_status = 0, vlan_offset = 0;
> +    uint8_t min_buf[MIN_BUF_SIZE];
>  
>      if (!(s->mac_reg[RCTL] & E1000_RCTL_EN))
>          return -1;
>  
> +    /* Pad to minimum Ethernet frame length */
> +    if (size < sizeof(min_buf)) {
> +        memcpy(min_buf, buf, size);
> +        memset(&min_buf[size], 0, sizeof(min_buf) - size);
> +        buf = min_buf;
> +        size = sizeof(min_buf);
> +    }
> +

Hi,

This doesn't look right. AFAIK, MAC's dont pad on receive.

IMO this kind of padding should somehow be done by the bridge that forwards
packets into the qemu vlan (e.g slirp or the generic tap bridge).

Cheers
Stefan Hajnoczi - Sept. 19, 2010, 6:36 a.m.
On Sat, Sep 18, 2010 at 10:27 PM, Edgar E. Iglesias
<edgar.iglesias@gmail.com> wrote:
> This doesn't look right. AFAIK, MAC's dont pad on receive.

I agree.  NICs that do padding will do it on transmit, not receive.
Anything coming in on the wire should already have the minimum length.

In QEMU that isn't true today and that's why rtl8139, pcnet, and
ne2000 already do this same padding.  This patch is the smallest
change to cover e1000.

> IMO this kind of padding should somehow be done by the bridge that forwards
> packets into the qemu vlan (e.g slirp or the generic tap bridge).

That should work and we can then drop the padding code from existing
NICs.  I'll take a look.

Stefan
Michael S. Tsirkin - Sept. 19, 2010, 11:18 a.m.
On Sun, Sep 19, 2010 at 07:36:51AM +0100, Stefan Hajnoczi wrote:
> On Sat, Sep 18, 2010 at 10:27 PM, Edgar E. Iglesias
> <edgar.iglesias@gmail.com> wrote:
> > This doesn't look right. AFAIK, MAC's dont pad on receive.
> 
> I agree.  NICs that do padding will do it on transmit, not receive.
> Anything coming in on the wire should already have the minimum length.
> 
> In QEMU that isn't true today and that's why rtl8139, pcnet, and
> ne2000 already do this same padding.  This patch is the smallest
> change to cover e1000.
> 
> > IMO this kind of padding should somehow be done by the bridge that forwards
> > packets into the qemu vlan (e.g slirp or the generic tap bridge).
> 
> That should work and we can then drop the padding code from existing
> NICs.  I'll take a look.
> 
> Stefan

Not all nic devices have to be emulate ethernet, so not all devices want
the padding, e.g. virtio does not.  It's also easy to imagine an
ethernet device that strips the padding: would be silly to add it
just to have it stripped.

If we really want to do this generically, we could implement a function dealing
with the padding, and call it from relevant devices.

But I think the patch looks fine as is.
Michael S. Tsirkin - Sept. 19, 2010, 12:03 p.m.
On Sun, Sep 19, 2010 at 02:04:55PM +0200, Edgar E. Iglesias wrote:
> On Sun, Sep 19, 2010 at 01:18:01PM +0200, Michael S. Tsirkin wrote:
> > On Sun, Sep 19, 2010 at 07:36:51AM +0100, Stefan Hajnoczi wrote:
> > > On Sat, Sep 18, 2010 at 10:27 PM, Edgar E. Iglesias
> > > <edgar.iglesias@gmail.com> wrote:
> > > > This doesn't look right. AFAIK, MAC's dont pad on receive.
> > > 
> > > I agree.  NICs that do padding will do it on transmit, not receive.
> > > Anything coming in on the wire should already have the minimum length.
> > > 
> > > In QEMU that isn't true today and that's why rtl8139, pcnet, and
> > > ne2000 already do this same padding.  This patch is the smallest
> > > change to cover e1000.
> > > 
> > > > IMO this kind of padding should somehow be done by the bridge that forwards
> > > > packets into the qemu vlan (e.g slirp or the generic tap bridge).
> > > 
> > > That should work and we can then drop the padding code from existing
> > > NICs.  I'll take a look.
> > > 
> > > Stefan
> > 
> > Not all nic devices have to be emulate ethernet, so not all devices want
> > the padding, e.g. virtio does not.
> 
> Right, ethernet behaviour should obviously not be applied unconditionally for
> all net devices.
> 
> 
> > It's also easy to imagine an
> > ethernet device that strips the padding: would be silly to add it
> > just to have it stripped.
> 
> I dont beleive that is possible. The FCS comes last, so an ethernet MAC
> would have to do really silly things to differentiate between padding and
> real payload.
> 
> > If we really want to do this generically, we could implement a function dealing
> > with the padding, and call it from relevant devices.
> 
> Another way is to have network devices register their link types so that the
> generic bridge can apply whatever link specific fixups that may be needed.

Well, we want to move away from using the generic bridge and to
-netdev pairing of front/backend, anyway.
Adding code there will just complicate that.

> I would prefer to have the padding of bridged frames decoupled from the
> device models, but I cant say I feel very strongly about this.
> 
> Cheers
Edgar Iglesias - Sept. 19, 2010, 12:04 p.m.
On Sun, Sep 19, 2010 at 01:18:01PM +0200, Michael S. Tsirkin wrote:
> On Sun, Sep 19, 2010 at 07:36:51AM +0100, Stefan Hajnoczi wrote:
> > On Sat, Sep 18, 2010 at 10:27 PM, Edgar E. Iglesias
> > <edgar.iglesias@gmail.com> wrote:
> > > This doesn't look right. AFAIK, MAC's dont pad on receive.
> > 
> > I agree.  NICs that do padding will do it on transmit, not receive.
> > Anything coming in on the wire should already have the minimum length.
> > 
> > In QEMU that isn't true today and that's why rtl8139, pcnet, and
> > ne2000 already do this same padding.  This patch is the smallest
> > change to cover e1000.
> > 
> > > IMO this kind of padding should somehow be done by the bridge that forwards
> > > packets into the qemu vlan (e.g slirp or the generic tap bridge).
> > 
> > That should work and we can then drop the padding code from existing
> > NICs.  I'll take a look.
> > 
> > Stefan
> 
> Not all nic devices have to be emulate ethernet, so not all devices want
> the padding, e.g. virtio does not.

Right, ethernet behaviour should obviously not be applied unconditionally for
all net devices.


> It's also easy to imagine an
> ethernet device that strips the padding: would be silly to add it
> just to have it stripped.

I dont beleive that is possible. The FCS comes last, so an ethernet MAC
would have to do really silly things to differentiate between padding and
real payload.


> If we really want to do this generically, we could implement a function dealing
> with the padding, and call it from relevant devices.

Another way is to have network devices register their link types so that the
generic bridge can apply whatever link specific fixups that may be needed.

I would prefer to have the padding of bridged frames decoupled from the
device models, but I cant say I feel very strongly about this.

Cheers
Kevin Wolf - Sept. 20, 2010, 8:42 a.m.
Am 18.09.2010 23:12, schrieb Stefan Hajnoczi:
> On Sat, Sep 18, 2010 at 9:57 PM, Hervé Poussineau <hpoussin@reactos.org> wrote:
>> Another patch creating ARP replies at least 64 bytes long has been
>> committed:
>> http://git.savannah.gnu.org/cgit/qemu.git/commit/?id=dbf3c4b4baceb91eb64d09f787cbe92d65188813
>>
>> Does it fix your issue?
> 
> No I don't think so.  This is an e1000 issue, it will happen if you
> use tap networking too.  The commit you linked to only affects slirp
> and pads its ARP code.
> 
> I think there are two places where the minimum frame length can be enforced:
> 1. The NIC emulation code.  This is currently how rtl8139, pcnet, and
> ne2000 do it.  My patch adds the same for e1000.
> 2. The net layer.  If we're emulating Ethernet then it would be
> possible to pad to minimum frame length in common networking code
> (net.c).

3. The sender. I think it should be the sender's decision which packet
he sends and there's no reason to manipulate it on its way to the guest.
If the sender sends too short packets, this is where the bug is.

Actually, instead of padding the packet we should already drop it in the
device model if RCTL.SBP = 0. Does a real Solaris work when it receives
the same packet?

On the other hand, it seems that we're missing the padding where it
actually belongs: when sending packets with TCTL.PSP = 1. Did you send
the ARP packet from another qemu instance? If so, this might be the real
reason.

Kevin
Kevin Wolf - Sept. 20, 2010, 8:50 a.m.
Am 19.09.2010 08:36, schrieb Stefan Hajnoczi:
> On Sat, Sep 18, 2010 at 10:27 PM, Edgar E. Iglesias
> <edgar.iglesias@gmail.com> wrote:
>> This doesn't look right. AFAIK, MAC's dont pad on receive.
> 
> I agree.  NICs that do padding will do it on transmit, not receive.
> Anything coming in on the wire should already have the minimum length.
> 
> In QEMU that isn't true today and that's why rtl8139, pcnet, and
> ne2000 already do this same padding.  This patch is the smallest
> change to cover e1000.

What's the reason that it isn't true in QEMU today? Shouldn't we fix
these problems rather than making device emulations incorrect to
compensate for it?

Kevin
Edgar Iglesias - Sept. 20, 2010, 8:51 a.m.
On Mon, Sep 20, 2010 at 10:42:31AM +0200, Kevin Wolf wrote:
> Am 18.09.2010 23:12, schrieb Stefan Hajnoczi:
> > On Sat, Sep 18, 2010 at 9:57 PM, Hervé Poussineau <hpoussin@reactos.org> wrote:
> >> Another patch creating ARP replies at least 64 bytes long has been
> >> committed:
> >> http://git.savannah.gnu.org/cgit/qemu.git/commit/?id=dbf3c4b4baceb91eb64d09f787cbe92d65188813
> >>
> >> Does it fix your issue?
> > 
> > No I don't think so.  This is an e1000 issue, it will happen if you
> > use tap networking too.  The commit you linked to only affects slirp
> > and pads its ARP code.
> > 
> > I think there are two places where the minimum frame length can be enforced:
> > 1. The NIC emulation code.  This is currently how rtl8139, pcnet, and
> > ne2000 do it.  My patch adds the same for e1000.
> > 2. The net layer.  If we're emulating Ethernet then it would be
> > possible to pad to minimum frame length in common networking code
> > (net.c).
> 
> 3. The sender. I think it should be the sender's decision which packet
> he sends and there's no reason to manipulate it on its way to the guest.
> If the sender sends too short packets, this is where the bug is.

Yes, but when using tap, the ethernet sender is QEMU itself. Tap doesn't
have the same requirements as ethernet so the original sender has no
reason to pad.

Internally in QEMU, there is code that picks up tap packets and
forwards them to the emulated ethernet links, this is were padding
should be done IMO. Not in the device models receive path.

The bridge that forwards frames from tap into emulated links must
also handle different kinds of link types, as all emulated network
devices are not necessarily ethernet.

Cheers
Edgar Iglesias - Sept. 20, 2010, 9:03 a.m.
On Mon, Sep 20, 2010 at 10:50:40AM +0200, Kevin Wolf wrote:
> Am 19.09.2010 08:36, schrieb Stefan Hajnoczi:
> > On Sat, Sep 18, 2010 at 10:27 PM, Edgar E. Iglesias
> > <edgar.iglesias@gmail.com> wrote:
> >> This doesn't look right. AFAIK, MAC's dont pad on receive.
> > 
> > I agree.  NICs that do padding will do it on transmit, not receive.
> > Anything coming in on the wire should already have the minimum length.
> > 
> > In QEMU that isn't true today and that's why rtl8139, pcnet, and
> > ne2000 already do this same padding.  This patch is the smallest
> > change to cover e1000.
> 
> What's the reason that it isn't true in QEMU today? Shouldn't we fix
> these problems rather than making device emulations incorrect to
> compensate for it?

Yes we should, I agree.

Cheers
Stefan Hajnoczi - Sept. 20, 2010, 9:13 a.m.
On Mon, Sep 20, 2010 at 10:42:31AM +0200, Kevin Wolf wrote:
> Am 18.09.2010 23:12, schrieb Stefan Hajnoczi:
> > On Sat, Sep 18, 2010 at 9:57 PM, Hervé Poussineau <hpoussin@reactos.org> wrote:
> >> Another patch creating ARP replies at least 64 bytes long has been
> >> committed:
> >> http://git.savannah.gnu.org/cgit/qemu.git/commit/?id=dbf3c4b4baceb91eb64d09f787cbe92d65188813
> >>
> >> Does it fix your issue?
> > 
> > No I don't think so.  This is an e1000 issue, it will happen if you
> > use tap networking too.  The commit you linked to only affects slirp
> > and pads its ARP code.
> > 
> > I think there are two places where the minimum frame length can be enforced:
> > 1. The NIC emulation code.  This is currently how rtl8139, pcnet, and
> > ne2000 do it.  My patch adds the same for e1000.
> > 2. The net layer.  If we're emulating Ethernet then it would be
> > possible to pad to minimum frame length in common networking code
> > (net.c).
> 
> 3. The sender. I think it should be the sender's decision which packet
> he sends and there's no reason to manipulate it on its way to the guest.
> If the sender sends too short packets, this is where the bug is.
> 
> Actually, instead of padding the packet we should already drop it in the
> device model if RCTL.SBP = 0. Does a real Solaris work when it receives
> the same packet?
> 
> On the other hand, it seems that we're missing the padding where it
> actually belongs: when sending packets with TCTL.PSP = 1. Did you send
> the ARP packet from another qemu instance? If so, this might be the real
> reason.

No, I brought up the tap interface on the host and tried to ping the
guest directly.  This caused the host to send an ARP request over the
tap interface.  It was not padded.

Perhaps Linux expects the NIC to deal with padding transmit frames but
tap does not do this.  That's just my theory though from the experience
that real NICs often do pad to minimum size.  Does anyone know the
definitive answer?

Stefan
Stefan Hajnoczi - Sept. 20, 2010, 9:34 a.m.
On Mon, Sep 20, 2010 at 11:03:37AM +0200, Edgar E. Iglesias wrote:
> On Mon, Sep 20, 2010 at 10:50:40AM +0200, Kevin Wolf wrote:
> > Am 19.09.2010 08:36, schrieb Stefan Hajnoczi:
> > > On Sat, Sep 18, 2010 at 10:27 PM, Edgar E. Iglesias
> > > <edgar.iglesias@gmail.com> wrote:
> > >> This doesn't look right. AFAIK, MAC's dont pad on receive.
> > > 
> > > I agree.  NICs that do padding will do it on transmit, not receive.
> > > Anything coming in on the wire should already have the minimum length.
> > > 
> > > In QEMU that isn't true today and that's why rtl8139, pcnet, and
> > > ne2000 already do this same padding.  This patch is the smallest
> > > change to cover e1000.
> > 
> > What's the reason that it isn't true in QEMU today? Shouldn't we fix
> > these problems rather than making device emulations incorrect to
> > compensate for it?
> 
> Yes we should, I agree.

Does someone with more knowledge for QEMU networking than me want to
take a stab at it?

Stefan
Michael S. Tsirkin - Sept. 20, 2010, 10:42 a.m.
On Sun, Sep 19, 2010 at 07:36:51AM +0100, Stefan Hajnoczi wrote:
> On Sat, Sep 18, 2010 at 10:27 PM, Edgar E. Iglesias
> <edgar.iglesias@gmail.com> wrote:
> > This doesn't look right. AFAIK, MAC's dont pad on receive.
> 
> I agree.  NICs that do padding will do it on transmit, not receive.
> Anything coming in on the wire should already have the minimum length.

QEMU never gets access to the wire.
Our APIs do not really pass complete ethernet packets:
we forward packets without checksum and padding.

I think it makes complete sense to keep this and
handle padding in devices because we
have devices that pass the frame to guest without padding and checksum.
It should be easy to replace padding code in devices that
need it with some kind of macro.

> In QEMU that isn't true today and that's why rtl8139, pcnet, and
> ne2000 already do this same padding.  This patch is the smallest
> change to cover e1000.
> 
> > IMO this kind of padding should somehow be done by the bridge that forwards
> > packets into the qemu vlan (e.g slirp or the generic tap bridge).
> 
> That should work and we can then drop the padding code from existing
> NICs.  I'll take a look.
> 
> Stefan
Michael S. Tsirkin - Sept. 20, 2010, 5:52 p.m.
On Sat, Sep 18, 2010 at 09:43:45PM +0100, Stefan Hajnoczi wrote:
> The OpenIndiana (Solaris) e1000g driver drops frames that are too long
> or too short.  It expects to receive frames of at least the Ethernet
> minimum size.  ARP requests in particular are small and will be dropped
> if they are not padded appropriately, preventing a Solaris VM from
> becoming visible on the network.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>

I'll put this patch on my tree: let's be consistent and fix e1000,
it is also a good approach for 0.13 I think.
Anthony - could you pick this up for 0.13 please?
If someone wants to then strip it off from all devices and tweak, it can
be done.

> ---
>  hw/e1000.c |   10 ++++++++++
>  1 files changed, 10 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/e1000.c b/hw/e1000.c
> index 7d7d140..bc983f9 100644
> --- a/hw/e1000.c
> +++ b/hw/e1000.c
> @@ -55,6 +55,7 @@ static int debugflags = DBGBIT(TXERR) | DBGBIT(GENERAL);
>  
>  #define IOPORT_SIZE       0x40
>  #define PNPMMIO_SIZE      0x20000
> +#define MIN_BUF_SIZE      60
>  
>  /*
>   * HW models:
> @@ -635,10 +636,19 @@ e1000_receive(VLANClientState *nc, const uint8_t *buf, size_t size)
>      uint32_t rdh_start;
>      uint16_t vlan_special = 0;
>      uint8_t vlan_status = 0, vlan_offset = 0;
> +    uint8_t min_buf[MIN_BUF_SIZE];
>  
>      if (!(s->mac_reg[RCTL] & E1000_RCTL_EN))
>          return -1;
>  
> +    /* Pad to minimum Ethernet frame length */
> +    if (size < sizeof(min_buf)) {
> +        memcpy(min_buf, buf, size);
> +        memset(&min_buf[size], 0, sizeof(min_buf) - size);
> +        buf = min_buf;
> +        size = sizeof(min_buf);
> +    }
> +
>      if (size > s->rxbuf_size) {
>          DBGOUT(RX, "packet too large for buffers (%lu > %d)\n",
>                 (unsigned long)size, s->rxbuf_size);
> -- 
> 1.7.1
>

Patch

diff --git a/hw/e1000.c b/hw/e1000.c
index 7d7d140..bc983f9 100644
--- a/hw/e1000.c
+++ b/hw/e1000.c
@@ -55,6 +55,7 @@  static int debugflags = DBGBIT(TXERR) | DBGBIT(GENERAL);
 
 #define IOPORT_SIZE       0x40
 #define PNPMMIO_SIZE      0x20000
+#define MIN_BUF_SIZE      60
 
 /*
  * HW models:
@@ -635,10 +636,19 @@  e1000_receive(VLANClientState *nc, const uint8_t *buf, size_t size)
     uint32_t rdh_start;
     uint16_t vlan_special = 0;
     uint8_t vlan_status = 0, vlan_offset = 0;
+    uint8_t min_buf[MIN_BUF_SIZE];
 
     if (!(s->mac_reg[RCTL] & E1000_RCTL_EN))
         return -1;
 
+    /* Pad to minimum Ethernet frame length */
+    if (size < sizeof(min_buf)) {
+        memcpy(min_buf, buf, size);
+        memset(&min_buf[size], 0, sizeof(min_buf) - size);
+        buf = min_buf;
+        size = sizeof(min_buf);
+    }
+
     if (size > s->rxbuf_size) {
         DBGOUT(RX, "packet too large for buffers (%lu > %d)\n",
                (unsigned long)size, s->rxbuf_size);