Message ID | 20171114103251.5495-1-dja@axtens.net |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
Series | macvlan: verify MTU before lowerdev xmit | expand |
On 11/14/2017 2:32 AM, Daniel Axtens wrote: > If a macvlan device which is not in bridge mode receives a packet, > it is sent straight to the lowerdev without checking against the > device's MTU. This also happens for multicast traffic. > > Add an is_skb_forwardable() check against the lowerdev before > sending the packet out through it. I think this is the simplest > and best way to do it, and is consistent with the use of > dev_forward_skb() in the bridge path. > > This is easy to replicate: > - create a VM with a macvtap connection in private mode > - set the lowerdev MTU to something low in the host (e.g. 1480) > - do not set the MTU lower in the guest (e.g. keep at 1500) > - netperf to a different host with the same high MTU > - observe that currently, the driver will forward too-big packets > - observe that with this patch the packets are dropped > > Cc: Shannon Nelson <shannon.nelson@oracle.com> > Signed-off-by: Daniel Axtens <dja@axtens.net> > > --- > > After hearing Shannon's lightning talk on macvlan at netdev I > figured I'd strike while the iron is hot and get this out of my > patch queue where it has been languishing. > --- > drivers/net/macvlan.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c > index a178c5efd33e..8adcad6798c5 100644 > --- a/drivers/net/macvlan.c > +++ b/drivers/net/macvlan.c > @@ -534,6 +534,10 @@ static int macvlan_queue_xmit(struct sk_buff *skb, struct net_device *dev) > } > > xmit_world: > + /* verify MTU */ > + if (!is_skb_forwardable(vlan->lowerdev, skb)) > + return NET_XMIT_DROP; > + Good catch. I remember your comment on this last week, but hadn't had a chance to look it up yet. As for the other paths, I see that the call to dev_forward_skb() already has this protection in it, but does the call to dev_queue_xmit_accel() in macvlan_start_xmit() need similar protection? sln > skb->dev = vlan->lowerdev; > return dev_queue_xmit(skb); > } >
On 11/14/2017 9:03 AM, Shannon Nelson wrote: > On 11/14/2017 2:32 AM, Daniel Axtens wrote: >> If a macvlan device which is not in bridge mode receives a packet, >> it is sent straight to the lowerdev without checking against the >> device's MTU. This also happens for multicast traffic. >> >> Add an is_skb_forwardable() check against the lowerdev before >> sending the packet out through it. I think this is the simplest >> and best way to do it, and is consistent with the use of >> dev_forward_skb() in the bridge path. >> >> This is easy to replicate: >> - create a VM with a macvtap connection in private mode >> - set the lowerdev MTU to something low in the host (e.g. 1480) >> - do not set the MTU lower in the guest (e.g. keep at 1500) >> - netperf to a different host with the same high MTU >> - observe that currently, the driver will forward too-big packets >> - observe that with this patch the packets are dropped >> >> Cc: Shannon Nelson <shannon.nelson@oracle.com> >> Signed-off-by: Daniel Axtens <dja@axtens.net> >> >> --- >> >> After hearing Shannon's lightning talk on macvlan at netdev I >> figured I'd strike while the iron is hot and get this out of my >> patch queue where it has been languishing. >> --- >> drivers/net/macvlan.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c >> index a178c5efd33e..8adcad6798c5 100644 >> --- a/drivers/net/macvlan.c >> +++ b/drivers/net/macvlan.c >> @@ -534,6 +534,10 @@ static int macvlan_queue_xmit(struct sk_buff >> *skb, struct net_device *dev) >> } >> xmit_world: >> + /* verify MTU */ >> + if (!is_skb_forwardable(vlan->lowerdev, skb)) >> + return NET_XMIT_DROP; >> + > > Good catch. I remember your comment on this last week, but hadn't had a > chance to look it up yet. > > As for the other paths, I see that the call to dev_forward_skb() already > has this protection in it, but does the call to dev_queue_xmit_accel() > in macvlan_start_xmit() need similar protection? > Now that I think about this a little more, why is this not already getting handled by the NETDEV_CHANGEMTU notifier? In what case are you running into this, and why is it not triggering the notifier? sln > > >> skb->dev = vlan->lowerdev; >> return dev_queue_xmit(skb); >> } >>
Hi Shannon, > Now that I think about this a little more, why is this not already > getting handled by the NETDEV_CHANGEMTU notifier? In what case are you > running into this, and why is it not triggering the notifier? My commit message was probably not super clear here - apologies for any mangling of terminology/concepts. The changed MTU is getting propagated from the lowerdev to the macvtap interface just fine, it's just not getting enforced. Hopefully the following diagrams make it clearer. This is the current behaviour: a VM sends a packet of 1514 bytes and it makes it through the macvtap/macvlan and lowerdev, to the destination, even though the intermediate MTUs are lower. *-------------------* | VM | | | | mtu 1500 | *-------------------* | V 1514 byte packet | *-------------------* | macvtap | | | | mtu 1480 | *-------------------* | V 1514 byte packet | *-------------------* | lowerdev | | | | mtu 1480 | *-------------------* | V 1514 byte packet | *-------------------* | dest | | | | mtu 1500 | *-------------------* My thought here is that the lowerdev should not be asked to transmit a packet that is larger than its MTU. The patch causes the following behaviour: *-------------------* | VM | | | | mtu 1500 | *-------------------* | V 1514 byte packet | *-------------------* | macvtap | | | | mtu 1480 | *-------------------* | | packet dropped X 1500 > 1480 I think this makes macvlan consistent with bridges. >> As for the other paths, I see that the call to dev_forward_skb() already >> has this protection in it, but does the call to dev_queue_xmit_accel() >> in macvlan_start_xmit() need similar protection? I think so. I will have a look and do a v2 if the core idea of the patch is OK. Regards, Daniel >> > > > sln > > >> >> >>> skb->dev = vlan->lowerdev; >>> return dev_queue_xmit(skb); >>> } >>>
From: Daniel Axtens <dja@axtens.net> Date: Tue, 14 Nov 2017 21:32:51 +1100 > If a macvlan device which is not in bridge mode receives a packet, > it is sent straight to the lowerdev without checking against the > device's MTU. This also happens for multicast traffic. > > Add an is_skb_forwardable() check against the lowerdev before > sending the packet out through it. I think this is the simplest > and best way to do it, and is consistent with the use of > dev_forward_skb() in the bridge path. > > This is easy to replicate: > - create a VM with a macvtap connection in private mode > - set the lowerdev MTU to something low in the host (e.g. 1480) > - do not set the MTU lower in the guest (e.g. keep at 1500) > - netperf to a different host with the same high MTU > - observe that currently, the driver will forward too-big packets > - observe that with this patch the packets are dropped > > Cc: Shannon Nelson <shannon.nelson@oracle.com> > Signed-off-by: Daniel Axtens <dja@axtens.net> This is an area where we really haven't set down some clear rules for behavior. If an interface has a particular MTU, it must be able to successfully send MTU sized packets on that link be it virtual or physical. Only a "next hop" can have a different MTU and thus drop packets. This requirement is absolutely necessary in order for proper signalling (path MTU messages) to make their way back to the sending host. In this VM-->macvlan case it's more like a point to point connection and there lacks a "next hop" to serve and the provider of proper signalling. This whole situation seems to be handled quite poorly in virtualized setups. Allowing one end of the virtual networking "link" into the guest have a different MTU from the other end is a HUGE mistake. There needs to be control path signalling between the guest and the provider of the virtual link so that they can synchronize their MTU settings. Yes this is hard, but what is happening now doesn't fly in the long term.
Hi Dave, > This is an area where we really haven't set down some clear rules > for behavior. > > If an interface has a particular MTU, it must be able to successfully > send MTU sized packets on that link be it virtual or physical. > > Only a "next hop" can have a different MTU and thus drop packets. > This requirement is absolutely necessary in order for proper > signalling (path MTU messages) to make their way back to the sending > host. > > In this VM-->macvlan case it's more like a point to point connection > and there lacks a "next hop" to serve and the provider of proper > signalling. > > This whole situation seems to be handled quite poorly in virtualized > setups. Allowing one end of the virtual networking "link" into the > guest have a different MTU from the other end is a HUGE mistake. I agree, but users do it, so I'm just trying to figure out the best way to handle it. Currently the bridge code and openvswitch will detect when a large packet arrives and drop the packet* - the bridge code with is_skb_forwardable() and openvswitch with it's own approach in vport.c. This patch tries to bring macvlan in line with those. *except for GSO packets - they are assumed to be ok, which isn't always true. I am working on some patches for this - but my approach may need to be changed in light of what you're saying. > There needs to be control path signalling between the guest and the > provider of the virtual link so that they can synchronize their MTU > settings. We have these sorts of problems on probably every type of virtual interface, some of which are easier to deal with than others. I'm particularly worried about interfaces like ibmveth where the 'other end' is usually an AIX partition on a big powerpc system. AIX tends to bring up these interfaces with MTUs of around 64k as well. (This is what originially got me down the rabbit hole of caring about mismatched-MTU handling!) > Yes this is hard, but what is happening now doesn't fly in the long > term. I'm at a bit of a loss about how we would do a proper fix. The only thing that comes to mind would be for the receive routines of the virtual network interfaces to check the size of incoming packets, but - if I understand correctly - we would need to know what the maximum size for a recieved packet should be. Regards, Daniel
From: Daniel Axtens <dja@axtens.net> Date: Fri, 17 Nov 2017 19:34:27 +1100 > Hi Dave, > >> This is an area where we really haven't set down some clear rules >> for behavior. >> >> If an interface has a particular MTU, it must be able to successfully >> send MTU sized packets on that link be it virtual or physical. >> >> Only a "next hop" can have a different MTU and thus drop packets. >> This requirement is absolutely necessary in order for proper >> signalling (path MTU messages) to make their way back to the sending >> host. >> >> In this VM-->macvlan case it's more like a point to point connection >> and there lacks a "next hop" to serve and the provider of proper >> signalling. >> >> This whole situation seems to be handled quite poorly in virtualized >> setups. Allowing one end of the virtual networking "link" into the >> guest have a different MTU from the other end is a HUGE mistake. > > I agree, but users do it, so I'm just trying to figure out the best way > to handle it. Currently the bridge code and openvswitch will detect when > a large packet arrives and drop the packet* - the bridge code with > is_skb_forwardable() and openvswitch with it's own approach in > vport.c. This patch tries to bring macvlan in line with those. > > *except for GSO packets - they are assumed to be ok, which isn't always > true. I am working on some patches for this - but my approach may need > to be changed in light of what you're saying. So how exactly do the oversized packets get to the macvlan device from the VM in this scenerio? That detail seems to be missing from the diagrams you provided earlier. The VM and the macvlan boxes are just connected with a line. Thank you.
Hi Dave, > So how exactly do the oversized packets get to the macvlan device from > the VM in this scenerio? That detail seems to be missing from the > diagrams you provided earlier. The VM and the macvlan boxes are just > connected with a line. Inside the VM I'm using netperf talking on an interface which the guest believes to have a MTU of 1500. I'm setting up the VM using libvirt - my understanding is that libvirt creates a macvtap device in private mode, and qemu opens that tap device and writes data from the emulated network card (I see the same behaviour with a emulated rtl8139, e1000, and with virtio). I think I could replicate this with any userspace program - qemu is just the easiest for me at the moment. Hopefully that's what you had in mind? Let me know if you wanted different info. Regards, Daniel [The gory details, in case it matters: The VM has a network adaptor with the following XML: <interface type='direct'> <mac address='52:54:00:e7:a2:ac'/> <source dev='enx0050b6655ff2' mode='private'/> <model type='rtl8139'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </interface> ] > > Thank you.
diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index a178c5efd33e..8adcad6798c5 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -534,6 +534,10 @@ static int macvlan_queue_xmit(struct sk_buff *skb, struct net_device *dev) } xmit_world: + /* verify MTU */ + if (!is_skb_forwardable(vlan->lowerdev, skb)) + return NET_XMIT_DROP; + skb->dev = vlan->lowerdev; return dev_queue_xmit(skb); }
If a macvlan device which is not in bridge mode receives a packet, it is sent straight to the lowerdev without checking against the device's MTU. This also happens for multicast traffic. Add an is_skb_forwardable() check against the lowerdev before sending the packet out through it. I think this is the simplest and best way to do it, and is consistent with the use of dev_forward_skb() in the bridge path. This is easy to replicate: - create a VM with a macvtap connection in private mode - set the lowerdev MTU to something low in the host (e.g. 1480) - do not set the MTU lower in the guest (e.g. keep at 1500) - netperf to a different host with the same high MTU - observe that currently, the driver will forward too-big packets - observe that with this patch the packets are dropped Cc: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Daniel Axtens <dja@axtens.net> --- After hearing Shannon's lightning talk on macvlan at netdev I figured I'd strike while the iron is hot and get this out of my patch queue where it has been languishing. --- drivers/net/macvlan.c | 4 ++++ 1 file changed, 4 insertions(+)