macvlan: verify MTU before lowerdev xmit

Message ID 20171114103251.5495-1-dja@axtens.net
State Changes Requested
Delegated to: David Miller
Headers show
Series
  • macvlan: verify MTU before lowerdev xmit
Related show

Commit Message

Daniel Axtens Nov. 14, 2017, 10:32 a.m.
If a macvlan device which is not in bridge mode receives a packet,
it is sent straight to the lowerdev without checking against the
device's MTU. This also happens for multicast traffic.

Add an is_skb_forwardable() check against the lowerdev before
sending the packet out through it. I think this is the simplest
and best way to do it, and is consistent with the use of
dev_forward_skb() in the bridge path.

This is easy to replicate:
 - create a VM with a macvtap connection in private mode
 - set the lowerdev MTU to something low in the host (e.g. 1480)
 - do not set the MTU lower in the guest (e.g. keep at 1500)
 - netperf to a different host with the same high MTU
 - observe that currently, the driver will forward too-big packets
 - observe that with this patch the packets are dropped

Cc: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>

---

After hearing Shannon's lightning talk on macvlan at netdev I
figured I'd strike while the iron is hot and get this out of my
patch queue where it has been languishing.
---
 drivers/net/macvlan.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Shannon Nelson Nov. 14, 2017, 5:03 p.m. | #1
On 11/14/2017 2:32 AM, Daniel Axtens wrote:
> If a macvlan device which is not in bridge mode receives a packet,
> it is sent straight to the lowerdev without checking against the
> device's MTU. This also happens for multicast traffic.
> 
> Add an is_skb_forwardable() check against the lowerdev before
> sending the packet out through it. I think this is the simplest
> and best way to do it, and is consistent with the use of
> dev_forward_skb() in the bridge path.
> 
> This is easy to replicate:
>   - create a VM with a macvtap connection in private mode
>   - set the lowerdev MTU to something low in the host (e.g. 1480)
>   - do not set the MTU lower in the guest (e.g. keep at 1500)
>   - netperf to a different host with the same high MTU
>   - observe that currently, the driver will forward too-big packets
>   - observe that with this patch the packets are dropped
> 
> Cc: Shannon Nelson <shannon.nelson@oracle.com>
> Signed-off-by: Daniel Axtens <dja@axtens.net>
> 
> ---
> 
> After hearing Shannon's lightning talk on macvlan at netdev I
> figured I'd strike while the iron is hot and get this out of my
> patch queue where it has been languishing.
> ---
>   drivers/net/macvlan.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
> index a178c5efd33e..8adcad6798c5 100644
> --- a/drivers/net/macvlan.c
> +++ b/drivers/net/macvlan.c
> @@ -534,6 +534,10 @@ static int macvlan_queue_xmit(struct sk_buff *skb, struct net_device *dev)
>   	}
>   
>   xmit_world:
> +	/* verify MTU */
> +	if (!is_skb_forwardable(vlan->lowerdev, skb))
> +		return NET_XMIT_DROP;
> +

Good catch.  I remember your comment on this last week, but hadn't had a 
chance to look it up yet.

As for the other paths, I see that the call to dev_forward_skb() already 
has this protection in it, but does the call to dev_queue_xmit_accel() 
in macvlan_start_xmit() need similar protection?

sln


>   	skb->dev = vlan->lowerdev;
>   	return dev_queue_xmit(skb);
>   }
>
Shannon Nelson Nov. 14, 2017, 7:06 p.m. | #2
On 11/14/2017 9:03 AM, Shannon Nelson wrote:
> On 11/14/2017 2:32 AM, Daniel Axtens wrote:
>> If a macvlan device which is not in bridge mode receives a packet,
>> it is sent straight to the lowerdev without checking against the
>> device's MTU. This also happens for multicast traffic.
>>
>> Add an is_skb_forwardable() check against the lowerdev before
>> sending the packet out through it. I think this is the simplest
>> and best way to do it, and is consistent with the use of
>> dev_forward_skb() in the bridge path.
>>
>> This is easy to replicate:
>>   - create a VM with a macvtap connection in private mode
>>   - set the lowerdev MTU to something low in the host (e.g. 1480)
>>   - do not set the MTU lower in the guest (e.g. keep at 1500)
>>   - netperf to a different host with the same high MTU
>>   - observe that currently, the driver will forward too-big packets
>>   - observe that with this patch the packets are dropped
>>
>> Cc: Shannon Nelson <shannon.nelson@oracle.com>
>> Signed-off-by: Daniel Axtens <dja@axtens.net>
>>
>> ---
>>
>> After hearing Shannon's lightning talk on macvlan at netdev I
>> figured I'd strike while the iron is hot and get this out of my
>> patch queue where it has been languishing.
>> ---
>>   drivers/net/macvlan.c | 4 ++++
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
>> index a178c5efd33e..8adcad6798c5 100644
>> --- a/drivers/net/macvlan.c
>> +++ b/drivers/net/macvlan.c
>> @@ -534,6 +534,10 @@ static int macvlan_queue_xmit(struct sk_buff 
>> *skb, struct net_device *dev)
>>       }
>>   xmit_world:
>> +    /* verify MTU */
>> +    if (!is_skb_forwardable(vlan->lowerdev, skb))
>> +        return NET_XMIT_DROP;
>> +
> 
> Good catch.  I remember your comment on this last week, but hadn't had a 
> chance to look it up yet.
> 
> As for the other paths, I see that the call to dev_forward_skb() already 
> has this protection in it, but does the call to dev_queue_xmit_accel() 
> in macvlan_start_xmit() need similar protection?
> 

Now that I think about this a little more, why is this not already 
getting handled by the NETDEV_CHANGEMTU notifier?  In what case are you 
running into this, and why is it not triggering the notifier?

sln


> 
> 
>>       skb->dev = vlan->lowerdev;
>>       return dev_queue_xmit(skb);
>>   }
>>
Daniel Axtens Nov. 15, 2017, 3:27 a.m. | #3
Hi Shannon,

> Now that I think about this a little more, why is this not already 
> getting handled by the NETDEV_CHANGEMTU notifier?  In what case are you 
> running into this, and why is it not triggering the notifier?

My commit message was probably not super clear here - apologies for any
mangling of terminology/concepts. The changed MTU is getting propagated
from the lowerdev to the macvtap interface just fine, it's just not
getting enforced.

Hopefully the following diagrams make it clearer.

This is the current behaviour: a VM sends a packet of 1514 bytes and it
makes it through the macvtap/macvlan and lowerdev, to the destination,
even though the intermediate MTUs are lower.

*-------------------*
|        VM         |
|                   |
|     mtu 1500      |
*-------------------*
          |
	  V 1514 byte packet
	  |
*-------------------*
|      macvtap      |
|                   |
|     mtu 1480      |
*-------------------*
          |
	  V 1514 byte packet
	  |
*-------------------*
|     lowerdev      |
|                   |
|     mtu 1480      |
*-------------------*
          |
	  V 1514 byte packet
	  |
*-------------------*
|       dest        |
|                   |
|     mtu 1500      |
*-------------------*

My thought here is that the lowerdev should not be asked to transmit a
packet that is larger than its MTU. The patch causes the following
behaviour:


*-------------------*
|        VM         |
|                   |
|     mtu 1500      |
*-------------------*
          |
	  V 1514 byte packet
	  |
*-------------------*
|      macvtap      |
|                   |
|     mtu 1480      |
*-------------------*
          |
	  | packet dropped
	  X  1500 > 1480

I think this makes macvlan consistent with bridges.

>> As for the other paths, I see that the call to dev_forward_skb() already 
>> has this protection in it, but does the call to dev_queue_xmit_accel() 
>> in macvlan_start_xmit() need similar protection?

I think so. I will have a look and do a v2 if the core idea of the patch
is OK.

Regards,
Daniel

>> 
>
>
> sln
>
>
>> 
>> 
>>>       skb->dev = vlan->lowerdev;
>>>       return dev_queue_xmit(skb);
>>>   }
>>>
David Miller Nov. 17, 2017, 5:54 a.m. | #4
From: Daniel Axtens <dja@axtens.net>
Date: Tue, 14 Nov 2017 21:32:51 +1100

> If a macvlan device which is not in bridge mode receives a packet,
> it is sent straight to the lowerdev without checking against the
> device's MTU. This also happens for multicast traffic.
> 
> Add an is_skb_forwardable() check against the lowerdev before
> sending the packet out through it. I think this is the simplest
> and best way to do it, and is consistent with the use of
> dev_forward_skb() in the bridge path.
> 
> This is easy to replicate:
>  - create a VM with a macvtap connection in private mode
>  - set the lowerdev MTU to something low in the host (e.g. 1480)
>  - do not set the MTU lower in the guest (e.g. keep at 1500)
>  - netperf to a different host with the same high MTU
>  - observe that currently, the driver will forward too-big packets
>  - observe that with this patch the packets are dropped
> 
> Cc: Shannon Nelson <shannon.nelson@oracle.com>
> Signed-off-by: Daniel Axtens <dja@axtens.net>

This is an area where we really haven't set down some clear rules
for behavior.

If an interface has a particular MTU, it must be able to successfully
send MTU sized packets on that link be it virtual or physical.

Only a "next hop" can have a different MTU and thus drop packets.
This requirement is absolutely necessary in order for proper
signalling (path MTU messages) to make their way back to the sending
host.

In this VM-->macvlan case it's more like a point to point connection
and there lacks a "next hop" to serve and the provider of proper
signalling.

This whole situation seems to be handled quite poorly in virtualized
setups.  Allowing one end of the virtual networking "link" into the
guest have a different MTU from the other end is a HUGE mistake.

There needs to be control path signalling between the guest and the
provider of the virtual link so that they can synchronize their MTU
settings.

Yes this is hard, but what is happening now doesn't fly in the long
term.
Daniel Axtens Nov. 17, 2017, 8:34 a.m. | #5
Hi Dave,

> This is an area where we really haven't set down some clear rules
> for behavior.
>
> If an interface has a particular MTU, it must be able to successfully
> send MTU sized packets on that link be it virtual or physical.
>
> Only a "next hop" can have a different MTU and thus drop packets.
> This requirement is absolutely necessary in order for proper
> signalling (path MTU messages) to make their way back to the sending
> host.
>
> In this VM-->macvlan case it's more like a point to point connection
> and there lacks a "next hop" to serve and the provider of proper
> signalling.
>
> This whole situation seems to be handled quite poorly in virtualized
> setups.  Allowing one end of the virtual networking "link" into the
> guest have a different MTU from the other end is a HUGE mistake.

I agree, but users do it, so I'm just trying to figure out the best way
to handle it. Currently the bridge code and openvswitch will detect when
a large packet arrives and drop the packet* - the bridge code with
is_skb_forwardable() and openvswitch with it's own approach in
vport.c. This patch tries to bring macvlan in line with those.

*except for GSO packets - they are assumed to be ok, which isn't always
 true. I am working on some patches for this - but my approach may need
 to be changed in light of what you're saying.

> There needs to be control path signalling between the guest and the
> provider of the virtual link so that they can synchronize their MTU
> settings.

We have these sorts of problems on probably every type of virtual
interface, some of which are easier to deal with than others. I'm
particularly worried about interfaces like ibmveth where the 'other end'
is usually an AIX partition on a big powerpc system. AIX tends to bring
up these interfaces with MTUs of around 64k as well. (This is what
originially got me down the rabbit hole of caring about mismatched-MTU
handling!)

> Yes this is hard, but what is happening now doesn't fly in the long
> term.

I'm at a bit of a loss about how we would do a proper fix. The only
thing that comes to mind would be for the receive routines of the
virtual network interfaces to check the size of incoming packets, but -
if I understand correctly - we would need to know what the maximum size
for a recieved packet should be.

Regards,
Daniel
David Miller Nov. 17, 2017, 8:41 a.m. | #6
From: Daniel Axtens <dja@axtens.net>
Date: Fri, 17 Nov 2017 19:34:27 +1100

> Hi Dave,
> 
>> This is an area where we really haven't set down some clear rules
>> for behavior.
>>
>> If an interface has a particular MTU, it must be able to successfully
>> send MTU sized packets on that link be it virtual or physical.
>>
>> Only a "next hop" can have a different MTU and thus drop packets.
>> This requirement is absolutely necessary in order for proper
>> signalling (path MTU messages) to make their way back to the sending
>> host.
>>
>> In this VM-->macvlan case it's more like a point to point connection
>> and there lacks a "next hop" to serve and the provider of proper
>> signalling.
>>
>> This whole situation seems to be handled quite poorly in virtualized
>> setups.  Allowing one end of the virtual networking "link" into the
>> guest have a different MTU from the other end is a HUGE mistake.
> 
> I agree, but users do it, so I'm just trying to figure out the best way
> to handle it. Currently the bridge code and openvswitch will detect when
> a large packet arrives and drop the packet* - the bridge code with
> is_skb_forwardable() and openvswitch with it's own approach in
> vport.c. This patch tries to bring macvlan in line with those.
> 
> *except for GSO packets - they are assumed to be ok, which isn't always
>  true. I am working on some patches for this - but my approach may need
>  to be changed in light of what you're saying.

So how exactly do the oversized packets get to the macvlan device from
the VM in this scenerio?  That detail seems to be missing from the
diagrams you provided earlier.  The VM and the macvlan boxes are just
connected with a line.

Thank you.
Daniel Axtens Nov. 17, 2017, 12:18 p.m. | #7
Hi Dave,

> So how exactly do the oversized packets get to the macvlan device from
> the VM in this scenerio?  That detail seems to be missing from the
> diagrams you provided earlier.  The VM and the macvlan boxes are just
> connected with a line.

Inside the VM I'm using netperf talking on an interface which the guest
believes to have a MTU of 1500.

I'm setting up the VM using libvirt - my understanding is that libvirt
creates a macvtap device in private mode, and qemu opens that tap device
and writes data from the emulated network card (I see the same behaviour
with a emulated rtl8139, e1000, and with virtio).

I think I could replicate this with any userspace program - qemu is just
the easiest for me at the moment.

Hopefully that's what you had in mind? Let me know if you wanted
different info.

Regards,
Daniel

[The gory details, in case it matters: The VM has a network adaptor with
the following XML:

    <interface type='direct'>
      <mac address='52:54:00:e7:a2:ac'/>
      <source dev='enx0050b6655ff2' mode='private'/>
      <model type='rtl8139'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </interface>
]

>
> Thank you.

Patch

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index a178c5efd33e..8adcad6798c5 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -534,6 +534,10 @@  static int macvlan_queue_xmit(struct sk_buff *skb, struct net_device *dev)
 	}
 
 xmit_world:
+	/* verify MTU */
+	if (!is_skb_forwardable(vlan->lowerdev, skb))
+		return NET_XMIT_DROP;
+
 	skb->dev = vlan->lowerdev;
 	return dev_queue_xmit(skb);
 }