diff mbox

[next,v2,2/3] ipvlan: Process fragmented multicast frames correctly

Message ID 1430331558-20286-1-git-send-email-maheshb@google.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Multicast processing in IPvlan was faulty as is. Eric Dumazet
pointed out that fragmented packets won't be processed correctly
unless defrag step is introduced.

This patch adds the defrag step before driver attempts to process
multicast frame(s).

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
---
 drivers/net/ipvlan/ipvlan_core.c | 8 ++++++++
 include/net/ip.h                 | 1 +
 2 files changed, 9 insertions(+)

Comments

David Miller May 2, 2015, 1:12 a.m. UTC | #1
From: Mahesh Bandewar <maheshb@google.com>
Date: Wed, 29 Apr 2015 11:19:18 -0700

> Multicast processing in IPvlan was faulty as is. Eric Dumazet
> pointed out that fragmented packets won't be processed correctly
> unless defrag step is introduced.
> 
> This patch adds the defrag step before driver attempts to process
> multicast frame(s).
> 
> Signed-off-by: Mahesh Bandewar <maheshb@google.com>

And now you are potentially modifying the geometry of packets that
traverse ipvlan devices.

I am sorry but I am going to put my foot down and not allow another
instance of this to happen again.  We already have this being done by
netfilter, and I consider it unacceptable as well as inefficient.

If a fragmented frame is going to be forwarded by us, defragmenting
and refragmenting may create a different outgoing packet stream than
what arrived.  If the outgoing interface's MTU can accomodate the
incoming frames, this kind of modification is absolutely forbidden.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
On Fri, May 1, 2015 at 6:12 PM, David Miller <davem@davemloft.net> wrote:
> From: Mahesh Bandewar <maheshb@google.com>
> Date: Wed, 29 Apr 2015 11:19:18 -0700
>
>> Multicast processing in IPvlan was faulty as is. Eric Dumazet
>> pointed out that fragmented packets won't be processed correctly
>> unless defrag step is introduced.
>>
>> This patch adds the defrag step before driver attempts to process
>> multicast frame(s).
>>
>> Signed-off-by: Mahesh Bandewar <maheshb@google.com>
>
> And now you are potentially modifying the geometry of packets that
> traverse ipvlan devices.
>
Yes, that is true but to optimize the fragments processing.

> I am sorry but I am going to put my foot down and not allow another
> instance of this to happen again.  We already have this being done by
> netfilter, and I consider it unacceptable as well as inefficient.
>
Actually this will make the fragments processing more efficient. e.g.
if there are 50 slave devices and we receive 10 fragments, the
duplication will happen for 50*10 = 500 skbs but if the packet is
defragmented, then it's only once per device making it more efficient.

> If a fragmented frame is going to be forwarded by us, defragmenting
> and refragmenting may create a different outgoing packet stream than
> what arrived.  If the outgoing interface's MTU can accomodate the
> incoming frames, this kind of modification is absolutely forbidden.

I understand the concern (after going though Florian Westphal's patch
series) and will take this patch off of this series until matter
settles. Also trying to optimize slow path (fragments) is not very
high on the priority list in any case.

Thanks,
--mahesh..
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller May 4, 2015, 11:22 p.m. UTC | #3
From: Mahesh Bandewar <maheshb@google.com>
Date: Mon, 4 May 2015 15:58:59 -0700

> On Fri, May 1, 2015 at 6:12 PM, David Miller <davem@davemloft.net> wrote:
>> I am sorry but I am going to put my foot down and not allow another
>> instance of this to happen again.  We already have this being done by
>> netfilter, and I consider it unacceptable as well as inefficient.
>>
> Actually this will make the fragments processing more efficient. e.g.
> if there are 50 slave devices and we receive 10 fragments, the
> duplication will happen for 50*10 = 500 skbs but if the packet is
> defragmented, then it's only once per device making it more efficient.

You can linearize it 'zero' times, which is more efficient than
anything no matter how many interfaces.

And you can do this by maintaining a chain of the incoming SKBs and
then resegmenting them, which allows to maintain exactly the precise
geometry you started with in a guaranteed way.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
index c5297d1f1e1c..3b86a4eee7ab 100644
--- a/drivers/net/ipvlan/ipvlan_core.c
+++ b/drivers/net/ipvlan/ipvlan_core.c
@@ -521,6 +521,10 @@  static int ipvlan_xmit_mode_l2(struct sk_buff *skb, struct net_device *dev)
 		return dev_forward_skb(ipvlan->phy_dev, skb);
 
 	} else if (is_multicast_ether_addr(eth->h_dest)) {
+		skb = ip_check_defrag(skb, IP_DEFRAG_IPVLAN);
+		if (!skb)
+			return RX_HANDLER_CONSUMED;
+
 		ipvlan_multicast_enqueue(ipvlan->port, skb);
 		return NET_XMIT_SUCCESS;
 	}
@@ -606,6 +610,10 @@  static rx_handler_result_t ipvlan_handle_mode_l2(struct sk_buff **pskb,
 	int addr_type;
 
 	if (is_multicast_ether_addr(eth->h_dest)) {
+		skb = ip_check_defrag(skb, IP_DEFRAG_IPVLAN);
+		if (!skb)
+			return RX_HANDLER_CONSUMED;
+
 		if (ipvlan_external_frame(skb, port)) {
 			struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC);
 
diff --git a/include/net/ip.h b/include/net/ip.h
index d14af7edd197..1fb432b346a8 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -476,6 +476,7 @@  enum ip_defrag_users {
 	IP_DEFRAG_VS_FWD,
 	IP_DEFRAG_AF_PACKET,
 	IP_DEFRAG_MACVLAN,
+	IP_DEFRAG_IPVLAN,
 };
 
 int ip_defrag(struct sk_buff *skb, u32 user);