diff mbox

[v2,1/2] net: Added mtu parameter to dev_forward_skb calls

Message ID 20170511134629.139528-2-fredrik.markstrom@gmail.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Fredrik Markstrom May 11, 2017, 1:46 p.m. UTC
From: Fredrik Markström <fredrik.markstrom@gmail.com>

is_skb_forwardable() currently checks if the packet size is <= mtu of
the receiving interface. This is not consistent with most of the hardware
ethernet drivers that happily receives packets larger then MTU.

This patch adds a parameter to dev_forward_skb and is_skb_forwardable so
that the caller can override this packet size limit.

Signed-off-by: Fredrik Markstrom <fredrik.markstrom@gmail.com>
---
 drivers/net/ipvlan/ipvlan_core.c |  7 ++++---
 drivers/net/macvlan.c            |  4 ++--
 drivers/net/veth.c               |  2 +-
 include/linux/netdevice.h        | 10 +++++-----
 net/bridge/br_forward.c          |  4 ++--
 net/core/dev.c                   | 17 +++++++++++------
 net/core/filter.c                |  4 ++--
 net/l2tp/l2tp_eth.c              |  2 +-
 8 files changed, 28 insertions(+), 22 deletions(-)

Comments

Stephen Hemminger May 11, 2017, 4:01 p.m. UTC | #1
On Thu, 11 May 2017 15:46:27 +0200
Fredrik Markstrom <fredrik.markstrom@gmail.com> wrote:

> From: Fredrik Markström <fredrik.markstrom@gmail.com>
> 
> is_skb_forwardable() currently checks if the packet size is <= mtu of
> the receiving interface. This is not consistent with most of the hardware
> ethernet drivers that happily receives packets larger then MTU.

Wrong.

Hardware interfaces are free to drop any packet greater than MTU (actually MTU + VLAN).
The actual limit is a function of the hardware. Some hardware can only limit by
power of 2; some can only limit frames larger than 1500; some have no limiting at all.
Any application that should:
  * not expect packets larger than MTU to be received
  * not send packets larger than MTU
  * check actual receive size. IP protocols will do truncation of padded packets
Fredrik Markstrom May 11, 2017, 7:10 p.m. UTC | #2
On Thu, May 11, 2017 at 6:01 PM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> On Thu, 11 May 2017 15:46:27 +0200
> Fredrik Markstrom <fredrik.markstrom@gmail.com> wrote:
>
>> From: Fredrik Markström <fredrik.markstrom@gmail.com>
>>
>> is_skb_forwardable() currently checks if the packet size is <= mtu of
>> the receiving interface. This is not consistent with most of the hardware
>> ethernet drivers that happily receives packets larger then MTU.
>
> Wrong.

What is "Wrong" ? I was initially skeptical to implement this patch,
since it feels odd to have different MTU:s set on the two sides of a
link. After consulting some IP people and the RFC:s I kind of changed
my mind and thought I'd give it a shot. In the RFCs I couldn't find
anything that defined when and when not a received packet should be
dropped.

>
> Hardware interfaces are free to drop any packet greater than MTU (actually MTU + VLAN).
> The actual limit is a function of the hardware. Some hardware can only limit by
> power of 2; some can only limit frames larger than 1500; some have no limiting at all.

Agreed. The purpose of these patches is to be able to configure an
veth interface to mimic these different behaviors. Non of the Ethernet
interfaces I have access to drops packets due to them being larger
then the configured MTU like veth does.

Being able to mimic real Ethernet hardware is useful when
consolidating hardware using containers/namespaces.

In a reply to a comment from David Miller in my previous version of
the patch I attached the example below to demonstrate the case in
detail.

This works with all ethernet hardware setups I have access to:

---- 8< ------
# Host A eth2 and Host B eth0 is on the same network.

# On HOST A
% ip address add 1.2.3.4/24 dev eth2
% ip link set eth2 mtu 300 up

% # HOST B
% ip address add 1.2.3.5/24 dev eth0
% ip link set eth0 mtu 1000 up
% ping -c 1 -W 1 -s 400 1.2.3.4
PING 1.2.3.4 (1.2.3.4) 400(428) bytes of data.
408 bytes from 1.2.3.4: icmp_seq=1 ttl=64 time=1.57 ms

--- 1.2.3.4 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.573/1.573/1.573/0.000 ms
---- 8< ------


But it doesn't work with veth:

---- 8< ------
# veth0 and veth1 is a veth pair and veth1 has ben moved to a separate
network namespace.
% # NS A
% ip address add 1.2.3.4/24 dev veth0
% ip link set veth0 mtu 300 up

% # NS B
% ip address add 1.2.3.5/24 dev veth1
% ip link set veth1 mtu 1000 up
% ping -c 1 -W 1 -s 400 1.2.3.4
PING 1.2.3.4 (1.2.3.4) 400(428) bytes of data.

--- 1.2.3.4 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms
---- 8< ------
Stephen Hemminger May 11, 2017, 7:44 p.m. UTC | #3
On Thu, 11 May 2017 21:10:11 +0200
Fredrik Markström <fredrik.markstrom@gmail.com> wrote:

> On Thu, May 11, 2017 at 6:01 PM, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> > On Thu, 11 May 2017 15:46:27 +0200
> > Fredrik Markstrom <fredrik.markstrom@gmail.com> wrote:
> >  
> >> From: Fredrik Markström <fredrik.markstrom@gmail.com>
> >>
> >> is_skb_forwardable() currently checks if the packet size is <= mtu of
> >> the receiving interface. This is not consistent with most of the hardware
> >> ethernet drivers that happily receives packets larger then MTU.  
> >
> > Wrong.  
> 
> What is "Wrong" ? I was initially skeptical to implement this patch,
> since it feels odd to have different MTU:s set on the two sides of a
> link. After consulting some IP people and the RFC:s I kind of changed
> my mind and thought I'd give it a shot. In the RFCs I couldn't find
> anything that defined when and when not a received packet should be
> dropped.
> 
> >
> > Hardware interfaces are free to drop any packet greater than MTU (actually MTU + VLAN).
> > The actual limit is a function of the hardware. Some hardware can only limit by
> > power of 2; some can only limit frames larger than 1500; some have no limiting at all.  
> 
> Agreed. The purpose of these patches is to be able to configure an
> veth interface to mimic these different behaviors. Non of the Ethernet
> interfaces I have access to drops packets due to them being larger
> then the configured MTU like veth does.
> 
> Being able to mimic real Ethernet hardware is useful when
> consolidating hardware using containers/namespaces.
> 
> In a reply to a comment from David Miller in my previous version of
> the patch I attached the example below to demonstrate the case in
> detail.
> 
> This works with all ethernet hardware setups I have access to:
> 

Why not just use an iptables rule to enforce what ever semantic you
want?
Teco Boot May 12, 2017, 8:05 a.m. UTC | #4
IP MTU and L2 MTU are different animals.

IMHO IP MTU is for fragmentation at sender of a link. There is no need dropping IP packets at receiver with size > configured IP MTU. IP packets with size > receiver L2 MTU will be dropped at sub-IP layer.

For this patch: if veth has some notion on L2 MTU (e.g. buffer size limits), there has to be checks for it. I don't know why configuring MRU helps, more config, more mistakes. If there is no need for dropping the packet: don't.

Teco


> Op 11 mei 2017, om 21:10 heeft Fredrik Markström <fredrik.markstrom@gmail.com> het volgende geschreven:
> 
> On Thu, May 11, 2017 at 6:01 PM, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
>> On Thu, 11 May 2017 15:46:27 +0200
>> Fredrik Markstrom <fredrik.markstrom@gmail.com> wrote:
>> 
>>> From: Fredrik Markström <fredrik.markstrom@gmail.com>
>>> 
>>> is_skb_forwardable() currently checks if the packet size is <= mtu of
>>> the receiving interface. This is not consistent with most of the hardware
>>> ethernet drivers that happily receives packets larger then MTU.
>> 
>> Wrong.
> 
> What is "Wrong" ? I was initially skeptical to implement this patch,
> since it feels odd to have different MTU:s set on the two sides of a
> link. After consulting some IP people and the RFC:s I kind of changed
> my mind and thought I'd give it a shot. In the RFCs I couldn't find
> anything that defined when and when not a received packet should be
> dropped.
> 
>> 
>> Hardware interfaces are free to drop any packet greater than MTU (actually MTU + VLAN).
>> The actual limit is a function of the hardware. Some hardware can only limit by
>> power of 2; some can only limit frames larger than 1500; some have no limiting at all.
> 
> Agreed. The purpose of these patches is to be able to configure an
> veth interface to mimic these different behaviors. Non of the Ethernet
> interfaces I have access to drops packets due to them being larger
> then the configured MTU like veth does.
> 
> Being able to mimic real Ethernet hardware is useful when
> consolidating hardware using containers/namespaces.
> 
> In a reply to a comment from David Miller in my previous version of
> the patch I attached the example below to demonstrate the case in
> detail.
> 
> This works with all ethernet hardware setups I have access to:
> 
> ---- 8< ------
> # Host A eth2 and Host B eth0 is on the same network.
> 
> # On HOST A
> % ip address add 1.2.3.4/24 dev eth2
> % ip link set eth2 mtu 300 up
> 
> % # HOST B
> % ip address add 1.2.3.5/24 dev eth0
> % ip link set eth0 mtu 1000 up
> % ping -c 1 -W 1 -s 400 1.2.3.4
> PING 1.2.3.4 (1.2.3.4) 400(428) bytes of data.
> 408 bytes from 1.2.3.4: icmp_seq=1 ttl=64 time=1.57 ms
> 
> --- 1.2.3.4 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 1.573/1.573/1.573/0.000 ms
> ---- 8< ------
> 
> 
> But it doesn't work with veth:
> 
> ---- 8< ------
> # veth0 and veth1 is a veth pair and veth1 has ben moved to a separate
> network namespace.
> % # NS A
> % ip address add 1.2.3.4/24 dev veth0
> % ip link set veth0 mtu 300 up
> 
> % # NS B
> % ip address add 1.2.3.5/24 dev veth1
> % ip link set veth1 mtu 1000 up
> % ping -c 1 -W 1 -s 400 1.2.3.4
> PING 1.2.3.4 (1.2.3.4) 400(428) bytes of data.
> 
> --- 1.2.3.4 ping statistics ---
> 1 packets transmitted, 0 received, 100% packet loss, time 0ms
> ---- 8< ------
> 
> -- 
> /Fredrik
Fredrik Markstrom May 12, 2017, 12:48 p.m. UTC | #5
On Fri, May 12, 2017 at 10:05 AM, Teco Boot <teco@inf-net.nl> wrote:
> IP MTU and L2 MTU are different animals.
>
> IMHO IP MTU is for fragmentation at sender of a link. There is no need dropping IP packets at receiver with size > configured IP MTU. IP packets with size > receiver L2 MTU will be dropped at sub-IP layer.
>
First, thanks for putting words on the different MTU:s (L2 vs IP MTU)

I agree and don't understand why we are dropping packets due to
receiver IP MTU at all and would not mind removing that test
altogether, at least for veth.

/Fredrik


> For this patch: if veth has some notion on L2 MTU (e.g. buffer size limits), there has to be checks for it. I don't know why configuring MRU helps, more config, more mistakes. If there is no need for dropping the packet: don't.
>
> Teco
>
>
>> Op 11 mei 2017, om 21:10 heeft Fredrik Markström <fredrik.markstrom@gmail.com> het volgende geschreven:
>>
>> On Thu, May 11, 2017 at 6:01 PM, Stephen Hemminger
>> <stephen@networkplumber.org> wrote:
>>> On Thu, 11 May 2017 15:46:27 +0200
>>> Fredrik Markstrom <fredrik.markstrom@gmail.com> wrote:
>>>
>>>> From: Fredrik Markström <fredrik.markstrom@gmail.com>
>>>>
>>>> is_skb_forwardable() currently checks if the packet size is <= mtu of
>>>> the receiving interface. This is not consistent with most of the hardware
>>>> ethernet drivers that happily receives packets larger then MTU.
>>>
>>> Wrong.
>>
>> What is "Wrong" ? I was initially skeptical to implement this patch,
>> since it feels odd to have different MTU:s set on the two sides of a
>> link. After consulting some IP people and the RFC:s I kind of changed
>> my mind and thought I'd give it a shot. In the RFCs I couldn't find
>> anything that defined when and when not a received packet should be
>> dropped.
>>
>>>
>>> Hardware interfaces are free to drop any packet greater than MTU (actually MTU + VLAN).
>>> The actual limit is a function of the hardware. Some hardware can only limit by
>>> power of 2; some can only limit frames larger than 1500; some have no limiting at all.
>>
>> Agreed. The purpose of these patches is to be able to configure an
>> veth interface to mimic these different behaviors. Non of the Ethernet
>> interfaces I have access to drops packets due to them being larger
>> then the configured MTU like veth does.
>>
>> Being able to mimic real Ethernet hardware is useful when
>> consolidating hardware using containers/namespaces.
>>
>> In a reply to a comment from David Miller in my previous version of
>> the patch I attached the example below to demonstrate the case in
>> detail.
>>
>> This works with all ethernet hardware setups I have access to:
>>
>> ---- 8< ------
>> # Host A eth2 and Host B eth0 is on the same network.
>>
>> # On HOST A
>> % ip address add 1.2.3.4/24 dev eth2
>> % ip link set eth2 mtu 300 up
>>
>> % # HOST B
>> % ip address add 1.2.3.5/24 dev eth0
>> % ip link set eth0 mtu 1000 up
>> % ping -c 1 -W 1 -s 400 1.2.3.4
>> PING 1.2.3.4 (1.2.3.4) 400(428) bytes of data.
>> 408 bytes from 1.2.3.4: icmp_seq=1 ttl=64 time=1.57 ms
>>
>> --- 1.2.3.4 ping statistics ---
>> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
>> rtt min/avg/max/mdev = 1.573/1.573/1.573/0.000 ms
>> ---- 8< ------
>>
>>
>> But it doesn't work with veth:
>>
>> ---- 8< ------
>> # veth0 and veth1 is a veth pair and veth1 has ben moved to a separate
>> network namespace.
>> % # NS A
>> % ip address add 1.2.3.4/24 dev veth0
>> % ip link set veth0 mtu 300 up
>>
>> % # NS B
>> % ip address add 1.2.3.5/24 dev veth1
>> % ip link set veth1 mtu 1000 up
>> % ping -c 1 -W 1 -s 400 1.2.3.4
>> PING 1.2.3.4 (1.2.3.4) 400(428) bytes of data.
>>
>> --- 1.2.3.4 ping statistics ---
>> 1 packets transmitted, 0 received, 100% packet loss, time 0ms
>> ---- 8< ------
>>
>> --
>> /Fredrik
>
Fredrik Markstrom May 12, 2017, 2:35 p.m. UTC | #6
On Thu, May 11, 2017 at 9:44 PM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> On Thu, 11 May 2017 21:10:11 +0200
> Fredrik Markström <fredrik.markstrom@gmail.com> wrote:
>
>> On Thu, May 11, 2017 at 6:01 PM, Stephen Hemminger
>> <stephen@networkplumber.org> wrote:
>> > On Thu, 11 May 2017 15:46:27 +0200
>> > Fredrik Markstrom <fredrik.markstrom@gmail.com> wrote:
>> >
>> >> From: Fredrik Markström <fredrik.markstrom@gmail.com>
>> >>
>> >> is_skb_forwardable() currently checks if the packet size is <= mtu of
>> >> the receiving interface. This is not consistent with most of the hardware
>> >> ethernet drivers that happily receives packets larger then MTU.
>> >
>> > Wrong.
>>
>> What is "Wrong" ? I was initially skeptical to implement this patch,
>> since it feels odd to have different MTU:s set on the two sides of a
>> link. After consulting some IP people and the RFC:s I kind of changed
>> my mind and thought I'd give it a shot. In the RFCs I couldn't find
>> anything that defined when and when not a received packet should be
>> dropped.
>>
>> >
>> > Hardware interfaces are free to drop any packet greater than MTU (actually MTU + VLAN).
>> > The actual limit is a function of the hardware. Some hardware can only limit by
>> > power of 2; some can only limit frames larger than 1500; some have no limiting at all.
>>
>> Agreed. The purpose of these patches is to be able to configure an
>> veth interface to mimic these different behaviors. Non of the Ethernet
>> interfaces I have access to drops packets due to them being larger
>> then the configured MTU like veth does.
>>
>> Being able to mimic real Ethernet hardware is useful when
>> consolidating hardware using containers/namespaces.
>>
>> In a reply to a comment from David Miller in my previous version of
>> the patch I attached the example below to demonstrate the case in
>> detail.
>>
>> This works with all ethernet hardware setups I have access to:
>>
>
> Why not just use an iptables rule to enforce what ever semantic you
> want?
>

I think that would be ok, but I can't find anything but TCPMSS but
that's only for TCP.

/Fredrik
diff mbox

Patch

diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
index 1f3295e274d0..dbbe48ade204 100644
--- a/drivers/net/ipvlan/ipvlan_core.c
+++ b/drivers/net/ipvlan/ipvlan_core.c
@@ -234,7 +234,8 @@  void ipvlan_process_multicast(struct work_struct *work)
 				nskb->pkt_type = pkt_type;
 				nskb->dev = ipvlan->dev;
 				if (tx_pkt)
-					ret = dev_forward_skb(ipvlan->dev, nskb);
+					ret = dev_forward_skb(ipvlan->dev,
+							      nskb, 0);
 				else
 					ret = netif_rx(nskb);
 			}
@@ -301,7 +302,7 @@  static int ipvlan_rcv_frame(struct ipvl_addr *addr, struct sk_buff **pskb,
 
 	if (local) {
 		skb->pkt_type = PACKET_HOST;
-		if (dev_forward_skb(ipvlan->dev, skb) == NET_RX_SUCCESS)
+		if (dev_forward_skb(ipvlan->dev, skb, 0) == NET_RX_SUCCESS)
 			success = true;
 	} else {
 		ret = RX_HANDLER_ANOTHER;
@@ -547,7 +548,7 @@  static int ipvlan_xmit_mode_l2(struct sk_buff *skb, struct net_device *dev)
 		 * the skb for the main-dev. At the RX side we just return
 		 * RX_PASS for it to be processed further on the stack.
 		 */
-		return dev_forward_skb(ipvlan->phy_dev, skb);
+		return dev_forward_skb(ipvlan->phy_dev, skb, 0);
 
 	} else if (is_multicast_ether_addr(eth->h_dest)) {
 		ipvlan_skb_crossing_ns(skb, NULL);
diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 9261722960a7..4db2876c1e44 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -202,7 +202,7 @@  static int macvlan_broadcast_one(struct sk_buff *skb,
 	struct net_device *dev = vlan->dev;
 
 	if (local)
-		return __dev_forward_skb(dev, skb);
+		return __dev_forward_skb(dev, skb, 0);
 
 	skb->dev = dev;
 	if (ether_addr_equal_64bits(eth->h_dest, dev->broadcast))
@@ -495,7 +495,7 @@  static int macvlan_queue_xmit(struct sk_buff *skb, struct net_device *dev)
 		dest = macvlan_hash_lookup(port, eth->h_dest);
 		if (dest && dest->mode == MACVLAN_MODE_BRIDGE) {
 			/* send to lowerdev first for its network taps */
-			dev_forward_skb(vlan->lowerdev, skb);
+			dev_forward_skb(vlan->lowerdev, skb, 0);
 
 			return NET_XMIT_SUCCESS;
 		}
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 8c39d6d690e5..561da3a63b8a 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -116,7 +116,7 @@  static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
 		goto drop;
 	}
 
-	if (likely(dev_forward_skb(rcv, skb) == NET_RX_SUCCESS)) {
+	if (likely(dev_forward_skb(rcv, skb, 0) == NET_RX_SUCCESS)) {
 		struct pcpu_vstats *stats = this_cpu_ptr(dev->vstats);
 
 		u64_stats_update_begin(&stats->syncp);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 97456b2539e4..f207b083ffec 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3282,16 +3282,16 @@  int dev_change_xdp_fd(struct net_device *dev, int fd, u32 flags);
 struct sk_buff *validate_xmit_skb_list(struct sk_buff *skb, struct net_device *dev);
 struct sk_buff *dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 				    struct netdev_queue *txq, int *ret);
-int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb);
-int dev_forward_skb(struct net_device *dev, struct sk_buff *skb);
+int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb, int mtu);
+int dev_forward_skb(struct net_device *dev, struct sk_buff *skb, int mtu);
 bool is_skb_forwardable(const struct net_device *dev,
-			const struct sk_buff *skb);
+			const struct sk_buff *skb, int mtu);
 
 static __always_inline int ____dev_forward_skb(struct net_device *dev,
-					       struct sk_buff *skb)
+					       struct sk_buff *skb, int mtu)
 {
 	if (skb_orphan_frags(skb, GFP_ATOMIC) ||
-	    unlikely(!is_skb_forwardable(dev, skb))) {
+	    unlikely(!is_skb_forwardable(dev, skb, mtu))) {
 		atomic_long_inc(&dev->rx_dropped);
 		kfree_skb(skb);
 		return NET_RX_DROP;
diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index 902af6ba481c..15ab57da5ef1 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -35,7 +35,7 @@  static inline int should_deliver(const struct net_bridge_port *p,
 
 int br_dev_queue_push_xmit(struct net *net, struct sock *sk, struct sk_buff *skb)
 {
-	if (!is_skb_forwardable(skb->dev, skb))
+	if (!is_skb_forwardable(skb->dev, skb, 0))
 		goto drop;
 
 	skb_push(skb, ETH_HLEN);
@@ -96,7 +96,7 @@  static void __br_forward(const struct net_bridge_port *to,
 		net = dev_net(indev);
 	} else {
 		if (unlikely(netpoll_tx_running(to->br->dev))) {
-			if (!is_skb_forwardable(skb->dev, skb)) {
+			if (!is_skb_forwardable(skb->dev, skb, 0)) {
 				kfree_skb(skb);
 			} else {
 				skb_push(skb, ETH_HLEN);
diff --git a/net/core/dev.c b/net/core/dev.c
index 533a6d6f6092..f7c53d7c8e26 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1767,14 +1767,18 @@  static inline void net_timestamp_set(struct sk_buff *skb)
 			__net_timestamp(SKB);		\
 	}						\
 
-bool is_skb_forwardable(const struct net_device *dev, const struct sk_buff *skb)
+bool is_skb_forwardable(const struct net_device *dev,
+			const struct sk_buff *skb, int mtu)
 {
 	unsigned int len;
 
 	if (!(dev->flags & IFF_UP))
 		return false;
 
-	len = dev->mtu + dev->hard_header_len + VLAN_HLEN;
+	if (mtu == 0)
+		mtu = dev->mtu;
+
+	len = mtu + dev->hard_header_len + VLAN_HLEN;
 	if (skb->len <= len)
 		return true;
 
@@ -1788,9 +1792,9 @@  bool is_skb_forwardable(const struct net_device *dev, const struct sk_buff *skb)
 }
 EXPORT_SYMBOL_GPL(is_skb_forwardable);
 
-int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb)
+int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb, int mtu)
 {
-	int ret = ____dev_forward_skb(dev, skb);
+	int ret = ____dev_forward_skb(dev, skb, mtu);
 
 	if (likely(!ret)) {
 		skb->protocol = eth_type_trans(skb, dev);
@@ -1806,6 +1810,7 @@  EXPORT_SYMBOL_GPL(__dev_forward_skb);
  *
  * @dev: destination network device
  * @skb: buffer to forward
+ * @mtu: Maximum size to forward. If 0 dev->mtu is used.
  *
  * return values:
  *	NET_RX_SUCCESS	(no congestion)
@@ -1819,9 +1824,9 @@  EXPORT_SYMBOL_GPL(__dev_forward_skb);
  * we have to clear all information in the skb that could
  * impact namespace isolation.
  */
-int dev_forward_skb(struct net_device *dev, struct sk_buff *skb)
+int dev_forward_skb(struct net_device *dev, struct sk_buff *skb, int mtu)
 {
-	return __dev_forward_skb(dev, skb) ?: netif_rx_internal(skb);
+	return __dev_forward_skb(dev, skb, mtu) ?: netif_rx_internal(skb);
 }
 EXPORT_SYMBOL_GPL(dev_forward_skb);
 
diff --git a/net/core/filter.c b/net/core/filter.c
index ebaeaf2e46e8..3f3eb26e7ea1 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1632,13 +1632,13 @@  static const struct bpf_func_proto bpf_csum_update_proto = {
 
 static inline int __bpf_rx_skb(struct net_device *dev, struct sk_buff *skb)
 {
-	return dev_forward_skb(dev, skb);
+	return dev_forward_skb(dev, skb, 0);
 }
 
 static inline int __bpf_rx_skb_no_mac(struct net_device *dev,
 				      struct sk_buff *skb)
 {
-	int ret = ____dev_forward_skb(dev, skb);
+	int ret = ____dev_forward_skb(dev, skb, 0);
 
 	if (likely(!ret)) {
 		skb->dev = dev;
diff --git a/net/l2tp/l2tp_eth.c b/net/l2tp/l2tp_eth.c
index 6fd41d7afe1e..1258555b6578 100644
--- a/net/l2tp/l2tp_eth.c
+++ b/net/l2tp/l2tp_eth.c
@@ -164,7 +164,7 @@  static void l2tp_eth_dev_recv(struct l2tp_session *session, struct sk_buff *skb,
 	skb_dst_drop(skb);
 	nf_reset(skb);
 
-	if (dev_forward_skb(dev, skb) == NET_RX_SUCCESS) {
+	if (dev_forward_skb(dev, skb, 0) == NET_RX_SUCCESS) {
 		atomic_long_inc(&priv->rx_packets);
 		atomic_long_add(data_len, &priv->rx_bytes);
 	} else {