diff mbox

[v3,net-next] net: ipv6: Add sysctl entry to disable MTU updates from RA

Message ID 1421773565-5181-1-git-send-email-harouth@codeaurora.org
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Harout Hedeshian Jan. 20, 2015, 5:06 p.m. UTC
The kernel forcefully applies MTU values received in router
advertisements provided the new MTU is less than the current. This
behavior is undesirable when the user space is managing the MTU. Instead
a sysctl flag 'accept_ra_mtu' is introduced such that the user space
can control whether or not RA provided MTU updates should be applied. The
default behavior is unchanged; user space must explicitly set this flag
to 0 for RA MTUs to be ignored.

Signed-off-by: Harout Hedeshian <harouth@codeaurora.org>
---
 Documentation/networking/ip-sysctl.txt |  7 +++++++
 include/linux/ipv6.h                   |  1 +
 include/uapi/linux/ipv6.h              |  1 +
 net/ipv6/addrconf.c                    | 10 ++++++++++
 net/ipv6/ndisc.c                       |  2 +-
 5 files changed, 20 insertions(+), 1 deletion(-)

Comments

David Miller Jan. 25, 2015, 7:14 a.m. UTC | #1
From: Harout Hedeshian <harouth@codeaurora.org>
Date: Tue, 20 Jan 2015 10:06:05 -0700

> The kernel forcefully applies MTU values received in router
> advertisements provided the new MTU is less than the current. This
> behavior is undesirable when the user space is managing the MTU. Instead
> a sysctl flag 'accept_ra_mtu' is introduced such that the user space
> can control whether or not RA provided MTU updates should be applied. The
> default behavior is unchanged; user space must explicitly set this flag
> to 0 for RA MTUs to be ignored.
> 
> Signed-off-by: Harout Hedeshian <harouth@codeaurora.org>

Under what circumstances would userland ignore a router advertized
MTU, and are the RFCs ok with this?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vadym Kochan Jan. 25, 2015, 7:21 a.m. UTC | #2
On Sat, Jan 24, 2015 at 11:14:32PM -0800, David Miller wrote:
> From: Harout Hedeshian <harouth@codeaurora.org>
> Date: Tue, 20 Jan 2015 10:06:05 -0700
> 
> > The kernel forcefully applies MTU values received in router
> > advertisements provided the new MTU is less than the current. This
> > behavior is undesirable when the user space is managing the MTU. Instead
> > a sysctl flag 'accept_ra_mtu' is introduced such that the user space
> > can control whether or not RA provided MTU updates should be applied. The
> > default behavior is unchanged; user space must explicitly set this flag
> > to 0 for RA MTUs to be ignored.
> > 
> > Signed-off-by: Harout Hedeshian <harouth@codeaurora.org>
> 
> Under what circumstances would userland ignore a router advertized
> MTU, and are the RFCs ok with this?
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Hi,

I don't know if it make sense but I had the same use case when was
working on supporting IPv6 infrastructure for home gateway.
One of the provider had requirements to have ability set force IPv6 MTU
value via TR parameters and disable update it via RA.

Regards,
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Harout Hedeshian Jan. 25, 2015, 4:28 p.m. UTC | #3
On 01/25/2015 12:21 AM, Vadim Kochan wrote:
> On Sat, Jan 24, 2015 at 11:14:32PM -0800, David Miller wrote:
>> From: Harout Hedeshian <harouth@codeaurora.org>
>> Date: Tue, 20 Jan 2015 10:06:05 -0700
>>
>>> The kernel forcefully applies MTU values received in router
>>> advertisements provided the new MTU is less than the current. This
>>> behavior is undesirable when the user space is managing the MTU.
> Instead
>>> a sysctl flag 'accept_ra_mtu' is introduced such that the user space
>>> can control whether or not RA provided MTU updates should be applied.
> The
>>> default behavior is unchanged; user space must explicitly set this
> flag
>>> to 0 for RA MTUs to be ignored.
>>>
>>> Signed-off-by: Harout Hedeshian <harouth@codeaurora.org>
>> Under what circumstances would userland ignore a router advertized
>> MTU, and are the RFCs ok with this?
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Hi,
>
> I don't know if it make sense but I had the same use case when was
> working on supporting IPv6 infrastructure for home gateway.
> One of the provider had requirements to have ability set force IPv6 MTU
> value via TR parameters and disable update it via RA.
Hi David,

We are optionally allowing the kernel shift this responsibility to the 
userland. The idea would be that the kernel would ignore it, not so much 
the userland. Just like Vadim, we may not want to use the MTU value 
which comes from the network. Instead, we get an MTU value from the 
cellular modem via configuration message, and that is the MTU we use.

In any case, none of the RFCs state that the kernel must update the MTU 
and that the userland cannot. In fact, there is no mention of 
kernel/user space at all in the RFC for this particular RA message. What 
if someone wants to listen to these RA messages from userland and update 
the MTU? Surely, that won't violate the RFC. In such a case, the kernel 
is unnecessarily forcing policy on the user space.

RFC4861 section 4.6.4 defines the MTU update option (RA option 5) for RA 
messages. I don't see any language where the receiver "MUST" apply this 
option. It merely states that the MTU value in the RA is "The 
recommended MTU for the link." The description goes on to point out why 
this option can be used by the router, but does not specifically enforce 
it. The only receive action specifically enforced by the RFC is that 
"This option MUST be silently ignored for other Neighbor Discovery 
messages."

The risk of not applying the MTU updates is that packet may get dropped 
if path MTU discovery is disabled or broken on the network. HOWEVER, 
anyone explicitly setting accept_ra_mtu to 0 is already taking 
responsibility for enforcing the correct MTU. Since this patch by 
default does not change the kernel behavior, I don't see it breaking for 
users who are unaware of this option.


Thanks,
Harout

--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Jan. 25, 2015, 10:55 p.m. UTC | #4
From: Harout Hedeshian <harouth@codeaurora.org>
Date: Tue, 20 Jan 2015 10:06:05 -0700

> The kernel forcefully applies MTU values received in router
> advertisements provided the new MTU is less than the current. This
> behavior is undesirable when the user space is managing the MTU. Instead
> a sysctl flag 'accept_ra_mtu' is introduced such that the user space
> can control whether or not RA provided MTU updates should be applied. The
> default behavior is unchanged; user space must explicitly set this flag
> to 0 for RA MTUs to be ignored.
> 
> Signed-off-by: Harout Hedeshian <harouth@codeaurora.org>

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dan Williams Jan. 26, 2015, 3:03 p.m. UTC | #5
On Sun, 2015-01-25 at 09:28 -0700, Harout Hedeshian wrote:
> On 01/25/2015 12:21 AM, Vadim Kochan wrote:
> > On Sat, Jan 24, 2015 at 11:14:32PM -0800, David Miller wrote:
> >> From: Harout Hedeshian <harouth@codeaurora.org>
> >> Date: Tue, 20 Jan 2015 10:06:05 -0700
> >>
> >>> The kernel forcefully applies MTU values received in router
> >>> advertisements provided the new MTU is less than the current. This
> >>> behavior is undesirable when the user space is managing the MTU.
> > Instead
> >>> a sysctl flag 'accept_ra_mtu' is introduced such that the user space
> >>> can control whether or not RA provided MTU updates should be applied.
> > The
> >>> default behavior is unchanged; user space must explicitly set this
> > flag
> >>> to 0 for RA MTUs to be ignored.
> >>>
> >>> Signed-off-by: Harout Hedeshian <harouth@codeaurora.org>
> >> Under what circumstances would userland ignore a router advertized
> >> MTU, and are the RFCs ok with this?
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe netdev" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Hi,
> >
> > I don't know if it make sense but I had the same use case when was
> > working on supporting IPv6 infrastructure for home gateway.
> > One of the provider had requirements to have ability set force IPv6 MTU
> > value via TR parameters and disable update it via RA.
> Hi David,
> 
> We are optionally allowing the kernel shift this responsibility to the 
> userland. The idea would be that the kernel would ignore it, not so much 
> the userland. Just like Vadim, we may not want to use the MTU value 
> which comes from the network. Instead, we get an MTU value from the 
> cellular modem via configuration message, and that is the MTU we use.

Are you talking about an ethernet interface exposed by the modem, or a
separate network interface connected to a normal LAN?  In the modem
case, why would the network-provided RA's MTU be incorrect, but the
modem's MTU be correct?  If the normal LAN case, why would the modem's
MTU be correct for a different network that is broadcasting its own RAs?
Just curious...

Dan

> In any case, none of the RFCs state that the kernel must update the MTU 
> and that the userland cannot. In fact, there is no mention of 
> kernel/user space at all in the RFC for this particular RA message. What 
> if someone wants to listen to these RA messages from userland and update 
> the MTU? Surely, that won't violate the RFC. In such a case, the kernel 
> is unnecessarily forcing policy on the user space.
> 
> RFC4861 section 4.6.4 defines the MTU update option (RA option 5) for RA 
> messages. I don't see any language where the receiver "MUST" apply this 
> option. It merely states that the MTU value in the RA is "The 
> recommended MTU for the link." The description goes on to point out why 
> this option can be used by the router, but does not specifically enforce 
> it. The only receive action specifically enforced by the RFC is that 
> "This option MUST be silently ignored for other Neighbor Discovery 
> messages."
> 
> The risk of not applying the MTU updates is that packet may get dropped 
> if path MTU discovery is disabled or broken on the network. HOWEVER, 
> anyone explicitly setting accept_ra_mtu to 0 is already taking 
> responsibility for enforcing the correct MTU. Since this patch by 
> default does not change the kernel behavior, I don't see it breaking for 
> users who are unaware of this option.
> 
> 
> Thanks,
> Harout
> 
> --
> Employee of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Harout Hedeshian Jan. 26, 2015, 4:16 p.m. UTC | #6
> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> On Behalf Of Dan Williams
> Sent: Monday, January 26, 2015 8:04 AM
> To: Harout Hedeshian
> Cc: David Miller; netdev@vger.kernel.org; Vadim Kochan
> Subject: Re: [PATCH v3 net-next] net: ipv6: Add sysctl entry to disable
> MTU updates from RA
> 
> On Sun, 2015-01-25 at 09:28 -0700, Harout Hedeshian wrote:
> > On 01/25/2015 12:21 AM, Vadim Kochan wrote:
> > > On Sat, Jan 24, 2015 at 11:14:32PM -0800, David Miller wrote:
> > >> From: Harout Hedeshian <harouth@codeaurora.org>
> > >> Date: Tue, 20 Jan 2015 10:06:05 -0700
> > >>
> > >>> The kernel forcefully applies MTU values received in router
> > >>> advertisements provided the new MTU is less than the current. This
> > >>> behavior is undesirable when the user space is managing the MTU.
> > > Instead
> > >>> a sysctl flag 'accept_ra_mtu' is introduced such that the user
> > >>> space can control whether or not RA provided MTU updates should be
> applied.
> > > The
> > >>> default behavior is unchanged; user space must explicitly set this
> > > flag
> > >>> to 0 for RA MTUs to be ignored.
> > >>>
> > >>> Signed-off-by: Harout Hedeshian <harouth@codeaurora.org>
> > >> Under what circumstances would userland ignore a router advertized
> > >> MTU, and are the RFCs ok with this?
> > >> --
> > >> To unsubscribe from this list: send the line "unsubscribe netdev"
> > >> in the body of a message to majordomo@vger.kernel.org More
> > >> majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > Hi,
> > >
> > > I don't know if it make sense but I had the same use case when was
> > > working on supporting IPv6 infrastructure for home gateway.
> > > One of the provider had requirements to have ability set force IPv6
> > > MTU value via TR parameters and disable update it via RA.
> > Hi David,
> >
> > We are optionally allowing the kernel shift this responsibility to the
> > userland. The idea would be that the kernel would ignore it, not so
> > much the userland. Just like Vadim, we may not want to use the MTU
> > value which comes from the network. Instead, we get an MTU value from
> > the cellular modem via configuration message, and that is the MTU we
> use.
> 
> Are you talking about an ethernet interface exposed by the modem, or a
> separate network interface connected to a normal LAN?  In the modem
> case, why would the network-provided RA's MTU be incorrect, but the
> modem's MTU be correct?  If the normal LAN case, why would the modem's
> MTU be correct for a different network that is broadcasting its own RAs?
> Just curious...
> 
> Dan

Hi Dan,

This is a really good question. In the case of a normal LAN, we will allow the kernel to handle the MTU values as they have been today (basically, keep the accept_ra_mtu=1). The issue is not really about the correctness of the RA MTU value (we assume this value is correct, otherwise we are in serious trouble).The issue is on the modem interfaces. To the modem, each protocol family is its own interface. This analogy breaks down for us in Linux because v4 and v6 are fundamentally the same net_device interface. From what I can tell, there is no /proc/sys/net/ipv4/conf/<dev>/mtu which means that IPv4 will take the MTU value from dev->mtu (see ipv4_mtu() ). In contrast, IPv6 maintains a separate MTU and will apply the RA MTUs such that they are less than the device's MTU (dev->mtu). For consistency, we have been asked to always pick the minimum value of the IPv4 and IPv6 MTU, and that will become the overall interface MTU. If the kernel goes and changes the V6 MTU without us knowing, the userland daemon which maintains the MTU parity will be out of sync. We *could* theoretically let the kernel apply RA MTU updates and we listen for netlink events, but that is unnecessarily complicated as we are already listening in multiple places for these MTU updates. Additionally, we have a problem where the default dev->mtu is 1500 bytes. If we have an IPv6-only network, then it is possible that the network will want to use an MTU > 1500 (esp. multimedia optimized carrier networks). Currently, ndisc.c compares the new MTU value with dev->mtu, if bigger, the RA is ignored. I don't see a good alternative to this because there is no way for ndisc.c to know what the device's maximum physical capabilities are (or that we even want to use such a large MTU). Because of that, we have to have an out-of-band mechanism to adjust the interface MTU since we know that the hardware is capable of transmitting packets greater than 1500 bytes. Thus, instead of letting the kernel handle RA MTUs in some special circumstances, it is safer and cleaner to disable RA MTUs on the modem interface altogether and let userland pick the correct MTU. 

One way to clean up this mess would be to make some changes in the way the kernel handles MTUs. 
1. Make dev->mtu actually be the MTU the device is capable. For example, jumbo frame capable devices would set this to 9000 upon enumeration instead of 1500. This value would not be editable from userland. There would no longer be a need for driver to implement the MTU adjustment ndo.
2. *ALL* protocols must maintain their own MTU values. This would mean a new per-device proc entry for IPv4 at a minimum. The defaults of these values can remain 1500.

If we did this, then the kernel can apply RA MTUs > 1500, and we would get it for free (no changes in IPv6 code). IPv4 would be parity with IPv6 in terms of decoupling MTU from dev->mtu. This means userland can completely not care about the IPv6 MTU, and we can push back on the MTU consistency requirement. Of course, this is a pretty drastic change in interpretation of dev->mtu and would break a lot of userland utilities. Or maybe we leave #1 editable in userland so the utilities and IOCTLs still work, however, userland will now have to additionally adjust IPv4 MTU...

If the kernel community likes this approach, I would be happy to upload some patches which creates a new definition for IPv4 MTU. I think #1 will need more discussion.

If v4 and v6 were truly decoupled, then we could get rid of this minimum selection mess and special case handling for large IPv6 MTUs and this patch could go away.


Thanks,
Harout



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa Jan. 26, 2015, 4:19 p.m. UTC | #7
On Sa, 2015-01-24 at 23:14 -0800, David Miller wrote:
> From: Harout Hedeshian <harouth@codeaurora.org>
> Date: Tue, 20 Jan 2015 10:06:05 -0700
> 
> > The kernel forcefully applies MTU values received in router
> > advertisements provided the new MTU is less than the current. This
> > behavior is undesirable when the user space is managing the MTU. Instead
> > a sysctl flag 'accept_ra_mtu' is introduced such that the user space
> > can control whether or not RA provided MTU updates should be applied. The
> > default behavior is unchanged; user space must explicitly set this flag
> > to 0 for RA MTUs to be ignored.
> > 
> > Signed-off-by: Harout Hedeshian <harouth@codeaurora.org>
> 
> Under what circumstances would userland ignore a router advertized
> MTU, and are the RFCs ok with this?

Sometimes those advertised MTUs are used to perform DoS attacks /
performance-degrading attacks (force generation of fragments e.g.) on
hosts. In larger, maybe non-controlled networks, it would be desirable
if one could disable acceptance of the MTU option.

Of course, there are similar issues with other options in RAs, too, but
mostly they result in more catastrophic connection failures. ;)

Bye,
Hannes


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjørn Mork Jan. 26, 2015, 5:27 p.m. UTC | #8
David Miller <davem@davemloft.net> writes:

> Under what circumstances would userland ignore a router advertized
> MTU, and are the RFCs ok with this?

RFC 4861 ( https://tools.ietf.org/html/rfc4861#page-54 ) says:

  "If the MTU option is present, hosts SHOULD copy the option's value
   into LinkMTU so long as the value is greater than or equal to the
   minimum link MTU [IPv6] and does not exceed the maximum LinkMTU value
   specified in the link-type-specific document (e.g., [IPv6-ETHER])."

So the RFC acknowledge that there may exist valid reasons in particular
circumstances to ignore the MTU option.  As others have stated: This
might be necessary for DoS prevention, working around bugs in other
equipment, etc.


Bjørn
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dan Williams Jan. 26, 2015, 9:32 p.m. UTC | #9
On Mon, 2015-01-26 at 09:16 -0700, Harout Hedeshian wrote:
> 
> > -----Original Message-----
> > From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> > On Behalf Of Dan Williams
> > Sent: Monday, January 26, 2015 8:04 AM
> > To: Harout Hedeshian
> > Cc: David Miller; netdev@vger.kernel.org; Vadim Kochan
> > Subject: Re: [PATCH v3 net-next] net: ipv6: Add sysctl entry to disable
> > MTU updates from RA
> > 
> > On Sun, 2015-01-25 at 09:28 -0700, Harout Hedeshian wrote:
> > > On 01/25/2015 12:21 AM, Vadim Kochan wrote:
> > > > On Sat, Jan 24, 2015 at 11:14:32PM -0800, David Miller wrote:
> > > >> From: Harout Hedeshian <harouth@codeaurora.org>
> > > >> Date: Tue, 20 Jan 2015 10:06:05 -0700
> > > >>
> > > >>> The kernel forcefully applies MTU values received in router
> > > >>> advertisements provided the new MTU is less than the current. This
> > > >>> behavior is undesirable when the user space is managing the MTU.
> > > > Instead
> > > >>> a sysctl flag 'accept_ra_mtu' is introduced such that the user
> > > >>> space can control whether or not RA provided MTU updates should be
> > applied.
> > > > The
> > > >>> default behavior is unchanged; user space must explicitly set this
> > > > flag
> > > >>> to 0 for RA MTUs to be ignored.
> > > >>>
> > > >>> Signed-off-by: Harout Hedeshian <harouth@codeaurora.org>
> > > >> Under what circumstances would userland ignore a router advertized
> > > >> MTU, and are the RFCs ok with this?
> > > >> --
> > > >> To unsubscribe from this list: send the line "unsubscribe netdev"
> > > >> in the body of a message to majordomo@vger.kernel.org More
> > > >> majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > Hi,
> > > >
> > > > I don't know if it make sense but I had the same use case when was
> > > > working on supporting IPv6 infrastructure for home gateway.
> > > > One of the provider had requirements to have ability set force IPv6
> > > > MTU value via TR parameters and disable update it via RA.
> > > Hi David,
> > >
> > > We are optionally allowing the kernel shift this responsibility to the
> > > userland. The idea would be that the kernel would ignore it, not so
> > > much the userland. Just like Vadim, we may not want to use the MTU
> > > value which comes from the network. Instead, we get an MTU value from
> > > the cellular modem via configuration message, and that is the MTU we
> > use.
> > 
> > Are you talking about an ethernet interface exposed by the modem, or a
> > separate network interface connected to a normal LAN?  In the modem
> > case, why would the network-provided RA's MTU be incorrect, but the
> > modem's MTU be correct?  If the normal LAN case, why would the modem's
> > MTU be correct for a different network that is broadcasting its own RAs?
> > Just curious...
> > 
> > Dan
> 
> Hi Dan,
> 
> This is a really good question. In the case of a normal LAN, we will allow the kernel to handle the MTU values as they have been today (basically, keep the accept_ra_mtu=1). The issue is not really about the correctness of the RA MTU value (we assume this value is correct, otherwise we are in serious trouble).The issue is on the modem interfaces. To the modem, each protocol family is its own interface. This analogy breaks down for us in Linux because v4 and v6 are fundamentally the same net_device interface. From what I can tell, there is no /proc/sys/net/ipv4/conf/<dev>/mtu which means that IPv4 will take the MTU value from dev->mtu (see ipv4_mtu() ). In contrast, IPv6 maintains a separate MTU and will apply the RA MTUs such that they are less than the device's MTU (dev->mtu). For consistency, we have been asked to always pick the minimum value of the IPv4 and IPv6 MTU, and that will become the overall interface MTU. If the kernel goes and changes the V6 MTU without us kn!
 ow!
>  ing, the userland daemon which maintains the MTU parity will be out of sync. We *could* theoretically let the kernel apply RA MTU updates and we listen for netlink events, but that is unnecessarily complicated as we are already listening in multiple places for these MTU updates. Additionally, we have a problem where the default dev->mtu is 1500 bytes. If we have an IPv6-only network, then it is possible that the network will want to use an MTU > 1500 (esp. multimedia optimized carrier networks). Currently, ndisc.c compares the new MTU value with dev->mtu, if bigger, the RA is ignored. I don't see a good alternative to this because there is no way for ndisc.c to know what the device's maximum physical capabilities are (or that we even want to use such a large MTU). Because of that, we have to have an out-of-band mechanism to adjust the interface MTU since we know that the hardware is capable of transmitting packets greater than 1500 bytes. Thus, instead of letting the kern!
 el!

I believe the IPv4 MTU is taken from the device MTU, eg 'ip link set dev
wwp0s26f7u2i8 mtu XXXX'.  I believe that's also where the IPv6 MTU is
taken from, unless it's set via /proc.

But thanks for the explanation, it makes sense.

Dan

>   handle RA MTUs in some special circumstances, it is safer and!
>   cleaner to disable RA MTUs on the modem interface altogether and let userland pick the correct MTU. 
> 
> One way to clean up this mess would be to make some changes in the way the kernel handles MTUs. 
> 1. Make dev->mtu actually be the MTU the device is capable. For example, jumbo frame capable devices would set this to 9000 upon enumeration instead of 1500. This value would not be editable from userland. There would no longer be a need for driver to implement the MTU adjustment ndo.
> 2. *ALL* protocols must maintain their own MTU values. This would mean a new per-device proc entry for IPv4 at a minimum. The defaults of these values can remain 1500.
> 
> If we did this, then the kernel can apply RA MTUs > 1500, and we would get it for free (no changes in IPv6 code). IPv4 would be parity with IPv6 in terms of decoupling MTU from dev->mtu. This means userland can completely not care about the IPv6 MTU, and we can push back on the MTU consistency requirement. Of course, this is a pretty drastic change in interpretation of dev->mtu and would break a lot of userland utilities. Or maybe we leave #1 editable in userland so the utilities and IOCTLs still work, however, userland will now have to additionally adjust IPv4 MTU...
> 
> If the kernel community likes this approach, I would be happy to upload some patches which creates a new definition for IPv4 MTU. I think #1 will need more discussion.
> 
> If v4 and v6 were truly decoupled, then we could get rid of this minimum selection mess and special case handling for large IPv6 MTUs and this patch could go away.
> 
> 
> Thanks,
> Harout
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 85b0221..a5e4c81 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1287,6 +1287,13 @@  accept_ra_rtr_pref - BOOLEAN
 	Functional default: enabled if accept_ra is enabled.
 			    disabled if accept_ra is disabled.
 
+accept_ra_mtu - BOOLEAN
+	Apply the MTU value specified in RA option 5 (RFC4861). If
+	disabled, the MTU specified in the RA will be ignored.
+
+	Functional default: enabled if accept_ra is enabled.
+			    disabled if accept_ra is disabled.
+
 accept_redirects - BOOLEAN
 	Accept Redirects.
 
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index c694e7b..2805062 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -52,6 +52,7 @@  struct ipv6_devconf {
 	__s32		force_tllao;
 	__s32           ndisc_notify;
 	__s32		suppress_frag_ndisc;
+	__s32		accept_ra_mtu;
 	void		*sysctl;
 };
 
diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
index 73cb02d..437a6a4 100644
--- a/include/uapi/linux/ipv6.h
+++ b/include/uapi/linux/ipv6.h
@@ -169,6 +169,7 @@  enum {
 	DEVCONF_SUPPRESS_FRAG_NDISC,
 	DEVCONF_ACCEPT_RA_FROM_LOCAL,
 	DEVCONF_USE_OPTIMISTIC,
+	DEVCONF_ACCEPT_RA_MTU,
 	DEVCONF_MAX
 };
 
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index d6b4f5d..7dcc065e 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -201,6 +201,7 @@  static struct ipv6_devconf ipv6_devconf __read_mostly = {
 	.disable_ipv6		= 0,
 	.accept_dad		= 1,
 	.suppress_frag_ndisc	= 1,
+	.accept_ra_mtu		= 1,
 };
 
 static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
@@ -238,6 +239,7 @@  static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
 	.disable_ipv6		= 0,
 	.accept_dad		= 1,
 	.suppress_frag_ndisc	= 1,
+	.accept_ra_mtu		= 1,
 };
 
 /* Check if a valid qdisc is available */
@@ -4380,6 +4382,7 @@  static inline void ipv6_store_devconf(struct ipv6_devconf *cnf,
 	array[DEVCONF_NDISC_NOTIFY] = cnf->ndisc_notify;
 	array[DEVCONF_SUPPRESS_FRAG_NDISC] = cnf->suppress_frag_ndisc;
 	array[DEVCONF_ACCEPT_RA_FROM_LOCAL] = cnf->accept_ra_from_local;
+	array[DEVCONF_ACCEPT_RA_MTU] = cnf->accept_ra_mtu;
 }
 
 static inline size_t inet6_ifla6_size(void)
@@ -5259,6 +5262,13 @@  static struct addrconf_sysctl_table
 			.proc_handler	= proc_dointvec,
 		},
 		{
+			.procname	= "accept_ra_mtu",
+			.data		= &ipv6_devconf.accept_ra_mtu,
+			.maxlen		= sizeof(int),
+			.mode		= 0644,
+			.proc_handler	= proc_dointvec,
+		},
+		{
 			/* sentinel */
 		}
 	},
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 6828667..8a9d7c1 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -1348,7 +1348,7 @@  skip_routeinfo:
 		}
 	}
 
-	if (ndopts.nd_opts_mtu) {
+	if (ndopts.nd_opts_mtu && in6_dev->cnf.accept_ra_mtu) {
 		__be32 n;
 		u32 mtu;