Message ID | 1427718657-21674-1-git-send-email-klamm@yandex-team.ru |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
From: Roman Gushchin <klamm@yandex-team.ru> Date: Mon, 30 Mar 2015 15:30:57 +0300 > This patch introduces new ipv6 sysctl: ra_default_route_mtu. > If it's set (> 0), it defines per-route MTU for any new default route > received by RA. > > This sysctl will help in the following configuration: we want to use > jumbo-frames for internal networks and default ethernet frames for > default route. Per-route MTU can only lower per-link MTU, so link MTU > should be set to ~9000 (statically or via RA). > > Due to dynamic nature of RA, setting MTU for default route will require > userspace agent, that will monitor changes of default route > and (re)configure it. Not simple. The suggested sysctl solves this > problem. > > Signed-off-by: Roman Gushchin <klamm@yandex-team.ru> > Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> I don't like this change at all. The way I see things you already have the mechanisms necessary to do this. You obviously control the entity providing the default routes and these RA messages, therefore you absolutely can configure it to provide an appropriate MTU value in those RA messages. Problem solved, and no kernel changes necessary. I am warning you ahead of time that I will have a very low tolerance for replies to this email containing stories explaining why this is "difficult" to do. The fact is that the mechanism is there and if you have designed things at your site in a way such that the mechanism designed for this has become less useful, that isn't my problem. I'm not adding facilities that duplicated existing methods that already exist to accomplish this task. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 31, 2015, at 22:05, David Miller wrote: > From: Roman Gushchin <klamm@yandex-team.ru> > Date: Mon, 30 Mar 2015 15:30:57 +0300 > > > This patch introduces new ipv6 sysctl: ra_default_route_mtu. > > If it's set (> 0), it defines per-route MTU for any new default route > > received by RA. > > > > This sysctl will help in the following configuration: we want to use > > jumbo-frames for internal networks and default ethernet frames for > > default route. Per-route MTU can only lower per-link MTU, so link MTU > > should be set to ~9000 (statically or via RA). > > > > Due to dynamic nature of RA, setting MTU for default route will require > > userspace agent, that will monitor changes of default route > > and (re)configure it. Not simple. The suggested sysctl solves this > > problem. > > > > Signed-off-by: Roman Gushchin <klamm@yandex-team.ru> > > Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> > > I don't like this change at all. The way I see things you already > have the mechanisms necessary to do this. This is totally understandable and the change seems not to fit because it alters incoming information, but I try to quickly explain my reasoning for the Ack: Neighbour Discovery does not fit the way how linux handles MTUs. It is only possible to send out one MTU option on the Router Advertisement and we pick it up as the ipv6 MTU value for the interface. A RA can provide further routing information but no MTU option is possible to be specified on those route options, thus they will adapt the link MTU. There is no differentiation between interface MTU and per-route MTU. One common setup is to have local jumbo frames to speed up e.g. NFS traffic and use default routes with MTU 1500 to reach the outside world. As I had no other idea how to solve this with in-kernel autoconf mechanism I thought this change would be reasonable. Obviously one can disable autoconf and set up routes by hand with correct MTU values which should solve the problem - or use custom DHCPv6 options to do so. > You obviously control the entity providing the default routes and > these RA messages, therefore you absolutely can configure it to > provide an appropriate MTU value in those RA messages. > > Problem solved, and no kernel changes necessary. > > I am warning you ahead of time that I will have a very low tolerance > for replies to this email containing stories explaining why this is > "difficult" to do. The fact is that the mechanism is there and if you > have designed things at your site in a way such that the mechanism > designed for this has become less useful, that isn't my problem. > > I'm not adding facilities that duplicated existing methods that > already exist to accomplish this task. Could you quickly comment on what you had in mind? I guess it is about handling RA in user space on the end hosts and overwriting MTU during insertion of the routes? Thanks, Hannes -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Hannes Frederic Sowa <hannes@stressinduktion.org> Date: Tue, 31 Mar 2015 22:35:48 +0200 > Could you quickly comment on what you had in mind? I guess it is about > handling RA in user space on the end hosts and overwriting MTU during > insertion of the routes? Even after reading your email I have no idea why you can't just have RA provide a 1500 byte MTU, everything else uses the device's 9000 MTU, problem solved? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
31.03.2015, 23:49, "David Miller" <davem@davemloft.net>: > From: Hannes Frederic Sowa <hannes@stressinduktion.org> > Date: Tue, 31 Mar 2015 22:35:48 +0200 >> Could you quickly comment on what you had in mind? I guess it is about >> handling RA in user space on the end hosts and overwriting MTU during >> insertion of the routes? > > Even after reading your email I have no idea why you can't just have > RA provide a 1500 byte MTU, everything else uses the device's 9000 > MTU, problem solved? Because the MTU (provided by RA) is assigned to the device. Thanks, Roman -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Roman Gushchin <klamm@yandex-team.ru> Date: Wed, 01 Apr 2015 12:58:50 +0300 > 31.03.2015, 23:49, "David Miller" <davem@davemloft.net>: >> From: Hannes Frederic Sowa <hannes@stressinduktion.org> >> Date: Tue, 31 Mar 2015 22:35:48 +0200 >>> Could you quickly comment on what you had in mind? I guess it is about >>> handling RA in user space on the end hosts and overwriting MTU during >>> insertion of the routes? >> >> Even after reading your email I have no idea why you can't just have >> RA provide a 1500 byte MTU, everything else uses the device's 9000 >> MTU, problem solved? > > Because the MTU (provided by RA) is assigned to the device. Ok, that severely limits the usefulness of this option I guess. The next question I have is about the behavior of the new setting in the presence of an RA MTU option. It seems like the sysctl doesn't override that RA MTU option, but rather just clamps it. And then if it's in range, this controls only whether the default route has it's MTU adjusted. That doesn't make any sense to me if we then go and do the rt6_mtu_change() call unconditionally. The route metric update and the rt6_mtu_change() go hand in hand. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Apr 1, 2015, at 19:55, David Miller wrote: > From: Roman Gushchin <klamm@yandex-team.ru> > Date: Wed, 01 Apr 2015 12:58:50 +0300 > > > 31.03.2015, 23:49, "David Miller" <davem@davemloft.net>: > >> From: Hannes Frederic Sowa <hannes@stressinduktion.org> > >> Date: Tue, 31 Mar 2015 22:35:48 +0200 > >>> Could you quickly comment on what you had in mind? I guess it is about > >>> handling RA in user space on the end hosts and overwriting MTU during > >>> insertion of the routes? > >> > >> Even after reading your email I have no idea why you can't just have > >> RA provide a 1500 byte MTU, everything else uses the device's 9000 > >> MTU, problem solved? > > > > Because the MTU (provided by RA) is assigned to the device. > > Ok, that severely limits the usefulness of this option I guess. > > The next question I have is about the behavior of the new setting > in the presence of an RA MTU option. It seems like the sysctl > doesn't override that RA MTU option, but rather just clamps it. > > And then if it's in range, this controls only whether the default > route has it's MTU adjusted. > > That doesn't make any sense to me if we then go and do the > rt6_mtu_change() call unconditionally. The route metric update > and the rt6_mtu_change() go hand in hand. Agreed but that gets interesting: I guess during testing the cnf.mtu6 value was equal to the newly announced mtu value, so the rt6_mtu_change call does not happen. We update cnf.mtu6 so a second RA packet would actually bring the system into the desired state but we have a moment where the default route carries a too big MTU. That's not good. Easiest solution is to reorder those calls but that also leaves us with a time frame where we carry the incorrect MTU on the default route. Otherwise we must conditionally filter out the default routes. Roman, any ideas? Thanks, Hannes -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>> The next question I have is about the behavior of the new setting >> in the presence of an RA MTU option. It seems like the sysctl >> doesn't override that RA MTU option, but rather just clamps it. >> >> And then if it's in range, this controls only whether the default >> route has it's MTU adjusted. >> >> That doesn't make any sense to me if we then go and do the >> rt6_mtu_change() call unconditionally. The route metric update >> and the rt6_mtu_change() go hand in hand. > > Agreed but that gets interesting: > > I guess during testing the cnf.mtu6 value was equal to the newly > announced mtu value, so the rt6_mtu_change call does not happen. We > update cnf.mtu6 so a second RA packet would actually bring the system > into the desired state but we have a moment where the default route > carries a too big MTU. That's not good. Agreed. > Easiest solution is to reorder those calls but that also leaves us with > a time frame where we carry the incorrect MTU on the default route. > Otherwise we must conditionally filter out the default routes. > Roman, any ideas? I think, such approach will work on practise, but looks not very beatiful. May be, a better idea is to serarate per-route and per-device MTU, so an updating of per-device MTU will not affect per-route MTU. Actual MTU can always been calculated as min(route_mtu, device_mtu), but we wouldn't need to update mtu on each route on receiving RA MTU option, for instance. Do you see any problems with such approach? Thanks, Roman -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Do, 2015-04-02 at 21:08 +0300, Roman Gushchin wrote: > >> The next question I have is about the behavior of the new setting > >> in the presence of an RA MTU option. It seems like the sysctl > >> doesn't override that RA MTU option, but rather just clamps it. > >> > >> And then if it's in range, this controls only whether the default > >> route has it's MTU adjusted. > >> > >> That doesn't make any sense to me if we then go and do the > >> rt6_mtu_change() call unconditionally. The route metric update > >> and the rt6_mtu_change() go hand in hand. > > > > Agreed but that gets interesting: > > > > I guess during testing the cnf.mtu6 value was equal to the newly > > announced mtu value, so the rt6_mtu_change call does not happen. We > > update cnf.mtu6 so a second RA packet would actually bring the system > > into the desired state but we have a moment where the default route > > carries a too big MTU. That's not good. > > Agreed. > > > Easiest solution is to reorder those calls but that also leaves us with > > a time frame where we carry the incorrect MTU on the default route. > > Otherwise we must conditionally filter out the default routes. > > Roman, any ideas? > > I think, such approach will work on practise, but looks not very beatiful. > > May be, a better idea is to serarate per-route and per-device MTU, > so an updating of per-device MTU will not affect per-route MTU. > Actual MTU can always been calculated as min(route_mtu, device_mtu), > but we wouldn't need to update mtu on each route on receiving RA MTU option, > for instance. > > Do you see any problems with such approach? If I understood you correct this actually seems to be quite an intrusive change? :/ Can you show me some code how to do this? I would also dislike adding a filtering capability to the route mtu updates. Currently I don't have a god idea, sorry. Bye, Hannes -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
07.04.2015, 18:58, "Hannes Frederic Sowa" <hannes@stressinduktion.org>: > On Do, 2015-04-02 at 21:08 +0300, Roman Gushchin wrote: >>>> The next question I have is about the behavior of the new setting >>>> in the presence of an RA MTU option. It seems like the sysctl >>>> doesn't override that RA MTU option, but rather just clamps it. >>>> >>>> And then if it's in range, this controls only whether the default >>>> route has it's MTU adjusted. >>>> >>>> That doesn't make any sense to me if we then go and do the >>>> rt6_mtu_change() call unconditionally. The route metric update >>>> and the rt6_mtu_change() go hand in hand. >>> Agreed but that gets interesting: >>> >>> I guess during testing the cnf.mtu6 value was equal to the newly >>> announced mtu value, so the rt6_mtu_change call does not happen. We >>> update cnf.mtu6 so a second RA packet would actually bring the system >>> into the desired state but we have a moment where the default route >>> carries a too big MTU. That's not good. >> Agreed. >>> Easiest solution is to reorder those calls but that also leaves us with >>> a time frame where we carry the incorrect MTU on the default route. >>> Otherwise we must conditionally filter out the default routes. >>> Roman, any ideas? >> I think, such approach will work on practise, but looks not very beatiful. >> >> May be, a better idea is to serarate per-route and per-device MTU, >> so an updating of per-device MTU will not affect per-route MTU. >> Actual MTU can always been calculated as min(route_mtu, device_mtu), >> but we wouldn't need to update mtu on each route on receiving RA MTU option, >> for instance. >> >> Do you see any problems with such approach? > If I understood you correct this actually seems to be quite an intrusive > change? :/ Can you show me some code how to do this? Too intrusive, really) > I would also dislike adding a filtering capability to the route mtu > updates. Currently I don't have a god idea, sorry. Hmm, I thought a bit more about this issue... And It seems to me now, that there is no issue at all. If RA MTU is larger than ra_default_route_mtu, rt6_mtu_change() will not update it, because dst_mtu(&rt->dst) != idev->cnf.mtu6 : if (rt->dst.dev == arg->dev && !dst_metric_locked(&rt->dst, RTAX_MTU) && (dst_mtu(&rt->dst) >= arg->mtu || (dst_mtu(&rt->dst) < arg->mtu && dst_mtu(&rt->dst) == idev->cnf.mtu6))) { dst_metric_set(&rt->dst, RTAX_MTU, arg->mtu); } So, it's ok. Otherwise, if RA MTU is lower than ra_default_route_mtu, rt6_mtu_change() will lower default route mtu, and it's ok too. There is a short period of time, when a newly created default route has too large MTU, but it's not scary. And it's exactly as it works now if new RA advertise MTU smaller than previous. Do I miss something? Thanks! Regards, Roman -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 071fb18..cf86729 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -1349,6 +1349,11 @@ accept_ra_mtu - BOOLEAN Functional default: enabled if accept_ra is enabled. disabled if accept_ra is disabled. +ra_default_route_mtu - INTEGER + Define MTU for any new default route received by RA. + + Functional default: disabled (0). + accept_redirects - BOOLEAN Accept Redirects. diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h index 82806c6..c7727b5 100644 --- a/include/linux/ipv6.h +++ b/include/linux/ipv6.h @@ -53,6 +53,7 @@ struct ipv6_devconf { __s32 ndisc_notify; __s32 suppress_frag_ndisc; __s32 accept_ra_mtu; + __s32 ra_default_route_mtu; struct ipv6_stable_secret { bool initialized; struct in6_addr secret; diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h index 5efa54a..1d31d70 100644 --- a/include/uapi/linux/ipv6.h +++ b/include/uapi/linux/ipv6.h @@ -170,6 +170,7 @@ enum { DEVCONF_ACCEPT_RA_FROM_LOCAL, DEVCONF_USE_OPTIMISTIC, DEVCONF_ACCEPT_RA_MTU, + DEVCONF_RA_DEFAULT_ROUTE_MTU, DEVCONF_STABLE_SECRET, DEVCONF_MAX }; diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 2660263..15528f7 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -209,6 +209,7 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = { .accept_dad = 1, .suppress_frag_ndisc = 1, .accept_ra_mtu = 1, + .ra_default_route_mtu = 0, .stable_secret = { .initialized = false, } @@ -250,6 +251,7 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = { .accept_dad = 1, .suppress_frag_ndisc = 1, .accept_ra_mtu = 1, + .ra_default_route_mtu = 0, .stable_secret = { .initialized = false, }, @@ -4583,6 +4585,7 @@ static inline void ipv6_store_devconf(struct ipv6_devconf *cnf, array[DEVCONF_SUPPRESS_FRAG_NDISC] = cnf->suppress_frag_ndisc; array[DEVCONF_ACCEPT_RA_FROM_LOCAL] = cnf->accept_ra_from_local; array[DEVCONF_ACCEPT_RA_MTU] = cnf->accept_ra_mtu; + array[DEVCONF_RA_DEFAULT_ROUTE_MTU] = cnf->ra_default_route_mtu; /* we omit DEVCONF_STABLE_SECRET for now */ } @@ -5576,6 +5579,13 @@ static struct addrconf_sysctl_table .proc_handler = proc_dointvec, }, { + .procname = "ra_default_route_mtu", + .data = &ipv6_devconf.ra_default_route_mtu, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec, + }, + { .procname = "stable_secret", .data = &ipv6_devconf.stable_secret, .maxlen = IPV6_MAX_STRLEN, diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c index 247ad7c..2a3a564 100644 --- a/net/ipv6/ndisc.c +++ b/net/ipv6/ndisc.c @@ -1208,6 +1208,11 @@ static void ndisc_router_discovery(struct sk_buff *skb) "RA: %s failed to add default route\n", __func__); return; + } else { + u32 mtu = in6_dev->cnf.ra_default_route_mtu; + + if (mtu && mtu >= IPV6_MIN_MTU && mtu <= in6_dev->cnf.mtu6) + dst_metric_set(&rt->dst, RTAX_MTU, mtu); } neigh = dst_neigh_lookup(&rt->dst, &ipv6_hdr(skb)->saddr); @@ -1370,7 +1375,8 @@ skip_routeinfo: } else if (in6_dev->cnf.mtu6 != mtu) { in6_dev->cnf.mtu6 = mtu; - if (rt) + if (rt && (!in6_dev->cnf.ra_default_route_mtu || + mtu < in6_dev->cnf.ra_default_route_mtu)) dst_metric_set(&rt->dst, RTAX_MTU, mtu); rt6_mtu_change(skb->dev, mtu);