diff mbox series

[net-next,1/3] net: dsa: Configure the MTU for switch ports

Message ID 20191123194844.9508-2-olteanv@gmail.com
State Changes Requested
Delegated to: David Miller
Headers show
Series Configure the MTU on DSA switches | expand

Commit Message

Vladimir Oltean Nov. 23, 2019, 7:48 p.m. UTC
It is useful be able to configure port policers on a switch to accept
frames of various sizes:

- Increase the MTU for better throughput from the default of 1500 if it
  is known that there is no 10/100 Mbps device in the network.
- Decrease the MTU to limit the latency of high-priority frames under
  congestion.

For DSA slave ports, this is mostly a pass-through callback, called
through the regular ndo ops and at probe time (to ensure consistency
across all supported switches).

The CPU port is called with an MTU equal to the largest configured MTU
of the slave ports. The assumption is that the user might want to
sustain a bidirectional conversation with a partner over any switch
port.

The DSA master is configured the same as the CPU port, plus the tagger
overhead. Since the MTU is by definition L2 payload (sans Ethernet
header), it is up to each individual driver to figure out if it needs to
do anything special for its frame tags on the CPU port (it shouldn't
except in special cases). So the MTU does not contain the tagger
overhead on the CPU port.
However the MTU of the DSA master, minus the tagger overhead, is used as
a proxy for the MTU of the CPU port, which does not have a net device.
This is to avoid uselessly calling the .change_mtu function on the CPU
port when nothing should change.

So it is safe to assume that the DSA master and the CPU port MTUs are
apart by exactly the tagger's overhead in bytes.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
 include/net/dsa.h  |  2 ++
 net/dsa/dsa_priv.h |  2 ++
 net/dsa/master.c   | 13 +++++-----
 net/dsa/slave.c    | 64 +++++++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 73 insertions(+), 8 deletions(-)

Comments

Florian Fainelli Nov. 23, 2019, 8:27 p.m. UTC | #1
Hi Vladimir,

On 11/23/2019 11:48 AM, Vladimir Oltean wrote:
> It is useful be able to configure port policers on a switch to accept
> frames of various sizes:
> 
> - Increase the MTU for better throughput from the default of 1500 if it
>   is known that there is no 10/100 Mbps device in the network.
> - Decrease the MTU to limit the latency of high-priority frames under
>   congestion.
> 
> For DSA slave ports, this is mostly a pass-through callback, called
> through the regular ndo ops and at probe time (to ensure consistency
> across all supported switches).
> 
> The CPU port is called with an MTU equal to the largest configured MTU
> of the slave ports. The assumption is that the user might want to
> sustain a bidirectional conversation with a partner over any switch
> port.
> 
> The DSA master is configured the same as the CPU port, plus the tagger
> overhead. Since the MTU is by definition L2 payload (sans Ethernet
> header), it is up to each individual driver to figure out if it needs to
> do anything special for its frame tags on the CPU port (it shouldn't
> except in special cases). So the MTU does not contain the tagger
> overhead on the CPU port.
> However the MTU of the DSA master, minus the tagger overhead, is used as
> a proxy for the MTU of the CPU port, which does not have a net device.
> This is to avoid uselessly calling the .change_mtu function on the CPU
> port when nothing should change.
> 
> So it is safe to assume that the DSA master and the CPU port MTUs are
> apart by exactly the tagger's overhead in bytes.
> 
> Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
> ---

[snip]
> +static int dsa_slave_change_mtu(struct net_device *dev, int new_mtu)
> +{
> +	struct net_device *master = dsa_slave_to_master(dev);
> +	struct dsa_slave_priv *p = netdev_priv(dev);
> +	struct dsa_switch *ds = p->dp->ds;
> +	struct dsa_port *cpu_dp;
> +	int port = p->dp->index;
> +	int max_mtu = 0;
> +	int cpu_mtu;
> +	int err, i;
> +
> +	if (!ds->ops->change_mtu)
> +		return -EOPNOTSUPP;
> +
> +	err = ds->ops->change_mtu(ds, port, new_mtu);
> +	if (err < 0)
> +		return err;
> +
> +	dev->mtu = new_mtu;
> +
> +	for (i = 0; i < ds->num_ports; i++) {
> +		if (!dsa_is_user_port(ds, i))
> +			continue;
> +
> +		/* During probe, this function will be called for each slave
> +		 * device, while not all of them have been allocated. That's
> +		 * ok, it doesn't change what the maximum is, so ignore it.
> +		 */
> +		if (!dsa_to_port(ds, i)->slave)
> +			continue;
> +
> +		if (max_mtu < dsa_to_port(ds, i)->slave->mtu)
> +			max_mtu = dsa_to_port(ds, i)->slave->mtu;
> +	}
> +
> +	cpu_dp = dsa_to_port(ds, port)->cpu_dp;
> +
> +	max_mtu += cpu_dp->tag_ops->overhead;
> +	cpu_mtu = master->mtu;
> +
> +	if (max_mtu != cpu_mtu) {
> +		err = ds->ops->change_mtu(ds, dsa_upstream_port(ds, port),
> +					  max_mtu - cpu_dp->tag_ops->overhead);
> +		if (err < 0)
> +			return err;

Before changing and committing the slave_dev's MTU you should actually
perform these two operations first to make sure that you can honor the
user port MTU that is requested. Here, you would possibly leave an user
port configured for a MTU value that is unsupported by the upstream
port(s) and/or the CPU port and/or the DSA master device, which could
possibly break frame forwarding depending on what the switch is willing
to accept.

I had prepared a patch series with Murali doing nearly the same thing
and targeting Broadcom switches nearly a year ago but since I never got
feedback whether this worked properly for the use case he was after, I
did not submit it since I did not need it personally and found it to be
a nice can of worms.

Another thing that I had not gotten around testing was making sure that
when a slave_dev gets enslaved as a bridge port member, that bridge MTU
normalization would kick in and make sure that if you have say: port 0
configured with MTU 1500 and port 1 configured with MTU 9000, the bridge
would normalize to MTU 1500 as you would expect.

https://github.com/ffainelli/linux/commits/dsa-mtu

This should be a DSA switch fabric notifier IMHO because changing the
MTU on an user port implies changing the MTU on every DSA port in
between plus the CPU port. Your approach here works for the first
upstream port, but not for the ones in between, and there can be more,
as is common with the ZII devel Rev. B and C boards.
Vladimir Oltean Nov. 23, 2019, 8:46 p.m. UTC | #2
On Sat, 23 Nov 2019 at 22:28, Florian Fainelli <f.fainelli@gmail.com> wrote:
>
> Hi Vladimir,
>
> On 11/23/2019 11:48 AM, Vladimir Oltean wrote:
> > It is useful be able to configure port policers on a switch to accept
> > frames of various sizes:
> >
> > - Increase the MTU for better throughput from the default of 1500 if it
> >   is known that there is no 10/100 Mbps device in the network.
> > - Decrease the MTU to limit the latency of high-priority frames under
> >   congestion.
> >
> > For DSA slave ports, this is mostly a pass-through callback, called
> > through the regular ndo ops and at probe time (to ensure consistency
> > across all supported switches).
> >
> > The CPU port is called with an MTU equal to the largest configured MTU
> > of the slave ports. The assumption is that the user might want to
> > sustain a bidirectional conversation with a partner over any switch
> > port.
> >
> > The DSA master is configured the same as the CPU port, plus the tagger
> > overhead. Since the MTU is by definition L2 payload (sans Ethernet
> > header), it is up to each individual driver to figure out if it needs to
> > do anything special for its frame tags on the CPU port (it shouldn't
> > except in special cases). So the MTU does not contain the tagger
> > overhead on the CPU port.
> > However the MTU of the DSA master, minus the tagger overhead, is used as
> > a proxy for the MTU of the CPU port, which does not have a net device.
> > This is to avoid uselessly calling the .change_mtu function on the CPU
> > port when nothing should change.
> >
> > So it is safe to assume that the DSA master and the CPU port MTUs are
> > apart by exactly the tagger's overhead in bytes.
> >
> > Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
> > ---
>
> [snip]
> > +static int dsa_slave_change_mtu(struct net_device *dev, int new_mtu)
> > +{
> > +     struct net_device *master = dsa_slave_to_master(dev);
> > +     struct dsa_slave_priv *p = netdev_priv(dev);
> > +     struct dsa_switch *ds = p->dp->ds;
> > +     struct dsa_port *cpu_dp;
> > +     int port = p->dp->index;
> > +     int max_mtu = 0;
> > +     int cpu_mtu;
> > +     int err, i;
> > +
> > +     if (!ds->ops->change_mtu)
> > +             return -EOPNOTSUPP;
> > +
> > +     err = ds->ops->change_mtu(ds, port, new_mtu);
> > +     if (err < 0)
> > +             return err;
> > +
> > +     dev->mtu = new_mtu;
> > +
> > +     for (i = 0; i < ds->num_ports; i++) {
> > +             if (!dsa_is_user_port(ds, i))
> > +                     continue;
> > +
> > +             /* During probe, this function will be called for each slave
> > +              * device, while not all of them have been allocated. That's
> > +              * ok, it doesn't change what the maximum is, so ignore it.
> > +              */
> > +             if (!dsa_to_port(ds, i)->slave)
> > +                     continue;
> > +
> > +             if (max_mtu < dsa_to_port(ds, i)->slave->mtu)
> > +                     max_mtu = dsa_to_port(ds, i)->slave->mtu;
> > +     }
> > +
> > +     cpu_dp = dsa_to_port(ds, port)->cpu_dp;
> > +
> > +     max_mtu += cpu_dp->tag_ops->overhead;
> > +     cpu_mtu = master->mtu;
> > +
> > +     if (max_mtu != cpu_mtu) {
> > +             err = ds->ops->change_mtu(ds, dsa_upstream_port(ds, port),
> > +                                       max_mtu - cpu_dp->tag_ops->overhead);
> > +             if (err < 0)
> > +                     return err;
>
> Before changing and committing the slave_dev's MTU you should actually
> perform these two operations first to make sure that you can honor the
> user port MTU that is requested. Here, you would possibly leave an user
> port configured for a MTU value that is unsupported by the upstream
> port(s) and/or the CPU port and/or the DSA master device, which could
> possibly break frame forwarding depending on what the switch is willing
> to accept.
>

Correct. I was actually held back a bit while looking at Andrew's
patch dc0fe7d47f9f ("net: dsa: Set the master device's MTU to account
for DSA overheads") where he basically discarded errors, so that's the
approach I took too (thinking that some DSA masters would not have ops
for changing or reporting the MTU).

> I had prepared a patch series with Murali doing nearly the same thing
> and targeting Broadcom switches nearly a year ago but since I never got
> feedback whether this worked properly for the use case he was after, I
> did not submit it since I did not need it personally and found it to be
> a nice can of worms.
>

Nice, do you mind if I take your series instead then?

> Another thing that I had not gotten around testing was making sure that
> when a slave_dev gets enslaved as a bridge port member, that bridge MTU
> normalization would kick in and make sure that if you have say: port 0
> configured with MTU 1500 and port 1 configured with MTU 9000, the bridge
> would normalize to MTU 1500 as you would expect.
>

Nope, that doesn't happen by default, at least in my implementation.
Is there code in the bridge core for it?

> https://github.com/ffainelli/linux/commits/dsa-mtu
>
> This should be a DSA switch fabric notifier IMHO because changing the
> MTU on an user port implies changing the MTU on every DSA port in
> between plus the CPU port. Your approach here works for the first
> upstream port, but not for the ones in between, and there can be more,
> as is common with the ZII devel Rev. B and C boards.

Yes, correct. Your patch implements notifiers which is definitely
good. I don't have a cascaded setup to test yet (my Turris Mox was
supposed to arrive but for some reason it was returned to the seller
by the shipping company...).

> --
> Florian

Thanks,
-Vladimir
Vladimir Oltean Nov. 23, 2019, 8:51 p.m. UTC | #3
On Sat, 23 Nov 2019 at 22:28, Florian Fainelli <f.fainelli@gmail.com> wrote:
>
> Hi Vladimir,
>
[snip]
>
> I had prepared a patch series with Murali doing nearly the same thing
> and targeting Broadcom switches nearly a year ago but since I never got
> feedback whether this worked properly for the use case he was after, I
> did not submit it since I did not need it personally and found it to be
> a nice can of worms.
>
[snip]

Also, is there another can of worms beyond what you've described here?

> --
> Florian

Thanks,
-Vladimir
Florian Fainelli Nov. 23, 2019, 9:09 p.m. UTC | #4
On 11/23/2019 12:51 PM, Vladimir Oltean wrote:
> On Sat, 23 Nov 2019 at 22:28, Florian Fainelli <f.fainelli@gmail.com> wrote:
>>
>> Hi Vladimir,
>>
> [snip]
>>
>> I had prepared a patch series with Murali doing nearly the same thing
>> and targeting Broadcom switches nearly a year ago but since I never got
>> feedback whether this worked properly for the use case he was after, I
>> did not submit it since I did not need it personally and found it to be
>> a nice can of worms.
>>
> [snip]
> 
> Also, is there another can of worms beyond what you've described here?

Not that I can think of, the patch series I mentioned was done without
considering the bridge MTU normalization and that is where I left.
Florian Fainelli Nov. 23, 2019, 9:14 p.m. UTC | #5
On 11/23/2019 12:46 PM, Vladimir Oltean wrote:
> 
> Correct. I was actually held back a bit while looking at Andrew's
> patch dc0fe7d47f9f ("net: dsa: Set the master device's MTU to account
> for DSA overheads") where he basically discarded errors, so that's the
> approach I took too (thinking that some DSA masters would not have ops
> for changing or reporting the MTU).
> 
>> I had prepared a patch series with Murali doing nearly the same thing
>> and targeting Broadcom switches nearly a year ago but since I never got
>> feedback whether this worked properly for the use case he was after, I
>> did not submit it since I did not need it personally and found it to be
>> a nice can of worms.
>>
> 
> Nice, do you mind if I take your series instead then?

Not at all, if it works, please go ahead, not sure how hard it is going
to be to rebase.

> 
>> Another thing that I had not gotten around testing was making sure that
>> when a slave_dev gets enslaved as a bridge port member, that bridge MTU
>> normalization would kick in and make sure that if you have say: port 0
>> configured with MTU 1500 and port 1 configured with MTU 9000, the bridge
>> would normalize to MTU 1500 as you would expect.
>>
> 
> Nope, that doesn't happen by default, at least in my implementation.
> Is there code in the bridge core for it?

net/bridge/br_if.c::br_mtu_auto_adjust() takes care of adjusting the
bridge master device's MTU based on the minimum MTU of all ports within
the bridge, but what it seems to be missing is ensuring that if bridge
ports are enslaved, and those bridge ports happen to be part of the same
switch id (similar decision path to setting skb->fwd_offload_mark), then
the bridge port's MTU should also be auto adjusted. mlxsw also supports
changing the MTU, so I am surprised this is not something they fixed
already.

> 
>> https://github.com/ffainelli/linux/commits/dsa-mtu
>>
>> This should be a DSA switch fabric notifier IMHO because changing the
>> MTU on an user port implies changing the MTU on every DSA port in
>> between plus the CPU port. Your approach here works for the first
>> upstream port, but not for the ones in between, and there can be more,
>> as is common with the ZII devel Rev. B and C boards.
> 
> Yes, correct. Your patch implements notifiers which is definitely
> good. I don't have a cascaded setup to test yet (my Turris Mox was
> supposed to arrive but for some reason it was returned to the seller
> by the shipping company...).

Andrew and Vivien may know whether it is possible to get you a ZII Devel
rev. B or C since those have 3 switches in the fabric and have
constituted nice test platforms. Not sure if there is a version with a
CPU port that is Gigabit capable though.
Vladimir Oltean Nov. 23, 2019, 9:29 p.m. UTC | #6
On Sat, 23 Nov 2019 at 23:14, Florian Fainelli <f.fainelli@gmail.com> wrote:
>
>
>
> On 11/23/2019 12:46 PM, Vladimir Oltean wrote:
> >
> > Correct. I was actually held back a bit while looking at Andrew's
> > patch dc0fe7d47f9f ("net: dsa: Set the master device's MTU to account
> > for DSA overheads") where he basically discarded errors, so that's the
> > approach I took too (thinking that some DSA masters would not have ops
> > for changing or reporting the MTU).
> >
> >> I had prepared a patch series with Murali doing nearly the same thing
> >> and targeting Broadcom switches nearly a year ago but since I never got
> >> feedback whether this worked properly for the use case he was after, I
> >> did not submit it since I did not need it personally and found it to be
> >> a nice can of worms.
> >>
> >
> > Nice, do you mind if I take your series instead then?
>
> Not at all, if it works, please go ahead, not sure how hard it is going
> to be to rebase.
>
> >
> >> Another thing that I had not gotten around testing was making sure that
> >> when a slave_dev gets enslaved as a bridge port member, that bridge MTU
> >> normalization would kick in and make sure that if you have say: port 0
> >> configured with MTU 1500 and port 1 configured with MTU 9000, the bridge
> >> would normalize to MTU 1500 as you would expect.
> >>
> >
> > Nope, that doesn't happen by default, at least in my implementation.
> > Is there code in the bridge core for it?
>
> net/bridge/br_if.c::br_mtu_auto_adjust() takes care of adjusting the
> bridge master device's MTU based on the minimum MTU of all ports within
> the bridge, but what it seems to be missing is ensuring that if bridge
> ports are enslaved, and those bridge ports happen to be part of the same
> switch id (similar decision path to setting skb->fwd_offload_mark), then
> the bridge port's MTU should also be auto adjusted. mlxsw also supports
> changing the MTU, so I am surprised this is not something they fixed
> already.
>

But then how would you even change a bridged interface's MTU? Delete
bridge, change MTU of all ports to same value, create bridge again?

> >
> >> https://github.com/ffainelli/linux/commits/dsa-mtu
> >>
> >> This should be a DSA switch fabric notifier IMHO because changing the
> >> MTU on an user port implies changing the MTU on every DSA port in
> >> between plus the CPU port. Your approach here works for the first
> >> upstream port, but not for the ones in between, and there can be more,
> >> as is common with the ZII devel Rev. B and C boards.
> >
> > Yes, correct. Your patch implements notifiers which is definitely
> > good. I don't have a cascaded setup to test yet (my Turris Mox was
> > supposed to arrive but for some reason it was returned to the seller
> > by the shipping company...).
>
> Andrew and Vivien may know whether it is possible to get you a ZII Devel
> rev. B or C since those have 3 switches in the fabric and have
> constituted nice test platforms. Not sure if there is a version with a
> CPU port that is Gigabit capable though.

Well, I've already ordered it again, this time with more expensive
shipping... Hopefully I got their hint right.

> --
> Florian
Florian Fainelli Nov. 23, 2019, 9:47 p.m. UTC | #7
On 11/23/2019 1:29 PM, Vladimir Oltean wrote:
> On Sat, 23 Nov 2019 at 23:14, Florian Fainelli <f.fainelli@gmail.com> wrote:
>>
>>
>>
>> On 11/23/2019 12:46 PM, Vladimir Oltean wrote:
>>>
>>> Correct. I was actually held back a bit while looking at Andrew's
>>> patch dc0fe7d47f9f ("net: dsa: Set the master device's MTU to account
>>> for DSA overheads") where he basically discarded errors, so that's the
>>> approach I took too (thinking that some DSA masters would not have ops
>>> for changing or reporting the MTU).
>>>
>>>> I had prepared a patch series with Murali doing nearly the same thing
>>>> and targeting Broadcom switches nearly a year ago but since I never got
>>>> feedback whether this worked properly for the use case he was after, I
>>>> did not submit it since I did not need it personally and found it to be
>>>> a nice can of worms.
>>>>
>>>
>>> Nice, do you mind if I take your series instead then?
>>
>> Not at all, if it works, please go ahead, not sure how hard it is going
>> to be to rebase.
>>
>>>
>>>> Another thing that I had not gotten around testing was making sure that
>>>> when a slave_dev gets enslaved as a bridge port member, that bridge MTU
>>>> normalization would kick in and make sure that if you have say: port 0
>>>> configured with MTU 1500 and port 1 configured with MTU 9000, the bridge
>>>> would normalize to MTU 1500 as you would expect.
>>>>
>>>
>>> Nope, that doesn't happen by default, at least in my implementation.
>>> Is there code in the bridge core for it?
>>
>> net/bridge/br_if.c::br_mtu_auto_adjust() takes care of adjusting the
>> bridge master device's MTU based on the minimum MTU of all ports within
>> the bridge, but what it seems to be missing is ensuring that if bridge
>> ports are enslaved, and those bridge ports happen to be part of the same
>> switch id (similar decision path to setting skb->fwd_offload_mark), then
>> the bridge port's MTU should also be auto adjusted. mlxsw also supports
>> changing the MTU, so I am surprised this is not something they fixed
>> already.
>>
> 
> But then how would you even change a bridged interface's MTU? Delete
> bridge, change MTU of all ports to same value, create bridge again?

I am afraid so, given that the NETDEV_CHANGEMTU even for which
br_device_event() listens to and processes with br_mtu_auto_adjust()
would lead to selecting the lowest MTU again. Unfortunately, I don't
really see a way to solve that other than walk all ports (which could be
any network device driver) and ask them if they support the new MTU of
that other port, and if so, commit, else rollback. Do you see another way?
Vladimir Oltean Nov. 23, 2019, 9:54 p.m. UTC | #8
On Sat, 23 Nov 2019 at 23:47, Florian Fainelli <f.fainelli@gmail.com> wrote:
> On 11/23/2019 1:29 PM, Vladimir Oltean wrote:
> > On Sat, 23 Nov 2019 at 23:14, Florian Fainelli <f.fainelli@gmail.com> wrote:
> >> On 11/23/2019 12:46 PM, Vladimir Oltean wrote:

[snip]

> >>>> Another thing that I had not gotten around testing was making sure that
> >>>> when a slave_dev gets enslaved as a bridge port member, that bridge MTU
> >>>> normalization would kick in and make sure that if you have say: port 0
> >>>> configured with MTU 1500 and port 1 configured with MTU 9000, the bridge
> >>>> would normalize to MTU 1500 as you would expect.
> >>>>
> >>>
> >>> Nope, that doesn't happen by default, at least in my implementation.
> >>> Is there code in the bridge core for it?
> >>
> >> net/bridge/br_if.c::br_mtu_auto_adjust() takes care of adjusting the
> >> bridge master device's MTU based on the minimum MTU of all ports within
> >> the bridge, but what it seems to be missing is ensuring that if bridge
> >> ports are enslaved, and those bridge ports happen to be part of the same
> >> switch id (similar decision path to setting skb->fwd_offload_mark), then
> >> the bridge port's MTU should also be auto adjusted. mlxsw also supports
> >> changing the MTU, so I am surprised this is not something they fixed
> >> already.
> >>
> >
> > But then how would you even change a bridged interface's MTU? Delete
> > bridge, change MTU of all ports to same value, create bridge again?
>
> I am afraid so, given that the NETDEV_CHANGEMTU even for which
> br_device_event() listens to and processes with br_mtu_auto_adjust()
> would lead to selecting the lowest MTU again. Unfortunately, I don't
> really see a way to solve that other than walk all ports (which could be
> any network device driver) and ask them if they support the new MTU of
> that other port, and if so, commit, else rollback. Do you see another way?
> --
> Florian

Something like that would be preferable. I think it would be
frustrating for the bridge to restore the old MTU right after you
change it. I have no idea how hard it would be to prevent the bridge
from doing this. I'll poke it with a stick and see what happens.

-Vladimir
Andrew Lunn Nov. 24, 2019, 10:43 p.m. UTC | #9
> Correct. I was actually held back a bit while looking at Andrew's
> patch dc0fe7d47f9f ("net: dsa: Set the master device's MTU to account
> for DSA overheads") where he basically discarded errors, so that's the
> approach I took too (thinking that some DSA masters would not have ops
> for changing or reporting the MTU).

Ignoring errors is deliberate because some master interfaces just
worked without having to set the MTU. I was worried that some that
just worked did not implement MTU changes, so i could not error out.

And my experience when things did not work was mostly the MTU did not
matter, but MRU did. The MAC would send frames with a header, but not
receive them with the header. Setting the MTU actually seems to set
the MRU on most MACs.

But when you are thinking about jumbo frames, i would not ignore the
error. We do need to be sure the master interface can support jumbo,
and big enough jumbo to support the header.

    Andrew
Andrew Lunn Nov. 24, 2019, 10:48 p.m. UTC | #10
> The CPU port is called with an MTU equal to the largest configured MTU
> of the slave ports. The assumption is that the user might want to
> sustain a bidirectional conversation with a partner over any switch
> port.

There is an assumption here that the MTU is enforced before the DSA
header is applied on the egress patch. And MRU is also applied once
the header is removed on ingress.  I've no idea if this is true for
all the switches we support in DSA.

We should clearly document that if the switch need the header to be
included, the function to set the MTU needs to add it.

	  Andrew
diff mbox series

Patch

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 6767dc3f66c0..c25eec2b8cc0 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -382,6 +382,8 @@  struct dsa_switch_ops {
 	int	(*setup)(struct dsa_switch *ds);
 	void	(*teardown)(struct dsa_switch *ds);
 	u32	(*get_phy_flags)(struct dsa_switch *ds, int port);
+	int	(*change_mtu)(struct dsa_switch *ds, int port, int new_mtu);
+	int	(*get_max_mtu)(struct dsa_switch *ds, int port);
 
 	/*
 	 * Access to the switch's PHY registers.
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 53e7577896b6..1501f06643ca 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -98,6 +98,8 @@  int dsa_legacy_fdb_del(struct ndmsg *ndm, struct nlattr *tb[],
 /* master.c */
 int dsa_master_setup(struct net_device *dev, struct dsa_port *cpu_dp);
 void dsa_master_teardown(struct net_device *dev);
+int dsa_master_set_mtu(struct net_device *dev, struct dsa_port *cpu_dp,
+		       int mtu);
 
 static inline struct net_device *dsa_master_find_slave(struct net_device *dev,
 						       int device, int port)
diff --git a/net/dsa/master.c b/net/dsa/master.c
index 3255dfc97f86..7efcce6ef05d 100644
--- a/net/dsa/master.c
+++ b/net/dsa/master.c
@@ -284,18 +284,19 @@  static const struct attribute_group dsa_group = {
 	.attrs	= dsa_slave_attrs,
 };
 
-static void dsa_master_set_mtu(struct net_device *dev, struct dsa_port *cpu_dp)
+/* Needs to be called under rtnl_lock */
+int dsa_master_set_mtu(struct net_device *dev, struct dsa_port *cpu_dp,
+		       int mtu)
 {
-	unsigned int mtu = ETH_DATA_LEN + cpu_dp->tag_ops->overhead;
-	int err;
+	int err = -ERANGE;
 
-	rtnl_lock();
 	if (mtu <= dev->max_mtu) {
 		err = dev_set_mtu(dev, mtu);
 		if (err)
 			netdev_dbg(dev, "Unable to set MTU to include for DSA overheads\n");
 	}
-	rtnl_unlock();
+
+	return err;
 }
 
 static void dsa_master_reset_mtu(struct net_device *dev)
@@ -314,8 +315,6 @@  int dsa_master_setup(struct net_device *dev, struct dsa_port *cpu_dp)
 {
 	int ret;
 
-	dsa_master_set_mtu(dev,  cpu_dp);
-
 	/* If we use a tagging format that doesn't have an ethertype
 	 * field, make sure that all packets from this point on get
 	 * sent to the tag format's receive function.
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 78ffc87dc25e..acdeeee191d7 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -149,6 +149,60 @@  static void dsa_slave_change_rx_flags(struct net_device *dev, int change)
 	}
 }
 
+static int dsa_slave_change_mtu(struct net_device *dev, int new_mtu)
+{
+	struct net_device *master = dsa_slave_to_master(dev);
+	struct dsa_slave_priv *p = netdev_priv(dev);
+	struct dsa_switch *ds = p->dp->ds;
+	struct dsa_port *cpu_dp;
+	int port = p->dp->index;
+	int max_mtu = 0;
+	int cpu_mtu;
+	int err, i;
+
+	if (!ds->ops->change_mtu)
+		return -EOPNOTSUPP;
+
+	err = ds->ops->change_mtu(ds, port, new_mtu);
+	if (err < 0)
+		return err;
+
+	dev->mtu = new_mtu;
+
+	for (i = 0; i < ds->num_ports; i++) {
+		if (!dsa_is_user_port(ds, i))
+			continue;
+
+		/* During probe, this function will be called for each slave
+		 * device, while not all of them have been allocated. That's
+		 * ok, it doesn't change what the maximum is, so ignore it.
+		 */
+		if (!dsa_to_port(ds, i)->slave)
+			continue;
+
+		if (max_mtu < dsa_to_port(ds, i)->slave->mtu)
+			max_mtu = dsa_to_port(ds, i)->slave->mtu;
+	}
+
+	cpu_dp = dsa_to_port(ds, port)->cpu_dp;
+
+	max_mtu += cpu_dp->tag_ops->overhead;
+	cpu_mtu = master->mtu;
+
+	if (max_mtu != cpu_mtu) {
+		err = ds->ops->change_mtu(ds, dsa_upstream_port(ds, port),
+					  max_mtu - cpu_dp->tag_ops->overhead);
+		if (err < 0)
+			return err;
+
+		err = dsa_master_set_mtu(master, cpu_dp, max_mtu);
+		if (err < 0)
+			return err;
+	}
+
+	return err;
+}
+
 static void dsa_slave_set_rx_mode(struct net_device *dev)
 {
 	struct net_device *master = dsa_slave_to_master(dev);
@@ -1248,6 +1302,7 @@  static const struct net_device_ops dsa_slave_netdev_ops = {
 	.ndo_stop		= dsa_slave_close,
 	.ndo_start_xmit		= dsa_slave_xmit,
 	.ndo_change_rx_flags	= dsa_slave_change_rx_flags,
+	.ndo_change_mtu		= dsa_slave_change_mtu,
 	.ndo_set_rx_mode	= dsa_slave_set_rx_mode,
 	.ndo_set_mac_address	= dsa_slave_set_mac_address,
 	.ndo_fdb_add		= dsa_legacy_fdb_add,
@@ -1440,7 +1495,10 @@  int dsa_slave_create(struct dsa_port *port)
 	slave_dev->priv_flags |= IFF_NO_QUEUE;
 	slave_dev->netdev_ops = &dsa_slave_netdev_ops;
 	slave_dev->min_mtu = 0;
-	slave_dev->max_mtu = ETH_MAX_MTU;
+	if (ds->ops->get_max_mtu)
+		slave_dev->max_mtu = ds->ops->get_max_mtu(ds, port->index);
+	else
+		slave_dev->max_mtu = ETH_MAX_MTU;
 	SET_NETDEV_DEVTYPE(slave_dev, &dsa_type);
 
 	SET_NETDEV_DEV(slave_dev, port->ds->dev);
@@ -1460,6 +1518,10 @@  int dsa_slave_create(struct dsa_port *port)
 	p->xmit = cpu_dp->tag_ops->xmit;
 	port->slave = slave_dev;
 
+	rtnl_lock();
+	dsa_slave_change_mtu(slave_dev, ETH_DATA_LEN);
+	rtnl_unlock();
+
 	netif_carrier_off(slave_dev);
 
 	ret = dsa_slave_phy_setup(slave_dev);