diff mbox series

[net-next,3/3] net: dsa: mv88e6xxx: defautl to multicast and unicast flooding

Message ID E1gvNNd-0006YN-Rc@rmk-PC.armlinux.org.uk
State Superseded
Delegated to: David Miller
Headers show
Series net: dsa: mv88e6xxx: fix IPv6 | expand

Commit Message

Russell King (Oracle) Feb. 17, 2019, 2:25 p.m. UTC
Switches work by learning the MAC address for each attached station by
monitoring traffic from each station.  When a station sends a packet,
the switch records which port the MAC address is connected to.

With IPv4 networking, before communication commences with a neighbour,
an ARP packet is broadcasted to all stations asking for the MAC address
corresponding with the IPv4.  The desired station responds with an ARP
reply, and the ARP reply causes the switch to learn which port the
station is connected to.

With IPv6 networking, the situation is rather different.  Rather than
broadcasting ARP packets, a "neighbour solicitation" is multicasted
rather than broadcasted.  This multicast needs to reach the intended
station in order for the neighbour to be discovered.

Once a neighbour has been discovered, and entered into the sending
stations neighbour cache, communication can restart at a point later
without sending a new neighbour solicitation, even if the entry in
the neighbour cache is marked as stale.  This can be after the MAC
address has expired from the forwarding cache of the DSA switch -
when that occurs, there is a long pause in communication.

Our DSA implementation for mv88e6xxx switches has defaulted to having
multicast and unicast flooding disabled.  As per the above description,
this is fine for IPv4 networking, since the broadcasted ARP queries
will be sent to and received by all stations on the same network.
However, this breaks IPv6 very badly - blocking neighbour solicitations
and later causing connections to stall.

The defaults that the Linux bridge code expect from bridges are that
unknown unicast frames and unknown multicast frames are flooded to
all stations, which is at odds to the defaults adopted by our DSA
implementation for mv88e6xxx switches.

This commit enables by default flooding of both unknown unicast and
unknown multicast frames.  This means that mv88e6xxx DSA switches now
behave as per the bridge(8) man page, and IPv6 works flawlessly through
such a switch.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
---
 drivers/net/dsa/mv88e6xxx/chip.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

Comments

Russell King (Oracle) Feb. 17, 2019, 2:27 p.m. UTC | #1
On Sun, Feb 17, 2019 at 02:25:17PM +0000, Russell King wrote:
> Switches work by learning the MAC address for each attached station by
> monitoring traffic from each station.  When a station sends a packet,
> the switch records which port the MAC address is connected to.
> 
> With IPv4 networking, before communication commences with a neighbour,
> an ARP packet is broadcasted to all stations asking for the MAC address
> corresponding with the IPv4.  The desired station responds with an ARP
> reply, and the ARP reply causes the switch to learn which port the
> station is connected to.
> 
> With IPv6 networking, the situation is rather different.  Rather than
> broadcasting ARP packets, a "neighbour solicitation" is multicasted
> rather than broadcasted.  This multicast needs to reach the intended
> station in order for the neighbour to be discovered.
> 
> Once a neighbour has been discovered, and entered into the sending
> stations neighbour cache, communication can restart at a point later
> without sending a new neighbour solicitation, even if the entry in
> the neighbour cache is marked as stale.  This can be after the MAC
> address has expired from the forwarding cache of the DSA switch -
> when that occurs, there is a long pause in communication.
> 
> Our DSA implementation for mv88e6xxx switches has defaulted to having
> multicast and unicast flooding disabled.  As per the above description,
> this is fine for IPv4 networking, since the broadcasted ARP queries
> will be sent to and received by all stations on the same network.
> However, this breaks IPv6 very badly - blocking neighbour solicitations
> and later causing connections to stall.
> 
> The defaults that the Linux bridge code expect from bridges are that
> unknown unicast frames and unknown multicast frames are flooded to
> all stations, which is at odds to the defaults adopted by our DSA
> implementation for mv88e6xxx switches.
> 
> This commit enables by default flooding of both unknown unicast and
> unknown multicast frames.  This means that mv88e6xxx DSA switches now
> behave as per the bridge(8) man page, and IPv6 works flawlessly through
> such a switch.

Note that there is the open question whether this affects the case where
each port is used as a separate network interface: that case has not yet
been tested.

> 
> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
> ---
>  drivers/net/dsa/mv88e6xxx/chip.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
> index b75a865a293d..eb5e3d88374f 100644
> --- a/drivers/net/dsa/mv88e6xxx/chip.c
> +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> @@ -2144,13 +2144,14 @@ static int mv88e6xxx_setup_message_port(struct mv88e6xxx_chip *chip, int port)
>  static int mv88e6xxx_setup_egress_floods(struct mv88e6xxx_chip *chip, int port)
>  {
>  	struct dsa_switch *ds = chip->ds;
> -	bool flood;
>  
> -	/* Upstream ports flood frames with unknown unicast or multicast DA */
> -	flood = dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port);
> +	/* Linux bridges are expected to flood unknown multicast and
> +	 * unicast frames to all ports - as per the defaults specified
> +	 * in the iproute2 bridge(8) man page. Not doing this causes
> +	 * stalls and failures with IPv6 over Marvell bridges. */
>  	if (chip->info->ops->port_set_egress_floods)
>  		return chip->info->ops->port_set_egress_floods(chip, port,
> -							       flood, flood);
> +							       true, true);
>  
>  	return 0;
>  }
> -- 
> 2.7.4
> 
>
Russell King (Oracle) Feb. 17, 2019, 4:34 p.m. UTC | #2
On Sun, Feb 17, 2019 at 02:27:16PM +0000, Russell King - ARM Linux admin wrote:
> On Sun, Feb 17, 2019 at 02:25:17PM +0000, Russell King wrote:
> > Switches work by learning the MAC address for each attached station by
> > monitoring traffic from each station.  When a station sends a packet,
> > the switch records which port the MAC address is connected to.
> > 
> > With IPv4 networking, before communication commences with a neighbour,
> > an ARP packet is broadcasted to all stations asking for the MAC address
> > corresponding with the IPv4.  The desired station responds with an ARP
> > reply, and the ARP reply causes the switch to learn which port the
> > station is connected to.
> > 
> > With IPv6 networking, the situation is rather different.  Rather than
> > broadcasting ARP packets, a "neighbour solicitation" is multicasted
> > rather than broadcasted.  This multicast needs to reach the intended
> > station in order for the neighbour to be discovered.
> > 
> > Once a neighbour has been discovered, and entered into the sending
> > stations neighbour cache, communication can restart at a point later
> > without sending a new neighbour solicitation, even if the entry in
> > the neighbour cache is marked as stale.  This can be after the MAC
> > address has expired from the forwarding cache of the DSA switch -
> > when that occurs, there is a long pause in communication.
> > 
> > Our DSA implementation for mv88e6xxx switches has defaulted to having
> > multicast and unicast flooding disabled.  As per the above description,
> > this is fine for IPv4 networking, since the broadcasted ARP queries
> > will be sent to and received by all stations on the same network.
> > However, this breaks IPv6 very badly - blocking neighbour solicitations
> > and later causing connections to stall.
> > 
> > The defaults that the Linux bridge code expect from bridges are that
> > unknown unicast frames and unknown multicast frames are flooded to
> > all stations, which is at odds to the defaults adopted by our DSA
> > implementation for mv88e6xxx switches.
> > 
> > This commit enables by default flooding of both unknown unicast and
> > unknown multicast frames.  This means that mv88e6xxx DSA switches now
> > behave as per the bridge(8) man page, and IPv6 works flawlessly through
> > such a switch.
> 
> Note that there is the open question whether this affects the case where
> each port is used as a separate network interface: that case has not yet
> been tested.

I've checked with a mv88e6131 on the clearfog gt8k board.  lan1
connected to my lan with plenty of traffic on, and configured as
part of a bridge.  lan2 connected to the zii board, but not part
of the bridge.  Monitoring lan2 from the zii board shows no traffic
that was received from lan1.

So it looks fine.

> 
> > 
> > Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
> > ---
> >  drivers/net/dsa/mv88e6xxx/chip.c | 9 +++++----
> >  1 file changed, 5 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
> > index b75a865a293d..eb5e3d88374f 100644
> > --- a/drivers/net/dsa/mv88e6xxx/chip.c
> > +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> > @@ -2144,13 +2144,14 @@ static int mv88e6xxx_setup_message_port(struct mv88e6xxx_chip *chip, int port)
> >  static int mv88e6xxx_setup_egress_floods(struct mv88e6xxx_chip *chip, int port)
> >  {
> >  	struct dsa_switch *ds = chip->ds;
> > -	bool flood;
> >  
> > -	/* Upstream ports flood frames with unknown unicast or multicast DA */
> > -	flood = dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port);
> > +	/* Linux bridges are expected to flood unknown multicast and
> > +	 * unicast frames to all ports - as per the defaults specified
> > +	 * in the iproute2 bridge(8) man page. Not doing this causes
> > +	 * stalls and failures with IPv6 over Marvell bridges. */
> >  	if (chip->info->ops->port_set_egress_floods)
> >  		return chip->info->ops->port_set_egress_floods(chip, port,
> > -							       flood, flood);
> > +							       true, true);
> >  
> >  	return 0;
> >  }
> > -- 
> > 2.7.4
> > 
> > 
> 
> -- 
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
> According to speedtest.net: 11.9Mbps down 500kbps up
Florian Fainelli Feb. 17, 2019, 9:45 p.m. UTC | #3
On 2/17/2019 8:34 AM, Russell King - ARM Linux admin wrote:
> On Sun, Feb 17, 2019 at 02:27:16PM +0000, Russell King - ARM Linux admin wrote:
>> On Sun, Feb 17, 2019 at 02:25:17PM +0000, Russell King wrote:
>>> Switches work by learning the MAC address for each attached station by
>>> monitoring traffic from each station.  When a station sends a packet,
>>> the switch records which port the MAC address is connected to.
>>>
>>> With IPv4 networking, before communication commences with a neighbour,
>>> an ARP packet is broadcasted to all stations asking for the MAC address
>>> corresponding with the IPv4.  The desired station responds with an ARP
>>> reply, and the ARP reply causes the switch to learn which port the
>>> station is connected to.
>>>
>>> With IPv6 networking, the situation is rather different.  Rather than
>>> broadcasting ARP packets, a "neighbour solicitation" is multicasted
>>> rather than broadcasted.  This multicast needs to reach the intended
>>> station in order for the neighbour to be discovered.
>>>
>>> Once a neighbour has been discovered, and entered into the sending
>>> stations neighbour cache, communication can restart at a point later
>>> without sending a new neighbour solicitation, even if the entry in
>>> the neighbour cache is marked as stale.  This can be after the MAC
>>> address has expired from the forwarding cache of the DSA switch -
>>> when that occurs, there is a long pause in communication.
>>>
>>> Our DSA implementation for mv88e6xxx switches has defaulted to having
>>> multicast and unicast flooding disabled.  As per the above description,
>>> this is fine for IPv4 networking, since the broadcasted ARP queries
>>> will be sent to and received by all stations on the same network.
>>> However, this breaks IPv6 very badly - blocking neighbour solicitations
>>> and later causing connections to stall.
>>>
>>> The defaults that the Linux bridge code expect from bridges are that
>>> unknown unicast frames and unknown multicast frames are flooded to
>>> all stations, which is at odds to the defaults adopted by our DSA
>>> implementation for mv88e6xxx switches.
>>>
>>> This commit enables by default flooding of both unknown unicast and
>>> unknown multicast frames.  This means that mv88e6xxx DSA switches now
>>> behave as per the bridge(8) man page, and IPv6 works flawlessly through
>>> such a switch.
>>
>> Note that there is the open question whether this affects the case where
>> each port is used as a separate network interface: that case has not yet
>> been tested.
> 
> I've checked with a mv88e6131 on the clearfog gt8k board.  lan1
> connected to my lan with plenty of traffic on, and configured as
> part of a bridge.  lan2 connected to the zii board, but not part
> of the bridge.  Monitoring lan2 from the zii board shows no traffic
> that was received from lan1.
> 
> So it looks fine.

With the current state whereby we do not have the necessary hooks to
perform filtering on non-bridged/standalone ports, this is entirely fine
indeed.

In the future this is part of something I want to address because it is
IMHO highly undesirable to have non-bridged ports be flooded with
unknown multicast or unknown unicast for that matter because that makes
them deviate from a standard NIC interface. Unknown unicast is not
necessarily a low hanging fruit, but still, if we have switches capable
of filtering, we might as well make use of that. Of course, one
difficulty is that we must not break running tcpdump on those DSA slave
network interfaces.

> 
>>
>>>
>>> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
>>> ---
>>>  drivers/net/dsa/mv88e6xxx/chip.c | 9 +++++----
>>>  1 file changed, 5 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
>>> index b75a865a293d..eb5e3d88374f 100644
>>> --- a/drivers/net/dsa/mv88e6xxx/chip.c
>>> +++ b/drivers/net/dsa/mv88e6xxx/chip.c
>>> @@ -2144,13 +2144,14 @@ static int mv88e6xxx_setup_message_port(struct mv88e6xxx_chip *chip, int port)
>>>  static int mv88e6xxx_setup_egress_floods(struct mv88e6xxx_chip *chip, int port)
>>>  {
>>>  	struct dsa_switch *ds = chip->ds;
>>> -	bool flood;
>>>  
>>> -	/* Upstream ports flood frames with unknown unicast or multicast DA */
>>> -	flood = dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port);
>>> +	/* Linux bridges are expected to flood unknown multicast and
>>> +	 * unicast frames to all ports - as per the defaults specified
>>> +	 * in the iproute2 bridge(8) man page. Not doing this causes
>>> +	 * stalls and failures with IPv6 over Marvell bridges. */
>>>  	if (chip->info->ops->port_set_egress_floods)
>>>  		return chip->info->ops->port_set_egress_floods(chip, port,
>>> -							       flood, flood);
>>> +							       true, true);
>>>  
>>>  	return 0;
>>>  }
>>> -- 
>>> 2.7.4
>>>
>>>
>>
>> -- 
>> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
>> FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
>> According to speedtest.net: 11.9Mbps down 500kbps up
>
Russell King (Oracle) Feb. 17, 2019, 9:58 p.m. UTC | #4
On Sun, Feb 17, 2019 at 01:45:24PM -0800, Florian Fainelli wrote:
> 
> 
> On 2/17/2019 8:34 AM, Russell King - ARM Linux admin wrote:
> > On Sun, Feb 17, 2019 at 02:27:16PM +0000, Russell King - ARM Linux admin wrote:
> >> On Sun, Feb 17, 2019 at 02:25:17PM +0000, Russell King wrote:
> >>> Switches work by learning the MAC address for each attached station by
> >>> monitoring traffic from each station.  When a station sends a packet,
> >>> the switch records which port the MAC address is connected to.
> >>>
> >>> With IPv4 networking, before communication commences with a neighbour,
> >>> an ARP packet is broadcasted to all stations asking for the MAC address
> >>> corresponding with the IPv4.  The desired station responds with an ARP
> >>> reply, and the ARP reply causes the switch to learn which port the
> >>> station is connected to.
> >>>
> >>> With IPv6 networking, the situation is rather different.  Rather than
> >>> broadcasting ARP packets, a "neighbour solicitation" is multicasted
> >>> rather than broadcasted.  This multicast needs to reach the intended
> >>> station in order for the neighbour to be discovered.
> >>>
> >>> Once a neighbour has been discovered, and entered into the sending
> >>> stations neighbour cache, communication can restart at a point later
> >>> without sending a new neighbour solicitation, even if the entry in
> >>> the neighbour cache is marked as stale.  This can be after the MAC
> >>> address has expired from the forwarding cache of the DSA switch -
> >>> when that occurs, there is a long pause in communication.
> >>>
> >>> Our DSA implementation for mv88e6xxx switches has defaulted to having
> >>> multicast and unicast flooding disabled.  As per the above description,
> >>> this is fine for IPv4 networking, since the broadcasted ARP queries
> >>> will be sent to and received by all stations on the same network.
> >>> However, this breaks IPv6 very badly - blocking neighbour solicitations
> >>> and later causing connections to stall.
> >>>
> >>> The defaults that the Linux bridge code expect from bridges are that
> >>> unknown unicast frames and unknown multicast frames are flooded to
> >>> all stations, which is at odds to the defaults adopted by our DSA
> >>> implementation for mv88e6xxx switches.
> >>>
> >>> This commit enables by default flooding of both unknown unicast and
> >>> unknown multicast frames.  This means that mv88e6xxx DSA switches now
> >>> behave as per the bridge(8) man page, and IPv6 works flawlessly through
> >>> such a switch.
> >>
> >> Note that there is the open question whether this affects the case where
> >> each port is used as a separate network interface: that case has not yet
> >> been tested.
> > 
> > I've checked with a mv88e6131 on the clearfog gt8k board.  lan1
> > connected to my lan with plenty of traffic on, and configured as
> > part of a bridge.  lan2 connected to the zii board, but not part
> > of the bridge.  Monitoring lan2 from the zii board shows no traffic
> > that was received from lan1.
> > 
> > So it looks fine.
> 
> With the current state whereby we do not have the necessary hooks to
> perform filtering on non-bridged/standalone ports, this is entirely fine
> indeed.
> 
> In the future this is part of something I want to address because it is
> IMHO highly undesirable to have non-bridged ports be flooded with
> unknown multicast or unknown unicast for that matter because that makes
> them deviate from a standard NIC interface. Unknown unicast is not
> necessarily a low hanging fruit, but still, if we have switches capable
> of filtering, we might as well make use of that. Of course, one
> difficulty is that we must not break running tcpdump on those DSA slave
> network interfaces.

Sorry, I think you have the wrong end of the stick.

For a non-bridged port, I am seeing _no_ traffic apart from that
explicitly sent out through that port.  In other words, there are
_no_ flooded frames coming out of the non-bridged port.

This patch appears to have no material effect on non-bridged ports.

> 
> > 
> >>
> >>>
> >>> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
> >>> ---
> >>>  drivers/net/dsa/mv88e6xxx/chip.c | 9 +++++----
> >>>  1 file changed, 5 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
> >>> index b75a865a293d..eb5e3d88374f 100644
> >>> --- a/drivers/net/dsa/mv88e6xxx/chip.c
> >>> +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> >>> @@ -2144,13 +2144,14 @@ static int mv88e6xxx_setup_message_port(struct mv88e6xxx_chip *chip, int port)
> >>>  static int mv88e6xxx_setup_egress_floods(struct mv88e6xxx_chip *chip, int port)
> >>>  {
> >>>  	struct dsa_switch *ds = chip->ds;
> >>> -	bool flood;
> >>>  
> >>> -	/* Upstream ports flood frames with unknown unicast or multicast DA */
> >>> -	flood = dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port);
> >>> +	/* Linux bridges are expected to flood unknown multicast and
> >>> +	 * unicast frames to all ports - as per the defaults specified
> >>> +	 * in the iproute2 bridge(8) man page. Not doing this causes
> >>> +	 * stalls and failures with IPv6 over Marvell bridges. */
> >>>  	if (chip->info->ops->port_set_egress_floods)
> >>>  		return chip->info->ops->port_set_egress_floods(chip, port,
> >>> -							       flood, flood);
> >>> +							       true, true);
> >>>  
> >>>  	return 0;
> >>>  }
> >>> -- 
> >>> 2.7.4
> >>>
> >>>
> >>
> >> -- 
> >> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> >> FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
> >> According to speedtest.net: 11.9Mbps down 500kbps up
> > 
> 
> -- 
> Florian
>
Florian Fainelli Feb. 17, 2019, 10:03 p.m. UTC | #5
On 2/17/2019 1:58 PM, Russell King - ARM Linux admin wrote:
> On Sun, Feb 17, 2019 at 01:45:24PM -0800, Florian Fainelli wrote:
>>
>>
>> On 2/17/2019 8:34 AM, Russell King - ARM Linux admin wrote:
>>> On Sun, Feb 17, 2019 at 02:27:16PM +0000, Russell King - ARM Linux admin wrote:
>>>> On Sun, Feb 17, 2019 at 02:25:17PM +0000, Russell King wrote:
>>>>> Switches work by learning the MAC address for each attached station by
>>>>> monitoring traffic from each station.  When a station sends a packet,
>>>>> the switch records which port the MAC address is connected to.
>>>>>
>>>>> With IPv4 networking, before communication commences with a neighbour,
>>>>> an ARP packet is broadcasted to all stations asking for the MAC address
>>>>> corresponding with the IPv4.  The desired station responds with an ARP
>>>>> reply, and the ARP reply causes the switch to learn which port the
>>>>> station is connected to.
>>>>>
>>>>> With IPv6 networking, the situation is rather different.  Rather than
>>>>> broadcasting ARP packets, a "neighbour solicitation" is multicasted
>>>>> rather than broadcasted.  This multicast needs to reach the intended
>>>>> station in order for the neighbour to be discovered.
>>>>>
>>>>> Once a neighbour has been discovered, and entered into the sending
>>>>> stations neighbour cache, communication can restart at a point later
>>>>> without sending a new neighbour solicitation, even if the entry in
>>>>> the neighbour cache is marked as stale.  This can be after the MAC
>>>>> address has expired from the forwarding cache of the DSA switch -
>>>>> when that occurs, there is a long pause in communication.
>>>>>
>>>>> Our DSA implementation for mv88e6xxx switches has defaulted to having
>>>>> multicast and unicast flooding disabled.  As per the above description,
>>>>> this is fine for IPv4 networking, since the broadcasted ARP queries
>>>>> will be sent to and received by all stations on the same network.
>>>>> However, this breaks IPv6 very badly - blocking neighbour solicitations
>>>>> and later causing connections to stall.
>>>>>
>>>>> The defaults that the Linux bridge code expect from bridges are that
>>>>> unknown unicast frames and unknown multicast frames are flooded to
>>>>> all stations, which is at odds to the defaults adopted by our DSA
>>>>> implementation for mv88e6xxx switches.
>>>>>
>>>>> This commit enables by default flooding of both unknown unicast and
>>>>> unknown multicast frames.  This means that mv88e6xxx DSA switches now
>>>>> behave as per the bridge(8) man page, and IPv6 works flawlessly through
>>>>> such a switch.
>>>>
>>>> Note that there is the open question whether this affects the case where
>>>> each port is used as a separate network interface: that case has not yet
>>>> been tested.
>>>
>>> I've checked with a mv88e6131 on the clearfog gt8k board.  lan1
>>> connected to my lan with plenty of traffic on, and configured as
>>> part of a bridge.  lan2 connected to the zii board, but not part
>>> of the bridge.  Monitoring lan2 from the zii board shows no traffic
>>> that was received from lan1.
>>>
>>> So it looks fine.
>>
>> With the current state whereby we do not have the necessary hooks to
>> perform filtering on non-bridged/standalone ports, this is entirely fine
>> indeed.
>>
>> In the future this is part of something I want to address because it is
>> IMHO highly undesirable to have non-bridged ports be flooded with
>> unknown multicast or unknown unicast for that matter because that makes
>> them deviate from a standard NIC interface. Unknown unicast is not
>> necessarily a low hanging fruit, but still, if we have switches capable
>> of filtering, we might as well make use of that. Of course, one
>> difficulty is that we must not break running tcpdump on those DSA slave
>> network interfaces.
> 
> Sorry, I think you have the wrong end of the stick.
> 
> For a non-bridged port, I am seeing _no_ traffic apart from that
> explicitly sent out through that port.  In other words, there are
> _no_ flooded frames coming out of the non-bridged port.
> 
> This patch appears to have no material effect on non-bridged ports.

Presumably because that non-bridged port and the CPU port are part of
the same domain with only those 2 ports and that is what we want.

Now what happens if say you have a station that sends multicast traffic
through that port to e.g.: 226.94.1.1, I bet that port happily sends
that multicast traffic to the CPU port with no filtering what so ever
and this ends-up being dropped in the network stack because there is a
socket look up failure there. IMHO unless you have a receiver/server on
that network interface on the DSA network interface and a matching
socket you should not be receiving that multicast traffic and the switch
should be filtering it. Since the network stack will call into
ndo_set_rx_mode() for those cases, we really just need to make that
multicast traffic known, instead of unknown to the switch.
Russell King (Oracle) Feb. 17, 2019, 10:19 p.m. UTC | #6
On Sun, Feb 17, 2019 at 02:03:40PM -0800, Florian Fainelli wrote:
> 
> 
> On 2/17/2019 1:58 PM, Russell King - ARM Linux admin wrote:
> > On Sun, Feb 17, 2019 at 01:45:24PM -0800, Florian Fainelli wrote:
> >>
> >>
> >> On 2/17/2019 8:34 AM, Russell King - ARM Linux admin wrote:
> >>> On Sun, Feb 17, 2019 at 02:27:16PM +0000, Russell King - ARM Linux admin wrote:
> >>>> On Sun, Feb 17, 2019 at 02:25:17PM +0000, Russell King wrote:
> >>>>> Switches work by learning the MAC address for each attached station by
> >>>>> monitoring traffic from each station.  When a station sends a packet,
> >>>>> the switch records which port the MAC address is connected to.
> >>>>>
> >>>>> With IPv4 networking, before communication commences with a neighbour,
> >>>>> an ARP packet is broadcasted to all stations asking for the MAC address
> >>>>> corresponding with the IPv4.  The desired station responds with an ARP
> >>>>> reply, and the ARP reply causes the switch to learn which port the
> >>>>> station is connected to.
> >>>>>
> >>>>> With IPv6 networking, the situation is rather different.  Rather than
> >>>>> broadcasting ARP packets, a "neighbour solicitation" is multicasted
> >>>>> rather than broadcasted.  This multicast needs to reach the intended
> >>>>> station in order for the neighbour to be discovered.
> >>>>>
> >>>>> Once a neighbour has been discovered, and entered into the sending
> >>>>> stations neighbour cache, communication can restart at a point later
> >>>>> without sending a new neighbour solicitation, even if the entry in
> >>>>> the neighbour cache is marked as stale.  This can be after the MAC
> >>>>> address has expired from the forwarding cache of the DSA switch -
> >>>>> when that occurs, there is a long pause in communication.
> >>>>>
> >>>>> Our DSA implementation for mv88e6xxx switches has defaulted to having
> >>>>> multicast and unicast flooding disabled.  As per the above description,
> >>>>> this is fine for IPv4 networking, since the broadcasted ARP queries
> >>>>> will be sent to and received by all stations on the same network.
> >>>>> However, this breaks IPv6 very badly - blocking neighbour solicitations
> >>>>> and later causing connections to stall.
> >>>>>
> >>>>> The defaults that the Linux bridge code expect from bridges are that
> >>>>> unknown unicast frames and unknown multicast frames are flooded to
> >>>>> all stations, which is at odds to the defaults adopted by our DSA
> >>>>> implementation for mv88e6xxx switches.
> >>>>>
> >>>>> This commit enables by default flooding of both unknown unicast and
> >>>>> unknown multicast frames.  This means that mv88e6xxx DSA switches now
> >>>>> behave as per the bridge(8) man page, and IPv6 works flawlessly through
> >>>>> such a switch.
> >>>>
> >>>> Note that there is the open question whether this affects the case where
> >>>> each port is used as a separate network interface: that case has not yet
> >>>> been tested.
> >>>
> >>> I've checked with a mv88e6131 on the clearfog gt8k board.  lan1
> >>> connected to my lan with plenty of traffic on, and configured as
> >>> part of a bridge.  lan2 connected to the zii board, but not part
> >>> of the bridge.  Monitoring lan2 from the zii board shows no traffic
> >>> that was received from lan1.
> >>>
> >>> So it looks fine.
> >>
> >> With the current state whereby we do not have the necessary hooks to
> >> perform filtering on non-bridged/standalone ports, this is entirely fine
> >> indeed.
> >>
> >> In the future this is part of something I want to address because it is
> >> IMHO highly undesirable to have non-bridged ports be flooded with
> >> unknown multicast or unknown unicast for that matter because that makes
> >> them deviate from a standard NIC interface. Unknown unicast is not
> >> necessarily a low hanging fruit, but still, if we have switches capable
> >> of filtering, we might as well make use of that. Of course, one
> >> difficulty is that we must not break running tcpdump on those DSA slave
> >> network interfaces.
> > 
> > Sorry, I think you have the wrong end of the stick.
> > 
> > For a non-bridged port, I am seeing _no_ traffic apart from that
> > explicitly sent out through that port.  In other words, there are
> > _no_ flooded frames coming out of the non-bridged port.
> > 
> > This patch appears to have no material effect on non-bridged ports.
> 
> Presumably because that non-bridged port and the CPU port are part of
> the same domain with only those 2 ports and that is what we want.
> 
> Now what happens if say you have a station that sends multicast traffic
> through that port to e.g.: 226.94.1.1, I bet that port happily sends
> that multicast traffic to the CPU port with no filtering what so ever
> and this ends-up being dropped in the network stack because there is a
> socket look up failure there. IMHO unless you have a receiver/server on
> that network interface on the DSA network interface and a matching
> socket you should not be receiving that multicast traffic and the switch
> should be filtering it. Since the network stack will call into
> ndo_set_rx_mode() for those cases, we really just need to make that
> multicast traffic known, instead of unknown to the switch.

If the port is not bridged, then it's operating as network interface,
and traffic to/from that port needs to be routed to the CPU port so
that it appears as it would do from a real network interface.
Doing anything else makes breaks the idea that you can use a set
of DSA ports as individual interfaces and run anything but IPv4
non-multicast over them.
Florian Fainelli Feb. 17, 2019, 10:30 p.m. UTC | #7
On 2/17/2019 2:19 PM, Russell King - ARM Linux admin wrote:
> On Sun, Feb 17, 2019 at 02:03:40PM -0800, Florian Fainelli wrote:
>>
>>
>> On 2/17/2019 1:58 PM, Russell King - ARM Linux admin wrote:
>>> On Sun, Feb 17, 2019 at 01:45:24PM -0800, Florian Fainelli wrote:
>>>>
>>>>
>>>> On 2/17/2019 8:34 AM, Russell King - ARM Linux admin wrote:
>>>>> On Sun, Feb 17, 2019 at 02:27:16PM +0000, Russell King - ARM Linux admin wrote:
>>>>>> On Sun, Feb 17, 2019 at 02:25:17PM +0000, Russell King wrote:
>>>>>>> Switches work by learning the MAC address for each attached station by
>>>>>>> monitoring traffic from each station.  When a station sends a packet,
>>>>>>> the switch records which port the MAC address is connected to.
>>>>>>>
>>>>>>> With IPv4 networking, before communication commences with a neighbour,
>>>>>>> an ARP packet is broadcasted to all stations asking for the MAC address
>>>>>>> corresponding with the IPv4.  The desired station responds with an ARP
>>>>>>> reply, and the ARP reply causes the switch to learn which port the
>>>>>>> station is connected to.
>>>>>>>
>>>>>>> With IPv6 networking, the situation is rather different.  Rather than
>>>>>>> broadcasting ARP packets, a "neighbour solicitation" is multicasted
>>>>>>> rather than broadcasted.  This multicast needs to reach the intended
>>>>>>> station in order for the neighbour to be discovered.
>>>>>>>
>>>>>>> Once a neighbour has been discovered, and entered into the sending
>>>>>>> stations neighbour cache, communication can restart at a point later
>>>>>>> without sending a new neighbour solicitation, even if the entry in
>>>>>>> the neighbour cache is marked as stale.  This can be after the MAC
>>>>>>> address has expired from the forwarding cache of the DSA switch -
>>>>>>> when that occurs, there is a long pause in communication.
>>>>>>>
>>>>>>> Our DSA implementation for mv88e6xxx switches has defaulted to having
>>>>>>> multicast and unicast flooding disabled.  As per the above description,
>>>>>>> this is fine for IPv4 networking, since the broadcasted ARP queries
>>>>>>> will be sent to and received by all stations on the same network.
>>>>>>> However, this breaks IPv6 very badly - blocking neighbour solicitations
>>>>>>> and later causing connections to stall.
>>>>>>>
>>>>>>> The defaults that the Linux bridge code expect from bridges are that
>>>>>>> unknown unicast frames and unknown multicast frames are flooded to
>>>>>>> all stations, which is at odds to the defaults adopted by our DSA
>>>>>>> implementation for mv88e6xxx switches.
>>>>>>>
>>>>>>> This commit enables by default flooding of both unknown unicast and
>>>>>>> unknown multicast frames.  This means that mv88e6xxx DSA switches now
>>>>>>> behave as per the bridge(8) man page, and IPv6 works flawlessly through
>>>>>>> such a switch.
>>>>>>
>>>>>> Note that there is the open question whether this affects the case where
>>>>>> each port is used as a separate network interface: that case has not yet
>>>>>> been tested.
>>>>>
>>>>> I've checked with a mv88e6131 on the clearfog gt8k board.  lan1
>>>>> connected to my lan with plenty of traffic on, and configured as
>>>>> part of a bridge.  lan2 connected to the zii board, but not part
>>>>> of the bridge.  Monitoring lan2 from the zii board shows no traffic
>>>>> that was received from lan1.
>>>>>
>>>>> So it looks fine.
>>>>
>>>> With the current state whereby we do not have the necessary hooks to
>>>> perform filtering on non-bridged/standalone ports, this is entirely fine
>>>> indeed.
>>>>
>>>> In the future this is part of something I want to address because it is
>>>> IMHO highly undesirable to have non-bridged ports be flooded with
>>>> unknown multicast or unknown unicast for that matter because that makes
>>>> them deviate from a standard NIC interface. Unknown unicast is not
>>>> necessarily a low hanging fruit, but still, if we have switches capable
>>>> of filtering, we might as well make use of that. Of course, one
>>>> difficulty is that we must not break running tcpdump on those DSA slave
>>>> network interfaces.
>>>
>>> Sorry, I think you have the wrong end of the stick.
>>>
>>> For a non-bridged port, I am seeing _no_ traffic apart from that
>>> explicitly sent out through that port.  In other words, there are
>>> _no_ flooded frames coming out of the non-bridged port.
>>>
>>> This patch appears to have no material effect on non-bridged ports.
>>
>> Presumably because that non-bridged port and the CPU port are part of
>> the same domain with only those 2 ports and that is what we want.
>>
>> Now what happens if say you have a station that sends multicast traffic
>> through that port to e.g.: 226.94.1.1, I bet that port happily sends
>> that multicast traffic to the CPU port with no filtering what so ever
>> and this ends-up being dropped in the network stack because there is a
>> socket look up failure there. IMHO unless you have a receiver/server on
>> that network interface on the DSA network interface and a matching
>> socket you should not be receiving that multicast traffic and the switch
>> should be filtering it. Since the network stack will call into
>> ndo_set_rx_mode() for those cases, we really just need to make that
>> multicast traffic known, instead of unknown to the switch.
> 
> If the port is not bridged, then it's operating as network interface,
> and traffic to/from that port needs to be routed to the CPU port so
> that it appears as it would do from a real network interface.
> Doing anything else makes breaks the idea that you can use a set
> of DSA ports as individual interfaces and run anything but IPv4
> non-multicast over them.

I am not proposing changing any of that just making use of the switch's
ability to snoop management traffic (ARP, DHCP, MLD, etc.) and use of
the network stack's hints in order to avoid flooding unknown multicast.
This is starting to diverge a little bit from the original issue though.
diff mbox series

Patch

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index b75a865a293d..eb5e3d88374f 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -2144,13 +2144,14 @@  static int mv88e6xxx_setup_message_port(struct mv88e6xxx_chip *chip, int port)
 static int mv88e6xxx_setup_egress_floods(struct mv88e6xxx_chip *chip, int port)
 {
 	struct dsa_switch *ds = chip->ds;
-	bool flood;
 
-	/* Upstream ports flood frames with unknown unicast or multicast DA */
-	flood = dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port);
+	/* Linux bridges are expected to flood unknown multicast and
+	 * unicast frames to all ports - as per the defaults specified
+	 * in the iproute2 bridge(8) man page. Not doing this causes
+	 * stalls and failures with IPv6 over Marvell bridges. */
 	if (chip->info->ops->port_set_egress_floods)
 		return chip->info->ops->port_set_egress_floods(chip, port,
-							       flood, flood);
+							       true, true);
 
 	return 0;
 }