diff mbox series

[net-next,v2,3/3] net: dsa: mv88e6xxx: default to multicast and unicast flooding

Message ID E1gvPMu-0000Ax-Ba@rmk-PC.armlinux.org.uk
State Superseded
Delegated to: David Miller
Headers show
Series net: dsa: mv88e6xxx: fix IPv6 | expand

Commit Message

Russell King (Oracle) Feb. 17, 2019, 4:32 p.m. UTC
Switches work by learning the MAC address for each attached station by
monitoring traffic from each station.  When a station sends a packet,
the switch records which port the MAC address is connected to.

With IPv4 networking, before communication commences with a neighbour,
an ARP packet is broadcasted to all stations asking for the MAC address
corresponding with the IPv4.  The desired station responds with an ARP
reply, and the ARP reply causes the switch to learn which port the
station is connected to.

With IPv6 networking, the situation is rather different.  Rather than
broadcasting ARP packets, a "neighbour solicitation" is multicasted
rather than broadcasted.  This multicast needs to reach the intended
station in order for the neighbour to be discovered.

Once a neighbour has been discovered, and entered into the sending
stations neighbour cache, communication can restart at a point later
without sending a new neighbour solicitation, even if the entry in
the neighbour cache is marked as stale.  This can be after the MAC
address has expired from the forwarding cache of the DSA switch -
when that occurs, there is a long pause in communication.

Our DSA implementation for mv88e6xxx switches has defaulted to having
multicast and unicast flooding disabled.  As per the above description,
this is fine for IPv4 networking, since the broadcasted ARP queries
will be sent to and received by all stations on the same network.
However, this breaks IPv6 very badly - blocking neighbour solicitations
and later causing connections to stall.

The defaults that the Linux bridge code expect from bridges are that
unknown unicast frames and unknown multicast frames are flooded to
all stations, which is at odds to the defaults adopted by our DSA
implementation for mv88e6xxx switches.

This commit enables by default flooding of both unknown unicast and
unknown multicast frames.  This means that mv88e6xxx DSA switches now
behave as per the bridge(8) man page, and IPv6 works flawlessly through
such a switch.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
---
 drivers/net/dsa/mv88e6xxx/chip.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

Comments

Russell King (Oracle) Feb. 18, 2019, 12:53 p.m. UTC | #1
On Sun, Feb 17, 2019 at 04:32:40PM +0000, Russell King wrote:
> Switches work by learning the MAC address for each attached station by
> monitoring traffic from each station.  When a station sends a packet,
> the switch records which port the MAC address is connected to.
> 
> With IPv4 networking, before communication commences with a neighbour,
> an ARP packet is broadcasted to all stations asking for the MAC address
> corresponding with the IPv4.  The desired station responds with an ARP
> reply, and the ARP reply causes the switch to learn which port the
> station is connected to.
> 
> With IPv6 networking, the situation is rather different.  Rather than
> broadcasting ARP packets, a "neighbour solicitation" is multicasted
> rather than broadcasted.  This multicast needs to reach the intended
> station in order for the neighbour to be discovered.
> 
> Once a neighbour has been discovered, and entered into the sending
> stations neighbour cache, communication can restart at a point later
> without sending a new neighbour solicitation, even if the entry in
> the neighbour cache is marked as stale.  This can be after the MAC
> address has expired from the forwarding cache of the DSA switch -
> when that occurs, there is a long pause in communication.
> 
> Our DSA implementation for mv88e6xxx switches has defaulted to having
> multicast and unicast flooding disabled.  As per the above description,
> this is fine for IPv4 networking, since the broadcasted ARP queries
> will be sent to and received by all stations on the same network.
> However, this breaks IPv6 very badly - blocking neighbour solicitations
> and later causing connections to stall.
> 
> The defaults that the Linux bridge code expect from bridges are that
> unknown unicast frames and unknown multicast frames are flooded to
> all stations, which is at odds to the defaults adopted by our DSA
> implementation for mv88e6xxx switches.
> 
> This commit enables by default flooding of both unknown unicast and
> unknown multicast frames.  This means that mv88e6xxx DSA switches now
> behave as per the bridge(8) man page, and IPv6 works flawlessly through
> such a switch.

Thinking about this a bit more, this approach probably isn't the best.
If we have a port that goes through this life-cycle:

1. assigned to a bridge
2. configured not to flood
3. reassigned to a new bridge

the port will retain its settings from the first bridge, which will be
at odds with the settings that the Linux bridge code expects and the
settings visible to the user.

So, how about this, which basically reverts this patch and applies the
flood settings each time a port joins a bridge, and clears them when
the port leaves a bridge.

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 5ac2c93b7cee..6f68c35d416f 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -1962,6 +1962,13 @@ static int mv88e6xxx_port_bridge_join(struct dsa_switch *ds, int port,
 
 	mutex_lock(&chip->reg_lock);
 	err = mv88e6xxx_bridge_map(chip, br);
+	/* Linux bridges are expected to flood unknown multicast and
+	 * unicast frames to all ports - as per the defaults specified
+	 * in the iproute2 bridge(8) man page. Not doing this causes
+	 * stalls and failures with IPv6 over Marvell bridges. */
+	if (err == 0 && chip->info->ops->port_set_egress_floods)
+		err = chip->info->ops->port_set_egress_floods(chip, port,
+							      true, true);
 	mutex_unlock(&chip->reg_lock);
 
 	return err;
@@ -1976,6 +1983,9 @@ static void mv88e6xxx_port_bridge_leave(struct dsa_switch *ds, int port,
 	if (mv88e6xxx_bridge_map(chip, br) ||
 	    mv88e6xxx_port_vlan_map(chip, port))
 		dev_err(ds->dev, "failed to remap in-chip Port VLAN\n");
+	if (chip->info->ops->port_set_egress_floods &&
+	    chip->info->ops->port_set_egress_floods(chip, port, false, false))
+		dev_err(ds->dev, "failed to disable port floods\n");
 	mutex_unlock(&chip->reg_lock);
 }
 
@@ -2134,13 +2144,14 @@ static int mv88e6xxx_setup_message_port(struct mv88e6xxx_chip *chip, int port)
 
 static int mv88e6xxx_setup_egress_floods(struct mv88e6xxx_chip *chip, int port)
 {
-	/* Linux bridges are expected to flood unknown multicast and
-	 * unicast frames to all ports - as per the defaults specified
-	 * in the iproute2 bridge(8) man page. Not doing this causes
-	 * stalls and failures with IPv6 over Marvell bridges. */
+	struct dsa_switch *ds = chip->ds;
+	bool flood;
+
+	/* Upstream ports flood frames with unknown unicast or multicast DA */
+	flood = dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port);
 	if (chip->info->ops->port_set_egress_floods)
 		return chip->info->ops->port_set_egress_floods(chip, port,
-							       true, true);
+							       flood, flood);
 
 	return 0;
 }
Vivien Didelot Feb. 19, 2019, 4:05 p.m. UTC | #2
Hi Russell,

On Mon, 18 Feb 2019 12:53:45 +0000, Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote:
> On Sun, Feb 17, 2019 at 04:32:40PM +0000, Russell King wrote:
> > Switches work by learning the MAC address for each attached station by
> > monitoring traffic from each station.  When a station sends a packet,
> > the switch records which port the MAC address is connected to.
> > 
> > With IPv4 networking, before communication commences with a neighbour,
> > an ARP packet is broadcasted to all stations asking for the MAC address
> > corresponding with the IPv4.  The desired station responds with an ARP
> > reply, and the ARP reply causes the switch to learn which port the
> > station is connected to.
> > 
> > With IPv6 networking, the situation is rather different.  Rather than
> > broadcasting ARP packets, a "neighbour solicitation" is multicasted
> > rather than broadcasted.  This multicast needs to reach the intended
> > station in order for the neighbour to be discovered.
> > 
> > Once a neighbour has been discovered, and entered into the sending
> > stations neighbour cache, communication can restart at a point later
> > without sending a new neighbour solicitation, even if the entry in
> > the neighbour cache is marked as stale.  This can be after the MAC
> > address has expired from the forwarding cache of the DSA switch -
> > when that occurs, there is a long pause in communication.

Thank you for the very informative message above.

> > Our DSA implementation for mv88e6xxx switches has defaulted to having
> > multicast and unicast flooding disabled.  As per the above description,
> > this is fine for IPv4 networking, since the broadcasted ARP queries
> > will be sent to and received by all stations on the same network.
> > However, this breaks IPv6 very badly - blocking neighbour solicitations
> > and later causing connections to stall.
> > 
> > The defaults that the Linux bridge code expect from bridges are that
> > unknown unicast frames and unknown multicast frames are flooded to
> > all stations, which is at odds to the defaults adopted by our DSA
> > implementation for mv88e6xxx switches.
> > 
> > This commit enables by default flooding of both unknown unicast and
> > unknown multicast frames.  This means that mv88e6xxx DSA switches now
> > behave as per the bridge(8) man page, and IPv6 works flawlessly through
> > such a switch.
> 
> Thinking about this a bit more, this approach probably isn't the best.
> If we have a port that goes through this life-cycle:
> 
> 1. assigned to a bridge
> 2. configured not to flood
> 3. reassigned to a new bridge
> 
> the port will retain its settings from the first bridge, which will be
> at odds with the settings that the Linux bridge code expects and the
> settings visible to the user.
> 
> So, how about this, which basically reverts this patch and applies the
> flood settings each time a port joins a bridge, and clears them when
> the port leaves a bridge.

Isn't the bridge code programming flooding on the port correctly on leave/join,
because the BR_*FLOOD flags have been learned? I would expect that.


Thanks,

	Vivien
Russell King (Oracle) Feb. 19, 2019, 4:18 p.m. UTC | #3
On Tue, Feb 19, 2019 at 11:05:37AM -0500, Vivien Didelot wrote:
> Hi Russell,
> 
> On Mon, 18 Feb 2019 12:53:45 +0000, Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote:
> > On Sun, Feb 17, 2019 at 04:32:40PM +0000, Russell King wrote:
> > > Switches work by learning the MAC address for each attached station by
> > > monitoring traffic from each station.  When a station sends a packet,
> > > the switch records which port the MAC address is connected to.
> > > 
> > > With IPv4 networking, before communication commences with a neighbour,
> > > an ARP packet is broadcasted to all stations asking for the MAC address
> > > corresponding with the IPv4.  The desired station responds with an ARP
> > > reply, and the ARP reply causes the switch to learn which port the
> > > station is connected to.
> > > 
> > > With IPv6 networking, the situation is rather different.  Rather than
> > > broadcasting ARP packets, a "neighbour solicitation" is multicasted
> > > rather than broadcasted.  This multicast needs to reach the intended
> > > station in order for the neighbour to be discovered.
> > > 
> > > Once a neighbour has been discovered, and entered into the sending
> > > stations neighbour cache, communication can restart at a point later
> > > without sending a new neighbour solicitation, even if the entry in
> > > the neighbour cache is marked as stale.  This can be after the MAC
> > > address has expired from the forwarding cache of the DSA switch -
> > > when that occurs, there is a long pause in communication.
> 
> Thank you for the very informative message above.
> 
> > > Our DSA implementation for mv88e6xxx switches has defaulted to having
> > > multicast and unicast flooding disabled.  As per the above description,
> > > this is fine for IPv4 networking, since the broadcasted ARP queries
> > > will be sent to and received by all stations on the same network.
> > > However, this breaks IPv6 very badly - blocking neighbour solicitations
> > > and later causing connections to stall.
> > > 
> > > The defaults that the Linux bridge code expect from bridges are that
> > > unknown unicast frames and unknown multicast frames are flooded to
> > > all stations, which is at odds to the defaults adopted by our DSA
> > > implementation for mv88e6xxx switches.
> > > 
> > > This commit enables by default flooding of both unknown unicast and
> > > unknown multicast frames.  This means that mv88e6xxx DSA switches now
> > > behave as per the bridge(8) man page, and IPv6 works flawlessly through
> > > such a switch.
> > 
> > Thinking about this a bit more, this approach probably isn't the best.
> > If we have a port that goes through this life-cycle:
> > 
> > 1. assigned to a bridge
> > 2. configured not to flood
> > 3. reassigned to a new bridge
> > 
> > the port will retain its settings from the first bridge, which will be
> > at odds with the settings that the Linux bridge code expects and the
> > settings visible to the user.
> > 
> > So, how about this, which basically reverts this patch and applies the
> > flood settings each time a port joins a bridge, and clears them when
> > the port leaves a bridge.
> 
> Isn't the bridge code programming flooding on the port correctly on leave/join,
> because the BR_*FLOOD flags have been learned? I would expect that.

If you're asking whether the bridge code sends a
SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS message on leave/join, it seems
that it does not.

There is only one place in the bridge code that this message is
generated, that is in net/bridge/br_switchdev.c
br_switchdev_set_port_flag().  That is called from one place in the
bridge code, which is br_set_port_flag() in net/bridge/br_netlink.c,
which is in response to a RTM_SETLINK netlink message where the bridge
code processes all the various bridge link options.

There appears to be no call when adding or removing an interface to/
from a bridge.
diff mbox series

Patch

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 72db6e74be48..c1bcd13af13f 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -2143,14 +2143,13 @@  static int mv88e6xxx_setup_message_port(struct mv88e6xxx_chip *chip, int port)
 
 static int mv88e6xxx_setup_egress_floods(struct mv88e6xxx_chip *chip, int port)
 {
-	struct dsa_switch *ds = chip->ds;
-	bool flood;
-
-	/* Upstream ports flood frames with unknown unicast or multicast DA */
-	flood = dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port);
+	/* Linux bridges are expected to flood unknown multicast and
+	 * unicast frames to all ports - as per the defaults specified
+	 * in the iproute2 bridge(8) man page. Not doing this causes
+	 * stalls and failures with IPv6 over Marvell bridges. */
 	if (chip->info->ops->port_set_egress_floods)
 		return chip->info->ops->port_set_egress_floods(chip, port,
-							       flood, flood);
+							       true, true);
 
 	return 0;
 }