diff mbox series

[RFC,4/4] net: dsa: tag_edsa: support reception of packets from lag devices

Message ID 20201027105117.23052-5-tobias@waldekranz.com
State RFC
Delegated to: David Miller
Headers show
Series net: dsa: link aggregation support | expand

Checks

Context Check Description
jkicinski/cover_letter success Link
jkicinski/fixes_present success Link
jkicinski/patch_count success Link
jkicinski/tree_selection success Guessed tree name to be net-next
jkicinski/subject_prefix warning Target tree name not specified in the subject
jkicinski/source_inline success Was 0 now: 0
jkicinski/verify_signedoff success Link
jkicinski/module_param success Was 0 now: 0
jkicinski/build_32bit fail Errors and warnings before: 4 this patch: 4
jkicinski/kdoc success Errors and warnings before: 0 this patch: 0
jkicinski/verify_fixes success Link
jkicinski/checkpatch fail Link
jkicinski/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
jkicinski/header_inline success Link
jkicinski/stable success Stable not CCed

Commit Message

Tobias Waldekranz Oct. 27, 2020, 10:51 a.m. UTC
Packets ingressing on a LAG that egress on the CPU port, which are not
classified as management, will have a FORWARD tag that does not
contain the normal source device/port tuple. Instead the trunk bit
will be set, and the port field holds the LAG id.

Since the exact source port information is not available in the tag,
frames are injected directly on the LAG interface and thus do never
pass through any DSA port interface on ingress.

Management frames (TO_CPU) are not affected and will pass through the
DSA port interface as usual.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
---
 net/dsa/dsa.c      | 23 +++++++++++++----------
 net/dsa/tag_edsa.c | 12 +++++++++++-
 2 files changed, 24 insertions(+), 11 deletions(-)

Comments

Vladimir Oltean Oct. 28, 2020, 12:05 p.m. UTC | #1
On Tue, Oct 27, 2020 at 11:51:17AM +0100, Tobias Waldekranz wrote:
> Packets ingressing on a LAG that egress on the CPU port, which are not
> classified as management, will have a FORWARD tag that does not
> contain the normal source device/port tuple. Instead the trunk bit
> will be set, and the port field holds the LAG id.
> 
> Since the exact source port information is not available in the tag,
> frames are injected directly on the LAG interface and thus do never
> pass through any DSA port interface on ingress.
> 
> Management frames (TO_CPU) are not affected and will pass through the
> DSA port interface as usual.
> 
> Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
> ---
>  net/dsa/dsa.c      | 23 +++++++++++++----------
>  net/dsa/tag_edsa.c | 12 +++++++++++-
>  2 files changed, 24 insertions(+), 11 deletions(-)
> 
> diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
> index 2131bf2b3a67..b84e5f0be049 100644
> --- a/net/dsa/dsa.c
> +++ b/net/dsa/dsa.c
> @@ -220,7 +220,6 @@ static int dsa_switch_rcv(struct sk_buff *skb, struct net_device *dev,
>  	}
>  
>  	skb = nskb;
> -	p = netdev_priv(skb->dev);
>  	skb_push(skb, ETH_HLEN);
>  	skb->pkt_type = PACKET_HOST;
>  	skb->protocol = eth_type_trans(skb, skb->dev);
> @@ -234,17 +233,21 @@ static int dsa_switch_rcv(struct sk_buff *skb, struct net_device *dev,
>  		skb = nskb;
>  	}
>  
> -	s = this_cpu_ptr(p->stats64);
> -	u64_stats_update_begin(&s->syncp);
> -	s->rx_packets++;
> -	s->rx_bytes += skb->len;
> -	u64_stats_update_end(&s->syncp);
> +	if (dsa_slave_dev_check(skb->dev)) {
> +		p = netdev_priv(skb->dev);
> +		s = this_cpu_ptr(p->stats64);
> +		u64_stats_update_begin(&s->syncp);
> +		s->rx_packets++;
> +		s->rx_bytes += skb->len;
> +		u64_stats_update_end(&s->syncp);
>  
> -	if (dsa_skb_defer_rx_timestamp(p, skb))
> -		return 0;
> -
> -	gro_cells_receive(&p->gcells, skb);
> +		if (dsa_skb_defer_rx_timestamp(p, skb))
> +			return 0;
>  
> +		gro_cells_receive(&p->gcells, skb);
> +	} else {
> +		netif_rx(skb);
> +	}
>  	return 0;
>  }
>  

On one hand, I feel pretty yucky about this change.
On the other hand, I wonder if it might be useful under some conditions
for drivers with DSA_TAG_PROTO_NONE? For example, once the user bridges
all slave interfaces, then that bridge will start being able to send and
receive traffic, despite none of the individual switch ports being able
to do that. Then, you could even go off and bridge a "foreign" interface,
and that would again work properly. That use case is not supported today,
but is very useful.

Thoughts?
Ido Schimmel Nov. 1, 2020, 11:31 a.m. UTC | #2
On Wed, Oct 28, 2020 at 08:18:24PM +0200, Vladimir Oltean wrote:
> Yes, I expect that the bridge input would need to have one more entry
> path into it than just br_handle_frame.
> 
> I'm a bit confused and undecided right now, so let's look at it from a
> different perspective. Let's imagine a switchdev driver (DSA or not)
> which is able to offload IP forwarding. There are some interfaces that
> are bridged and one that is standalone. The setup looks as below.
> 
>  IP interfaces
>                 +---------------------------------------------------------+
>                 |                           br0                           |
>                 +---------------------------------------------------------+
> 
>  +------------+ +------------+ +------------+ +------------+ +------------+
>  |    swp0    | |    swp1    | |    swp2    | |    swp3    | |    eth0    |
>  +------------+ +------------+ +------------+ +------------+ +------------+
> 
>  Hardware interfaces
> 
>  +------------+ +------------+ +------------+ +------------+ +------------+
>  | DSA port 0 | | DSA port 1 | | DSA port 2 | | DSA port 3 | |   e1000    |
>  +------------+ +------------+ +------------+ +------------+ +------------+
> 
> Let's say you receive a packet on the standalone swp0, and you need to
> perform IP routing towards the bridged domain br0. Some switchdev/DSA
> ports are bridged and some aren't.
> 
> The switchdev/DSA switch will attempt to do the IP routing step first,
> and it _can_ do that because it is aware of the br0 interface, so it
> will decrement the TTL and replace the L2 header.
> 
> At this stage we have a modified IP packet, which corresponds with what
> should be injected into the hardware's view of the br0 interface. The
> packet is still in the switchdev/DSA hardware data path.
> 
> But then, the switchdev/DSA hardware will look up the FDB in the name of
> br0, in an attempt of finding the destination port for the packet. But
> the packet should be delivered to a station connected to eth0 (e1000,
> foreign interface). So that's part of the exception path, the packet
> should be delivered to the CPU.
> 
> But the packet was already modified by the hardware data path (IP
> forwarding has already taken place)! So how should the DSA/switchdev
> hardware deliver the packet to the CPU? It has 2 options:
> 
> (a) unwind the entire packet modification, cancel the IP forwarding and
>     deliver the unmodified packet to the CPU on behalf of swp0, the
>     ingress port. Then let software IP forwarding plus software bridging
>     deal with it, so that it can reach the e1000.

This is what happens in the Spectrum ASICs. If a packet hits some
exception in the data path, it is trapped from the Rx port unmodified.

> (b) deliver the packet to the CPU in the middle of the hardware
>     forwarding data path, where the exception/miss occurred, aka deliver
>     it on behalf of br0. Modified by IP forwarding. This is where we'd
>     have to manually inject skb->dev into br0 somehow.
diff mbox series

Patch

diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 2131bf2b3a67..b84e5f0be049 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -220,7 +220,6 @@  static int dsa_switch_rcv(struct sk_buff *skb, struct net_device *dev,
 	}
 
 	skb = nskb;
-	p = netdev_priv(skb->dev);
 	skb_push(skb, ETH_HLEN);
 	skb->pkt_type = PACKET_HOST;
 	skb->protocol = eth_type_trans(skb, skb->dev);
@@ -234,17 +233,21 @@  static int dsa_switch_rcv(struct sk_buff *skb, struct net_device *dev,
 		skb = nskb;
 	}
 
-	s = this_cpu_ptr(p->stats64);
-	u64_stats_update_begin(&s->syncp);
-	s->rx_packets++;
-	s->rx_bytes += skb->len;
-	u64_stats_update_end(&s->syncp);
+	if (dsa_slave_dev_check(skb->dev)) {
+		p = netdev_priv(skb->dev);
+		s = this_cpu_ptr(p->stats64);
+		u64_stats_update_begin(&s->syncp);
+		s->rx_packets++;
+		s->rx_bytes += skb->len;
+		u64_stats_update_end(&s->syncp);
 
-	if (dsa_skb_defer_rx_timestamp(p, skb))
-		return 0;
-
-	gro_cells_receive(&p->gcells, skb);
+		if (dsa_skb_defer_rx_timestamp(p, skb))
+			return 0;
 
+		gro_cells_receive(&p->gcells, skb);
+	} else {
+		netif_rx(skb);
+	}
 	return 0;
 }
 
diff --git a/net/dsa/tag_edsa.c b/net/dsa/tag_edsa.c
index 120614240319..800b02f04394 100644
--- a/net/dsa/tag_edsa.c
+++ b/net/dsa/tag_edsa.c
@@ -86,6 +86,7 @@  static struct sk_buff *edsa_xmit(struct sk_buff *skb, struct net_device *dev)
 static struct sk_buff *edsa_rcv(struct sk_buff *skb, struct net_device *dev,
 				struct packet_type *pt)
 {
+	bool trunk = false;
 	u8 *edsa_header;
 	int frame_type;
 	int code;
@@ -120,6 +121,7 @@  static struct sk_buff *edsa_rcv(struct sk_buff *skb, struct net_device *dev,
 		break;
 
 	case FRAME_TYPE_FORWARD:
+		trunk = !!(edsa_header[1] & 7);
 		skb->offload_fwd_mark = 1;
 		break;
 
@@ -133,7 +135,15 @@  static struct sk_buff *edsa_rcv(struct sk_buff *skb, struct net_device *dev,
 	source_device = edsa_header[0] & 0x1f;
 	source_port = (edsa_header[1] >> 3) & 0x1f;
 
-	skb->dev = dsa_master_find_slave(dev, source_device, source_port);
+	if (trunk) {
+		struct dsa_port *cpu_dp = dev->dsa_ptr;
+
+		skb->dev = dsa_lag_dev_by_id(cpu_dp->dst, source_port);
+	} else {
+		skb->dev = dsa_master_find_slave(dev, source_device,
+						 source_port);
+	}
+
 	if (!skb->dev)
 		return NULL;