Message ID | 1325475154-15997-1-git-send-email-david.ward@ll.mit.edu |
---|---|
State | Rejected, archived |
Delegated to: | David Miller |
Headers | show |
On Sun, Jan 01, 2012 at 10:32:34PM -0500, David Ward wrote: > For IPsec tunnel mode (or BEET mode), after inbound packets are xfrm'ed, > call the IPv4/IPv6 receive handler directly instead of calling netif_rx. > In addition to avoiding unneeded re-processing of the MAC layer, packets > will not be received a second time on network taps. (Note that outbound > packets are only received on network taps post-xfrm, but inbound packets > were being received both pre- and post-xfrm. So now network taps will > receive packets in either direction only once, in the form that they go > "over the wire".) > > Signed-off-by: David Ward <david.ward@ll.mit.edu> > Cc: Herbert Xu <herbert@gondor.apana.org.au> You can't do this as this may cause stack overruns if we nest too deeply. Changing the existing tap processing behaviour will also break existing setups. Cheers,
Le lundi 02 janvier 2012 à 18:28 +1100, Herbert Xu a écrit : > You can't do this as this may cause stack overruns if we nest > too deeply. > I was considering adding a generic helper, for tunneling, taking into account the depth for current packet. [ calling netif_receive_skb() instead of netif_rx(), to solve the OOO problem occurring on SMP when interrupts are spreaded on several cpus ] We could use the delta between skb->data and skb->head as an estimation of this depth, in order not adding a new skb field ? #define DEPTH_THRESHOLD (NET_SKB_PAD + 64) static inline void netif_reinject(struct sk_buff *skb) { if (skb->data - skb->head < DEPTH_THRESHOLD) netif_receive_skb(skb); else netif_rx(skb); } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Herbert, On 01/02/2012 02:28 AM, Herbert Xu wrote: > On Sun, Jan 01, 2012 at 10:32:34PM -0500, David Ward wrote: >> For IPsec tunnel mode (or BEET mode), after inbound packets are xfrm'ed, >> call the IPv4/IPv6 receive handler directly instead of calling netif_rx. >> In addition to avoiding unneeded re-processing of the MAC layer, packets >> will not be received a second time on network taps. (Note that outbound >> packets are only received on network taps post-xfrm, but inbound packets >> were being received both pre- and post-xfrm. So now network taps will >> receive packets in either direction only once, in the form that they go >> "over the wire".) >> >> Signed-off-by: David Ward<david.ward@ll.mit.edu> >> Cc: Herbert Xu<herbert@gondor.apana.org.au> > You can't do this as this may cause stack overruns if we nest > too deeply. Sorry if I'm missing something, but how are such overruns avoided on the outbound side? > Changing the existing tap processing behaviour will also break > existing setups. Assuming there might be a better way to make this change, are there examples of existing setups that would be negatively affected? From my perspective this behavior is just an unintended artifact of xfrm'ed packets being placed back into netif_rx, which only occurs for inbound packets, and it complicates the usage of network taps on these interfaces (i.e. how do you systematically determine whether any packet is post-xfrm and was already seen in an earlier form?). It seems to me that network taps operate at a lower layer than xfrm, and so xfrm should be invisible to the network taps. If users are, for example, capturing ESP packets from a PF_PACKET socket and want to examine the decrypted payload, I think the capture application should be responsible for the decryption, just as it would be at higher layers with something like SSL/TLS (and again for example, both protocols can be decrypted by Wireshark when provided the keys). I would appreciate your feedback. David
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Mon, 02 Jan 2012 09:18:02 +0100 > if (skb->data - skb->head < DEPTH_THRESHOLD) > netif_receive_skb(skb); Fundamentally I think such things are doomed to failure. I encourage you to instead look into the idea proposed the other year (but unfortunately I found no time to implement) wherein we have a top-level looping structure. The scheme was originally proposed for TX but we can do it just as easily for RX too. Essentially the entity that begins the traversal into the packet send or receive path makes a mark in some per-cpu data structure. When we return to the mark setting spot, we check if any "continued processing" work got queued there, and run it if so, keeping the mark set. Once the queued work is rechecked and found to be all clear, we clear the mark and finish. This has performance benefits too because on both the TX and RX side we'll stop this whole dance where we schedule a SW interrupt and incr all the overhead necessary to do that. It's going to be faster than your threshold test scheme because we'll be using a smaller stack frame and thus get better cache hits there. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jan 02, 2012 at 02:52:36PM -0500, Ward, David - 0663 - MITLL wrote: > > Sorry if I'm missing something, but how are such overruns avoided on the > outbound side? We use tail calls on the output path. > Assuming there might be a better way to make this change, are there > examples of existing setups that would be negatively affected? From my > perspective this behavior is just an unintended artifact of xfrm'ed > packets being placed back into netif_rx, which only occurs for inbound > packets, and it complicates the usage of network taps on these > interfaces (i.e. how do you systematically determine whether any packet > is post-xfrm and was already seen in an earlier form?). It seems to me > that network taps operate at a lower layer than xfrm, and so xfrm should > be invisible to the network taps. If users are, for example, capturing > ESP packets from a PF_PACKET socket and want to examine the decrypted > payload, I think the capture application should be responsible for the > decryption, just as it would be at higher layers with something like > SSL/TLS (and again for example, both protocols can be decrypted by > Wireshark when provided the keys). While I sympathise with your argument, doing it nearly 10 years after this behaviour was implemented is just too dangerous IMHO. Cheers,
diff --git a/include/net/xfrm.h b/include/net/xfrm.h index b203e14..423a779 100644 --- a/include/net/xfrm.h +++ b/include/net/xfrm.h @@ -329,6 +329,7 @@ struct xfrm_state_afinfo { struct sk_buff *skb); int (*extract_output)(struct xfrm_state *x, struct sk_buff *skb); + int (*tunnel_finish)(struct sk_buff *skb); int (*transport_finish)(struct sk_buff *skb, int async); }; @@ -1453,6 +1454,7 @@ extern int xfrm4_extract_header(struct sk_buff *skb); extern int xfrm4_extract_input(struct xfrm_state *x, struct sk_buff *skb); extern int xfrm4_rcv_encap(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type); +extern int xfrm4_tunnel_finish(struct sk_buff *skb); extern int xfrm4_transport_finish(struct sk_buff *skb, int async); extern int xfrm4_rcv(struct sk_buff *skb); @@ -1470,6 +1472,7 @@ extern int xfrm4_tunnel_deregister(struct xfrm_tunnel *handler, unsigned short f extern int xfrm6_extract_header(struct sk_buff *skb); extern int xfrm6_extract_input(struct xfrm_state *x, struct sk_buff *skb); extern int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi); +extern int xfrm6_tunnel_finish(struct sk_buff *skb); extern int xfrm6_transport_finish(struct sk_buff *skb, int async); extern int xfrm6_rcv(struct sk_buff *skb); extern int xfrm6_input_addr(struct sk_buff *skb, xfrm_address_t *daddr, diff --git a/net/ipv4/xfrm4_input.c b/net/ipv4/xfrm4_input.c index 06814b6..4903a01 100644 --- a/net/ipv4/xfrm4_input.c +++ b/net/ipv4/xfrm4_input.c @@ -46,6 +46,11 @@ int xfrm4_rcv_encap(struct sk_buff *skb, int nexthdr, __be32 spi, } EXPORT_SYMBOL(xfrm4_rcv_encap); +int xfrm4_tunnel_finish(struct sk_buff *skb) +{ + return ip_rcv(skb, skb->dev, NULL, skb->dev); +} + int xfrm4_transport_finish(struct sk_buff *skb, int async) { struct iphdr *iph = ip_hdr(skb); diff --git a/net/ipv4/xfrm4_state.c b/net/ipv4/xfrm4_state.c index 9258e75..1931c42 100644 --- a/net/ipv4/xfrm4_state.c +++ b/net/ipv4/xfrm4_state.c @@ -82,6 +82,7 @@ static struct xfrm_state_afinfo xfrm4_state_afinfo = { .output_finish = xfrm4_output_finish, .extract_input = xfrm4_extract_input, .extract_output = xfrm4_extract_output, + .tunnel_finish = xfrm4_tunnel_finish, .transport_finish = xfrm4_transport_finish, }; diff --git a/net/ipv6/xfrm6_input.c b/net/ipv6/xfrm6_input.c index f8c3cf8..dc898a8 100644 --- a/net/ipv6/xfrm6_input.c +++ b/net/ipv6/xfrm6_input.c @@ -29,6 +29,11 @@ int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi) } EXPORT_SYMBOL(xfrm6_rcv_spi); +int xfrm6_tunnel_finish(struct sk_buff *skb) +{ + return ipv6_rcv(skb, skb->dev, NULL, skb->dev); +} + int xfrm6_transport_finish(struct sk_buff *skb, int async) { skb_network_header(skb)[IP6CB(skb)->nhoff] = diff --git a/net/ipv6/xfrm6_state.c b/net/ipv6/xfrm6_state.c index f2d72b8..51d31c3 100644 --- a/net/ipv6/xfrm6_state.c +++ b/net/ipv6/xfrm6_state.c @@ -182,6 +182,7 @@ static struct xfrm_state_afinfo xfrm6_state_afinfo = { .output_finish = xfrm6_output_finish, .extract_input = xfrm6_extract_input, .extract_output = xfrm6_extract_output, + .tunnel_finish = xfrm6_tunnel_finish, .transport_finish = xfrm6_transport_finish, }; diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c index 54a0dc2..571af71 100644 --- a/net/xfrm/xfrm_input.c +++ b/net/xfrm/xfrm_input.c @@ -262,7 +262,9 @@ resume: if (decaps) { skb_dst_drop(skb); - netif_rx(skb); + skb_reset_network_header(skb); + skb_reset_transport_header(skb); + x->inner_mode->afinfo->tunnel_finish(skb); return 0; } else { return x->inner_mode->afinfo->transport_finish(skb, async);
For IPsec tunnel mode (or BEET mode), after inbound packets are xfrm'ed, call the IPv4/IPv6 receive handler directly instead of calling netif_rx. In addition to avoiding unneeded re-processing of the MAC layer, packets will not be received a second time on network taps. (Note that outbound packets are only received on network taps post-xfrm, but inbound packets were being received both pre- and post-xfrm. So now network taps will receive packets in either direction only once, in the form that they go "over the wire".) Signed-off-by: David Ward <david.ward@ll.mit.edu> Cc: Herbert Xu <herbert@gondor.apana.org.au> --- include/net/xfrm.h | 3 +++ net/ipv4/xfrm4_input.c | 5 +++++ net/ipv4/xfrm4_state.c | 1 + net/ipv6/xfrm6_input.c | 5 +++++ net/ipv6/xfrm6_state.c | 1 + net/xfrm/xfrm_input.c | 4 +++- 6 files changed, 18 insertions(+), 1 deletions(-)