Message ID | 20120723.140737.822125500680433996.davem@davemloft.net |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
From: David Miller <davem@davemloft.net> Date: Mon, 23 Jul 2012 14:07:37 -0700 (PDT) > On input packet processing, rt->rt_iif will be zero if we should > use skb->dev->ifindex. > > Since we access rt->rt_iif consistently via inet_iif(), that is > the only spot whose interpretation have to adjust. > > Signed-off-by: David S. Miller <davem@davemloft.net> Julian, as you seem to have feared, this turns out to not work. We zap the skb->dev apparently, so I'll need to come up with a different scheme to handle this. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Julian Anastasov <ja@ssi.bg> Date: Tue, 24 Jul 2012 02:11:38 +0300 (EEST) > skb_iif: ifindex of device we arrived on > > If it has hidden semantic may be we can make this > comment more specific, does it survive some loops? Strangely I hadn't noticed this, and after the email I just sent out I was going to look into adding such a value :-) Great! I'll respin my patches to use that thing, thanks Julian. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, On Mon, 23 Jul 2012, David Miller wrote: > From: David Miller <davem@davemloft.net> > Date: Mon, 23 Jul 2012 14:07:37 -0700 (PDT) > > > On input packet processing, rt->rt_iif will be zero if we should > > use skb->dev->ifindex. > > > > Since we access rt->rt_iif consistently via inet_iif(), that is > > the only spot whose interpretation have to adjust. > > > > Signed-off-by: David S. Miller <davem@davemloft.net> > > Julian, as you seem to have feared, this turns out to not work. > > We zap the skb->dev apparently, so I'll need to come up with > a different scheme to handle this. I was just replying that I don't see any problems with the 4 patches but when you say it, may be the net/sched code will work with output device on forwarding, not input. Is skb_iif a good source? From include/linux/skbuff.h: skb_iif: ifindex of device we arrived on If it has hidden semantic may be we can make this comment more specific, does it survive some loops? Anyways, the idea for new encoding of rt_iif is very good. Regards -- Julian Anastasov <ja@ssi.bg> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: David Miller <davem@davemloft.net> Date: Mon, 23 Jul 2012 16:05:41 -0700 (PDT) > From: Julian Anastasov <ja@ssi.bg> > Date: Tue, 24 Jul 2012 02:11:38 +0300 (EEST) > >> skb_iif: ifindex of device we arrived on >> >> If it has hidden semantic may be we can make this >> comment more specific, does it survive some loops? > > Strangely I hadn't noticed this, and after the email I just sent > out I was going to look into adding such a value :-) > > Great! > > I'll respin my patches to use that thing, thanks Julian. Hmmm, the problem is that when we decapsulate VLAN devices, we're left with the parent device's index in skb->skb_iif. That's what all of that "orig_dev" logic in __netif_receive_skb() is all about. The skb->dev is rewritten by vlan_do_receive() for that case. I wonder if we should just get rid of all of that orig_dev logic and simply update skb->skb_iif every time we hit the code starting at label "another_round" I'll keep looking into this. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, On Mon, 23 Jul 2012, David Miller wrote: > Hmmm, the problem is that when we decapsulate VLAN devices, we're left > with the parent device's index in skb->skb_iif. Not sure if it is a problem with VLANs, can be also with some virtual devices but may be they use dev_forward_skb() where skb_iif is zeroed. > That's what all of that "orig_dev" logic in __netif_receive_skb() is > all about. The skb->dev is rewritten by vlan_do_receive() for that > case. This is the part that I didn't know for skb_iif. I was also worrying about ip_mc_output looping packets with skb_iif because skb_clone copies the field but may be such loops happen only for locally originated traffic where skb_iif starts with 0. Still, the comment in ipmr_queue_xmit() and the IPSKB_FORWARDED bit puzzles me: * RFC1584 teaches, that DVMRP/PIM router must deliver packets locally * not only before forwarding, but after forwarding on all output * interfaces. It is clear, if mrouter runs a multicasting ip_mr_forward preserves one copy to local app, calls ipmr_queue_xmit() where packet can be looped? ip_mr_input(): /* Packet is looped back after forward, it should not be * forwarded second time, but still can be delivered locally. */ I really hope the case is for locally originated multicast, not for packets received from network. > I wonder if we should just get rid of all of that orig_dev logic and > simply update skb->skb_iif every time we hit the code starting at > label "another_round" I don't have opinion about this problem. I see your new change for __netif_receive_skb, the skb_iif places are something that I have to check for me... Regards -- Julian Anastasov <ja@ssi.bg> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Julian Anastasov <ja@ssi.bg> Date: Tue, 24 Jul 2012 03:24:38 +0300 (EEST) > On Mon, 23 Jul 2012, David Miller wrote: > >> Hmmm, the problem is that when we decapsulate VLAN devices, we're left >> with the parent device's index in skb->skb_iif. > > Not sure if it is a problem with VLANs, can be also > with some virtual devices but may be they use dev_forward_skb() > where skb_iif is zeroed. dev_forward_skb() gives the packet to netif_rx() which will thus send the packet down to __netif_receive_skb() which will set the skb->skb_iif to the new device's ifindex. It will not stay at zero :-) > I was also worrying about ip_mc_output looping packets with > skb_iif because skb_clone copies the field but may be such loops > happen only for locally originated traffic where skb_iif starts with > 0. Loopback of multicast packets is done in ip_mc_output(), via the clone that you mention, via dev_loopback_xmit(). dev_loopback_xmit() gives the packet to netif_rx_ni() which again lands it back at __netif_receive_skb(), which (with my changes) will adjust the skb->skb_iif to match whatever sits in skb->dev at the time. It should be alright. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, On Mon, 23 Jul 2012, David Miller wrote: > From: Julian Anastasov <ja@ssi.bg> > Date: Tue, 24 Jul 2012 03:24:38 +0300 (EEST) > > > On Mon, 23 Jul 2012, David Miller wrote: > > > >> Hmmm, the problem is that when we decapsulate VLAN devices, we're left > >> with the parent device's index in skb->skb_iif. > > > > Not sure if it is a problem with VLANs, can be also > > with some virtual devices but may be they use dev_forward_skb() > > where skb_iif is zeroed. > > dev_forward_skb() gives the packet to netif_rx() which will thus send > the packet down to __netif_receive_skb() which will set the > skb->skb_iif to the new device's ifindex. It will not stay at zero > :-) > > > I was also worrying about ip_mc_output looping packets with > > skb_iif because skb_clone copies the field but may be such loops > > happen only for locally originated traffic where skb_iif starts with > > 0. > > Loopback of multicast packets is done in ip_mc_output(), via > the clone that you mention, via dev_loopback_xmit(). > > dev_loopback_xmit() gives the packet to netif_rx_ni() which again > lands it back at __netif_receive_skb(), which (with my changes) > will adjust the skb->skb_iif to match whatever sits in skb->dev > at the time. > > It should be alright. Yes, I know that now it is set in all cases, I was wondering why __netif_receive_skb was prepared to see non-zero value. Please also check xfrm4_fill_dst, the possible case where the new skb_iif usage can differ from rt_iif is for forwarded packets where skb->dst is changed with new version, for example: ip_forward -> *xfrm*_policy_check* -> __xfrm_decode_session -> _decode_session4 memsets flowi4_iif (now to get skb_iif). ip_forward -> xfrm4_route_forward -> ... xfrm_lookup -> ... -> xfrm4_fill_dst now will use skb_iif ? XFRM is still magic for me and I'm not sure if we need some other device instead of skb_iif. Should we prefer dst->dev for output routes? For example: static inline int inet_iif(const struct sk_buff *skb) { struct rtable *rt = skb_rtable(skb); int iif = rt->rt_iif; if (iif) return iif; if (rt_is_output_route(rt)) return rt->dst.dev->ifindex; return skb->skb_iif; } but may be it is same if we restore the rth->rt_iif = orig_oif ? : dev_out->ifindex; code in __mkroute_output? By this way we will have correct rt_iif at the time of routing call, before the packet is looped via this device. Then skb_iif will remain as default only for input routes. By this way we will be less worried for places like xfrm or when skb->dst is replaced in forwarding path. IPIP tunnels also replace skb->dst but this can hurt only code like route4_classify. Regards -- Julian Anastasov <ja@ssi.bg> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le 24/07/2012 01:14, David Miller a écrit : [...] > I wonder if we should just get rid of all of that orig_dev logic and > simply update skb->skb_iif every time we hit the code starting at > label "another_round" It clearly depends on the exact meaning of orig_dev. When we studied the usage of orig_dev before removing it from bonding, it was clear that two different meanings existed: - From the bonding point of view, is was "the device one level below current device" (the slave, from the master's point of view). - From the af_packet point of view, is was "the real original device that received the packet". As bonding don't use orig_dev anymore, the remaining meaning should logically be "the real original device that received the packet". But as __netif_receive_skb() is recursively called in many cases, setting orig_dev to something new every time, this meaning is probably mostly inconsistent. As such, it sounds appropriate to remove orig_dev and use skb_iif instead. Nicolas. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Nicolas de Pesloüan <nicolas.2p.debian@gmail.com> Date: Wed, 25 Jul 2012 00:13:55 +0200 > - From the af_packet point of view, is was "the real original device > - that received the packet". > > As bonding don't use orig_dev anymore, the remaining meaning should > logically be "the real original device that received the packet". But > as __netif_receive_skb() is recursively called in many cases, setting > orig_dev to something new every time, this meaning is probably mostly > inconsistent. As such, it sounds appropriate to remove orig_dev and > use skb_iif instead. I don't think we can, otherwise people who set po->origdev will no longer get what they expect. For the simpler cases of bonding and VLANs, it does currently behaved as expected. That's why I left it alone. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le 25/07/2012 00:18, David Miller a écrit : > From: Nicolas de Pesloüan<nicolas.2p.debian@gmail.com> > Date: Wed, 25 Jul 2012 00:13:55 +0200 > >> - From the af_packet point of view, is was "the real original device >> - that received the packet". >> >> As bonding don't use orig_dev anymore, the remaining meaning should >> logically be "the real original device that received the packet". But >> as __netif_receive_skb() is recursively called in many cases, setting >> orig_dev to something new every time, this meaning is probably mostly >> inconsistent. As such, it sounds appropriate to remove orig_dev and >> use skb_iif instead. > > I don't think we can, otherwise people who set po->origdev will no > longer get what they expect. > > For the simpler cases of bonding and VLANs, it does currently behaved > as expected. > > That's why I left it alone. Do they get what they expect when stacking interfaces? __netif_receive_skb starts with orig_dev = skb->dev. So when calling __netif_receive_skb recursively, after changing skb->dev, they get the packet several times, with a different orig_dev value? Any way, both looks good to me. Nicolas. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/net/route.h b/include/net/route.h index 60d611d..d418ae1 100644 --- a/include/net/route.h +++ b/include/net/route.h @@ -277,7 +277,11 @@ static inline struct rtable *ip_route_newports(struct flowi4 *fl4, struct rtable static inline int inet_iif(const struct sk_buff *skb) { - return skb_rtable(skb)->rt_iif; + int iif = skb_rtable(skb)->rt_iif; + + if (iif) + return iif; + return skb->dev->ifindex; } extern int sysctl_ip_default_ttl; diff --git a/net/ipv4/route.c b/net/ipv4/route.c index f6be781..6bcb8fc 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1309,7 +1309,7 @@ static int ip_route_input_mc(struct sk_buff *skb, __be32 daddr, __be32 saddr, rth->rt_flags = RTCF_MULTICAST; rth->rt_type = RTN_MULTICAST; rth->rt_is_input= 1; - rth->rt_iif = dev->ifindex; + rth->rt_iif = 0; rth->rt_pmtu = 0; rth->rt_gateway = 0; if (our) { @@ -1435,7 +1435,7 @@ static int __mkroute_input(struct sk_buff *skb, rth->rt_flags = flags; rth->rt_type = res->type; rth->rt_is_input = 1; - rth->rt_iif = in_dev->dev->ifindex; + rth->rt_iif = 0; rth->rt_pmtu = 0; rth->rt_gateway = 0; @@ -1608,7 +1608,7 @@ local_input: rth->rt_flags = flags|RTCF_LOCAL; rth->rt_type = res.type; rth->rt_is_input = 1; - rth->rt_iif = dev->ifindex; + rth->rt_iif = 0; rth->rt_pmtu = 0; rth->rt_gateway = 0; if (res.type == RTN_UNREACHABLE) { @@ -1772,7 +1772,7 @@ static struct rtable *__mkroute_output(const struct fib_result *res, rth->rt_flags = flags; rth->rt_type = type; rth->rt_is_input = 0; - rth->rt_iif = orig_oif ? : dev_out->ifindex; + rth->rt_iif = orig_oif ? : 0; rth->rt_pmtu = 0; rth->rt_gateway = 0;
On input packet processing, rt->rt_iif will be zero if we should use skb->dev->ifindex. Since we access rt->rt_iif consistently via inet_iif(), that is the only spot whose interpretation have to adjust. Signed-off-by: David S. Miller <davem@davemloft.net> --- include/net/route.h | 6 +++++- net/ipv4/route.c | 8 ++++---- 2 files changed, 9 insertions(+), 5 deletions(-)