diff mbox

[RFC,davem] revert: net: Make skb->skb_iif always track skb->dev

Message ID 50F1699E.1000200@hartkopp.net
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Oliver Hartkopp Jan. 12, 2013, 1:48 p.m. UTC
Hello Dave,

in your below patch from 23 Jul 2012 you removed the check for an already set
value of skb_iif in net/core/dev.c

I'm currently working on a solution to prevent some routed CAN frames to be
sent back onto the originating network device.

With your patch it is not possible anymore to check on which netdev the
CAN frame has originally been received, as for every routing the frame
goes through netif_receive_skb(), which hard sets

	skb->skb_iif = skb->dev->ifindex

and therefore kills the original incoming interface index.

To me it is not clear why skb_iff is needed anyway as the value should
always be available via skb->dev->ifindex, right?

But if skb_iff has any right to exist it should contain the first incoming
interface on the host IMO.

Please correct my if i'm wrong and/or tell me what your commit message means
in respect to my request and why skb->dev->ifindex is not used instead of
skb_iif. I feel somehow lost about the skb_iif intention ...

Best regards,
Oliver

---


http://git.kernel.org/?p=linux/kernel/git/davem/net-next.git;a=commitdiff;h=b68581778cd0051a3fb9a2b614dee7eccb5127ff

net: Make skb->skb_iif always track skb->dev

Make it follow device decapsulation, from things such as VLAN and
bonding.

The stuff that actually cares about pre-demuxed device pointers, is
handled by the "orig_dev" variable in __netif_receive_skb().  And
the only consumer of that is the po->origdev feature of AF_PACKET
sockets.

Signed-off-by: David S. Miller <davem@davemloft.net>
---



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Jiri Pirko Jan. 12, 2013, 6:13 p.m. UTC | #1
Sat, Jan 12, 2013 at 02:48:14PM CET, socketcan@hartkopp.net wrote:
>Hello Dave,
>
>in your below patch from 23 Jul 2012 you removed the check for an already set
>value of skb_iif in net/core/dev.c
>
>I'm currently working on a solution to prevent some routed CAN frames to be
>sent back onto the originating network device.

Hm, I'm not sure where exactly you want to use this information, but
can_rcv() can get it from orig_dev->ifindex

>
>With your patch it is not possible anymore to check on which netdev the
>CAN frame has originally been received, as for every routing the frame
>goes through netif_receive_skb(), which hard sets
>
>	skb->skb_iif = skb->dev->ifindex
>
>and therefore kills the original incoming interface index.
>
>To me it is not clear why skb_iff is needed anyway as the value should
>always be available via skb->dev->ifindex, right?
>
>But if skb_iff has any right to exist it should contain the first incoming
>interface on the host IMO.
>
>Please correct my if i'm wrong and/or tell me what your commit message means
>in respect to my request and why skb->dev->ifindex is not used instead of
>skb_iif. I feel somehow lost about the skb_iif intention ...

I believe that skb_iif should be removed.

>
>Best regards,
>Oliver
>
>---
>
>
>http://git.kernel.org/?p=linux/kernel/git/davem/net-next.git;a=commitdiff;h=b68581778cd0051a3fb9a2b614dee7eccb5127ff
>
>net: Make skb->skb_iif always track skb->dev
>
>Make it follow device decapsulation, from things such as VLAN and
>bonding.
>
>The stuff that actually cares about pre-demuxed device pointers, is
>handled by the "orig_dev" variable in __netif_receive_skb().  And
>the only consumer of that is the po->origdev feature of AF_PACKET
>sockets.
>
>Signed-off-by: David S. Miller <davem@davemloft.net>
>---
>
>diff --git a/net/core/dev.c b/net/core/dev.c
>index cca02ae..0ebaea1 100644
>--- a/net/core/dev.c
>+++ b/net/core/dev.c
>@@ -3173,8 +3173,6 @@ static int __netif_receive_skb(struct sk_buff *skb)
> 	if (netpoll_receive_skb(skb))
> 		return NET_RX_DROP;
> 
>-	if (!skb->skb_iif)
>-		skb->skb_iif = skb->dev->ifindex;
> 	orig_dev = skb->dev;
> 
> 	skb_reset_network_header(skb);
>@@ -3186,6 +3184,7 @@ static int __netif_receive_skb(struct sk_buff *skb)
> 	rcu_read_lock();
> 
> another_round:
>+	skb->skb_iif = skb->dev->ifindex;
> 
> 	__this_cpu_inc(softnet_data.processed);
> 
>
>
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oliver Hartkopp Jan. 12, 2013, 6:40 p.m. UTC | #2
Hello Jiri,

On 12.01.2013 19:13, Jiri Pirko wrote:

> Sat, Jan 12, 2013 at 02:48:14PM CET, socketcan@hartkopp.net wrote:
>> Hello Dave,
>>
>> in your below patch from 23 Jul 2012 you removed the check for an already set
>> value of skb_iif in net/core/dev.c
>>
>> I'm currently working on a solution to prevent some routed CAN frames to be
>> sent back onto the originating network device.
> 
> Hm, I'm not sure where exactly you want to use this information, but
> can_rcv() can get it from orig_dev->ifindex


No - it's not in can_rcv() ...

Depending on the filter lists in can_rcv() the received skb is passed to the
function can_can_gw_rcv() in net/can/gw.c.

An there i wanted to add this code to omit sending the CAN frame on the
originating interface:

@@ .. @@ static void can_can_gw_rcv(struct sk_buff *skb, void *data

        if (!(gwj->dst.dev->flags & IFF_UP)) {
                gwj->dropped_frames++;
                return;
        }

+       /* is sending the skb back to the incoming interface allowed? */
+       if (!(gwj->flags & CGW_FLAGS_CAN_IIF_TX_OK) &&
+           skb->skb_iif == gwj->dst.dev->ifindex)
+               return;
+
        /*
         * clone the given skb, which has not been done in can_rcv()
         *

This works fine, when the patch from Dave is reverted.

I did not find any good solution to preserve the originating netdev over
several netif_receive_skb() calls - but this skb_iif which has been made
unusable ...

(..)

>>
>> To me it is not clear why skb_iff is needed anyway as the value should
>> always be available via skb->dev->ifindex, right?
>>
>> But if skb_iff has any right to exist it should contain the first incoming
>> interface on the host IMO.
>>
>> Please correct my if i'm wrong and/or tell me what your commit message means
>> in respect to my request and why skb->dev->ifindex is not used instead of
>> skb_iif. I feel somehow lost about the skb_iif intention ...
> 
> I believe that skb_iif should be removed.


AFAICS skb->skb_iif is used as some kind of cached variable for the original
skb->dev->ifindex - or is it really possible that skb->skb_iif is set while
skb->dev->ifindex is not accessible?

Btw. i my case skb_iif would make sense though.

Regards,
Oliver

>>
>> ---
>>
>>
>> http://git.kernel.org/?p=linux/kernel/git/davem/net-next.git;a=commitdiff;h=b68581778cd0051a3fb9a2b614dee7eccb5127ff
>>
>> net: Make skb->skb_iif always track skb->dev
>>
>> Make it follow device decapsulation, from things such as VLAN and
>> bonding.
>>
>> The stuff that actually cares about pre-demuxed device pointers, is
>> handled by the "orig_dev" variable in __netif_receive_skb().  And
>> the only consumer of that is the po->origdev feature of AF_PACKET
>> sockets.
>>
>> Signed-off-by: David S. Miller <davem@davemloft.net>
>> ---
>>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index cca02ae..0ebaea1 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -3173,8 +3173,6 @@ static int __netif_receive_skb(struct sk_buff *skb)
>> 	if (netpoll_receive_skb(skb))
>> 		return NET_RX_DROP;
>>
>> -	if (!skb->skb_iif)
>> -		skb->skb_iif = skb->dev->ifindex;
>> 	orig_dev = skb->dev;
>>
>> 	skb_reset_network_header(skb);
>> @@ -3186,6 +3184,7 @@ static int __netif_receive_skb(struct sk_buff *skb)
>> 	rcu_read_lock();
>>
>> another_round:
>> +	skb->skb_iif = skb->dev->ifindex;
>>
>> 	__this_cpu_inc(softnet_data.processed);
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko Jan. 12, 2013, 7:37 p.m. UTC | #3
Sat, Jan 12, 2013 at 07:40:33PM CET, socketcan@hartkopp.net wrote:
>Hello Jiri,
>
>On 12.01.2013 19:13, Jiri Pirko wrote:
>
>> Sat, Jan 12, 2013 at 02:48:14PM CET, socketcan@hartkopp.net wrote:
>>> Hello Dave,
>>>
>>> in your below patch from 23 Jul 2012 you removed the check for an already set
>>> value of skb_iif in net/core/dev.c
>>>
>>> I'm currently working on a solution to prevent some routed CAN frames to be
>>> sent back onto the originating network device.
>> 
>> Hm, I'm not sure where exactly you want to use this information, but
>> can_rcv() can get it from orig_dev->ifindex
>
>
>No - it's not in can_rcv() ...
>
>Depending on the filter lists in can_rcv() the received skb is passed to the
>function can_can_gw_rcv() in net/can/gw.c.
>
>An there i wanted to add this code to omit sending the CAN frame on the
>originating interface:
>
>@@ .. @@ static void can_can_gw_rcv(struct sk_buff *skb, void *data
>
>        if (!(gwj->dst.dev->flags & IFF_UP)) {
>                gwj->dropped_frames++;
>                return;
>        }
>
>+       /* is sending the skb back to the incoming interface allowed? */
>+       if (!(gwj->flags & CGW_FLAGS_CAN_IIF_TX_OK) &&
>+           skb->skb_iif == gwj->dst.dev->ifindex)
>+               return;
>+
>        /*
>         * clone the given skb, which has not been done in can_rcv()
>         *
>
>This works fine, when the patch from Dave is reverted.
>
>I did not find any good solution to preserve the originating netdev over
>several netif_receive_skb() calls - but this skb_iif which has been made
>unusable ...



Well, look at struct receiver in net/can/af_can.h:

struct receiver {
        struct hlist_node list;
        struct rcu_head rcu;
        canid_t can_id;
        canid_t mask;
        unsigned long matches;
        void (*func)(struct sk_buff *, void *);
        void *data;
        char *ident;
};

your can_can_gw_rcv() is callback registered as ->func here.
This ->func is called from chain can_rcv->can_receive->can_rcv_filter->deliver

So just extend
->func to something like:

        void (*func)(struct sk_buff *, struct neti_device *orig_dev, void *);
and pass the orig_dev all the way through the chain.



>
>(..)
>
>>>
>>> To me it is not clear why skb_iff is needed anyway as the value should
>>> always be available via skb->dev->ifindex, right?
>>>
>>> But if skb_iff has any right to exist it should contain the first incoming
>>> interface on the host IMO.
>>>
>>> Please correct my if i'm wrong and/or tell me what your commit message means
>>> in respect to my request and why skb->dev->ifindex is not used instead of
>>> skb_iif. I feel somehow lost about the skb_iif intention ...
>> 
>> I believe that skb_iif should be removed.
>
>
>AFAICS skb->skb_iif is used as some kind of cached variable for the original
>skb->dev->ifindex - or is it really possible that skb->skb_iif is set while
>skb->dev->ifindex is not accessible?
>
>Btw. i my case skb_iif would make sense though.
>
>Regards,
>Oliver
>
>>>
>>> ---
>>>
>>>
>>> http://git.kernel.org/?p=linux/kernel/git/davem/net-next.git;a=commitdiff;h=b68581778cd0051a3fb9a2b614dee7eccb5127ff
>>>
>>> net: Make skb->skb_iif always track skb->dev
>>>
>>> Make it follow device decapsulation, from things such as VLAN and
>>> bonding.
>>>
>>> The stuff that actually cares about pre-demuxed device pointers, is
>>> handled by the "orig_dev" variable in __netif_receive_skb().  And
>>> the only consumer of that is the po->origdev feature of AF_PACKET
>>> sockets.
>>>
>>> Signed-off-by: David S. Miller <davem@davemloft.net>
>>> ---
>>>
>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>> index cca02ae..0ebaea1 100644
>>> --- a/net/core/dev.c
>>> +++ b/net/core/dev.c
>>> @@ -3173,8 +3173,6 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>> 	if (netpoll_receive_skb(skb))
>>> 		return NET_RX_DROP;
>>>
>>> -	if (!skb->skb_iif)
>>> -		skb->skb_iif = skb->dev->ifindex;
>>> 	orig_dev = skb->dev;
>>>
>>> 	skb_reset_network_header(skb);
>>> @@ -3186,6 +3184,7 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>> 	rcu_read_lock();
>>>
>>> another_round:
>>> +	skb->skb_iif = skb->dev->ifindex;
>>>
>>> 	__this_cpu_inc(softnet_data.processed);
>>>
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oliver Hartkopp Jan. 12, 2013, 8:14 p.m. UTC | #4
On 12.01.2013 20:37, Jiri Pirko wrote:

> Sat, Jan 12, 2013 at 07:40:33PM CET, socketcan@hartkopp.net wrote:


>> An there i wanted to add this code to omit sending the CAN frame on the
>> originating interface:
>>
>> @@ .. @@ static void can_can_gw_rcv(struct sk_buff *skb, void *data
>>
>>        if (!(gwj->dst.dev->flags & IFF_UP)) {
>>                gwj->dropped_frames++;
>>                return;
>>        }
>>
>> +       /* is sending the skb back to the incoming interface allowed? */
>> +       if (!(gwj->flags & CGW_FLAGS_CAN_IIF_TX_OK) &&
>> +           skb->skb_iif == gwj->dst.dev->ifindex)
>> +               return;
>> +
>>        /*
>>         * clone the given skb, which has not been done in can_rcv()
>>         *
>>
>> This works fine, when the patch from Dave is reverted.
>>
>> I did not find any good solution to preserve the originating netdev over
>> several netif_receive_skb() calls - but this skb_iif which has been made
>> unusable ...
> 
> 
> 
> Well, look at struct receiver in net/can/af_can.h:
> 
> struct receiver {
>         struct hlist_node list;
>         struct rcu_head rcu;
>         canid_t can_id;
>         canid_t mask;
>         unsigned long matches;
>         void (*func)(struct sk_buff *, void *);
>         void *data;
>         char *ident;
> };
> 
> your can_can_gw_rcv() is callback registered as ->func here.
> This ->func is called from chain can_rcv->can_receive->can_rcv_filter->deliver
> 
> So just extend
> ->func to something like:
> 
>         void (*func)(struct sk_buff *, struct neti_device *orig_dev, void *);
> and pass the orig_dev all the way through the chain.
> 


Passing the information up to the can-gw once is not the problem.

But when this skb has to be routed to another CAN netdev it is cloned and goes
down from can_can_gw_rcv() to
-> can_send(cloned_skb) -> dev_queue_xmit(cloned_skb)

And when it has been sent successfully on the CAN bus the cloned_skb is echoed
back into the system via netif_rx_ni(cloned_skb).
(see http://lxr.linux.no/#linux+v3.7.2/Documentation/networking/can.txt#L177 )

This entire path - using dev_queue_xmit() / netif_rx_ni() - is reduced to a
skb structure and can not deal with any orig_dev pointer.

Therefore storing the first incoming interface in skb_iif is relevant for this
use-case.

Regards,
Oliver
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Jan. 12, 2013, 9:23 p.m. UTC | #5
From: Oliver Hartkopp <socketcan@hartkopp.net>
Date: Sat, 12 Jan 2013 14:48:14 +0100

> To me it is not clear why skb_iff is needed anyway as the value should
> always be available via skb->dev->ifindex, right?

But all the code uses skb_iif, in particular the ipv4 routing
lookups use that as the key.

It absolutely must follow whatever is skb->dev, it is a hard
invariant.

I am not reverting this change.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Jan. 12, 2013, 9:36 p.m. UTC | #6
From: David Miller <davem@davemloft.net>
Date: Sat, 12 Jan 2013 13:23:16 -0800 (PST)

> From: Oliver Hartkopp <socketcan@hartkopp.net>
> Date: Sat, 12 Jan 2013 14:48:14 +0100
> 
>> To me it is not clear why skb_iff is needed anyway as the value should
>> always be available via skb->dev->ifindex, right?
> 
> But all the code uses skb_iif, in particular the ipv4 routing
> lookups use that as the key.
> 
> It absolutely must follow whatever is skb->dev, it is a hard
> invariant.
> 
> I am not reverting this change.

More information, because I can't believe how idiotic and
ignorant people are being able this issue.

skb->dev->ifindex IS NOT the same as skb->skb_iif

Why don't you put a test into tcp_recvmsg() for packets being removed
from the socket's receive queue and see if skb->dev->ifindex is the
same as skb->skb_iif

Surprise, skb->dev is going to be NULL at that point.

Why?

Because on packet receive we don't take references on devices we hook
into skb->dev, therefore we cannot let that pointer escape the
software interrupt packet input paths.

Therefore, as a bug trap, TCP input will set skb->dev to NULL.

The only valid way to figure out the final demuxed device the packet
arrived on, is therefore, via skb->skb_iif.

As per your problem with CAN, that's also rediculous.  You have an SKB
control block in skb->cb[] that you can put whatever values with
whatever semantics you want.

Use it.

I'm not discussing this any further.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oliver Hartkopp Jan. 13, 2013, 11:01 a.m. UTC | #7
On 12.01.2013 22:36, David Miller wrote:

> As per your problem with CAN, that's also rediculous.  You have an SKB
> control block in skb->cb[] that you can put whatever values with
> whatever semantics you want.
> 
> Use it.


I'm not writing a RFC to you, when i'm not sure having checked several options
before.

In the tx path already net/sched is using cb[] for it's purposes.
Adding the information somewhere at the end of cb[] to not interfere with
sched becomes tricky. See users of qdisc_cb_private_validate().

So the 'incoming interface index' could be stored safely in the tx path in

- skb->data
- skb_shared_info
- skb_iif

skb_iif was the obvious choice then. And the question why a seven year old
if-statement in net/core/dev.c has been removed must be allowed.

Networking is not only IP networking.

Oliver
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Jan. 13, 2013, 1:20 p.m. UTC | #8
From: Oliver Hartkopp <socketcan@hartkopp.net>
Date: Sun, 13 Jan 2013 12:01:53 +0100

> Networking is not only IP networking.

But it is who the majority of the infrastructure is goint to be
optimized for, and this is never going to change.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index cca02ae..0ebaea1 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3173,8 +3173,6 @@  static int __netif_receive_skb(struct sk_buff *skb)
 	if (netpoll_receive_skb(skb))
 		return NET_RX_DROP;
 
-	if (!skb->skb_iif)
-		skb->skb_iif = skb->dev->ifindex;
 	orig_dev = skb->dev;
 
 	skb_reset_network_header(skb);
@@ -3186,6 +3184,7 @@  static int __netif_receive_skb(struct sk_buff *skb)
 	rcu_read_lock();
 
 another_round:
+	skb->skb_iif = skb->dev->ifindex;
 
 	__this_cpu_inc(softnet_data.processed);