diff mbox

[net-next,v2,2/2] netlink: specify netlink packet direction for nlmon

Message ID 1387788519-17722-3-git-send-email-dborkman@redhat.com
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Daniel Borkmann Dec. 23, 2013, 8:48 a.m. UTC
In order to facilitate development for netlink protocol dissector,
fill the unused field skb->pkt_type of the cloned skb with a hint
of the address space of the new owner (receiver) socket in the
notion of "to kernel" resp. "to user".

At the time we invoke __netlink_deliver_tap_skb(), we already have
set the new skb owner via netlink_skb_set_owner_r(), so we can use
that for netlink_is_kernel() probing.

In normal PF_PACKET network traffic, this field denotes if the
packet is destined for us (PACKET_HOST), if it's broadcast
(PACKET_BROADCAST), etc.

As we only have 3 bit reserved, we can use the value (= 6) of
PACKET_FASTROUTE as it's _not used_ anywhere in the whole kernel
and packets of such type were never exposed to user space, so
there are no overlapping users of such kind. Thus, as wished,
that seems the only way to make both PACKET_* values non-overlapping
and therefore device agnostic.

By using those two flags for netlink skbs on nlmon devices, they
can be made available and picked up via sll_pkttype (previously
unused in netlink context) in struct sockaddr_ll. We now have
these two directions:

 - PACKET_USER (= 6)    ->  to user space
 - PACKET_KERNEL (= 7)  ->  to kernel space

Partial `ip a` example strace for sa_family=AF_NETLINK with
detected nl msg direction:

syscall:                     direction:
sendto(3,  ...) = 40         /* to kernel */
recvmsg(3, ...) = 3404       /* to user */
recvmsg(3, ...) = 1120       /* to user */
recvmsg(3, ...) = 20         /* to user */
sendto(3,  ...) = 40         /* to kernel */
recvmsg(3, ...) = 168        /* to user */
recvmsg(3, ...) = 144        /* to user */
recvmsg(3, ...) = 20         /* to user */

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Jakub Zawadzki <darkjames-ws@darkjames.pl>
---
 v1->v2:
  - let PACKET_* values not overlap as wished by Dave

 include/uapi/linux/if_packet.h | 4 +++-
 net/netlink/af_netlink.c       | 2 ++
 2 files changed, 5 insertions(+), 1 deletion(-)

Comments

Nicolas Dichtel Dec. 23, 2013, 10:43 a.m. UTC | #1
Le 23/12/2013 09:48, Daniel Borkmann a écrit :
> In order to facilitate development for netlink protocol dissector,
> fill the unused field skb->pkt_type of the cloned skb with a hint
> of the address space of the new owner (receiver) socket in the
> notion of "to kernel" resp. "to user".
>
> At the time we invoke __netlink_deliver_tap_skb(), we already have
> set the new skb owner via netlink_skb_set_owner_r(), so we can use
> that for netlink_is_kernel() probing.
>
> In normal PF_PACKET network traffic, this field denotes if the
> packet is destined for us (PACKET_HOST), if it's broadcast
> (PACKET_BROADCAST), etc.
>
> As we only have 3 bit reserved, we can use the value (= 6) of
> PACKET_FASTROUTE as it's _not used_ anywhere in the whole kernel
> and packets of such type were never exposed to user space, so
> there are no overlapping users of such kind. Thus, as wished,
> that seems the only way to make both PACKET_* values non-overlapping
> and therefore device agnostic.
>
> By using those two flags for netlink skbs on nlmon devices, they
> can be made available and picked up via sll_pkttype (previously
> unused in netlink context) in struct sockaddr_ll. We now have
> these two directions:
>
>   - PACKET_USER (= 6)    ->  to user space
>   - PACKET_KERNEL (= 7)  ->  to kernel space
>
> Partial `ip a` example strace for sa_family=AF_NETLINK with
> detected nl msg direction:
>
> syscall:                     direction:
> sendto(3,  ...) = 40         /* to kernel */
> recvmsg(3, ...) = 3404       /* to user */
> recvmsg(3, ...) = 1120       /* to user */
> recvmsg(3, ...) = 20         /* to user */
> sendto(3,  ...) = 40         /* to kernel */
> recvmsg(3, ...) = 168        /* to user */
> recvmsg(3, ...) = 144        /* to user */
> recvmsg(3, ...) = 20         /* to user */
>
> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
> Signed-off-by: Jakub Zawadzki <darkjames-ws@darkjames.pl>
> ---
>   v1->v2:
>    - let PACKET_* values not overlap as wished by Dave
>
>   include/uapi/linux/if_packet.h | 4 +++-
>   net/netlink/af_netlink.c       | 2 ++
>   2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/if_packet.h b/include/uapi/linux/if_packet.h
> index e9d844c..06e2a28 100644
> --- a/include/uapi/linux/if_packet.h
> +++ b/include/uapi/linux/if_packet.h
> @@ -26,8 +26,10 @@ struct sockaddr_ll {
>   #define PACKET_MULTICAST	2		/* To group		*/
>   #define PACKET_OTHERHOST	3		/* To someone else 	*/
>   #define PACKET_OUTGOING		4		/* Outgoing of any type */
> -/* These ones are invisible by user level */
>   #define PACKET_LOOPBACK		5		/* MC/BRD frame looped back */
> +#define PACKET_USER		6		/* To user space	*/
Reusing this value is like changing the API. If some userland apps and external
modules rely on it, this patch may break them.

> +#define PACKET_KERNEL		7		/* To kernel space	*/
> +/* Unused, PACKET_FASTROUTE and PACKET_LOOPBACK are invisble to user space */
nitpicking: s/invisble/invisible

>   #define PACKET_FASTROUTE	6		/* Fastrouted frame	*/
>
>   /* Packet socket options */
> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> index 56e09d8..3f75f1c 100644
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -204,6 +204,8 @@ static int __netlink_deliver_tap_skb(struct sk_buff *skb,
>   	if (nskb) {
>   		nskb->dev = dev;
>   		nskb->protocol = htons((u16) sk->sk_protocol);
> +		nskb->pkt_type = netlink_is_kernel(sk) ?
> +				 PACKET_KERNEL : PACKET_USER;
>
>   		ret = dev_queue_xmit(nskb);
>   		if (unlikely(ret > 0))
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Borkmann Dec. 23, 2013, 10:46 a.m. UTC | #2
On 12/23/2013 11:43 AM, Nicolas Dichtel wrote:
> Le 23/12/2013 09:48, Daniel Borkmann a écrit :
>> In order to facilitate development for netlink protocol dissector,
>> fill the unused field skb->pkt_type of the cloned skb with a hint
>> of the address space of the new owner (receiver) socket in the
>> notion of "to kernel" resp. "to user".
>>
>> At the time we invoke __netlink_deliver_tap_skb(), we already have
>> set the new skb owner via netlink_skb_set_owner_r(), so we can use
>> that for netlink_is_kernel() probing.
>>
>> In normal PF_PACKET network traffic, this field denotes if the
>> packet is destined for us (PACKET_HOST), if it's broadcast
>> (PACKET_BROADCAST), etc.
>>
>> As we only have 3 bit reserved, we can use the value (= 6) of
>> PACKET_FASTROUTE as it's _not used_ anywhere in the whole kernel
>> and packets of such type were never exposed to user space, so
>> there are no overlapping users of such kind. Thus, as wished,
>> that seems the only way to make both PACKET_* values non-overlapping
>> and therefore device agnostic.
>>
>> By using those two flags for netlink skbs on nlmon devices, they
>> can be made available and picked up via sll_pkttype (previously
>> unused in netlink context) in struct sockaddr_ll. We now have
>> these two directions:
>>
>>   - PACKET_USER (= 6)    ->  to user space
>>   - PACKET_KERNEL (= 7)  ->  to kernel space
>>
>> Partial `ip a` example strace for sa_family=AF_NETLINK with
>> detected nl msg direction:
>>
>> syscall:                     direction:
>> sendto(3,  ...) = 40         /* to kernel */
>> recvmsg(3, ...) = 3404       /* to user */
>> recvmsg(3, ...) = 1120       /* to user */
>> recvmsg(3, ...) = 20         /* to user */
>> sendto(3,  ...) = 40         /* to kernel */
>> recvmsg(3, ...) = 168        /* to user */
>> recvmsg(3, ...) = 144        /* to user */
>> recvmsg(3, ...) = 20         /* to user */
>>
>> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
>> Signed-off-by: Jakub Zawadzki <darkjames-ws@darkjames.pl>
>> ---
>>   v1->v2:
>>    - let PACKET_* values not overlap as wished by Dave
>>
>>   include/uapi/linux/if_packet.h | 4 +++-
>>   net/netlink/af_netlink.c       | 2 ++
>>   2 files changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/uapi/linux/if_packet.h b/include/uapi/linux/if_packet.h
>> index e9d844c..06e2a28 100644
>> --- a/include/uapi/linux/if_packet.h
>> +++ b/include/uapi/linux/if_packet.h
>> @@ -26,8 +26,10 @@ struct sockaddr_ll {
>>   #define PACKET_MULTICAST    2        /* To group        */
>>   #define PACKET_OTHERHOST    3        /* To someone else     */
>>   #define PACKET_OUTGOING        4        /* Outgoing of any type */
>> -/* These ones are invisible by user level */
>>   #define PACKET_LOOPBACK        5        /* MC/BRD frame looped back */
>> +#define PACKET_USER        6        /* To user space    */
> Reusing this value is like changing the API. If some userland apps and external
> modules rely on it, this patch may break them.

Sorry, but I thought I made it clear in the commit message that
PACKET_FASTROUTE is *not* used anywhere in the whole kernel tree.
And as the comment said as well, this type was never exposed to
user land.

>> +#define PACKET_KERNEL        7        /* To kernel space    */
>> +/* Unused, PACKET_FASTROUTE and PACKET_LOOPBACK are invisble to user space */
> nitpicking: s/invisble/invisible
>
>>   #define PACKET_FASTROUTE    6        /* Fastrouted frame    */
>>
>>   /* Packet socket options */
>> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
>> index 56e09d8..3f75f1c 100644
>> --- a/net/netlink/af_netlink.c
>> +++ b/net/netlink/af_netlink.c
>> @@ -204,6 +204,8 @@ static int __netlink_deliver_tap_skb(struct sk_buff *skb,
>>       if (nskb) {
>>           nskb->dev = dev;
>>           nskb->protocol = htons((u16) sk->sk_protocol);
>> +        nskb->pkt_type = netlink_is_kernel(sk) ?
>> +                 PACKET_KERNEL : PACKET_USER;
>>
>>           ret = dev_queue_xmit(nskb);
>>           if (unlikely(ret > 0))
>>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicolas Dichtel Dec. 23, 2013, 11:03 a.m. UTC | #3
Le 23/12/2013 11:46, Daniel Borkmann a écrit :
> On 12/23/2013 11:43 AM, Nicolas Dichtel wrote:
>> Le 23/12/2013 09:48, Daniel Borkmann a écrit :
>>> In order to facilitate development for netlink protocol dissector,
>>> fill the unused field skb->pkt_type of the cloned skb with a hint
>>> of the address space of the new owner (receiver) socket in the
>>> notion of "to kernel" resp. "to user".
>>>
>>> At the time we invoke __netlink_deliver_tap_skb(), we already have
>>> set the new skb owner via netlink_skb_set_owner_r(), so we can use
>>> that for netlink_is_kernel() probing.
>>>
>>> In normal PF_PACKET network traffic, this field denotes if the
>>> packet is destined for us (PACKET_HOST), if it's broadcast
>>> (PACKET_BROADCAST), etc.
>>>
>>> As we only have 3 bit reserved, we can use the value (= 6) of
>>> PACKET_FASTROUTE as it's _not used_ anywhere in the whole kernel
>>> and packets of such type were never exposed to user space, so
>>> there are no overlapping users of such kind. Thus, as wished,
>>> that seems the only way to make both PACKET_* values non-overlapping
>>> and therefore device agnostic.
>>>
>>> By using those two flags for netlink skbs on nlmon devices, they
>>> can be made available and picked up via sll_pkttype (previously
>>> unused in netlink context) in struct sockaddr_ll. We now have
>>> these two directions:
>>>
>>>   - PACKET_USER (= 6)    ->  to user space
>>>   - PACKET_KERNEL (= 7)  ->  to kernel space
>>>
>>> Partial `ip a` example strace for sa_family=AF_NETLINK with
>>> detected nl msg direction:
>>>
>>> syscall:                     direction:
>>> sendto(3,  ...) = 40         /* to kernel */
>>> recvmsg(3, ...) = 3404       /* to user */
>>> recvmsg(3, ...) = 1120       /* to user */
>>> recvmsg(3, ...) = 20         /* to user */
>>> sendto(3,  ...) = 40         /* to kernel */
>>> recvmsg(3, ...) = 168        /* to user */
>>> recvmsg(3, ...) = 144        /* to user */
>>> recvmsg(3, ...) = 20         /* to user */
>>>
>>> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
>>> Signed-off-by: Jakub Zawadzki <darkjames-ws@darkjames.pl>
>>> ---
>>>   v1->v2:
>>>    - let PACKET_* values not overlap as wished by Dave
>>>
>>>   include/uapi/linux/if_packet.h | 4 +++-
>>>   net/netlink/af_netlink.c       | 2 ++
>>>   2 files changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/uapi/linux/if_packet.h b/include/uapi/linux/if_packet.h
>>> index e9d844c..06e2a28 100644
>>> --- a/include/uapi/linux/if_packet.h
>>> +++ b/include/uapi/linux/if_packet.h
>>> @@ -26,8 +26,10 @@ struct sockaddr_ll {
>>>   #define PACKET_MULTICAST    2        /* To group        */
>>>   #define PACKET_OTHERHOST    3        /* To someone else     */
>>>   #define PACKET_OUTGOING        4        /* Outgoing of any type */
>>> -/* These ones are invisible by user level */
>>>   #define PACKET_LOOPBACK        5        /* MC/BRD frame looped back */
>>> +#define PACKET_USER        6        /* To user space    */
>> Reusing this value is like changing the API. If some userland apps and external
>> modules rely on it, this patch may break them.
>
> Sorry, but I thought I made it clear in the commit message that
> PACKET_FASTROUTE is *not* used anywhere in the whole kernel tree.
Yes, it's why I talk about *external* modules, which in fact are allowed
to use existing API.

> And as the comment said as well, this type was never exposed to
> user land.
The fact is that the value is in include/uapi/*, hence it's exposed to userland.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Borkmann Dec. 23, 2013, 11:11 a.m. UTC | #4
On 12/23/2013 12:03 PM, Nicolas Dichtel wrote:
> Le 23/12/2013 11:46, Daniel Borkmann a écrit :
>> On 12/23/2013 11:43 AM, Nicolas Dichtel wrote:
>>> Le 23/12/2013 09:48, Daniel Borkmann a écrit :
>>>> In order to facilitate development for netlink protocol dissector,
>>>> fill the unused field skb->pkt_type of the cloned skb with a hint
>>>> of the address space of the new owner (receiver) socket in the
>>>> notion of "to kernel" resp. "to user".
>>>>
>>>> At the time we invoke __netlink_deliver_tap_skb(), we already have
>>>> set the new skb owner via netlink_skb_set_owner_r(), so we can use
>>>> that for netlink_is_kernel() probing.
>>>>
>>>> In normal PF_PACKET network traffic, this field denotes if the
>>>> packet is destined for us (PACKET_HOST), if it's broadcast
>>>> (PACKET_BROADCAST), etc.
>>>>
>>>> As we only have 3 bit reserved, we can use the value (= 6) of
>>>> PACKET_FASTROUTE as it's _not used_ anywhere in the whole kernel
>>>> and packets of such type were never exposed to user space, so
>>>> there are no overlapping users of such kind. Thus, as wished,
>>>> that seems the only way to make both PACKET_* values non-overlapping
>>>> and therefore device agnostic.
>>>>
>>>> By using those two flags for netlink skbs on nlmon devices, they
>>>> can be made available and picked up via sll_pkttype (previously
>>>> unused in netlink context) in struct sockaddr_ll. We now have
>>>> these two directions:
>>>>
>>>>   - PACKET_USER (= 6)    ->  to user space
>>>>   - PACKET_KERNEL (= 7)  ->  to kernel space
>>>>
>>>> Partial `ip a` example strace for sa_family=AF_NETLINK with
>>>> detected nl msg direction:
>>>>
>>>> syscall:                     direction:
>>>> sendto(3,  ...) = 40         /* to kernel */
>>>> recvmsg(3, ...) = 3404       /* to user */
>>>> recvmsg(3, ...) = 1120       /* to user */
>>>> recvmsg(3, ...) = 20         /* to user */
>>>> sendto(3,  ...) = 40         /* to kernel */
>>>> recvmsg(3, ...) = 168        /* to user */
>>>> recvmsg(3, ...) = 144        /* to user */
>>>> recvmsg(3, ...) = 20         /* to user */
>>>>
>>>> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
>>>> Signed-off-by: Jakub Zawadzki <darkjames-ws@darkjames.pl>
>>>> ---
>>>>   v1->v2:
>>>>    - let PACKET_* values not overlap as wished by Dave
>>>>
>>>>   include/uapi/linux/if_packet.h | 4 +++-
>>>>   net/netlink/af_netlink.c       | 2 ++
>>>>   2 files changed, 5 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/include/uapi/linux/if_packet.h b/include/uapi/linux/if_packet.h
>>>> index e9d844c..06e2a28 100644
>>>> --- a/include/uapi/linux/if_packet.h
>>>> +++ b/include/uapi/linux/if_packet.h
>>>> @@ -26,8 +26,10 @@ struct sockaddr_ll {
>>>>   #define PACKET_MULTICAST    2        /* To group        */
>>>>   #define PACKET_OTHERHOST    3        /* To someone else     */
>>>>   #define PACKET_OUTGOING        4        /* Outgoing of any type */
>>>> -/* These ones are invisible by user level */
>>>>   #define PACKET_LOOPBACK        5        /* MC/BRD frame looped back */
>>>> +#define PACKET_USER        6        /* To user space    */
>>> Reusing this value is like changing the API. If some userland apps and external
>>> modules rely on it, this patch may break them.
>>
>> Sorry, but I thought I made it clear in the commit message that
>> PACKET_FASTROUTE is *not* used anywhere in the whole kernel tree.
> Yes, it's why I talk about *external* modules, which in fact are allowed
> to use existing API.

Sorry, but we *never* cared about external out-of-tree modules! If
out-of-tree modules want to use kernel APIs and stay updated then
people should start submitting them to the kernel. I thought that
this is clear as this is the default policy here on netdev!

>> And as the comment said as well, this type was never exposed to
>> user land.
> The fact is that the value is in include/uapi/*, hence it's exposed to userland.

Ok, let me explain once more ... no packet *what-so-ever* will ever
go up to user space with PACKET_FASTROUTE in sll_pkttype. 1) because
if you grep the kernel tree then you'll see that this is _not used
anywhere_, 2) as the comment says, skbs of such type were invisible
to user land, hence _never_ exposed through PF_PACKET in user space.

Thanks !
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicolas Dichtel Dec. 23, 2013, 1:08 p.m. UTC | #5
Le 23/12/2013 12:11, Daniel Borkmann a écrit :
> On 12/23/2013 12:03 PM, Nicolas Dichtel wrote:
>> Le 23/12/2013 11:46, Daniel Borkmann a écrit :
>>> On 12/23/2013 11:43 AM, Nicolas Dichtel wrote:
>>>> Le 23/12/2013 09:48, Daniel Borkmann a écrit :
>>>>> In order to facilitate development for netlink protocol dissector,
>>>>> fill the unused field skb->pkt_type of the cloned skb with a hint
>>>>> of the address space of the new owner (receiver) socket in the
>>>>> notion of "to kernel" resp. "to user".
>>>>>
>>>>> At the time we invoke __netlink_deliver_tap_skb(), we already have
>>>>> set the new skb owner via netlink_skb_set_owner_r(), so we can use
>>>>> that for netlink_is_kernel() probing.
>>>>>
>>>>> In normal PF_PACKET network traffic, this field denotes if the
>>>>> packet is destined for us (PACKET_HOST), if it's broadcast
>>>>> (PACKET_BROADCAST), etc.
>>>>>
>>>>> As we only have 3 bit reserved, we can use the value (= 6) of
>>>>> PACKET_FASTROUTE as it's _not used_ anywhere in the whole kernel
>>>>> and packets of such type were never exposed to user space, so
>>>>> there are no overlapping users of such kind. Thus, as wished,
>>>>> that seems the only way to make both PACKET_* values non-overlapping
>>>>> and therefore device agnostic.
>>>>>
>>>>> By using those two flags for netlink skbs on nlmon devices, they
>>>>> can be made available and picked up via sll_pkttype (previously
>>>>> unused in netlink context) in struct sockaddr_ll. We now have
>>>>> these two directions:
>>>>>
>>>>>   - PACKET_USER (= 6)    ->  to user space
>>>>>   - PACKET_KERNEL (= 7)  ->  to kernel space
>>>>>
>>>>> Partial `ip a` example strace for sa_family=AF_NETLINK with
>>>>> detected nl msg direction:
>>>>>
>>>>> syscall:                     direction:
>>>>> sendto(3,  ...) = 40         /* to kernel */
>>>>> recvmsg(3, ...) = 3404       /* to user */
>>>>> recvmsg(3, ...) = 1120       /* to user */
>>>>> recvmsg(3, ...) = 20         /* to user */
>>>>> sendto(3,  ...) = 40         /* to kernel */
>>>>> recvmsg(3, ...) = 168        /* to user */
>>>>> recvmsg(3, ...) = 144        /* to user */
>>>>> recvmsg(3, ...) = 20         /* to user */
>>>>>
>>>>> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
>>>>> Signed-off-by: Jakub Zawadzki <darkjames-ws@darkjames.pl>
>>>>> ---
>>>>>   v1->v2:
>>>>>    - let PACKET_* values not overlap as wished by Dave
>>>>>
>>>>>   include/uapi/linux/if_packet.h | 4 +++-
>>>>>   net/netlink/af_netlink.c       | 2 ++
>>>>>   2 files changed, 5 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/include/uapi/linux/if_packet.h b/include/uapi/linux/if_packet.h
>>>>> index e9d844c..06e2a28 100644
>>>>> --- a/include/uapi/linux/if_packet.h
>>>>> +++ b/include/uapi/linux/if_packet.h
>>>>> @@ -26,8 +26,10 @@ struct sockaddr_ll {
>>>>>   #define PACKET_MULTICAST    2        /* To group        */
>>>>>   #define PACKET_OTHERHOST    3        /* To someone else     */
>>>>>   #define PACKET_OUTGOING        4        /* Outgoing of any type */
>>>>> -/* These ones are invisible by user level */
>>>>>   #define PACKET_LOOPBACK        5        /* MC/BRD frame looped back */
>>>>> +#define PACKET_USER        6        /* To user space    */
>>>> Reusing this value is like changing the API. If some userland apps and external
>>>> modules rely on it, this patch may break them.
>>>
>>> Sorry, but I thought I made it clear in the commit message that
>>> PACKET_FASTROUTE is *not* used anywhere in the whole kernel tree.
>> Yes, it's why I talk about *external* modules, which in fact are allowed
>> to use existing API.
>
> Sorry, but we *never* cared about external out-of-tree modules! If
> out-of-tree modules want to use kernel APIs and stay updated then
> people should start submitting them to the kernel. I thought that
> this is clear as this is the default policy here on netdev!
Yes, this is perfectly clear. But I was thinking not changing/breaking an API
was a MUST too.

>
>>> And as the comment said as well, this type was never exposed to
>>> user land.
>> The fact is that the value is in include/uapi/*, hence it's exposed to userland.
>
> Ok, let me explain once more ... no packet *what-so-ever* will ever
> go up to user space with PACKET_FASTROUTE in sll_pkttype. 1) because
> if you grep the kernel tree then you'll see that this is _not used
> anywhere_, 2) as the comment says, skbs of such type were invisible
> to user land, hence _never_ exposed through PF_PACKET in user space.
Why keeping PACKET_FASTROUTE then?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Borkmann Dec. 23, 2013, 1:21 p.m. UTC | #6
On 12/23/2013 02:08 PM, Nicolas Dichtel wrote:
> Le 23/12/2013 12:11, Daniel Borkmann a écrit :
>> On 12/23/2013 12:03 PM, Nicolas Dichtel wrote:
>>> Le 23/12/2013 11:46, Daniel Borkmann a écrit :
>>>> On 12/23/2013 11:43 AM, Nicolas Dichtel wrote:
>>>>> Le 23/12/2013 09:48, Daniel Borkmann a écrit :
>>>>>> In order to facilitate development for netlink protocol dissector,
>>>>>> fill the unused field skb->pkt_type of the cloned skb with a hint
>>>>>> of the address space of the new owner (receiver) socket in the
>>>>>> notion of "to kernel" resp. "to user".
>>>>>>
>>>>>> At the time we invoke __netlink_deliver_tap_skb(), we already have
>>>>>> set the new skb owner via netlink_skb_set_owner_r(), so we can use
>>>>>> that for netlink_is_kernel() probing.
>>>>>>
>>>>>> In normal PF_PACKET network traffic, this field denotes if the
>>>>>> packet is destined for us (PACKET_HOST), if it's broadcast
>>>>>> (PACKET_BROADCAST), etc.
>>>>>>
>>>>>> As we only have 3 bit reserved, we can use the value (= 6) of
>>>>>> PACKET_FASTROUTE as it's _not used_ anywhere in the whole kernel
>>>>>> and packets of such type were never exposed to user space, so
>>>>>> there are no overlapping users of such kind. Thus, as wished,
>>>>>> that seems the only way to make both PACKET_* values non-overlapping
>>>>>> and therefore device agnostic.
>>>>>>
>>>>>> By using those two flags for netlink skbs on nlmon devices, they
>>>>>> can be made available and picked up via sll_pkttype (previously
>>>>>> unused in netlink context) in struct sockaddr_ll. We now have
>>>>>> these two directions:
>>>>>>
>>>>>>   - PACKET_USER (= 6)    ->  to user space
>>>>>>   - PACKET_KERNEL (= 7)  ->  to kernel space
>>>>>>
>>>>>> Partial `ip a` example strace for sa_family=AF_NETLINK with
>>>>>> detected nl msg direction:
>>>>>>
>>>>>> syscall:                     direction:
>>>>>> sendto(3,  ...) = 40         /* to kernel */
>>>>>> recvmsg(3, ...) = 3404       /* to user */
>>>>>> recvmsg(3, ...) = 1120       /* to user */
>>>>>> recvmsg(3, ...) = 20         /* to user */
>>>>>> sendto(3,  ...) = 40         /* to kernel */
>>>>>> recvmsg(3, ...) = 168        /* to user */
>>>>>> recvmsg(3, ...) = 144        /* to user */
>>>>>> recvmsg(3, ...) = 20         /* to user */
>>>>>>
>>>>>> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
>>>>>> Signed-off-by: Jakub Zawadzki <darkjames-ws@darkjames.pl>
>>>>>> ---
>>>>>>   v1->v2:
>>>>>>    - let PACKET_* values not overlap as wished by Dave
>>>>>>
>>>>>>   include/uapi/linux/if_packet.h | 4 +++-
>>>>>>   net/netlink/af_netlink.c       | 2 ++
>>>>>>   2 files changed, 5 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/include/uapi/linux/if_packet.h b/include/uapi/linux/if_packet.h
>>>>>> index e9d844c..06e2a28 100644
>>>>>> --- a/include/uapi/linux/if_packet.h
>>>>>> +++ b/include/uapi/linux/if_packet.h
>>>>>> @@ -26,8 +26,10 @@ struct sockaddr_ll {
>>>>>>   #define PACKET_MULTICAST    2        /* To group        */
>>>>>>   #define PACKET_OTHERHOST    3        /* To someone else     */
>>>>>>   #define PACKET_OUTGOING        4        /* Outgoing of any type */
>>>>>> -/* These ones are invisible by user level */
>>>>>>   #define PACKET_LOOPBACK        5        /* MC/BRD frame looped back */
>>>>>> +#define PACKET_USER        6        /* To user space    */
>>>>> Reusing this value is like changing the API. If some userland apps and external
>>>>> modules rely on it, this patch may break them.
>>>>
>>>> Sorry, but I thought I made it clear in the commit message that
>>>> PACKET_FASTROUTE is *not* used anywhere in the whole kernel tree.
>>> Yes, it's why I talk about *external* modules, which in fact are allowed
>>> to use existing API.
>>
>> Sorry, but we *never* cared about external out-of-tree modules! If
>> out-of-tree modules want to use kernel APIs and stay updated then
>> people should start submitting them to the kernel. I thought that
>> this is clear as this is the default policy here on netdev!
> Yes, this is perfectly clear. But I was thinking not changing/breaking an API
> was a MUST too.

Please *explicitly* point out to me where I break something in
user space!

With this patch, on nlmon devices there'll be only skbs send up
for PF_PACKET to user space that *either* have PACKET_USER *or*
PACKET_KERNEL set as sll_pkttype, _nothing_ else, and netlink
is the only user of this!

The rest (e.g. traditional PF_PACKET path of network traffic) is
unchanged, plus *nobody ever* sets PACKET_FASTROUTE WHAT-SO-EVER!

Nothing breaks ...

>>>> And as the comment said as well, this type was never exposed to
>>>> user land.
>>> The fact is that the value is in include/uapi/*, hence it's exposed to userland.
>>
>> Ok, let me explain once more ... no packet *what-so-ever* will ever
>> go up to user space with PACKET_FASTROUTE in sll_pkttype. 1) because
>> if you grep the kernel tree then you'll see that this is _not used
>> anywhere_, 2) as the comment says, skbs of such type were invisible
>> to user land, hence _never_ exposed through PF_PACKET in user space.
> Why keeping PACKET_FASTROUTE then?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Dec. 31, 2013, 6:48 p.m. UTC | #7
From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Mon, 23 Dec 2013 11:43:52 +0100

> Le 23/12/2013 09:48, Daniel Borkmann a écrit :
>> In order to facilitate development for netlink protocol dissector,
>> fill the unused field skb->pkt_type of the cloned skb with a hint
>> of the address space of the new owner (receiver) socket in the
>> notion of "to kernel" resp. "to user".
>>
>> At the time we invoke __netlink_deliver_tap_skb(), we already have
>> set the new skb owner via netlink_skb_set_owner_r(), so we can use
>> that for netlink_is_kernel() probing.
>>
>> In normal PF_PACKET network traffic, this field denotes if the
>> packet is destined for us (PACKET_HOST), if it's broadcast
>> (PACKET_BROADCAST), etc.
>>
>> As we only have 3 bit reserved, we can use the value (= 6) of
>> PACKET_FASTROUTE as it's _not used_ anywhere in the whole kernel
>> and packets of such type were never exposed to user space, so
>> there are no overlapping users of such kind. Thus, as wished,
>> that seems the only way to make both PACKET_* values non-overlapping
>> and therefore device agnostic.
>>
>> By using those two flags for netlink skbs on nlmon devices, they
>> can be made available and picked up via sll_pkttype (previously
>> unused in netlink context) in struct sockaddr_ll. We now have
>> these two directions:
>>
>>   - PACKET_USER (= 6)    ->  to user space
>>   - PACKET_KERNEL (= 7)  ->  to kernel space
>>
>> Partial `ip a` example strace for sa_family=AF_NETLINK with
>> detected nl msg direction:
>>
>> syscall:                     direction:
>> sendto(3,  ...) = 40         /* to kernel */
>> recvmsg(3, ...) = 3404       /* to user */
>> recvmsg(3, ...) = 1120       /* to user */
>> recvmsg(3, ...) = 20         /* to user */
>> sendto(3,  ...) = 40         /* to kernel */
>> recvmsg(3, ...) = 168        /* to user */
>> recvmsg(3, ...) = 144        /* to user */
>> recvmsg(3, ...) = 20         /* to user */
>>
>> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
>> Signed-off-by: Jakub Zawadzki <darkjames-ws@darkjames.pl>
>> ---
>>   v1->v2:
>>    - let PACKET_* values not overlap as wished by Dave
>>
>>   include/uapi/linux/if_packet.h | 4 +++-
>>   net/netlink/af_netlink.c       | 2 ++
>>   2 files changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/uapi/linux/if_packet.h
>> b/include/uapi/linux/if_packet.h
>> index e9d844c..06e2a28 100644
>> --- a/include/uapi/linux/if_packet.h
>> +++ b/include/uapi/linux/if_packet.h
>> @@ -26,8 +26,10 @@ struct sockaddr_ll {
>>   #define PACKET_MULTICAST	2		/* To group		*/
>>   #define PACKET_OTHERHOST	3		/* To someone else 	*/
>>   #define PACKET_OUTGOING 4 /* Outgoing of any type */
>> -/* These ones are invisible by user level */
>>   #define PACKET_LOOPBACK 5 /* MC/BRD frame looped back */
>> +#define PACKET_USER 6 /* To user space */
> Reusing this value is like changing the API. If some userland apps and
> external
> modules rely on it, this patch may break them.

Interpretation of a value is contextual, please stop ignoring this
fact.

And in this context, Daniel's reappropriation of the PACKET_FASTROUTE
value causes no harm to anyone.

Values never used by us at any point in the past, yet nevertheless
were placed in a user visible header file, have to be kept around.  But
that doesn't mean that we can't reuse them for other meanings.

The worst thing that could exist, is that some code out there builds,
for example, string tables using PACKET_* values and whoever did them
decided to be complete enough to build entries even for effectively
unused values such as PACKET_FASTROUTE.

If you actually do searches for code making reference to
PACKET_FASTROUTE or other similarly unused defines, that's what you
find.  There is nothing looking semantically at this value at all.

Any objections to Daniel's changes will fall on deaf ears until
someone can really provide an actual existing way that things can
break.  Because I'm very sure that they cannot.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/uapi/linux/if_packet.h b/include/uapi/linux/if_packet.h
index e9d844c..06e2a28 100644
--- a/include/uapi/linux/if_packet.h
+++ b/include/uapi/linux/if_packet.h
@@ -26,8 +26,10 @@  struct sockaddr_ll {
 #define PACKET_MULTICAST	2		/* To group		*/
 #define PACKET_OTHERHOST	3		/* To someone else 	*/
 #define PACKET_OUTGOING		4		/* Outgoing of any type */
-/* These ones are invisible by user level */
 #define PACKET_LOOPBACK		5		/* MC/BRD frame looped back */
+#define PACKET_USER		6		/* To user space	*/
+#define PACKET_KERNEL		7		/* To kernel space	*/
+/* Unused, PACKET_FASTROUTE and PACKET_LOOPBACK are invisble to user space */
 #define PACKET_FASTROUTE	6		/* Fastrouted frame	*/
 
 /* Packet socket options */
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 56e09d8..3f75f1c 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -204,6 +204,8 @@  static int __netlink_deliver_tap_skb(struct sk_buff *skb,
 	if (nskb) {
 		nskb->dev = dev;
 		nskb->protocol = htons((u16) sk->sk_protocol);
+		nskb->pkt_type = netlink_is_kernel(sk) ?
+				 PACKET_KERNEL : PACKET_USER;
 
 		ret = dev_queue_xmit(nskb);
 		if (unlikely(ret > 0))