diff mbox

Packet mmap : allow the user to choose the offset of the tx payload.

Message ID 1349442629-10960-1-git-send-email-paul.chavent@onera.fr
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Paul Chavent Oct. 5, 2012, 1:10 p.m. UTC
The tx offset of packet mmap tx ring used to be :
(TPACKET2_HDRLEN - sizeof(struct sockaddr_ll))

The problem is that depending on the usage of SOCK_DGRAM or
SOCK_RAW, the payload could be aligned or not.

This patch allow to let the user give an offset for it's tx
payload if he desires.

Signed-off-by: Paul Chavent <paul.chavent@onera.fr>
---
 net/packet/af_packet.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

Comments

danborkmann@iogearbox.net Oct. 5, 2012, 2:17 p.m. UTC | #1
On Fri, Oct 5, 2012 at 3:10 PM, Paul Chavent <Paul.Chavent@onera.fr> wrote:
> The tx offset of packet mmap tx ring used to be :
> (TPACKET2_HDRLEN - sizeof(struct sockaddr_ll))
>
> The problem is that depending on the usage of SOCK_DGRAM or
> SOCK_RAW, the payload could be aligned or not.
>
> This patch allow to let the user give an offset for it's tx
> payload if he desires.
>
> Signed-off-by: Paul Chavent <paul.chavent@onera.fr>

Can you provide an example when it doesn't hit TPACKET_ALIGNMENT?

On the first look, could it be that your patch currently enforces the
use of your tp_net's offset for *all* TX_RING users, even if they
don't care about it? So in case off==0, you probably get a negative
offset in case of SOCK_RAW, thus it won't hit the second if-statement.
Sure, but this does not look intuitive in my opinion. Maybe, it's
better to only enter this path if the offset *is* used by someone.

Also, to my knowledge tp_net is currently only applied in receive
path. So, if for whatever reason people did not explicitly set tp_net
to 0, it might break their code, if there's a random offset in it,
right?

Best,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul Chavent Oct. 5, 2012, 7:21 p.m. UTC | #2
Hi

On Fri, 5 Oct 2012 16:17:12 +0200, Daniel Borkmann wrote:
> On Fri, Oct 5, 2012 at 3:10 PM, Paul Chavent <Paul.Chavent@onera.fr> 
> wrote:
>> The tx offset of packet mmap tx ring used to be :
>> (TPACKET2_HDRLEN - sizeof(struct sockaddr_ll))
>>
>> The problem is that depending on the usage of SOCK_DGRAM or
>> SOCK_RAW, the payload could be aligned or not.
>>
>> This patch allow to let the user give an offset for it's tx
>> payload if he desires.
>>
>> Signed-off-by: Paul Chavent <paul.chavent@onera.fr>
>
> Can you provide an example when it doesn't hit TPACKET_ALIGNMENT?

When we use tx ring, the user have to write at (TPACKET_HDRLEN - 
sizeof(struct sockaddr_ll))

This adress is aligned on TPACKET_ALIGNMENT since
TPACKET_HDRLEN = (TPACKET_ALIGN(sizeof(struct tpacket_hdr)) + 
sizeof(struct sockaddr_ll))

When we use the tx ring with SOCK_RAW option, the mac header is aligned 
on TPACKET_ALIGNMENT, but not the payload (14 bytes away).


>
> On the first look, could it be that your patch currently enforces the
> use of your tp_net's offset for *all* TX_RING users, even if they
> don't care about it? So in case off==0, you probably get a negative
> offset in case of SOCK_RAW, thus it won't hit the second 
> if-statement.
> Sure, but this does not look intuitive in my opinion. Maybe, it's
> better to only enter this path if the offset *is* used by someone.

Yes, moreover i had a problem with the signed/unsigned comparison.
I've fixed all those problems for the next submission.


>
> Also, to my knowledge tp_net is currently only applied in receive
> path. So, if for whatever reason people did not explicitly set tp_net
> to 0, it might break their code, if there's a random offset in it,
> right?

I haven't found where all frames were initialized to 
TP_STATUS_AVAILABLE, so i have supposed that the frames were initialized 
to zero.

So your are right, if tp_net is not forced to zero before sending the 
packet it could break the legacy code :(


>
> Best,
> Daniel

Thank for your review.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
danborkmann@iogearbox.net Oct. 5, 2012, 7:37 p.m. UTC | #3
On Fri, Oct 5, 2012 at 9:21 PM, pchavent <Paul.Chavent@onera.fr> wrote:
> On Fri, 5 Oct 2012 16:17:12 +0200, Daniel Borkmann wrote:
>> On Fri, Oct 5, 2012 at 3:10 PM, Paul Chavent <Paul.Chavent@onera.fr>
>> wrote:
>>>
>>> The tx offset of packet mmap tx ring used to be :
>>> (TPACKET2_HDRLEN - sizeof(struct sockaddr_ll))
>>>
>>> The problem is that depending on the usage of SOCK_DGRAM or
>>> SOCK_RAW, the payload could be aligned or not.
>>>
>>> This patch allow to let the user give an offset for it's tx
>>> payload if he desires.
>>>
>>> Signed-off-by: Paul Chavent <paul.chavent@onera.fr>
>>
>> Can you provide an example when it doesn't hit TPACKET_ALIGNMENT?
>
> When we use tx ring, the user have to write at (TPACKET_HDRLEN -
> sizeof(struct sockaddr_ll))
>
> This adress is aligned on TPACKET_ALIGNMENT since
> TPACKET_HDRLEN = (TPACKET_ALIGN(sizeof(struct tpacket_hdr)) + sizeof(struct
> sockaddr_ll))
>
> When we use the tx ring with SOCK_RAW option, the mac header is aligned on
> TPACKET_ALIGNMENT, but not the payload (14 bytes away).

Okay, I'm confused about your intentions, maybe I'm missing something.
The man-page of packet(7) clearly says:

The socket_type is either SOCK_RAW for raw packets *including* the
link level header or SOCK_DGRAM for cooked packets with  the link
level header *removed*.

So this is perfectly intended behavior of PF_PACKET.

Cheers,

Daniel
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul Chavent Oct. 6, 2012, 7:43 a.m. UTC | #4
On Fri, 5 Oct 2012 21:37:58 +0200, Daniel Borkmann wrote:
> On Fri, Oct 5, 2012 at 9:21 PM, pchavent <Paul.Chavent@onera.fr> 
> wrote:
>> On Fri, 5 Oct 2012 16:17:12 +0200, Daniel Borkmann wrote:
>>> On Fri, Oct 5, 2012 at 3:10 PM, Paul Chavent 
>>> <Paul.Chavent@onera.fr>
>>> wrote:
>>>>
>>>> The tx offset of packet mmap tx ring used to be :
>>>> (TPACKET2_HDRLEN - sizeof(struct sockaddr_ll))
>>>>
>>>> The problem is that depending on the usage of SOCK_DGRAM or
>>>> SOCK_RAW, the payload could be aligned or not.
>>>>
>>>> This patch allow to let the user give an offset for it's tx
>>>> payload if he desires.
>>>>
>>>> Signed-off-by: Paul Chavent <paul.chavent@onera.fr>
>>>
>>> Can you provide an example when it doesn't hit TPACKET_ALIGNMENT?
>>
>> When we use tx ring, the user have to write at (TPACKET_HDRLEN -
>> sizeof(struct sockaddr_ll))
>>
>> This adress is aligned on TPACKET_ALIGNMENT since
>> TPACKET_HDRLEN = (TPACKET_ALIGN(sizeof(struct tpacket_hdr)) + 
>> sizeof(struct
>> sockaddr_ll))
>>
>> When we use the tx ring with SOCK_RAW option, the mac header is 
>> aligned on
>> TPACKET_ALIGNMENT, but not the payload (14 bytes away).
>
> Okay, I'm confused about your intentions, maybe I'm missing 
> something.
> The man-page of packet(7) clearly says:
>
> The socket_type is either SOCK_RAW for raw packets *including* the
> link level header or SOCK_DGRAM for cooked packets with  the link
> level header *removed*.
>
> So this is perfectly intended behavior of PF_PACKET.
>
> Cheers,
>
> Daniel

Yes, i also expect to be able to include the link level header when i 
use SOCK_RAW.

My intention is to send a frame with this payload (for example) :
typedef struct
{
   double   ts;
   uint64_t foo;
} test_t;

So i get a pointer to the raw packet :
void * raw_packet = frame_base + (TPACKET_HDRLEN - sizeof(struct 
sockaddr_ll));

I cook the header :
memcpy(raw_packet +  0, dst_addr, sizeof(dst_addr));
memcpy(raw_packet +  6, src_addr, sizeof(src_addr));
memcpy(raw_packet + 12, type    , sizeof(type));

Then i get a pointer to the beginning of payload :
test_t * payload = raw_packet + 14;

Here payload is at 58 bytes from the beginning of the frame.

Then i fill the payload :
payload->ts = 1.0;
payload->foo = 2;
...

These are misaligned accesses.

I don't care to fill the cooked header if it's misaligned, but i would 
like to be able to fill the frame directly in the ring buffer being on 
aligned boundaries.
It would allow to avoid to memcpy an instance of the payload 
structure...

Regards.

Paul.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
danborkmann@iogearbox.net Oct. 7, 2012, 10:50 a.m. UTC | #5
On Sat, Oct 6, 2012 at 9:43 AM, pchavent <Paul.Chavent@onera.fr> wrote:
> On Fri, 5 Oct 2012 21:37:58 +0200, Daniel Borkmann wrote:
>> On Fri, Oct 5, 2012 at 9:21 PM, pchavent <Paul.Chavent@onera.fr> wrote:
>>> On Fri, 5 Oct 2012 16:17:12 +0200, Daniel Borkmann wrote:
>>>> On Fri, Oct 5, 2012 at 3:10 PM, Paul Chavent <Paul.Chavent@onera.fr>
>>>> wrote:
>>>>>
>>>>>
>>>>> The tx offset of packet mmap tx ring used to be :
>>>>> (TPACKET2_HDRLEN - sizeof(struct sockaddr_ll))
>>>>>
>>>>> The problem is that depending on the usage of SOCK_DGRAM or
>>>>> SOCK_RAW, the payload could be aligned or not.
>>>>>
>>>>> This patch allow to let the user give an offset for it's tx
>>>>> payload if he desires.
>>>>>
>>>>> Signed-off-by: Paul Chavent <paul.chavent@onera.fr>
>>>>
>>>>
>>>> Can you provide an example when it doesn't hit TPACKET_ALIGNMENT?
>>>
>>>
>>> When we use tx ring, the user have to write at (TPACKET_HDRLEN -
>>> sizeof(struct sockaddr_ll))
>>>
>>> This adress is aligned on TPACKET_ALIGNMENT since
>>> TPACKET_HDRLEN = (TPACKET_ALIGN(sizeof(struct tpacket_hdr)) +
>>> sizeof(struct
>>> sockaddr_ll))
>>>
>>> When we use the tx ring with SOCK_RAW option, the mac header is aligned
>>> on
>>> TPACKET_ALIGNMENT, but not the payload (14 bytes away).
>>
>>
>> Okay, I'm confused about your intentions, maybe I'm missing something.
>> The man-page of packet(7) clearly says:
>>
>> The socket_type is either SOCK_RAW for raw packets *including* the
>> link level header or SOCK_DGRAM for cooked packets with  the link
>> level header *removed*.
>>
>> So this is perfectly intended behavior of PF_PACKET.
>>
>> Cheers,
>>
>> Daniel
>
>
> Yes, i also expect to be able to include the link level header when i use
> SOCK_RAW.
>
> My intention is to send a frame with this payload (for example) :
> typedef struct
> {
>   double   ts;
>   uint64_t foo;
> } test_t;
>
> So i get a pointer to the raw packet :
> void * raw_packet = frame_base + (TPACKET_HDRLEN - sizeof(struct
> sockaddr_ll));
>
> I cook the header :
> memcpy(raw_packet +  0, dst_addr, sizeof(dst_addr));
> memcpy(raw_packet +  6, src_addr, sizeof(src_addr));
> memcpy(raw_packet + 12, type    , sizeof(type));
>
> Then i get a pointer to the beginning of payload :
> test_t * payload = raw_packet + 14;
>
> Here payload is at 58 bytes from the beginning of the frame.
>
> Then i fill the payload :
> payload->ts = 1.0;
> payload->foo = 2;
> ...
>
> These are misaligned accesses.
>
> I don't care to fill the cooked header if it's misaligned, but i would like
> to be able to fill the frame directly in the ring buffer being on aligned
> boundaries.

Okay.

Maybe what you could do in a new version of your patch is to introduce
a TP_STATUS flag, e.g. TP_STATUS_SEND_HAS_OFF that you pass along
binary or'ed with the commonly used flags, and then you can fill
tp_mac resp. tp_net with offsets. By that, you won't break legacy
stuff.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
danborkmann@iogearbox.net Oct. 7, 2012, 12:44 p.m. UTC | #6
On Sun, Oct 7, 2012 at 12:50 PM, Daniel Borkmann
<danborkmann@iogearbox.net> wrote:
> On Sat, Oct 6, 2012 at 9:43 AM, pchavent <Paul.Chavent@onera.fr> wrote:
>> On Fri, 5 Oct 2012 21:37:58 +0200, Daniel Borkmann wrote:
>>> On Fri, Oct 5, 2012 at 9:21 PM, pchavent <Paul.Chavent@onera.fr> wrote:
>>>> On Fri, 5 Oct 2012 16:17:12 +0200, Daniel Borkmann wrote:
>>>>> On Fri, Oct 5, 2012 at 3:10 PM, Paul Chavent <Paul.Chavent@onera.fr>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> The tx offset of packet mmap tx ring used to be :
>>>>>> (TPACKET2_HDRLEN - sizeof(struct sockaddr_ll))
>>>>>>
>>>>>> The problem is that depending on the usage of SOCK_DGRAM or
>>>>>> SOCK_RAW, the payload could be aligned or not.
>>>>>>
>>>>>> This patch allow to let the user give an offset for it's tx
>>>>>> payload if he desires.
>>>>>>
>>>>>> Signed-off-by: Paul Chavent <paul.chavent@onera.fr>
>>>>>
>>>>>
>>>>> Can you provide an example when it doesn't hit TPACKET_ALIGNMENT?
>>>>
>>>>
>>>> When we use tx ring, the user have to write at (TPACKET_HDRLEN -
>>>> sizeof(struct sockaddr_ll))
>>>>
>>>> This adress is aligned on TPACKET_ALIGNMENT since
>>>> TPACKET_HDRLEN = (TPACKET_ALIGN(sizeof(struct tpacket_hdr)) +
>>>> sizeof(struct
>>>> sockaddr_ll))
>>>>
>>>> When we use the tx ring with SOCK_RAW option, the mac header is aligned
>>>> on
>>>> TPACKET_ALIGNMENT, but not the payload (14 bytes away).
>>>
>>>
>>> Okay, I'm confused about your intentions, maybe I'm missing something.
>>> The man-page of packet(7) clearly says:
>>>
>>> The socket_type is either SOCK_RAW for raw packets *including* the
>>> link level header or SOCK_DGRAM for cooked packets with  the link
>>> level header *removed*.
>>>
>>> So this is perfectly intended behavior of PF_PACKET.
>>>
>>> Cheers,
>>>
>>> Daniel
>>
>>
>> Yes, i also expect to be able to include the link level header when i use
>> SOCK_RAW.
>>
>> My intention is to send a frame with this payload (for example) :
>> typedef struct
>> {
>>   double   ts;
>>   uint64_t foo;
>> } test_t;
>>
>> So i get a pointer to the raw packet :
>> void * raw_packet = frame_base + (TPACKET_HDRLEN - sizeof(struct
>> sockaddr_ll));
>>
>> I cook the header :
>> memcpy(raw_packet +  0, dst_addr, sizeof(dst_addr));
>> memcpy(raw_packet +  6, src_addr, sizeof(src_addr));
>> memcpy(raw_packet + 12, type    , sizeof(type));
>>
>> Then i get a pointer to the beginning of payload :
>> test_t * payload = raw_packet + 14;
>>
>> Here payload is at 58 bytes from the beginning of the frame.
>>
>> Then i fill the payload :
>> payload->ts = 1.0;
>> payload->foo = 2;
>> ...
>>
>> These are misaligned accesses.
>>
>> I don't care to fill the cooked header if it's misaligned, but i would like
>> to be able to fill the frame directly in the ring buffer being on aligned
>> boundaries.

Just a minor remark: speaking about your patch, in the case of
SOCK_RAW I think you rather might want to care whether your *mac
header* starts at an aligned offset or not, since the kernel doesn't
care about your particular payload, but about the frame as a whole
that it needs to process.

> Okay.
>
> Maybe what you could do in a new version of your patch is to introduce
> a TP_STATUS flag, e.g. TP_STATUS_SEND_HAS_OFF that you pass along
> binary or'ed with the commonly used flags, and then you can fill
> tp_mac resp. tp_net with offsets. By that, you won't break legacy
> stuff.

Also, note that you should wait with your submission until net-next is reopened.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 94060ed..5bf3100 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1851,7 +1851,7 @@  static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
 		struct tpacket2_hdr *h2;
 		void *raw;
 	} ph;
-	int to_write, offset, len, tp_len, nr_frags, len_max;
+	int to_write, offset, len, tp_len, nr_frags, len_max, off;
 	struct socket *sock = po->sk.sk_socket;
 	struct page *page;
 	void *data;
@@ -1868,9 +1868,11 @@  static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
 	switch (po->tp_version) {
 	case TPACKET_V2:
 		tp_len = ph.h2->tp_len;
+		off = ph.h2->tp_net;
 		break;
 	default:
 		tp_len = ph.h1->tp_len;
+		off = ph.h1->tp_net;
 		break;
 	}
 	if (unlikely(tp_len > size_max)) {
@@ -1882,6 +1884,16 @@  static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
 	skb_reset_network_header(skb);
 
 	data = ph.raw + po->tp_hdrlen - sizeof(struct sockaddr_ll);
+	if (sock->type != SOCK_DGRAM)
+		off -= dev->hard_header_len;
+	if (likely((po->tp_hdrlen - sizeof(struct sockaddr_ll)) < off)) {
+		if (unlikely((off + tp_len) > size_max)) {
+			pr_err("packet size is too long (%d + %d > %d)\n",
+				off, tp_len, size_max);
+			return -EMSGSIZE;
+		}
+		data = ph.raw + off;
+	}
 	to_write = tp_len;
 
 	if (sock->type == SOCK_DGRAM) {