[bpf-next,4/7] net: do_tcp_sendpages flag to avoid SKBTX_SHARED_FRAG

Message ID 20180112181055.21531.95322.stgit@john-Precision-Tower-5810
State Changes Requested
Delegated to: BPF Maintainers
Headers show
Series
  • Add BPF_PROG_TYPE_SK_MSG and attach pt
Related show

Commit Message

John Fastabend Jan. 12, 2018, 6:10 p.m.
When calling do_tcp_sendpages() from in kernel and we know the data
has no references from user side we can omit SKBTX_SHARED_FRAG flag.
This patch adds an internal flag, NO_SKBTX_SHARED_FRAG that can be used
to omit setting SKBTX_SHARED_FRAG.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 include/linux/socket.h |    1 +
 net/ipv4/tcp.c         |    4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

Comments

Eric Dumazet Jan. 12, 2018, 8:10 p.m. | #1
On Fri, 2018-01-12 at 10:10 -0800, John Fastabend wrote:
> When calling do_tcp_sendpages() from in kernel and we know the data
> has no references from user side we can omit SKBTX_SHARED_FRAG flag.
> This patch adds an internal flag, NO_SKBTX_SHARED_FRAG that can be used
> to omit setting SKBTX_SHARED_FRAG.
> 
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> ---
>  include/linux/socket.h |    1 +
>  net/ipv4/tcp.c         |    4 +++-
>  2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/socket.h b/include/linux/socket.h
> index 9286a5a..add9360 100644
> --- a/include/linux/socket.h
> +++ b/include/linux/socket.h
> @@ -287,6 +287,7 @@ struct ucred {
>  #define MSG_SENDPAGE_NOTLAST 0x20000 /* sendpage() internal : not the last page */
>  #define MSG_BATCH	0x40000 /* sendmmsg(): more messages coming */
>  #define MSG_EOF         MSG_FIN
> +#define MSG_NO_SHARED_FRAGS 0x80000 /* sendpage() internal : page frags are not shared */
>  
>  #define MSG_ZEROCOPY	0x4000000	/* Use user data in kernel path */
>  #define MSG_FASTOPEN	0x20000000	/* Send data in TCP SYN */
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 7ac583a..56c6f49 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -995,7 +995,9 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
>  			get_page(page);
>  			skb_fill_page_desc(skb, i, page, offset, copy);
>  		}
> -		skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
> +
> +		if (!(flags & MSG_NO_SHARED_FRAGS))
> +			skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
>  
>  		skb->len += copy;
>  		skb->data_len += copy;

What would prevent user space from using this flag ?
John Fastabend Jan. 12, 2018, 8:26 p.m. | #2
On 01/12/2018 12:10 PM, Eric Dumazet wrote:
> On Fri, 2018-01-12 at 10:10 -0800, John Fastabend wrote:
>> When calling do_tcp_sendpages() from in kernel and we know the data
>> has no references from user side we can omit SKBTX_SHARED_FRAG flag.
>> This patch adds an internal flag, NO_SKBTX_SHARED_FRAG that can be used
>> to omit setting SKBTX_SHARED_FRAG.
>>
>> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
>> ---
>>  include/linux/socket.h |    1 +
>>  net/ipv4/tcp.c         |    4 +++-
>>  2 files changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/socket.h b/include/linux/socket.h
>> index 9286a5a..add9360 100644
>> --- a/include/linux/socket.h
>> +++ b/include/linux/socket.h
>> @@ -287,6 +287,7 @@ struct ucred {
>>  #define MSG_SENDPAGE_NOTLAST 0x20000 /* sendpage() internal : not the last page */
>>  #define MSG_BATCH	0x40000 /* sendmmsg(): more messages coming */
>>  #define MSG_EOF         MSG_FIN
>> +#define MSG_NO_SHARED_FRAGS 0x80000 /* sendpage() internal : page frags are not shared */
>>  
>>  #define MSG_ZEROCOPY	0x4000000	/* Use user data in kernel path */
>>  #define MSG_FASTOPEN	0x20000000	/* Send data in TCP SYN */
>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>> index 7ac583a..56c6f49 100644
>> --- a/net/ipv4/tcp.c
>> +++ b/net/ipv4/tcp.c
>> @@ -995,7 +995,9 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
>>  			get_page(page);
>>  			skb_fill_page_desc(skb, i, page, offset, copy);
>>  		}
>> -		skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
>> +
>> +		if (!(flags & MSG_NO_SHARED_FRAGS))
>> +			skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
>>  
>>  		skb->len += copy;
>>  		skb->data_len += copy;
> 
> What would prevent user space from using this flag ?
> 

Nothing in the current patches. So user could set this, change the data,
and then presumably get incorrect checksums with bad timing. Seems like
this should be blocked so we don't allow users to try and send bad csums.

How about masking the flags coming from userland? Alternatively could add
a bool to do_tcp_sendpages().

.John
Eric Dumazet Jan. 12, 2018, 8:46 p.m. | #3
On Fri, 2018-01-12 at 12:26 -0800, John Fastabend wrote:
> On 01/12/2018 12:10 PM, Eric Dumazet wrote:
> > On Fri, 2018-01-12 at 10:10 -0800, John Fastabend wrote:
> > > When calling do_tcp_sendpages() from in kernel and we know the data
> > > has no references from user side we can omit SKBTX_SHARED_FRAG flag.
> > > This patch adds an internal flag, NO_SKBTX_SHARED_FRAG that can be used
> > > to omit setting SKBTX_SHARED_FRAG.
> > > 
> > > Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> > > ---
> > >  include/linux/socket.h |    1 +
> > >  net/ipv4/tcp.c         |    4 +++-
> > >  2 files changed, 4 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/include/linux/socket.h b/include/linux/socket.h
> > > index 9286a5a..add9360 100644
> > > --- a/include/linux/socket.h
> > > +++ b/include/linux/socket.h
> > > @@ -287,6 +287,7 @@ struct ucred {
> > >  #define MSG_SENDPAGE_NOTLAST 0x20000 /* sendpage() internal : not the last page */
> > >  #define MSG_BATCH	0x40000 /* sendmmsg(): more messages coming */
> > >  #define MSG_EOF         MSG_FIN
> > > +#define MSG_NO_SHARED_FRAGS 0x80000 /* sendpage() internal : page frags are not shared */
> > >  
> > >  #define MSG_ZEROCOPY	0x4000000	/* Use user data in kernel path */
> > >  #define MSG_FASTOPEN	0x20000000	/* Send data in TCP SYN */
> > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > > index 7ac583a..56c6f49 100644
> > > --- a/net/ipv4/tcp.c
> > > +++ b/net/ipv4/tcp.c
> > > @@ -995,7 +995,9 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
> > >  			get_page(page);
> > >  			skb_fill_page_desc(skb, i, page, offset, copy);
> > >  		}
> > > -		skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
> > > +
> > > +		if (!(flags & MSG_NO_SHARED_FRAGS))
> > > +			skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
> > >  
> > >  		skb->len += copy;
> > >  		skb->data_len += copy;
> > 
> > What would prevent user space from using this flag ?
> > 
> 
> Nothing in the current patches. So user could set this, change the data,
> and then presumably get incorrect checksums with bad timing. Seems like
> this should be blocked so we don't allow users to try and send bad csums.

Are you sure user can set it ? How would this happen ?

It would be nice to check (sorry I was lazy/busy and did not check
before asking)

> How about masking the flags coming from userland? Alternatively could add
> a bool to do_tcp_sendpages().
>
John Fastabend Jan. 12, 2018, 9:11 p.m. | #4
On 01/12/2018 12:46 PM, Eric Dumazet wrote:
> On Fri, 2018-01-12 at 12:26 -0800, John Fastabend wrote:
>> On 01/12/2018 12:10 PM, Eric Dumazet wrote:
>>> On Fri, 2018-01-12 at 10:10 -0800, John Fastabend wrote:
>>>> When calling do_tcp_sendpages() from in kernel and we know the data
>>>> has no references from user side we can omit SKBTX_SHARED_FRAG flag.
>>>> This patch adds an internal flag, NO_SKBTX_SHARED_FRAG that can be used
>>>> to omit setting SKBTX_SHARED_FRAG.
>>>>
>>>> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
>>>> ---
>>>>  include/linux/socket.h |    1 +
>>>>  net/ipv4/tcp.c         |    4 +++-
>>>>  2 files changed, 4 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/include/linux/socket.h b/include/linux/socket.h
>>>> index 9286a5a..add9360 100644
>>>> --- a/include/linux/socket.h
>>>> +++ b/include/linux/socket.h
>>>> @@ -287,6 +287,7 @@ struct ucred {
>>>>  #define MSG_SENDPAGE_NOTLAST 0x20000 /* sendpage() internal : not the last page */
>>>>  #define MSG_BATCH	0x40000 /* sendmmsg(): more messages coming */
>>>>  #define MSG_EOF         MSG_FIN
>>>> +#define MSG_NO_SHARED_FRAGS 0x80000 /* sendpage() internal : page frags are not shared */
>>>>  
>>>>  #define MSG_ZEROCOPY	0x4000000	/* Use user data in kernel path */
>>>>  #define MSG_FASTOPEN	0x20000000	/* Send data in TCP SYN */
>>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>>>> index 7ac583a..56c6f49 100644
>>>> --- a/net/ipv4/tcp.c
>>>> +++ b/net/ipv4/tcp.c
>>>> @@ -995,7 +995,9 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
>>>>  			get_page(page);
>>>>  			skb_fill_page_desc(skb, i, page, offset, copy);
>>>>  		}
>>>> -		skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
>>>> +
>>>> +		if (!(flags & MSG_NO_SHARED_FRAGS))
>>>> +			skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
>>>>  
>>>>  		skb->len += copy;
>>>>  		skb->data_len += copy;
>>>
>>> What would prevent user space from using this flag ?
>>>
>>
>> Nothing in the current patches. So user could set this, change the data,
>> and then presumably get incorrect checksums with bad timing. Seems like
>> this should be blocked so we don't allow users to try and send bad csums.
> 
> Are you sure user can set it ? How would this happen ?
> 

Ah OK I thought you might have a path that I missed. Just
rechecked and I don't see any paths where user flags get
to sendpage without being masked.

> It would be nice to check (sorry I was lazy/busy and did not check
> before asking)

No problem.

The splice path using pipe_to_sendpage() already masks the
flags before sendpage is called. The only other call sites I
see are in o2net and lowcomms both places flags are hard coded
in-kernel.

So we should be safe.


>> How about masking the flags coming from userland? Alternatively could add
>> a bool to do_tcp_sendpages().
>>

Patch

diff --git a/include/linux/socket.h b/include/linux/socket.h
index 9286a5a..add9360 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -287,6 +287,7 @@  struct ucred {
 #define MSG_SENDPAGE_NOTLAST 0x20000 /* sendpage() internal : not the last page */
 #define MSG_BATCH	0x40000 /* sendmmsg(): more messages coming */
 #define MSG_EOF         MSG_FIN
+#define MSG_NO_SHARED_FRAGS 0x80000 /* sendpage() internal : page frags are not shared */
 
 #define MSG_ZEROCOPY	0x4000000	/* Use user data in kernel path */
 #define MSG_FASTOPEN	0x20000000	/* Send data in TCP SYN */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 7ac583a..56c6f49 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -995,7 +995,9 @@  ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
 			get_page(page);
 			skb_fill_page_desc(skb, i, page, offset, copy);
 		}
-		skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
+
+		if (!(flags & MSG_NO_SHARED_FRAGS))
+			skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
 
 		skb->len += copy;
 		skb->data_len += copy;