Message ID | 20180112181055.21531.95322.stgit@john-Precision-Tower-5810 |
---|---|
State | Changes Requested, archived |
Delegated to: | BPF Maintainers |
Headers | show |
Series | Add BPF_PROG_TYPE_SK_MSG and attach pt | expand |
On Fri, 2018-01-12 at 10:10 -0800, John Fastabend wrote: > When calling do_tcp_sendpages() from in kernel and we know the data > has no references from user side we can omit SKBTX_SHARED_FRAG flag. > This patch adds an internal flag, NO_SKBTX_SHARED_FRAG that can be used > to omit setting SKBTX_SHARED_FRAG. > > Signed-off-by: John Fastabend <john.fastabend@gmail.com> > --- > include/linux/socket.h | 1 + > net/ipv4/tcp.c | 4 +++- > 2 files changed, 4 insertions(+), 1 deletion(-) > > diff --git a/include/linux/socket.h b/include/linux/socket.h > index 9286a5a..add9360 100644 > --- a/include/linux/socket.h > +++ b/include/linux/socket.h > @@ -287,6 +287,7 @@ struct ucred { > #define MSG_SENDPAGE_NOTLAST 0x20000 /* sendpage() internal : not the last page */ > #define MSG_BATCH 0x40000 /* sendmmsg(): more messages coming */ > #define MSG_EOF MSG_FIN > +#define MSG_NO_SHARED_FRAGS 0x80000 /* sendpage() internal : page frags are not shared */ > > #define MSG_ZEROCOPY 0x4000000 /* Use user data in kernel path */ > #define MSG_FASTOPEN 0x20000000 /* Send data in TCP SYN */ > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index 7ac583a..56c6f49 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -995,7 +995,9 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset, > get_page(page); > skb_fill_page_desc(skb, i, page, offset, copy); > } > - skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG; > + > + if (!(flags & MSG_NO_SHARED_FRAGS)) > + skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG; > > skb->len += copy; > skb->data_len += copy; What would prevent user space from using this flag ?
On 01/12/2018 12:10 PM, Eric Dumazet wrote: > On Fri, 2018-01-12 at 10:10 -0800, John Fastabend wrote: >> When calling do_tcp_sendpages() from in kernel and we know the data >> has no references from user side we can omit SKBTX_SHARED_FRAG flag. >> This patch adds an internal flag, NO_SKBTX_SHARED_FRAG that can be used >> to omit setting SKBTX_SHARED_FRAG. >> >> Signed-off-by: John Fastabend <john.fastabend@gmail.com> >> --- >> include/linux/socket.h | 1 + >> net/ipv4/tcp.c | 4 +++- >> 2 files changed, 4 insertions(+), 1 deletion(-) >> >> diff --git a/include/linux/socket.h b/include/linux/socket.h >> index 9286a5a..add9360 100644 >> --- a/include/linux/socket.h >> +++ b/include/linux/socket.h >> @@ -287,6 +287,7 @@ struct ucred { >> #define MSG_SENDPAGE_NOTLAST 0x20000 /* sendpage() internal : not the last page */ >> #define MSG_BATCH 0x40000 /* sendmmsg(): more messages coming */ >> #define MSG_EOF MSG_FIN >> +#define MSG_NO_SHARED_FRAGS 0x80000 /* sendpage() internal : page frags are not shared */ >> >> #define MSG_ZEROCOPY 0x4000000 /* Use user data in kernel path */ >> #define MSG_FASTOPEN 0x20000000 /* Send data in TCP SYN */ >> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c >> index 7ac583a..56c6f49 100644 >> --- a/net/ipv4/tcp.c >> +++ b/net/ipv4/tcp.c >> @@ -995,7 +995,9 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset, >> get_page(page); >> skb_fill_page_desc(skb, i, page, offset, copy); >> } >> - skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG; >> + >> + if (!(flags & MSG_NO_SHARED_FRAGS)) >> + skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG; >> >> skb->len += copy; >> skb->data_len += copy; > > What would prevent user space from using this flag ? > Nothing in the current patches. So user could set this, change the data, and then presumably get incorrect checksums with bad timing. Seems like this should be blocked so we don't allow users to try and send bad csums. How about masking the flags coming from userland? Alternatively could add a bool to do_tcp_sendpages(). .John
On Fri, 2018-01-12 at 12:26 -0800, John Fastabend wrote: > On 01/12/2018 12:10 PM, Eric Dumazet wrote: > > On Fri, 2018-01-12 at 10:10 -0800, John Fastabend wrote: > > > When calling do_tcp_sendpages() from in kernel and we know the data > > > has no references from user side we can omit SKBTX_SHARED_FRAG flag. > > > This patch adds an internal flag, NO_SKBTX_SHARED_FRAG that can be used > > > to omit setting SKBTX_SHARED_FRAG. > > > > > > Signed-off-by: John Fastabend <john.fastabend@gmail.com> > > > --- > > > include/linux/socket.h | 1 + > > > net/ipv4/tcp.c | 4 +++- > > > 2 files changed, 4 insertions(+), 1 deletion(-) > > > > > > diff --git a/include/linux/socket.h b/include/linux/socket.h > > > index 9286a5a..add9360 100644 > > > --- a/include/linux/socket.h > > > +++ b/include/linux/socket.h > > > @@ -287,6 +287,7 @@ struct ucred { > > > #define MSG_SENDPAGE_NOTLAST 0x20000 /* sendpage() internal : not the last page */ > > > #define MSG_BATCH 0x40000 /* sendmmsg(): more messages coming */ > > > #define MSG_EOF MSG_FIN > > > +#define MSG_NO_SHARED_FRAGS 0x80000 /* sendpage() internal : page frags are not shared */ > > > > > > #define MSG_ZEROCOPY 0x4000000 /* Use user data in kernel path */ > > > #define MSG_FASTOPEN 0x20000000 /* Send data in TCP SYN */ > > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > > > index 7ac583a..56c6f49 100644 > > > --- a/net/ipv4/tcp.c > > > +++ b/net/ipv4/tcp.c > > > @@ -995,7 +995,9 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset, > > > get_page(page); > > > skb_fill_page_desc(skb, i, page, offset, copy); > > > } > > > - skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG; > > > + > > > + if (!(flags & MSG_NO_SHARED_FRAGS)) > > > + skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG; > > > > > > skb->len += copy; > > > skb->data_len += copy; > > > > What would prevent user space from using this flag ? > > > > Nothing in the current patches. So user could set this, change the data, > and then presumably get incorrect checksums with bad timing. Seems like > this should be blocked so we don't allow users to try and send bad csums. Are you sure user can set it ? How would this happen ? It would be nice to check (sorry I was lazy/busy and did not check before asking) > How about masking the flags coming from userland? Alternatively could add > a bool to do_tcp_sendpages(). >
On 01/12/2018 12:46 PM, Eric Dumazet wrote: > On Fri, 2018-01-12 at 12:26 -0800, John Fastabend wrote: >> On 01/12/2018 12:10 PM, Eric Dumazet wrote: >>> On Fri, 2018-01-12 at 10:10 -0800, John Fastabend wrote: >>>> When calling do_tcp_sendpages() from in kernel and we know the data >>>> has no references from user side we can omit SKBTX_SHARED_FRAG flag. >>>> This patch adds an internal flag, NO_SKBTX_SHARED_FRAG that can be used >>>> to omit setting SKBTX_SHARED_FRAG. >>>> >>>> Signed-off-by: John Fastabend <john.fastabend@gmail.com> >>>> --- >>>> include/linux/socket.h | 1 + >>>> net/ipv4/tcp.c | 4 +++- >>>> 2 files changed, 4 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/include/linux/socket.h b/include/linux/socket.h >>>> index 9286a5a..add9360 100644 >>>> --- a/include/linux/socket.h >>>> +++ b/include/linux/socket.h >>>> @@ -287,6 +287,7 @@ struct ucred { >>>> #define MSG_SENDPAGE_NOTLAST 0x20000 /* sendpage() internal : not the last page */ >>>> #define MSG_BATCH 0x40000 /* sendmmsg(): more messages coming */ >>>> #define MSG_EOF MSG_FIN >>>> +#define MSG_NO_SHARED_FRAGS 0x80000 /* sendpage() internal : page frags are not shared */ >>>> >>>> #define MSG_ZEROCOPY 0x4000000 /* Use user data in kernel path */ >>>> #define MSG_FASTOPEN 0x20000000 /* Send data in TCP SYN */ >>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c >>>> index 7ac583a..56c6f49 100644 >>>> --- a/net/ipv4/tcp.c >>>> +++ b/net/ipv4/tcp.c >>>> @@ -995,7 +995,9 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset, >>>> get_page(page); >>>> skb_fill_page_desc(skb, i, page, offset, copy); >>>> } >>>> - skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG; >>>> + >>>> + if (!(flags & MSG_NO_SHARED_FRAGS)) >>>> + skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG; >>>> >>>> skb->len += copy; >>>> skb->data_len += copy; >>> >>> What would prevent user space from using this flag ? >>> >> >> Nothing in the current patches. So user could set this, change the data, >> and then presumably get incorrect checksums with bad timing. Seems like >> this should be blocked so we don't allow users to try and send bad csums. > > Are you sure user can set it ? How would this happen ? > Ah OK I thought you might have a path that I missed. Just rechecked and I don't see any paths where user flags get to sendpage without being masked. > It would be nice to check (sorry I was lazy/busy and did not check > before asking) No problem. The splice path using pipe_to_sendpage() already masks the flags before sendpage is called. The only other call sites I see are in o2net and lowcomms both places flags are hard coded in-kernel. So we should be safe. >> How about masking the flags coming from userland? Alternatively could add >> a bool to do_tcp_sendpages(). >>
diff --git a/include/linux/socket.h b/include/linux/socket.h index 9286a5a..add9360 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -287,6 +287,7 @@ struct ucred { #define MSG_SENDPAGE_NOTLAST 0x20000 /* sendpage() internal : not the last page */ #define MSG_BATCH 0x40000 /* sendmmsg(): more messages coming */ #define MSG_EOF MSG_FIN +#define MSG_NO_SHARED_FRAGS 0x80000 /* sendpage() internal : page frags are not shared */ #define MSG_ZEROCOPY 0x4000000 /* Use user data in kernel path */ #define MSG_FASTOPEN 0x20000000 /* Send data in TCP SYN */ diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 7ac583a..56c6f49 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -995,7 +995,9 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset, get_page(page); skb_fill_page_desc(skb, i, page, offset, copy); } - skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG; + + if (!(flags & MSG_NO_SHARED_FRAGS)) + skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG; skb->len += copy; skb->data_len += copy;
When calling do_tcp_sendpages() from in kernel and we know the data has no references from user side we can omit SKBTX_SHARED_FRAG flag. This patch adds an internal flag, NO_SKBTX_SHARED_FRAG that can be used to omit setting SKBTX_SHARED_FRAG. Signed-off-by: John Fastabend <john.fastabend@gmail.com> --- include/linux/socket.h | 1 + net/ipv4/tcp.c | 4 +++- 2 files changed, 4 insertions(+), 1 deletion(-)