Message ID | 20090122090442.GB11139@ff.dom.local |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
From: Jarek Poplawski <jarkao2@gmail.com> Date: Thu, 22 Jan 2009 09:04:42 +0000 > It seems this sk_sndmsg_page usage (refcounting) isn't consistent. > I used here tcp_sndmsg() way, but I think I'll go back to this question > soon. Indeed, it is something to look into, as well as locking. I'll try to find some time for this, thanks Jarek. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Miller <davem@davemloft.net> wrote: > From: Jarek Poplawski <jarkao2@gmail.com> > Date: Thu, 22 Jan 2009 09:04:42 +0000 > >> It seems this sk_sndmsg_page usage (refcounting) isn't consistent. >> I used here tcp_sndmsg() way, but I think I'll go back to this question >> soon. > > Indeed, it is something to look into, as well as locking. > > I'll try to find some time for this, thanks Jarek. After a quick look it seems to be OK to me. The code in the patch is called from tcp_splice_read, which holds the socket lock. So as long as the patch uses the usual TCP convention it should work. Cheers,
On Tue, Jan 27, 2009 at 06:11:30PM +1100, Herbert Xu wrote: > David Miller <davem@davemloft.net> wrote: > > From: Jarek Poplawski <jarkao2@gmail.com> > > Date: Thu, 22 Jan 2009 09:04:42 +0000 > > > >> It seems this sk_sndmsg_page usage (refcounting) isn't consistent. > >> I used here tcp_sndmsg() way, but I think I'll go back to this question > >> soon. > > > > Indeed, it is something to look into, as well as locking. > > > > I'll try to find some time for this, thanks Jarek. > > After a quick look it seems to be OK to me. The code in the patch > is called from tcp_splice_read, which holds the socket lock. So as > long as the patch uses the usual TCP convention it should work. Yes, but ip_append_data() (and skb_append_datato_frags() for NETIF_F_UFO only, so currently not a problem), uses this differently, and these pages in sk->sk_sndmsg_page could leak or be used after kfree. (I didn't track locking in these other places). Thanks, Jarek P. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jan 27, 2009 at 07:54:18AM +0000, Jarek Poplawski wrote: > > Yes, but ip_append_data() (and skb_append_datato_frags() for > NETIF_F_UFO only, so currently not a problem), uses this differently, > and these pages in sk->sk_sndmsg_page could leak or be used after > kfree. (I didn't track locking in these other places). It'll be freed when the socket is freed so that should be fine. Cheers,
On Tue, Jan 27, 2009 at 09:09:58PM +1100, Herbert Xu wrote: > On Tue, Jan 27, 2009 at 07:54:18AM +0000, Jarek Poplawski wrote: > > > > Yes, but ip_append_data() (and skb_append_datato_frags() for > > NETIF_F_UFO only, so currently not a problem), uses this differently, > > and these pages in sk->sk_sndmsg_page could leak or be used after > > kfree. (I didn't track locking in these other places). > > It'll be freed when the socket is freed so that should be fine. > I don't think so: these places can overwrite sk->sk_sndmsg_page left after tcp_sendmsg(), or skb_splice_bits() now, with NULL or a new pointer without put_page() (they only reference copied chunks and expect auto freeing). On the other hand, if tcp_sendmsg() reads after them it could use a pointer after the page is freed, I guess. Cheers, Jarek P. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jan 27, 2009 at 10:35:11AM +0000, Jarek Poplawski wrote: > On Tue, Jan 27, 2009 at 09:09:58PM +1100, Herbert Xu wrote: > > On Tue, Jan 27, 2009 at 07:54:18AM +0000, Jarek Poplawski wrote: > > > > > > Yes, but ip_append_data() (and skb_append_datato_frags() for > > > NETIF_F_UFO only, so currently not a problem), uses this differently, > > > and these pages in sk->sk_sndmsg_page could leak or be used after > > > kfree. (I didn't track locking in these other places). > > > > It'll be freed when the socket is freed so that should be fine. > > > > I don't think so: these places can overwrite sk->sk_sndmsg_page left > after tcp_sendmsg(), or skb_splice_bits() now, with NULL or a new > pointer without put_page() (they only reference copied chunks and > expect auto freeing). On the other hand, if tcp_sendmsg() reads after > them it could use a pointer after the page is freed, I guess. tcp_v4_destroy_sock() looks like vulnerable too. BTW, skb_append_datato_frags() currently doesn't need to use this sk->sk_sndmsg_page at all - it doesn't use caching between calls. Jarek P. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jan 27, 2009 at 10:35:11AM +0000, Jarek Poplawski wrote: > > > > Yes, but ip_append_data() (and skb_append_datato_frags() for > > > NETIF_F_UFO only, so currently not a problem), uses this differently, > > > and these pages in sk->sk_sndmsg_page could leak or be used after > > > kfree. (I didn't track locking in these other places). > > > > It'll be freed when the socket is freed so that should be fine. > > I don't think so: these places can overwrite sk->sk_sndmsg_page left > after tcp_sendmsg(), or skb_splice_bits() now, with NULL or a new > pointer without put_page() (they only reference copied chunks and > expect auto freeing). On the other hand, if tcp_sendmsg() reads after > them it could use a pointer after the page is freed, I guess. I wasn't referring to the first part of your sentence. That can't happen because they're only used for UDP sockets, this is a TCP socket. Cheers,
On Tue, Jan 27, 2009 at 10:48:05PM +1100, Herbert Xu wrote: > On Tue, Jan 27, 2009 at 10:35:11AM +0000, Jarek Poplawski wrote: > > > > > > Yes, but ip_append_data() (and skb_append_datato_frags() for > > > > NETIF_F_UFO only, so currently not a problem), uses this differently, > > > > and these pages in sk->sk_sndmsg_page could leak or be used after > > > > kfree. (I didn't track locking in these other places). > > > > > > It'll be freed when the socket is freed so that should be fine. > > > > I don't think so: these places can overwrite sk->sk_sndmsg_page left > > after tcp_sendmsg(), or skb_splice_bits() now, with NULL or a new > > pointer without put_page() (they only reference copied chunks and > > expect auto freeing). On the other hand, if tcp_sendmsg() reads after > > them it could use a pointer after the page is freed, I guess. > > I wasn't referring to the first part of your sentence. That can't > happen because they're only used for UDP sockets, this is a TCP > socket. Do you mean this part from ip_append_data() isn't used for TCP?: 1007 1008 if (page && (left = PAGE_SIZE - off) > 0) { 1009 if (copy >= left) 1010 copy = left; 1011 if (page != frag->page) { 1012 if (i == MAX_SKB_FRAGS) { 1013 err = -EMSGSIZE; 1014 goto error; 1015 } 1016 get_page(page); 1017 skb_fill_page_desc(skb, i, page, sk->sk_sndmsg_off, 0); 1018 frag = &skb_shinfo(skb)->frags[i]; 1019 } 1020 } else if (i < MAX_SKB_FRAGS) { 1021 if (copy > PAGE_SIZE) 1022 copy = PAGE_SIZE; 1023 page = alloc_pages(sk->sk_allocation, 0); 1024 if (page == NULL) { 1025 err = -ENOMEM; 1026 goto error; 1027 } 1028 sk->sk_sndmsg_page = page; 1029 sk->sk_sndmsg_off = 0; 1030 1031 skb_fill_page_desc(skb, i, page, 0, 0); 1032 frag = &skb_shinfo(skb)->frags[i]; 1033 } else { Jarek P. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jan 27, 2009 at 12:16:42PM +0000, Jarek Poplawski wrote: > On Tue, Jan 27, 2009 at 10:48:05PM +1100, Herbert Xu wrote: > > On Tue, Jan 27, 2009 at 10:35:11AM +0000, Jarek Poplawski wrote: > > > > > > > > Yes, but ip_append_data() (and skb_append_datato_frags() for > > > > > NETIF_F_UFO only, so currently not a problem), uses this differently, > > > > > and these pages in sk->sk_sndmsg_page could leak or be used after > > > > > kfree. (I didn't track locking in these other places). > > > > > > > > It'll be freed when the socket is freed so that should be fine. > > > > > > I don't think so: these places can overwrite sk->sk_sndmsg_page left > > > after tcp_sendmsg(), or skb_splice_bits() now, with NULL or a new > > > pointer without put_page() (they only reference copied chunks and > > > expect auto freeing). On the other hand, if tcp_sendmsg() reads after > > > them it could use a pointer after the page is freed, I guess. > > > > I wasn't referring to the first part of your sentence. That can't > > happen because they're only used for UDP sockets, this is a TCP > > socket. > > Do you mean this part from ip_append_data() isn't used for TCP?: Actually, the beginning part of ip_append_data() should be enough too. So I guess I missed your point... Jarek P. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Jarek Poplawski <jarkao2@gmail.com> Date: Tue, 27 Jan 2009 12:31:11 +0000 > On Tue, Jan 27, 2009 at 12:16:42PM +0000, Jarek Poplawski wrote: > > On Tue, Jan 27, 2009 at 10:48:05PM +1100, Herbert Xu wrote: > > > On Tue, Jan 27, 2009 at 10:35:11AM +0000, Jarek Poplawski wrote: > > > > > > > > > > Yes, but ip_append_data() (and skb_append_datato_frags() for > > > > > > NETIF_F_UFO only, so currently not a problem), uses this differently, > > > > > > and these pages in sk->sk_sndmsg_page could leak or be used after > > > > > > kfree. (I didn't track locking in these other places). > > > > > > > > > > It'll be freed when the socket is freed so that should be fine. > > > > > > > > I don't think so: these places can overwrite sk->sk_sndmsg_page left > > > > after tcp_sendmsg(), or skb_splice_bits() now, with NULL or a new > > > > pointer without put_page() (they only reference copied chunks and > > > > expect auto freeing). On the other hand, if tcp_sendmsg() reads after > > > > them it could use a pointer after the page is freed, I guess. > > > > > > I wasn't referring to the first part of your sentence. That can't > > > happen because they're only used for UDP sockets, this is a TCP > > > socket. > > > > Do you mean this part from ip_append_data() isn't used for TCP?: > > Actually, the beginning part of ip_append_data() should be enough too. > So I guess I missed your point... TCP doesn't use ip_append_data(), period. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jan 27, 2009 at 09:06:51AM -0800, David Miller wrote: > From: Jarek Poplawski <jarkao2@gmail.com> > Date: Tue, 27 Jan 2009 12:31:11 +0000 > > > On Tue, Jan 27, 2009 at 12:16:42PM +0000, Jarek Poplawski wrote: > > > On Tue, Jan 27, 2009 at 10:48:05PM +1100, Herbert Xu wrote: > > > > On Tue, Jan 27, 2009 at 10:35:11AM +0000, Jarek Poplawski wrote: > > > > > > > > > > > > Yes, but ip_append_data() (and skb_append_datato_frags() for > > > > > > > NETIF_F_UFO only, so currently not a problem), uses this differently, > > > > > > > and these pages in sk->sk_sndmsg_page could leak or be used after > > > > > > > kfree. (I didn't track locking in these other places). > > > > > > > > > > > > It'll be freed when the socket is freed so that should be fine. > > > > > > > > > > I don't think so: these places can overwrite sk->sk_sndmsg_page left > > > > > after tcp_sendmsg(), or skb_splice_bits() now, with NULL or a new > > > > > pointer without put_page() (they only reference copied chunks and > > > > > expect auto freeing). On the other hand, if tcp_sendmsg() reads after > > > > > them it could use a pointer after the page is freed, I guess. > > > > > > > > I wasn't referring to the first part of your sentence. That can't > > > > happen because they're only used for UDP sockets, this is a TCP > > > > socket. > > > > > > Do you mean this part from ip_append_data() isn't used for TCP?: > > > > Actually, the beginning part of ip_append_data() should be enough too. > > So I guess I missed your point... > > TCP doesn't use ip_append_data(), period. Hmm... I see: TCP does use ip_send_reply(), so ip_append_data() too, but with a special socket. Thanks for the explanations, Jarek P. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Herbert Xu <herbert@gondor.apana.org.au> Date: Tue, 27 Jan 2009 18:11:30 +1100 > David Miller <davem@davemloft.net> wrote: > > From: Jarek Poplawski <jarkao2@gmail.com> > > Date: Thu, 22 Jan 2009 09:04:42 +0000 > > > >> It seems this sk_sndmsg_page usage (refcounting) isn't consistent. > >> I used here tcp_sndmsg() way, but I think I'll go back to this question > >> soon. > > > > Indeed, it is something to look into, as well as locking. > > > > I'll try to find some time for this, thanks Jarek. > > After a quick look it seems to be OK to me. The code in the patch > is called from tcp_splice_read, which holds the socket lock. So as > long as the patch uses the usual TCP convention it should work. I've tossed Jarek's patch into net-next-2.6, thanks everyone. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 2e5f2ca..2e64c1b 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1333,14 +1333,39 @@ static void sock_spd_release(struct splice_pipe_desc *spd, unsigned int i) put_page(spd->pages[i]); } -static inline struct page *linear_to_page(struct page *page, unsigned int len, - unsigned int offset) -{ - struct page *p = alloc_pages(GFP_KERNEL, 0); +static inline struct page *linear_to_page(struct page *page, unsigned int *len, + unsigned int *offset, + struct sk_buff *skb) +{ + struct sock *sk = skb->sk; + struct page *p = sk->sk_sndmsg_page; + unsigned int off; + + if (!p) { +new_page: + p = sk->sk_sndmsg_page = alloc_pages(sk->sk_allocation, 0); + if (!p) + return NULL; - if (!p) - return NULL; - memcpy(page_address(p) + offset, page_address(page) + offset, len); + off = sk->sk_sndmsg_off = 0; + /* hold one ref to this page until it's full */ + } else { + unsigned int mlen; + + off = sk->sk_sndmsg_off; + mlen = PAGE_SIZE - off; + if (mlen < 64 && mlen < *len) { + put_page(p); + goto new_page; + } + + *len = min_t(unsigned int, *len, mlen); + } + + memcpy(page_address(p) + off, page_address(page) + *offset, *len); + sk->sk_sndmsg_off += *len; + *offset = off; + get_page(p); return p; } @@ -1349,21 +1374,21 @@ static inline struct page *linear_to_page(struct page *page, unsigned int len, * Fill page/offset/length into spd, if it can hold more pages. */ static inline int spd_fill_page(struct splice_pipe_desc *spd, struct page *page, - unsigned int len, unsigned int offset, + unsigned int *len, unsigned int offset, struct sk_buff *skb, int linear) { if (unlikely(spd->nr_pages == PIPE_BUFFERS)) return 1; if (linear) { - page = linear_to_page(page, len, offset); + page = linear_to_page(page, len, &offset, skb); if (!page) return 1; } else get_page(page); spd->pages[spd->nr_pages] = page; - spd->partial[spd->nr_pages].len = len; + spd->partial[spd->nr_pages].len = *len; spd->partial[spd->nr_pages].offset = offset; spd->nr_pages++; @@ -1405,7 +1430,7 @@ static inline int __splice_segment(struct page *page, unsigned int poff, /* the linear region may spread across several pages */ flen = min_t(unsigned int, flen, PAGE_SIZE - poff); - if (spd_fill_page(spd, page, flen, poff, skb, linear)) + if (spd_fill_page(spd, page, &flen, poff, skb, linear)) return 1; __segment_seek(&page, &poff, &plen, flen);