diff mbox series

tcp: detect use sendpage for slab-based objects

Message ID a8655149-80b9-c75d-6528-0b851ea85de8@virtuozzo.com
State Rejected
Delegated to: David Miller
Headers show
Series tcp: detect use sendpage for slab-based objects | expand

Commit Message

Vasily Averin Feb. 21, 2019, 3:30 p.m. UTC
There was few incidents when XFS over network block device generates
IO requests with slab-based metadata. If these requests are processed
via sendpage path tcp_sendpage() calls skb_can_coalesce() and merges
neighbour slab objects into one skb fragment.

If receiving side is located on the same host tcp_recvmsg() can trigger
following BUG_ON
usercopy: kernel memory exposure attempt detected
		from XXXXXX (kmalloc-512) (1024 bytes)

This patch helps to detect the reason of similar incidents on sending side.

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
---
 net/ipv4/tcp.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Eric Dumazet Feb. 21, 2019, 4 p.m. UTC | #1
On Thu, Feb 21, 2019 at 7:30 AM Vasily Averin <vvs@virtuozzo.com> wrote:
>
> There was few incidents when XFS over network block device generates
> IO requests with slab-based metadata. If these requests are processed
> via sendpage path tcp_sendpage() calls skb_can_coalesce() and merges
> neighbour slab objects into one skb fragment.
>
> If receiving side is located on the same host tcp_recvmsg() can trigger
> following BUG_ON
> usercopy: kernel memory exposure attempt detected
>                 from XXXXXX (kmalloc-512) (1024 bytes)
>
> This patch helps to detect the reason of similar incidents on sending side.
>
> Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
> ---
>  net/ipv4/tcp.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 2079145a3b7c..cf9572f4fc0f 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -996,6 +996,7 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
>                         goto wait_for_memory;
>
>                 if (can_coalesce) {
> +                       WARN_ON_ONCE(PageSlab(page));

Please use VM_WARN_ON_ONCE() to make this a nop for CONFIG_VM_DEBUG=n

Also the whole tcp_sendpage() should be protected, not only the coalescing part.

(The get_page()  done few lines later should not be attempted either)

>                         skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy);
>                 } else {
>                         get_page(page);
> --
> 2.17.1
>

It seems the bug has nothing to do with TCP, and belongs to the caller.

Otherwise you need to add the check to all existing .sendpage() /
.sendpage_locked() handler out there.
Vasily Averin Feb. 22, 2019, 2:02 p.m. UTC | #2
On 2/21/19 7:00 PM, Eric Dumazet wrote:
> On Thu, Feb 21, 2019 at 7:30 AM Vasily Averin <vvs@virtuozzo.com> wrote:
>> index 2079145a3b7c..cf9572f4fc0f 100644
>> --- a/net/ipv4/tcp.c
>> +++ b/net/ipv4/tcp.c
>> @@ -996,6 +996,7 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
>>                         goto wait_for_memory;
>>
>>                 if (can_coalesce) {
>> +                       WARN_ON_ONCE(PageSlab(page));
> 
> Please use VM_WARN_ON_ONCE() to make this a nop for CONFIG_VM_DEBUG=n
> 
> Also the whole tcp_sendpage() should be protected, not only the coalescing part.
> 
> (The get_page()  done few lines later should not be attempted either)
> 
>>                         skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy);
>>                 } else {
>>                         get_page(page);
>> --
> 
> It seems the bug has nothing to do with TCP, and belongs to the caller.
> 
> Otherwise you need to add the check to all existing .sendpage() /
> .sendpage_locked() handler out there.
 
Eric, could you please elaborate once again why tcp_sendpage() should not handle slab objects?

There is known restriction: sendpage should not be called for pages with counter=0,
because internal put_page() releases the page. All sendpage callers I know have such check.

However why they should add one check for PageSlab?

Let me explain the problem once again:
I do not see any bugs neither in tcp nor in any sendpage callers,
there is false alert on receiving side that crashes correctly worked host.

There is network block device with XFS, 
XFS submit IO request with slab objects, 
block device driver checks that page count is positive and decides to use sendpage.
sendpage calls tcp_sendpage() that can merge 2 neighbour slab objects into one tcp fragment.

If data is transferred outside -- nothing bad happen, network device successfully send data outside.
However if data is received locally tcp_recvmsg detects strange vector with "merged" slab objects.
It is not real problem, data can be accessed correctly, however this check calls BUG_ON and crashes the host.

By this way recently added hardening check forces all .sendpage callers modify code that worked correctly for ages.

It looks abnormal to me, but I do not understand how to fix this problem correctly.

I do not like an idea to keep current state -- it can trigger crash of correctly worked hosts in some rare corner cases.
I do not like an idea to fix all callers -- why they need modify correctly worked code to protect from false positive?
I do not like an idea to modify tcp -- to block merge of fragments with slab objects like I proposed earlier.
We can trigger warning in tcp code -- to inform .sendpage callers that they are under fire,
however I agree with yours "bug has nothing to do with TCP" and do not understand why we need to modify tcp_sendpage().

May be it's better to replace BUG_ON to WARN_ON in hardening check?
Could you probably advise some other solution?

Thank you,
	Vasily Averin
Eric Dumazet Feb. 22, 2019, 4:39 p.m. UTC | #3
On Fri, Feb 22, 2019 at 6:02 AM Vasily Averin <vvs@virtuozzo.com> wrote:
>
> On 2/21/19 7:00 PM, Eric Dumazet wrote:
> > On Thu, Feb 21, 2019 at 7:30 AM Vasily Averin <vvs@virtuozzo.com> wrote:
> >> index 2079145a3b7c..cf9572f4fc0f 100644
> >> --- a/net/ipv4/tcp.c
> >> +++ b/net/ipv4/tcp.c
> >> @@ -996,6 +996,7 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
> >>                         goto wait_for_memory;
> >>
> >>                 if (can_coalesce) {
> >> +                       WARN_ON_ONCE(PageSlab(page));
> >
> > Please use VM_WARN_ON_ONCE() to make this a nop for CONFIG_VM_DEBUG=n
> >
> > Also the whole tcp_sendpage() should be protected, not only the coalescing part.
> >
> > (The get_page()  done few lines later should not be attempted either)
> >
> >>                         skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy);
> >>                 } else {
> >>                         get_page(page);
> >> --
> >
> > It seems the bug has nothing to do with TCP, and belongs to the caller.
> >
> > Otherwise you need to add the check to all existing .sendpage() /
> > .sendpage_locked() handler out there.
>
> Eric, could you please elaborate once again why tcp_sendpage() should not handle slab objects?

Simply because SLAB has its own way to manage objects from a page, and
does not care
about the underlying page having its refcount elevated.

ptr = kmalloc(xx)
...  < here you can attempt cheating and add one to the underlying page>
kfree(ptr); // SLAB does not care of page count, it will effectively
put ptr in the free list.

ptr2 = kmalloc(xx); //

ptr2 can be the same than ptr (object was kfreed() earlier)

This means that some other stuff will happily reuse the piece of
memory that you wanted to use for zero-copy.

This is a serious bug IMO, since this would allow for data corruption.



>
> There is known restriction: sendpage should not be called for pages with counter=0,
> because internal put_page() releases the page. All sendpage callers I know have such check.
>
> However why they should add one check for PageSlab?
>
> Let me explain the problem once again:
> I do not see any bugs neither in tcp nor in any sendpage callers,
> there is false alert on receiving side that crashes correctly worked host.

This is not a false alert, but a very fundamental issue.

We can not mix kmalloc() and page fragments, this simply is buggy.

>
> There is network block device with XFS,
> XFS submit IO request with slab objects,
> block device driver checks that page count is positive and decides to use sendpage.
> sendpage calls tcp_sendpage() that can merge 2 neighbour slab objects into one tcp fragment.
>
> If data is transferred outside -- nothing bad happen, network device successfully send data outside.
> However if data is received locally tcp_recvmsg detects strange vector with "merged" slab objects.
> It is not real problem, data can be accessed correctly, however this check calls BUG_ON and crashes the host.
>
> By this way recently added hardening check forces all .sendpage callers modify code that worked correctly for ages.
>
> It looks abnormal to me, but I do not understand how to fix this problem correctly.
>
> I do not like an idea to keep current state -- it can trigger crash of correctly worked hosts in some rare corner cases.
> I do not like an idea to fix all callers -- why they need modify correctly worked code to protect from false positive?
> I do not like an idea to modify tcp -- to block merge of fragments with slab objects like I proposed earlier.
> We can trigger warning in tcp code -- to inform .sendpage callers that they are under fire,
> however I agree with yours "bug has nothing to do with TCP" and do not understand why we need to modify tcp_sendpage().
>
> May be it's better to replace BUG_ON to WARN_ON in hardening check?
> Could you probably advise some other solution?
>
> Thank you,
>         Vasily Averin
Vasily Averin Feb. 25, 2019, 9:15 a.m. UTC | #4
On 2/22/19 7:39 PM, Eric Dumazet wrote:
> On Fri, Feb 22, 2019 at 6:02 AM Vasily Averin <vvs@virtuozzo.com> wrote:

>> Eric, could you please elaborate once again why tcp_sendpage() should not handle slab objects?
> 
> Simply because SLAB has its own way to manage objects from a page, and
> does not care
> about the underlying page having its refcount elevated.
> 
> ptr = kmalloc(xx)
> ...  < here you can attempt cheating and add one to the underlying page>
> kfree(ptr); // SLAB does not care of page count, it will effectively
> put ptr in the free list.
> 
> ptr2 = kmalloc(xx); //
> 
> ptr2 can be the same than ptr (object was kfreed() earlier)
> 
> This means that some other stuff will happily reuse the piece of
> memory that you wanted to use for zero-copy.
> 
> This is a serious bug IMO, since this would allow for data corruption.

Thank you for explanation, however I still have some doubts.

Yes, it's strange to use sendpage if we want to send some small 8-bytes-long slab based object,
it's better to use sendmsg instead.

Yes, using of sendpage for slab-based objects can require special attention 
to guarantee that slab object will not be freed until end of IO.
However IMO this should be guaranteed if caller uses sendmsg instead of sendpage.
Btw, as far as I understand in my example XFS did it correctly, submitted slab objects was kept in use
and seems they should be freed after end of IO, via end_io callback.
At least I did not found any bugs in sendpage callers.

And most important, it seems for me  switch from sendpage  to sendmsg doe not resolve the problem completely: 
tcp_sendmsg_locked() under some conditions can merge neighbours slab-based tcp fragments,
so local tcp_recvmsg() can trigger BUG_ON in this case too.

Am I missed something probably? 

>> There is known restriction: sendpage should not be called for pages with counter=0,
>> because internal put_page() releases the page. All sendpage callers I know have such check.
>>
>> However why they should add one check for PageSlab?
>>
>> Let me explain the problem once again:
>> I do not see any bugs neither in tcp nor in any sendpage callers,
>> there is false alert on receiving side that crashes correctly worked host.
> 
> This is not a false alert, but a very fundamental issue.
> 
> We can not mix kmalloc() and page fragments, this simply is buggy.
Vasily Averin Feb. 25, 2019, 9:32 a.m. UTC | #5
On 2/25/19 12:15 PM, Vasily Averin wrote:
> On 2/22/19 7:39 PM, Eric Dumazet wrote:
>> On Fri, Feb 22, 2019 at 6:02 AM Vasily Averin <vvs@virtuozzo.com> wrote:
> 
>>> Eric, could you please elaborate once again why tcp_sendpage() should not handle slab objects?
>>
>> Simply because SLAB has its own way to manage objects from a page, and
>> does not care
>> about the underlying page having its refcount elevated.
>>
>> ptr = kmalloc(xx)
>> ...  < here you can attempt cheating and add one to the underlying page>
>> kfree(ptr); // SLAB does not care of page count, it will effectively
>> put ptr in the free list.
>>
>> ptr2 = kmalloc(xx); //
>>
>> ptr2 can be the same than ptr (object was kfreed() earlier)
>>
>> This means that some other stuff will happily reuse the piece of
>> memory that you wanted to use for zero-copy.
>>
>> This is a serious bug IMO, since this would allow for data corruption.
> 
> Thank you for explanation, however I still have some doubts.
> 
> Yes, it's strange to use sendpage if we want to send some small 8-bytes-long slab based object,
> it's better to use sendmsg instead.
> 
> Yes, using of sendpage for slab-based objects can require special attention 
> to guarantee that slab object will not be freed until end of IO.
> However IMO this should be guaranteed if caller uses sendmsg instead of sendpage.
> Btw, as far as I understand in my example XFS did it correctly, submitted slab objects was kept in use
> and seems they should be freed after end of IO, via end_io callback.
> At least I did not found any bugs in sendpage callers.
> 
> And most important, it seems for me  switch from sendpage  to sendmsg doe not resolve the problem completely: 
> tcp_sendmsg_locked() under some conditions can merge neighbours slab-based tcp fragments,
> so local tcp_recvmsg() can trigger BUG_ON in this case too.
> 
> Am I missed something probably? 

Seems I missed that skb_copy_to_page_nocache() in tcp_sendpage_locked copies data from original slab object,
and merges fragments with copied data.
Vasily Averin March 4, 2019, 12:58 p.m. UTC | #6
On 2/21/19 7:00 PM, Eric Dumazet wrote:
> On Thu, Feb 21, 2019 at 7:30 AM Vasily Averin <vvs@virtuozzo.com> wrote:
>>
>> There was few incidents when XFS over network block device generates
>> IO requests with slab-based metadata. If these requests are processed
>> via sendpage path tcp_sendpage() calls skb_can_coalesce() and merges
>> neighbour slab objects into one skb fragment.
>>
>> If receiving side is located on the same host tcp_recvmsg() can trigger
>> following BUG_ON
>> usercopy: kernel memory exposure attempt detected
>>                 from XXXXXX (kmalloc-512) (1024 bytes)
>>
>> This patch helps to detect the reason of similar incidents on sending side.
>>
>> Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
>> ---
>>  net/ipv4/tcp.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>> index 2079145a3b7c..cf9572f4fc0f 100644
>> --- a/net/ipv4/tcp.c
>> +++ b/net/ipv4/tcp.c
>> @@ -996,6 +996,7 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
>>                         goto wait_for_memory;
>>
>>                 if (can_coalesce) {
>> +                       WARN_ON_ONCE(PageSlab(page));
> 
> Please use VM_WARN_ON_ONCE() to make this a nop for CONFIG_VM_DEBUG=n
> Also the whole tcp_sendpage() should be protected, not only the coalescing part.
> (The get_page()  done few lines later should not be attempted either)

Eric, what do you think about following patch?
I validate its backported version on RHEL7 based OpenVZ kernel before sending to mainline. 

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index cf3c5095c10e..7be7b6abe8b5 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -943,6 +943,11 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
 	ssize_t copied;
 	long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
 
+	if (PageSlab(page)) {
+		VM_WARN_ONCE(true, "sendpage should not handle Slab objects,"
+				   " please fix callers\n");
+		return sock_no_sendpage_locked(sk, page, offset, size, flags);
+	}
 	/* Wait for a connection to finish. One exception is TCP Fast Open
 	 * (passive side) where data is allowed to be sent before a connection
 	 * is fully established.
Eric Dumazet March 4, 2019, 3:51 p.m. UTC | #7
On 03/04/2019 04:58 AM, Vasily Averin wrote:
> On 2/21/19 7:00 PM, Eric Dumazet wrote:
>> On Thu, Feb 21, 2019 at 7:30 AM Vasily Averin <vvs@virtuozzo.com> wrote:
>>>
>>> There was few incidents when XFS over network block device generates
>>> IO requests with slab-based metadata. If these requests are processed
>>> via sendpage path tcp_sendpage() calls skb_can_coalesce() and merges
>>> neighbour slab objects into one skb fragment.
>>>
>>> If receiving side is located on the same host tcp_recvmsg() can trigger
>>> following BUG_ON
>>> usercopy: kernel memory exposure attempt detected
>>>                 from XXXXXX (kmalloc-512) (1024 bytes)
>>>
>>> This patch helps to detect the reason of similar incidents on sending side.
>>>
>>> Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
>>> ---
>>>  net/ipv4/tcp.c | 1 +
>>>  1 file changed, 1 insertion(+)
>>>
>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>>> index 2079145a3b7c..cf9572f4fc0f 100644
>>> --- a/net/ipv4/tcp.c
>>> +++ b/net/ipv4/tcp.c
>>> @@ -996,6 +996,7 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
>>>                         goto wait_for_memory;
>>>
>>>                 if (can_coalesce) {
>>> +                       WARN_ON_ONCE(PageSlab(page));
>>
>> Please use VM_WARN_ON_ONCE() to make this a nop for CONFIG_VM_DEBUG=n
>> Also the whole tcp_sendpage() should be protected, not only the coalescing part.
>> (The get_page()  done few lines later should not be attempted either)
> 
> Eric, what do you think about following patch?
> I validate its backported version on RHEL7 based OpenVZ kernel before sending to mainline. 
> 
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index cf3c5095c10e..7be7b6abe8b5 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -943,6 +943,11 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
>  	ssize_t copied;
>  	long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
>  
> +	if (PageSlab(page)) {
> +		VM_WARN_ONCE(true, "sendpage should not handle Slab objects,"
> +				   " please fix callers\n");
> +		return sock_no_sendpage_locked(sk, page, offset, size, flags);
> +	}
>  	/* Wait for a connection to finish. One exception is TCP Fast Open
>  	 * (passive side) where data is allowed to be sent before a connection
>  	 * is fully established.
> 

There are at least four problems with this approach :

1) VM_WARN_ONCE() might be a NOP, and if not, it is simply some lines in syslog,
among thousands.

2) Falling back will give no incentive for callers to fix their code.

3) slowing down TCP, just because of some weird kernel-users.
   I agree to add sanity check for everything user space can think of (aka syzbot),
   but kernel users need to be fixed, without adding code in TCP.

4) sendpage() API is providing one page at a time.
   We therefore call very expensive lock_sock() and release_sock() for every page.
   sendfile() is sub optimal (compared to sendmsg(MSG_ZEROCOPY))
   There is an effort to provide batches of pages per round.
   Your patch would cancel this effort, or make it very complicated.
Vasily Averin March 5, 2019, 2:24 p.m. UTC | #8
On 3/4/19 6:51 PM, Eric Dumazet wrote:
> On 03/04/2019 04:58 AM, Vasily Averin wrote:
>> Eric, what do you think about following patch?
>> I validate its backported version on RHEL7 based OpenVZ kernel before sending to mainline. 
>>
>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>> index cf3c5095c10e..7be7b6abe8b5 100644
>> --- a/net/ipv4/tcp.c
>> +++ b/net/ipv4/tcp.c
>> @@ -943,6 +943,11 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
>>  	ssize_t copied;
>>  	long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
>>  
>> +	if (PageSlab(page)) {
>> +		VM_WARN_ONCE(true, "sendpage should not handle Slab objects,"
>> +				   " please fix callers\n");
>> +		return sock_no_sendpage_locked(sk, page, offset, size, flags);
>> +	}
>>  	/* Wait for a connection to finish. One exception is TCP Fast Open
>>  	 * (passive side) where data is allowed to be sent before a connection
>>  	 * is fully established.
>>
> 
> There are at least four problems with this approach :
> 
> 1) VM_WARN_ONCE() might be a NOP, and if not, it is simply some lines in syslog,
> among thousands.
> 
> 2) Falling back will give no incentive for callers to fix their code.

We can return error instead of fallback,
but yes, it means an extra (almost unneeded) check in TCP code. 
 
> 3) slowing down TCP, just because of some weird kernel-users.
>    I agree to add sanity check for everything user space can think of (aka syzbot),
>    but kernel users need to be fixed, without adding code in TCP.

Do you advise to add PageSlab check into all .sendpage / .sendpacge_locked / 
tcp_sendpage / do_tcp_sednpages callers instead?

> 4) sendpage() API is providing one page at a time.
>    We therefore call very expensive lock_sock() and release_sock() for every page.
>    sendfile() is sub optimal (compared to sendmsg(MSG_ZEROCOPY))
>    There is an effort to provide batches of pages per round.
>    Your patch would cancel this effort, or make it very complicated.
Eric Dumazet March 5, 2019, 3:11 p.m. UTC | #9
Resent in plain text mode for the lists.

On Tue, Mar 5, 2019 at 7:08 AM Eric Dumazet <edumazet@google.com> wrote:
>
>
>
> On Tue, Mar 5, 2019 at 6:24 AM Vasily Averin <vvs@virtuozzo.com> wrote:
>>
>> On 3/4/19 6:51 PM, Eric Dumazet wrote:
>> > On 03/04/2019 04:58 AM, Vasily Averin wrote:
>> >> Eric, what do you think about following patch?
>> >> I validate its backported version on RHEL7 based OpenVZ kernel before sending to mainline.
>> >>
>> >> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>> >> index cf3c5095c10e..7be7b6abe8b5 100644
>> >> --- a/net/ipv4/tcp.c
>> >> +++ b/net/ipv4/tcp.c
>> >> @@ -943,6 +943,11 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
>> >>      ssize_t copied;
>> >>      long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
>> >>
>> >> +    if (PageSlab(page)) {
>> >> +            VM_WARN_ONCE(true, "sendpage should not handle Slab objects,"
>> >> +                               " please fix callers\n");
>> >> +            return sock_no_sendpage_locked(sk, page, offset, size, flags);
>> >> +    }
>> >>      /* Wait for a connection to finish. One exception is TCP Fast Open
>> >>       * (passive side) where data is allowed to be sent before a connection
>> >>       * is fully established.
>> >>
>> >
>> > There are at least four problems with this approach :
>> >
>> > 1) VM_WARN_ONCE() might be a NOP, and if not, it is simply some lines in syslog,
>> > among thousands.
>> >
>> > 2) Falling back will give no incentive for callers to fix their code.
>>
>> We can return error instead of fallback,
>> but yes, it means an extra (almost unneeded) check in TCP code.
>>
>> > 3) slowing down TCP, just because of some weird kernel-users.
>> >    I agree to add sanity check for everything user space can think of (aka syzbot),
>> >    but kernel users need to be fixed, without adding code in TCP.
>>
>> Do you advise to add PageSlab check into all .sendpage / .sendpacge_locked /
>> tcp_sendpage / do_tcp_sednpages callers instead?
>>
>
>
> My original suggestion was to use VM_WARN_ONCE() so that the debug checks would
> be compiled out by the compiler, unless you compile a debug kernel.
>
> Something like :
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index ad07dd71063da09843ccfbd3e00d3f41567e1205..889563a66dde2a41c198d92a80183cf5382f632d 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -943,6 +943,9 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
>         ssize_t copied;
>         long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
>
> +       if (VM_WARN_ONCE(PageSlab(page)), "page must not be a Slab one")
> +               return -EINVAL;
> +
>         /* Wait for a connection to finish. One exception is TCP Fast Open
>          * (passive side) where data is allowed to be sent before a connection
>          * is fully established.
>
>
> For some reason you added a fallback :/
>
Eric Dumazet March 5, 2019, 4:44 p.m. UTC | #10
On Tue, Mar 5, 2019 at 7:11 AM Eric Dumazet <edumazet@google.com> wrote:
>

> >
> > My original suggestion was to use VM_WARN_ONCE() so that the debug checks would
> > be compiled out by the compiler, unless you compile a debug kernel.
> >
> > Something like :
> >
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index ad07dd71063da09843ccfbd3e00d3f41567e1205..889563a66dde2a41c198d92a80183cf5382f632d 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -943,6 +943,9 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
> >         ssize_t copied;
> >         long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
> >
> > +       if (VM_WARN_ONCE(PageSlab(page)), "page must not be a Slab one")
> > +               return -EINVAL;
> > +

Well, VM_WARN_ONCE() returns 1 if !CONFIG_DEBUG_VM,
so it is not behaving like WARN_ONCE(), which is unfortunate.

But you get the idea ;)


> >         /* Wait for a connection to finish. One exception is TCP Fast Open
> >          * (passive side) where data is allowed to be sent before a connection
> >          * is fully established.
> >
> >
> > For some reason you added a fallback :/
> >
Vasily Averin March 5, 2019, 6:35 p.m. UTC | #11
On 3/5/19 7:44 PM, Eric Dumazet wrote:
> On Tue, Mar 5, 2019 at 7:11 AM Eric Dumazet <edumazet@google.com> wrote:
>>> My original suggestion was to use VM_WARN_ONCE() so that the debug checks would
>>> be compiled out by the compiler, unless you compile a debug kernel.
>>>
>>> Something like :
>>>
>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>>> index ad07dd71063da09843ccfbd3e00d3f41567e1205..889563a66dde2a41c198d92a80183cf5382f632d 100644
>>> --- a/net/ipv4/tcp.c
>>> +++ b/net/ipv4/tcp.c
>>> @@ -943,6 +943,9 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
>>>         ssize_t copied;
>>>         long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
>>>
>>> +       if (VM_WARN_ONCE(PageSlab(page)), "page must not be a Slab one")
>>> +               return -EINVAL;
>>> +
> 
> Well, VM_WARN_ONCE() returns 1 if !CONFIG_DEBUG_VM,
> so it is not behaving like WARN_ONCE(), which is unfortunate.
> 
> But you get the idea ;)

Thanks, I did not know this trick. 
Seems it can be replaced by following construction, I'll check it tomorrow morning.

#ifdef CONFIG_DEBUG_VM
	if (WARN_ONCE(PageSlab(page), "page must not be a Slab one")
		return -EINVAL;
#endif

I expect it should affect debug kernels only, 
fail all incorrect requests, taint the kernel and generate warning only once.

>>>          * (passive side) where data is allowed to be sent before a connection
>>>          * is fully established.
>>>
>>>
>>> For some reason you added a fallback :/
>>>
>
diff mbox series

Patch

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 2079145a3b7c..cf9572f4fc0f 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -996,6 +996,7 @@  ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
 			goto wait_for_memory;
 
 		if (can_coalesce) {
+			WARN_ON_ONCE(PageSlab(page));
 			skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy);
 		} else {
 			get_page(page);