diff mbox

[net-next,01/12] tipc: change socket buffer overflow control to respect sk_rcvbuf

Message ID 1369942577-39563-2-git-send-email-paul.gortmaker@windriver.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Paul Gortmaker May 30, 2013, 7:36 p.m. UTC
From: Jon Maloy <jon.maloy@ericsson.com>

As per feedback from the netdev community, we change the buffer
overflow protection algorithm in receiving sockets so that it
always respects the nominal upper limit set in sk_rcvbuf.

Instead of scaling up from a small sk_rcvbuf value, which leads to
violation of the configured sk_rcvbuf limit, we now calculate the
weighted per-message limit by scaling down from a much bigger value,
still in the same field, according to the importance priority of the
received message.

Cc: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/tipc/socket.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

Comments

Neil Horman May 31, 2013, 1:36 p.m. UTC | #1
On Thu, May 30, 2013 at 03:36:06PM -0400, Paul Gortmaker wrote:
> From: Jon Maloy <jon.maloy@ericsson.com>
> 
> As per feedback from the netdev community, we change the buffer
> overflow protection algorithm in receiving sockets so that it
> always respects the nominal upper limit set in sk_rcvbuf.
> 
> Instead of scaling up from a small sk_rcvbuf value, which leads to
> violation of the configured sk_rcvbuf limit, we now calculate the
> weighted per-message limit by scaling down from a much bigger value,
> still in the same field, according to the importance priority of the
> received message.
> 
> Cc: Neil Horman <nhorman@tuxdriver.com>
> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> ---
>  net/tipc/socket.c | 13 +++++++------
>  1 file changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/net/tipc/socket.c b/net/tipc/socket.c
> index 515ce38..2dfabc7 100644
> --- a/net/tipc/socket.c
> +++ b/net/tipc/socket.c
> @@ -1,7 +1,7 @@
>  /*
>   * net/tipc/socket.c: TIPC socket API
>   *
> - * Copyright (c) 2001-2007, 2012 Ericsson AB
> + * Copyright (c) 2001-2007, 2012-2013, Ericsson AB
>   * Copyright (c) 2004-2008, 2010-2012, Wind River Systems
>   * All rights reserved.
>   *
> @@ -203,6 +203,7 @@ static int tipc_create(struct net *net, struct socket *sock, int protocol,
>  
>  	sock_init_data(sock, sk);
>  	sk->sk_backlog_rcv = backlog_rcv;
> +	sk->sk_rcvbuf = CONN_OVERLOAD_LIMIT;
The last time Jon and I discussed this, I thought the consensus was to export
sk_rcvbuf via its own sysctl, or tie it to sysctl_rmem (while requiring a
protocol specific minimum on top of that), so administrators on memory
constrained systems didn't wonder why their sysctl changes weren't being
honored.

>  	sk->sk_data_ready = tipc_data_ready;
>  	sk->sk_write_space = tipc_write_space;
>  	tipc_sk(sk)->p = tp_ptr;
> @@ -1233,10 +1234,10 @@ static u32 filter_connect(struct tipc_sock *tsock, struct sk_buff **buf)
>   * For all connectionless messages, by default new queue limits are
>   * as belows:
>   *
> - * TIPC_LOW_IMPORTANCE       (5MB)
> - * TIPC_MEDIUM_IMPORTANCE    (10MB)
> - * TIPC_HIGH_IMPORTANCE      (20MB)
> - * TIPC_CRITICAL_IMPORTANCE  (40MB)
> + * TIPC_LOW_IMPORTANCE       (4 MB)
> + * TIPC_MEDIUM_IMPORTANCE    (8 MB)
> + * TIPC_HIGH_IMPORTANCE      (16 MB)
> + * TIPC_CRITICAL_IMPORTANCE  (32 MB)
>   *
>   * Returns overload limit according to corresponding message importance
>   */
> @@ -1248,7 +1249,7 @@ static unsigned int rcvbuf_limit(struct sock *sk, struct sk_buff *buf)
>  	if (msg_connected(msg))
>  		limit = CONN_OVERLOAD_LIMIT;
>  	else
> -		limit = sk->sk_rcvbuf << (msg_importance(msg) + 5);
> +		limit = sk->sk_rcvbuf >> 4 << msg_importance(msg);
I still don't like this.  I would much prefer that the minimum sk_rcvbuf value
were defaulted to a value such that:
sk->sk_rcvbuf >> 4 << msg_importance(TIPC_CRITICAL_IMPORTANCE) = sk->sk_rcvbuf
i.e. that the minimum sk_rcvbuf size allowed was equal to the size needed to
hold the maximum number of critical messages TIPC required, and have less
important messages be a fraction of that.  that, in conjunction with the above
default setting would allow for administrative tunability, while still giving
you the receive space you need I think.

This is much better than what you have there currently though.

Regards
Neil
>  	return limit;
>  }
>  
> -- 
> 1.8.1.2
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ying Xue June 3, 2013, 9:55 a.m. UTC | #2
On 05/31/2013 09:36 PM, Neil Horman wrote:
> On Thu, May 30, 2013 at 03:36:06PM -0400, Paul Gortmaker wrote:
>> From: Jon Maloy <jon.maloy@ericsson.com>
>>
>> As per feedback from the netdev community, we change the buffer
>> overflow protection algorithm in receiving sockets so that it
>> always respects the nominal upper limit set in sk_rcvbuf.
>>
>> Instead of scaling up from a small sk_rcvbuf value, which leads to
>> violation of the configured sk_rcvbuf limit, we now calculate the
>> weighted per-message limit by scaling down from a much bigger value,
>> still in the same field, according to the importance priority of the
>> received message.
>>
>> Cc: Neil Horman <nhorman@tuxdriver.com>
>> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
>> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
>> ---
>>  net/tipc/socket.c | 13 +++++++------
>>  1 file changed, 7 insertions(+), 6 deletions(-)
>>
>> diff --git a/net/tipc/socket.c b/net/tipc/socket.c
>> index 515ce38..2dfabc7 100644
>> --- a/net/tipc/socket.c
>> +++ b/net/tipc/socket.c
>> @@ -1,7 +1,7 @@
>>  /*
>>   * net/tipc/socket.c: TIPC socket API
>>   *
>> - * Copyright (c) 2001-2007, 2012 Ericsson AB
>> + * Copyright (c) 2001-2007, 2012-2013, Ericsson AB
>>   * Copyright (c) 2004-2008, 2010-2012, Wind River Systems
>>   * All rights reserved.
>>   *
>> @@ -203,6 +203,7 @@ static int tipc_create(struct net *net, struct socket *sock, int protocol,
>>  
>>  	sock_init_data(sock, sk);
>>  	sk->sk_backlog_rcv = backlog_rcv;
>> +	sk->sk_rcvbuf = CONN_OVERLOAD_LIMIT;
> The last time Jon and I discussed this, I thought the consensus was to export
> sk_rcvbuf via its own sysctl, or tie it to sysctl_rmem (while requiring a
> protocol specific minimum on top of that), so administrators on memory
> constrained systems didn't wonder why their sysctl changes weren't being
> honored.

Yes, your suggestion is reasonable, and I prefer to involve
net.tipc.sysctl_rmem. But I have one question about it:

As you suggested as belows, the default value of sk->sk_rcvbuf is set to
sk->sk_rcvbuf >> 4 << msg_importance(TIPC_CRITICAL_IMPORTANCE), that is,
sk->sk_rcvbuf is about 32MB.

However, please see below code:

int sock_setsockopt()
{
...
	        case SO_RCVBUF:
                /* Don't error on this BSD doesn't and if you think
                 * about it this is right. Otherwise apps have to
                 * play 'guess the biggest size' games. RCVBUF/SNDBUF
                 * are treated in BSD as hints
                 */
                val = min_t(u32, val, sysctl_rmem_max);
set_rcvbuf:
                sk->sk_userlocks |= SOCK_RCVBUF_LOCK;
                /*
                 * We double it on the way in to account for
                 * "struct sk_buff" etc. overhead.   Applications
                 * assume that the SO_RCVBUF setting they make will
                 * allow that much actual data to be received on that
                 * socket.
                 *
                 * Applications are unaware that "struct sk_buff" and
                 * other overheads allocate from the receive buffer
                 * during socket buffer allocation.
                 *
                 * And after considering the possible alternatives,
                 * returning the value we actually used in getsockopt
                 * is the most desirable behavior.
                 */
                sk->sk_rcvbuf = max_t(u32, val * 2, SOCK_MIN_RCVBUF);
                break;
...
}

From above logic of setting sk->sk_rcvbuf with SO_RCVBUF, it only
permits the maximum value of sk->sk_rcvbuf to sysctl_rmem_max * 2(ie,
about 400KB normally).

So, even if the default value of sk->sk_rcvbuf is set to 32MB with
net.tipc.sysctl_rmem, a bit smaller value than the default value can
never be set to sk->sk_rcvbuf successfully with SO_RCVBUF option.

How can we avoid the limit?

Regards,
Ying

> 
>>  	sk->sk_data_ready = tipc_data_ready;
>>  	sk->sk_write_space = tipc_write_space;
>>  	tipc_sk(sk)->p = tp_ptr;
>> @@ -1233,10 +1234,10 @@ static u32 filter_connect(struct tipc_sock *tsock, struct sk_buff **buf)
>>   * For all connectionless messages, by default new queue limits are
>>   * as belows:
>>   *
>> - * TIPC_LOW_IMPORTANCE       (5MB)
>> - * TIPC_MEDIUM_IMPORTANCE    (10MB)
>> - * TIPC_HIGH_IMPORTANCE      (20MB)
>> - * TIPC_CRITICAL_IMPORTANCE  (40MB)
>> + * TIPC_LOW_IMPORTANCE       (4 MB)
>> + * TIPC_MEDIUM_IMPORTANCE    (8 MB)
>> + * TIPC_HIGH_IMPORTANCE      (16 MB)
>> + * TIPC_CRITICAL_IMPORTANCE  (32 MB)
>>   *
>>   * Returns overload limit according to corresponding message importance
>>   */
>> @@ -1248,7 +1249,7 @@ static unsigned int rcvbuf_limit(struct sock *sk, struct sk_buff *buf)
>>  	if (msg_connected(msg))
>>  		limit = CONN_OVERLOAD_LIMIT;
>>  	else
>> -		limit = sk->sk_rcvbuf << (msg_importance(msg) + 5);
>> +		limit = sk->sk_rcvbuf >> 4 << msg_importance(msg);
> I still don't like this.  I would much prefer that the minimum sk_rcvbuf value
> were defaulted to a value such that:
> sk->sk_rcvbuf >> 4 << msg_importance(TIPC_CRITICAL_IMPORTANCE) = sk->sk_rcvbuf
> i.e. that the minimum sk_rcvbuf size allowed was equal to the size needed to
> hold the maximum number of critical messages TIPC required, and have less
> important messages be a fraction of that.  that, in conjunction with the above
> default setting would allow for administrative tunability, while still giving
> you the receive space you need I think.
> 
> This is much better than what you have there currently though.
> 
> Regards
> Neil
>>  	return limit;
>>  }
>>  
>> -- 
>> 1.8.1.2
>>
>>
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman June 3, 2013, 1:16 p.m. UTC | #3
On Mon, Jun 03, 2013 at 05:55:06PM +0800, Ying Xue wrote:
> On 05/31/2013 09:36 PM, Neil Horman wrote:
> > On Thu, May 30, 2013 at 03:36:06PM -0400, Paul Gortmaker wrote:
> >> From: Jon Maloy <jon.maloy@ericsson.com>
> >>
> >> As per feedback from the netdev community, we change the buffer
> >> overflow protection algorithm in receiving sockets so that it
> >> always respects the nominal upper limit set in sk_rcvbuf.
> >>
> >> Instead of scaling up from a small sk_rcvbuf value, which leads to
> >> violation of the configured sk_rcvbuf limit, we now calculate the
> >> weighted per-message limit by scaling down from a much bigger value,
> >> still in the same field, according to the importance priority of the
> >> received message.
> >>
> >> Cc: Neil Horman <nhorman@tuxdriver.com>
> >> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
> >> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> >> ---
> >>  net/tipc/socket.c | 13 +++++++------
> >>  1 file changed, 7 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/net/tipc/socket.c b/net/tipc/socket.c
> >> index 515ce38..2dfabc7 100644
> >> --- a/net/tipc/socket.c
> >> +++ b/net/tipc/socket.c
> >> @@ -1,7 +1,7 @@
> >>  /*
> >>   * net/tipc/socket.c: TIPC socket API
> >>   *
> >> - * Copyright (c) 2001-2007, 2012 Ericsson AB
> >> + * Copyright (c) 2001-2007, 2012-2013, Ericsson AB
> >>   * Copyright (c) 2004-2008, 2010-2012, Wind River Systems
> >>   * All rights reserved.
> >>   *
> >> @@ -203,6 +203,7 @@ static int tipc_create(struct net *net, struct socket *sock, int protocol,
> >>  
> >>  	sock_init_data(sock, sk);
> >>  	sk->sk_backlog_rcv = backlog_rcv;
> >> +	sk->sk_rcvbuf = CONN_OVERLOAD_LIMIT;
> > The last time Jon and I discussed this, I thought the consensus was to export
> > sk_rcvbuf via its own sysctl, or tie it to sysctl_rmem (while requiring a
> > protocol specific minimum on top of that), so administrators on memory
> > constrained systems didn't wonder why their sysctl changes weren't being
> > honored.
> 
> Yes, your suggestion is reasonable, and I prefer to involve
> net.tipc.sysctl_rmem. But I have one question about it:
> 
> As you suggested as belows, the default value of sk->sk_rcvbuf is set to
> sk->sk_rcvbuf >> 4 << msg_importance(TIPC_CRITICAL_IMPORTANCE), that is,
> sk->sk_rcvbuf is about 32MB.
> 
> However, please see below code:
> 
> int sock_setsockopt()
> {
> ...
> 	        case SO_RCVBUF:
>                 /* Don't error on this BSD doesn't and if you think
>                  * about it this is right. Otherwise apps have to
>                  * play 'guess the biggest size' games. RCVBUF/SNDBUF
>                  * are treated in BSD as hints
>                  */
>                 val = min_t(u32, val, sysctl_rmem_max);
> set_rcvbuf:
>                 sk->sk_userlocks |= SOCK_RCVBUF_LOCK;
>                 /*
>                  * We double it on the way in to account for
>                  * "struct sk_buff" etc. overhead.   Applications
>                  * assume that the SO_RCVBUF setting they make will
>                  * allow that much actual data to be received on that
>                  * socket.
>                  *
>                  * Applications are unaware that "struct sk_buff" and
>                  * other overheads allocate from the receive buffer
>                  * during socket buffer allocation.
>                  *
>                  * And after considering the possible alternatives,
>                  * returning the value we actually used in getsockopt
>                  * is the most desirable behavior.
>                  */
>                 sk->sk_rcvbuf = max_t(u32, val * 2, SOCK_MIN_RCVBUF);
>                 break;
> ...
> }
> 
> From above logic of setting sk->sk_rcvbuf with SO_RCVBUF, it only
> permits the maximum value of sk->sk_rcvbuf to sysctl_rmem_max * 2(ie,
> about 400KB normally).
> 
> So, even if the default value of sk->sk_rcvbuf is set to 32MB with
> net.tipc.sysctl_rmem, a bit smaller value than the default value can
> never be set to sk->sk_rcvbuf successfully with SO_RCVBUF option.
> 
> How can we avoid the limit?
> 
By administratively adjusting sysctl_rmem_max to be a sufficiently large value
such that using SO_RCVBUF won't be clamed to a lower limit.

If you don't want to force users to have to manually adjust the sysctl, there
might be support for you to automatically update sysctl_rmem_max in your
tipc_init routine, and print an informational message indicating that tipc
requires the additional space (although I still maintain its not strictly
needed, but thats another argument).

Neil

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ying Xue June 4, 2013, 1:37 a.m. UTC | #4
On 06/03/2013 09:16 PM, Neil Horman wrote:
> On Mon, Jun 03, 2013 at 05:55:06PM +0800, Ying Xue wrote:
>> On 05/31/2013 09:36 PM, Neil Horman wrote:
>>> On Thu, May 30, 2013 at 03:36:06PM -0400, Paul Gortmaker wrote:
>>>> From: Jon Maloy <jon.maloy@ericsson.com>
>>>>
>>>> As per feedback from the netdev community, we change the buffer
>>>> overflow protection algorithm in receiving sockets so that it
>>>> always respects the nominal upper limit set in sk_rcvbuf.
>>>>
>>>> Instead of scaling up from a small sk_rcvbuf value, which leads to
>>>> violation of the configured sk_rcvbuf limit, we now calculate the
>>>> weighted per-message limit by scaling down from a much bigger value,
>>>> still in the same field, according to the importance priority of the
>>>> received message.
>>>>
>>>> Cc: Neil Horman <nhorman@tuxdriver.com>
>>>> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
>>>> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
>>>> ---
>>>>  net/tipc/socket.c | 13 +++++++------
>>>>  1 file changed, 7 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/net/tipc/socket.c b/net/tipc/socket.c
>>>> index 515ce38..2dfabc7 100644
>>>> --- a/net/tipc/socket.c
>>>> +++ b/net/tipc/socket.c
>>>> @@ -1,7 +1,7 @@
>>>>  /*
>>>>   * net/tipc/socket.c: TIPC socket API
>>>>   *
>>>> - * Copyright (c) 2001-2007, 2012 Ericsson AB
>>>> + * Copyright (c) 2001-2007, 2012-2013, Ericsson AB
>>>>   * Copyright (c) 2004-2008, 2010-2012, Wind River Systems
>>>>   * All rights reserved.
>>>>   *
>>>> @@ -203,6 +203,7 @@ static int tipc_create(struct net *net, struct socket *sock, int protocol,
>>>>  
>>>>  	sock_init_data(sock, sk);
>>>>  	sk->sk_backlog_rcv = backlog_rcv;
>>>> +	sk->sk_rcvbuf = CONN_OVERLOAD_LIMIT;
>>> The last time Jon and I discussed this, I thought the consensus was to export
>>> sk_rcvbuf via its own sysctl, or tie it to sysctl_rmem (while requiring a
>>> protocol specific minimum on top of that), so administrators on memory
>>> constrained systems didn't wonder why their sysctl changes weren't being
>>> honored.
>>
>> Yes, your suggestion is reasonable, and I prefer to involve
>> net.tipc.sysctl_rmem. But I have one question about it:
>>
>> As you suggested as belows, the default value of sk->sk_rcvbuf is set to
>> sk->sk_rcvbuf >> 4 << msg_importance(TIPC_CRITICAL_IMPORTANCE), that is,
>> sk->sk_rcvbuf is about 32MB.
>>
>> However, please see below code:
>>
>> int sock_setsockopt()
>> {
>> ...
>> 	        case SO_RCVBUF:
>>                 /* Don't error on this BSD doesn't and if you think
>>                  * about it this is right. Otherwise apps have to
>>                  * play 'guess the biggest size' games. RCVBUF/SNDBUF
>>                  * are treated in BSD as hints
>>                  */
>>                 val = min_t(u32, val, sysctl_rmem_max);
>> set_rcvbuf:
>>                 sk->sk_userlocks |= SOCK_RCVBUF_LOCK;
>>                 /*
>>                  * We double it on the way in to account for
>>                  * "struct sk_buff" etc. overhead.   Applications
>>                  * assume that the SO_RCVBUF setting they make will
>>                  * allow that much actual data to be received on that
>>                  * socket.
>>                  *
>>                  * Applications are unaware that "struct sk_buff" and
>>                  * other overheads allocate from the receive buffer
>>                  * during socket buffer allocation.
>>                  *
>>                  * And after considering the possible alternatives,
>>                  * returning the value we actually used in getsockopt
>>                  * is the most desirable behavior.
>>                  */
>>                 sk->sk_rcvbuf = max_t(u32, val * 2, SOCK_MIN_RCVBUF);
>>                 break;
>> ...
>> }
>>
>> From above logic of setting sk->sk_rcvbuf with SO_RCVBUF, it only
>> permits the maximum value of sk->sk_rcvbuf to sysctl_rmem_max * 2(ie,
>> about 400KB normally).
>>
>> So, even if the default value of sk->sk_rcvbuf is set to 32MB with
>> net.tipc.sysctl_rmem, a bit smaller value than the default value can
>> never be set to sk->sk_rcvbuf successfully with SO_RCVBUF option.
>>
>> How can we avoid the limit?
>>
> By administratively adjusting sysctl_rmem_max to be a sufficiently large value
> such that using SO_RCVBUF won't be clamed to a lower limit.
> 
> If you don't want to force users to have to manually adjust the sysctl, there
> might be support for you to automatically update sysctl_rmem_max in your
> tipc_init routine, and print an informational message indicating that tipc
> requires the additional space (although I still maintain its not strictly
> needed, but thats another argument).
> 

Thanks for your clear clarification.

I also have the same concern. If we override sysctl_rmem_max in
tipc_init() with a larger value, I am afraid that other guys will oppose
the behaviour.

The truth is that little TIPC user adjusts the sk->sk_rcvbuf with
SO_RCVBUF option in practice. If he really wants to do, he should follow
your suggestion he manually enlarges the sysctl.

OK, I will rewrite the patch with your suggestion.

Regards,
Ying

> Neil
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman June 4, 2013, 1:40 p.m. UTC | #5
On Tue, Jun 04, 2013 at 09:37:54AM +0800, Ying Xue wrote:
> On 06/03/2013 09:16 PM, Neil Horman wrote:
> > On Mon, Jun 03, 2013 at 05:55:06PM +0800, Ying Xue wrote:
> >> On 05/31/2013 09:36 PM, Neil Horman wrote:
> >>> On Thu, May 30, 2013 at 03:36:06PM -0400, Paul Gortmaker wrote:
> >>>> From: Jon Maloy <jon.maloy@ericsson.com>
> >>>>
> >>>> As per feedback from the netdev community, we change the buffer
> >>>> overflow protection algorithm in receiving sockets so that it
> >>>> always respects the nominal upper limit set in sk_rcvbuf.
> >>>>
> >>>> Instead of scaling up from a small sk_rcvbuf value, which leads to
> >>>> violation of the configured sk_rcvbuf limit, we now calculate the
> >>>> weighted per-message limit by scaling down from a much bigger value,
> >>>> still in the same field, according to the importance priority of the
> >>>> received message.
> >>>>
> >>>> Cc: Neil Horman <nhorman@tuxdriver.com>
> >>>> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
> >>>> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> >>>> ---
> >>>>  net/tipc/socket.c | 13 +++++++------
> >>>>  1 file changed, 7 insertions(+), 6 deletions(-)
> >>>>
> >>>> diff --git a/net/tipc/socket.c b/net/tipc/socket.c
> >>>> index 515ce38..2dfabc7 100644
> >>>> --- a/net/tipc/socket.c
> >>>> +++ b/net/tipc/socket.c
> >>>> @@ -1,7 +1,7 @@
> >>>>  /*
> >>>>   * net/tipc/socket.c: TIPC socket API
> >>>>   *
> >>>> - * Copyright (c) 2001-2007, 2012 Ericsson AB
> >>>> + * Copyright (c) 2001-2007, 2012-2013, Ericsson AB
> >>>>   * Copyright (c) 2004-2008, 2010-2012, Wind River Systems
> >>>>   * All rights reserved.
> >>>>   *
> >>>> @@ -203,6 +203,7 @@ static int tipc_create(struct net *net, struct socket *sock, int protocol,
> >>>>  
> >>>>  	sock_init_data(sock, sk);
> >>>>  	sk->sk_backlog_rcv = backlog_rcv;
> >>>> +	sk->sk_rcvbuf = CONN_OVERLOAD_LIMIT;
> >>> The last time Jon and I discussed this, I thought the consensus was to export
> >>> sk_rcvbuf via its own sysctl, or tie it to sysctl_rmem (while requiring a
> >>> protocol specific minimum on top of that), so administrators on memory
> >>> constrained systems didn't wonder why their sysctl changes weren't being
> >>> honored.
> >>
> >> Yes, your suggestion is reasonable, and I prefer to involve
> >> net.tipc.sysctl_rmem. But I have one question about it:
> >>
> >> As you suggested as belows, the default value of sk->sk_rcvbuf is set to
> >> sk->sk_rcvbuf >> 4 << msg_importance(TIPC_CRITICAL_IMPORTANCE), that is,
> >> sk->sk_rcvbuf is about 32MB.
> >>
> >> However, please see below code:
> >>
> >> int sock_setsockopt()
> >> {
> >> ...
> >> 	        case SO_RCVBUF:
> >>                 /* Don't error on this BSD doesn't and if you think
> >>                  * about it this is right. Otherwise apps have to
> >>                  * play 'guess the biggest size' games. RCVBUF/SNDBUF
> >>                  * are treated in BSD as hints
> >>                  */
> >>                 val = min_t(u32, val, sysctl_rmem_max);
> >> set_rcvbuf:
> >>                 sk->sk_userlocks |= SOCK_RCVBUF_LOCK;
> >>                 /*
> >>                  * We double it on the way in to account for
> >>                  * "struct sk_buff" etc. overhead.   Applications
> >>                  * assume that the SO_RCVBUF setting they make will
> >>                  * allow that much actual data to be received on that
> >>                  * socket.
> >>                  *
> >>                  * Applications are unaware that "struct sk_buff" and
> >>                  * other overheads allocate from the receive buffer
> >>                  * during socket buffer allocation.
> >>                  *
> >>                  * And after considering the possible alternatives,
> >>                  * returning the value we actually used in getsockopt
> >>                  * is the most desirable behavior.
> >>                  */
> >>                 sk->sk_rcvbuf = max_t(u32, val * 2, SOCK_MIN_RCVBUF);
> >>                 break;
> >> ...
> >> }
> >>
> >> From above logic of setting sk->sk_rcvbuf with SO_RCVBUF, it only
> >> permits the maximum value of sk->sk_rcvbuf to sysctl_rmem_max * 2(ie,
> >> about 400KB normally).
> >>
> >> So, even if the default value of sk->sk_rcvbuf is set to 32MB with
> >> net.tipc.sysctl_rmem, a bit smaller value than the default value can
> >> never be set to sk->sk_rcvbuf successfully with SO_RCVBUF option.
> >>
> >> How can we avoid the limit?
> >>
> > By administratively adjusting sysctl_rmem_max to be a sufficiently large value
> > such that using SO_RCVBUF won't be clamed to a lower limit.
> > 
> > If you don't want to force users to have to manually adjust the sysctl, there
> > might be support for you to automatically update sysctl_rmem_max in your
> > tipc_init routine, and print an informational message indicating that tipc
> > requires the additional space (although I still maintain its not strictly
> > needed, but thats another argument).
> > 
> 
> Thanks for your clear clarification.
> 
> I also have the same concern. If we override sysctl_rmem_max in
> tipc_init() with a larger value, I am afraid that other guys will oppose
> the behaviour.
> 
> The truth is that little TIPC user adjusts the sk->sk_rcvbuf with
> SO_RCVBUF option in practice. If he really wants to do, he should follow
> your suggestion he manually enlarges the sysctl.
> 
> OK, I will rewrite the patch with your suggestion.
> 
> Regards,
> Ying
> 
> > Neil
> > 
> > 
> > 
> 
> 
Sounds good, thanks!
Neil

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 515ce38..2dfabc7 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -1,7 +1,7 @@ 
 /*
  * net/tipc/socket.c: TIPC socket API
  *
- * Copyright (c) 2001-2007, 2012 Ericsson AB
+ * Copyright (c) 2001-2007, 2012-2013, Ericsson AB
  * Copyright (c) 2004-2008, 2010-2012, Wind River Systems
  * All rights reserved.
  *
@@ -203,6 +203,7 @@  static int tipc_create(struct net *net, struct socket *sock, int protocol,
 
 	sock_init_data(sock, sk);
 	sk->sk_backlog_rcv = backlog_rcv;
+	sk->sk_rcvbuf = CONN_OVERLOAD_LIMIT;
 	sk->sk_data_ready = tipc_data_ready;
 	sk->sk_write_space = tipc_write_space;
 	tipc_sk(sk)->p = tp_ptr;
@@ -1233,10 +1234,10 @@  static u32 filter_connect(struct tipc_sock *tsock, struct sk_buff **buf)
  * For all connectionless messages, by default new queue limits are
  * as belows:
  *
- * TIPC_LOW_IMPORTANCE       (5MB)
- * TIPC_MEDIUM_IMPORTANCE    (10MB)
- * TIPC_HIGH_IMPORTANCE      (20MB)
- * TIPC_CRITICAL_IMPORTANCE  (40MB)
+ * TIPC_LOW_IMPORTANCE       (4 MB)
+ * TIPC_MEDIUM_IMPORTANCE    (8 MB)
+ * TIPC_HIGH_IMPORTANCE      (16 MB)
+ * TIPC_CRITICAL_IMPORTANCE  (32 MB)
  *
  * Returns overload limit according to corresponding message importance
  */
@@ -1248,7 +1249,7 @@  static unsigned int rcvbuf_limit(struct sock *sk, struct sk_buff *buf)
 	if (msg_connected(msg))
 		limit = CONN_OVERLOAD_LIMIT;
 	else
-		limit = sk->sk_rcvbuf << (msg_importance(msg) + 5);
+		limit = sk->sk_rcvbuf >> 4 << msg_importance(msg);
 	return limit;
 }