diff mbox

[net-next,03/10] tipc: sk_recv_queue size check only for connectionless sockets

Message ID 1354890498-6448-4-git-send-email-paul.gortmaker@windriver.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Paul Gortmaker Dec. 7, 2012, 2:28 p.m. UTC
From: Ying Xue <ying.xue@windriver.com>

The sk_receive_queue limit control is currently performed for
all arriving messages, disregarding socket and message type.
But for connected sockets this check is redundant, since the protocol
flow control already makes queue overflow impossible.

We move the sk_receive_queue limit control so that it is only performed
for connectionless sockets, i.e. SOCK_RDM and SOCK_DGRAM type sockets.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/tipc/socket.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

Comments

Neil Horman Dec. 7, 2012, 7:20 p.m. UTC | #1
On Fri, Dec 07, 2012 at 09:28:11AM -0500, Paul Gortmaker wrote:
> From: Ying Xue <ying.xue@windriver.com>
> 
> The sk_receive_queue limit control is currently performed for
> all arriving messages, disregarding socket and message type.
> But for connected sockets this check is redundant, since the protocol
> flow control already makes queue overflow impossible.
> 
Can you explain where that occurs?  I see where the tipc dispatch function calls
sk_add_backlog, which checks the per socket recieve queue (regardless of weather
the receiving socket is connection oriented or connectionless), but if the
receiver doesn't call receive very often, This just adds a check against your
global limit, doing nothing for your per-socket limits.  In fact it seems to
repeat the same check twice, as in the worst case of the incomming message being
TIPC_LOW_IMPORTANCE, its just going to check that the global limit is exactly
OVERLOAD_LIMIT_BASE/2 again.

Neil

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jon Maloy Dec. 7, 2012, 10:30 p.m. UTC | #2
On 12/07/2012 02:20 PM, Neil Horman wrote:
> On Fri, Dec 07, 2012 at 09:28:11AM -0500, Paul Gortmaker wrote:
>> From: Ying Xue <ying.xue@windriver.com>
>>
>> The sk_receive_queue limit control is currently performed for
>> all arriving messages, disregarding socket and message type.
>> But for connected sockets this check is redundant, since the protocol
>> flow control already makes queue overflow impossible.
>>
> Can you explain where that occurs?  

It happens in the functions port_dispatcher_sigh() and tipc_send(), 
among other places. Both are to be found in the file port.c, which 
was supposed to contain the 'generic' (i.e., API independent) part 
of the send/receive code.
Now that we have only one API left, the socket API, we are 
planning to merge the code in socket.c and port.c, and get rid of 
some code overhead.

The flow control in TIPC is message based, where the sender
requires to receive an explicit acknowledge message for each 
512 message the receiver reads to user space.
If the sender has more than 1024 messages outstanding without having
received an acknowledge he will be suspended or receive EAGAIN until 
he does.
The plan going forward is to replace this mechanism with a more 
standard looking byte based flow control, while maintaining 
backwards compatibility.


> I see where the tipc dispatch function calls
> sk_add_backlog, which checks the per socket recieve queue (regardless of weather
> the receiving socket is connection oriented or connectionless), but if the
> receiver doesn't call receive very often, This just adds a check against your
> global limit, doing nothing for your per-socket limits. 

OVERLOAD_LIMIT_BASE is tested against a per-socket message counter, so it _is_
our per-socket limit. In fact, TIPC connectionless overflow control currently 
is a kind of a hybrid, based on a message counter when the socket is not locked, 
and based on sk_rcv_queue's byte limit when a message has to be added to the 
backlog.
We are planning to fix this inconsistency too.

 In fact it seems to
> repeat the same check twice, as in the worst case of the incomming message being
> TIPC_LOW_IMPORTANCE, its just going to check that the global limit is exactly
> OVERLOAD_LIMIT_BASE/2 again.

Yes, you are right. The intention is that only the first test, 
if (unlikely(recv_q_len >= (OVERLOAD_LIMIT_BASE / 2)){..}
will be run for the vast majority of messages, since we must assume
that there is no overload most of the time.
An inelegant optimization, perhaps, but not logically wrong.

///jon

> 
> Neil
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman Dec. 9, 2012, 4:50 p.m. UTC | #3
On Fri, Dec 07, 2012 at 05:30:11PM -0500, Jon Maloy wrote:
> On 12/07/2012 02:20 PM, Neil Horman wrote:
> > On Fri, Dec 07, 2012 at 09:28:11AM -0500, Paul Gortmaker wrote:
> >> From: Ying Xue <ying.xue@windriver.com>
> >>
> >> The sk_receive_queue limit control is currently performed for
> >> all arriving messages, disregarding socket and message type.
> >> But for connected sockets this check is redundant, since the protocol
> >> flow control already makes queue overflow impossible.
> >>
> > Can you explain where that occurs?  
> 
> It happens in the functions port_dispatcher_sigh() and tipc_send(), 
> among other places. Both are to be found in the file port.c, which 
> was supposed to contain the 'generic' (i.e., API independent) part 
> of the send/receive code.
> Now that we have only one API left, the socket API, we are 
> planning to merge the code in socket.c and port.c, and get rid of 
> some code overhead.
> 
> The flow control in TIPC is message based, where the sender
> requires to receive an explicit acknowledge message for each 
> 512 message the receiver reads to user space.
> If the sender has more than 1024 messages outstanding without having
> received an acknowledge he will be suspended or receive EAGAIN until 
> he does.
> The plan going forward is to replace this mechanism with a more 
> standard looking byte based flow control, while maintaining 
> backwards compatibility.
> 
Ok, That makes more sense, thank you.  Although I still don't think this is
safe (but the problem may not be solely introduced by this patch).  Using a
global limit that assumes the sender will block when the congestion window is
reached just doesn't seem sane to me.  It clearly works with the Linux
implementation, as it conforms to your expectations, but an alternate
implementation could create a DOS situation by simply ignoring the window limit,
and continuing to send.  I see that we drop frames over the global limit in
filter_rcv, but the check in rx_queue_full bumps up that limit based on the
value of msg_importance(msg), but that threshold is ignored if the value of
msg_importance is invalid.  All a sender needs to do is flood a receiver with
frames containing an invalid set of message importance bits, and you will queue
frames indefinately.  In fact that will also happen if you send message of
CRITICAL importance as well, so you don't even need to supply an invalid value
here.

> 
> > I see where the tipc dispatch function calls
> > sk_add_backlog, which checks the per socket recieve queue (regardless of weather
> > the receiving socket is connection oriented or connectionless), but if the
> > receiver doesn't call receive very often, This just adds a check against your
> > global limit, doing nothing for your per-socket limits. 
> 
> OVERLOAD_LIMIT_BASE is tested against a per-socket message counter, so it _is_
> our per-socket limit. In fact, TIPC connectionless overflow control currently 
> is a kind of a hybrid, based on a message counter when the socket is not locked, 
> and based on sk_rcv_queue's byte limit when a message has to be added to the 
> backlog.
> We are planning to fix this inconsistency too.
Good, thank you,  that was seeming quite wrong to me.

> 
>  In fact it seems to
> > repeat the same check twice, as in the worst case of the incomming message being
> > TIPC_LOW_IMPORTANCE, its just going to check that the global limit is exactly
> > OVERLOAD_LIMIT_BASE/2 again.
> 
> Yes, you are right. The intention is that only the first test, 
> if (unlikely(recv_q_len >= (OVERLOAD_LIMIT_BASE / 2)){..}
> will be run for the vast majority of messages, since we must assume
> that there is no overload most of the time.
> An inelegant optimization, perhaps, but not logically wrong.
No, not logically wrong, but not an optimization either.  With this change,
your only use of rx_queue_full passes OVERLOAD_LIMIT_BASE/2 as the base value to
rx_queue_full, and then you do some multiplication based on that.  If you really
want to optimize this, leave OVERLOAD_LIMIT_BASE where it is (rather than
doubling it like this patch series does), mark rx_queue_full as inline, and just
pass OVERLOAD_LIMIT_BASE as the argument, it will save you a division opration,
the conditional branch and a call instruction.  If you add a multiplication
factor table, you can eliminate the if/else clauses in rx_queue_full as well.

Neil

> 
> ///jon
> 
> > 
> > Neil
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ying Xue Dec. 10, 2012, 6:27 a.m. UTC | #4
Neil Horman wrote:
> On Fri, Dec 07, 2012 at 05:30:11PM -0500, Jon Maloy wrote:
>   
>> On 12/07/2012 02:20 PM, Neil Horman wrote:
>>     
>>> On Fri, Dec 07, 2012 at 09:28:11AM -0500, Paul Gortmaker wrote:
>>>       
>>>> From: Ying Xue <ying.xue@windriver.com>
>>>>
>>>> The sk_receive_queue limit control is currently performed for
>>>> all arriving messages, disregarding socket and message type.
>>>> But for connected sockets this check is redundant, since the protocol
>>>> flow control already makes queue overflow impossible.
>>>>
>>>>         
>>> Can you explain where that occurs?  
>>>       
>> It happens in the functions port_dispatcher_sigh() and tipc_send(), 
>> among other places. Both are to be found in the file port.c, which 
>> was supposed to contain the 'generic' (i.e., API independent) part 
>> of the send/receive code.
>> Now that we have only one API left, the socket API, we are 
>> planning to merge the code in socket.c and port.c, and get rid of 
>> some code overhead.
>>
>> The flow control in TIPC is message based, where the sender
>> requires to receive an explicit acknowledge message for each 
>> 512 message the receiver reads to user space.
>> If the sender has more than 1024 messages outstanding without having
>> received an acknowledge he will be suspended or receive EAGAIN until 
>> he does.
>> The plan going forward is to replace this mechanism with a more 
>> standard looking byte based flow control, while maintaining 
>> backwards compatibility.
>>
>>     
> Ok, That makes more sense, thank you.  Although I still don't think this is
> safe (but the problem may not be solely introduced by this patch).  Using a
> global limit that assumes the sender will block when the congestion window is
> reached just doesn't seem sane to me.  It clearly works with the Linux
> implementation, as it conforms to your expectations, but an alternate
> implementation could create a DOS situation by simply ignoring the window limit,
> and continuing to send.  I see that we drop frames over the global limit in
> filter_rcv, but the check in rx_queue_full bumps up that limit based on the
> value of msg_importance(msg), but that threshold is ignored if the value of
> msg_importance is invalid.  All a sender needs to do is flood a receiver with
> frames containing an invalid set of message importance bits, and you will queue
> frames indefinately.  In fact that will also happen if you send message of
> CRITICAL importance as well, so you don't even need to supply an invalid value
> here.
>
>   

You are absolutely right. I will correct these drawbacks in next version.

>>> I see where the tipc dispatch function calls
>>> sk_add_backlog, which checks the per socket recieve queue (regardless of weather
>>> the receiving socket is connection oriented or connectionless), but if the
>>> receiver doesn't call receive very often, This just adds a check against your
>>> global limit, doing nothing for your per-socket limits. 
>>>       
>> OVERLOAD_LIMIT_BASE is tested against a per-socket message counter, so it _is_
>> our per-socket limit. In fact, TIPC connectionless overflow control currently 
>> is a kind of a hybrid, based on a message counter when the socket is not locked, 
>> and based on sk_rcv_queue's byte limit when a message has to be added to the 
>> backlog.
>> We are planning to fix this inconsistency too.
>>     
> Good, thank you,  that was seeming quite wrong to me.
>
>   
>>  In fact it seems to
>>     
>>> repeat the same check twice, as in the worst case of the incomming message being
>>> TIPC_LOW_IMPORTANCE, its just going to check that the global limit is exactly
>>> OVERLOAD_LIMIT_BASE/2 again.
>>>       
>> Yes, you are right. The intention is that only the first test, 
>> if (unlikely(recv_q_len >= (OVERLOAD_LIMIT_BASE / 2)){..}
>> will be run for the vast majority of messages, since we must assume
>> that there is no overload most of the time.
>> An inelegant optimization, perhaps, but not logically wrong.
>>     
> No, not logically wrong, but not an optimization either.  With this change,
> your only use of rx_queue_full passes OVERLOAD_LIMIT_BASE/2 as the base value to
> rx_queue_full, and then you do some multiplication based on that.  If you really
> want to optimize this, leave OVERLOAD_LIMIT_BASE where it is (rather than
> doubling it like this patch series does), mark rx_queue_full as inline, and just
> pass OVERLOAD_LIMIT_BASE as the argument, it will save you a division opration,
> the conditional branch and a call instruction.  If you add a multiplication
> factor table, you can eliminate the if/else clauses in rx_queue_full as well.
>
>   

Good suggestion with a factor table. Maybe it's unnecessary to 
explicitly mark rx_queue_full as inline. Currently it sounds like we let 
complier decide whether a function is defined as inline or not.

Regards,
Ying

> Neil
>
>   
>> ///jon
>>
>>     
>>> Neil
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>       
>>     
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>   

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jon Maloy Dec. 10, 2012, 8:46 a.m. UTC | #5
On 12/10/2012 01:27 AM, Ying Xue wrote:
> Neil Horman wrote:
>> On Fri, Dec 07, 2012 at 05:30:11PM -0500, Jon Maloy wrote:
>>   
>>> On 12/07/2012 02:20 PM, Neil Horman wrote:
>>>     
>>>> On Fri, Dec 07, 2012 at 09:28:11AM -0500, Paul Gortmaker wrote:
>>>>       
>>>>> From: Ying Xue <ying.xue@windriver.com>
>>>>>
>>>>> The sk_receive_queue limit control is currently performed for
>>>>> all arriving messages, disregarding socket and message type.
>>>>> But for connected sockets this check is redundant, since the protocol
>>>>> flow control already makes queue overflow impossible.
>>>>>
>>>>>         
>>>> Can you explain where that occurs?  
>>>>       
>>> It happens in the functions port_dispatcher_sigh() and tipc_send(), 
>>> among other places. Both are to be found in the file port.c, which 
>>> was supposed to contain the 'generic' (i.e., API independent) part 
>>> of the send/receive code.
>>> Now that we have only one API left, the socket API, we are 
>>> planning to merge the code in socket.c and port.c, and get rid of 
>>> some code overhead.
>>>
>>> The flow control in TIPC is message based, where the sender
>>> requires to receive an explicit acknowledge message for each 
>>> 512 message the receiver reads to user space.
>>> If the sender has more than 1024 messages outstanding without having
>>> received an acknowledge he will be suspended or receive EAGAIN until 
>>> he does.
>>> The plan going forward is to replace this mechanism with a more 
>>> standard looking byte based flow control, while maintaining 
>>> backwards compatibility.
>>>
>>>     
>> Ok, That makes more sense, thank you.  Although I still don't think this is
>> safe (but the problem may not be solely introduced by this patch).  Using a
>> global limit that assumes the sender will block when the congestion window is
>> reached just doesn't seem sane to me.  It clearly works with the Linux
>> implementation, as it conforms to your expectations, but an alternate
>> implementation could create a DOS situation by simply ignoring the window limit,
>> and continuing to send.  I see that we drop frames over the global limit in
>> filter_rcv, but the check in rx_queue_full bumps up that limit based on the
>> value of msg_importance(msg), but that threshold is ignored if the value of
>> msg_importance is invalid.  All a sender needs to do is flood a receiver with
>> frames containing an invalid set of message importance bits, and you will queue
>> frames indefinately.  In fact that will also happen if you send message of
>> CRITICAL importance as well, so you don't even need to supply an invalid value
>> here.
>>
>>   
> 
> You are absolutely right. I will correct these drawbacks in next version.

I think we should rather just drop this patch. We introduce a major vulnerability,
as Neil correctly points out. We will anyway have to do a rework of this code.

> 
>>>> I see where the tipc dispatch function calls
>>>> sk_add_backlog, which checks the per socket recieve queue (regardless of weather
>>>> the receiving socket is connection oriented or connectionless), but if the
>>>> receiver doesn't call receive very often, This just adds a check against your
>>>> global limit, doing nothing for your per-socket limits. 
>>>>       
>>> OVERLOAD_LIMIT_BASE is tested against a per-socket message counter, so it _is_
>>> our per-socket limit. In fact, TIPC connectionless overflow control currently 
>>> is a kind of a hybrid, based on a message counter when the socket is not locked, 
>>> and based on sk_rcv_queue's byte limit when a message has to be added to the 
>>> backlog.
>>> We are planning to fix this inconsistency too.
>>>     
>> Good, thank you,  that was seeming quite wrong to me.
>>
>>   
>>>  In fact it seems to
>>>     
>>>> repeat the same check twice, as in the worst case of the incomming message being
>>>> TIPC_LOW_IMPORTANCE, its just going to check that the global limit is exactly
>>>> OVERLOAD_LIMIT_BASE/2 again.
>>>>       
>>> Yes, you are right. The intention is that only the first test, 
>>> if (unlikely(recv_q_len >= (OVERLOAD_LIMIT_BASE / 2)){..}
>>> will be run for the vast majority of messages, since we must assume
>>> that there is no overload most of the time.
>>> An inelegant optimization, perhaps, but not logically wrong.
>>>     
>> No, not logically wrong, but not an optimization either.  With this change,
>> your only use of rx_queue_full passes OVERLOAD_LIMIT_BASE/2 as the base value to
>> rx_queue_full, and then you do some multiplication based on that.  

It is still in the "unlikely" (in fact, very unlikely) branch. And the multiplication
is by two, i.e. just a left-shift operation. Our approach was rather to let the 
compiler decide about inlining, which in this case might be a sub-optimization.

If you really
>> want to optimize this, leave OVERLOAD_LIMIT_BASE where it is (rather than
>> doubling it like this patch series does), mark rx_queue_full as inline, and just
>> pass OVERLOAD_LIMIT_BASE as the argument, it will save you a division opration,
>> the conditional branch and a call instruction.  If you add a multiplication
>> factor table, you can eliminate the if/else clauses in rx_queue_full as well.
>>
>>   
> 
> Good suggestion with a factor table. Maybe it's unnecessary to 
> explicitly mark rx_queue_full as inline. Currently it sounds like we let 
> complier decide whether a function is defined as inline or not.

One approach I had in mind was to just left-shift OVERLOAD_LIMIT_BASE with
message priority, and compare that to the per-socket counter. This way,
we obtain the limit set [10000,20000,30000,40000] without having to read
data memory. The limits will not be the same as now, but probably good
enough. We don't even need a separate function for this check.
Something we should look into when we move on to make this mechanism 
byte-based.

> 
> Regards,
> Ying
> 
>> Neil
>>
>>   
>>> ///jon
>>>
>>>     
>>>> Neil
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>       
>>>     
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>   
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman Dec. 10, 2012, 2:22 p.m. UTC | #6
On Mon, Dec 10, 2012 at 02:27:50PM +0800, Ying Xue wrote:
> Neil Horman wrote:
> >On Fri, Dec 07, 2012 at 05:30:11PM -0500, Jon Maloy wrote:
> >>On 12/07/2012 02:20 PM, Neil Horman wrote:
> >>>On Fri, Dec 07, 2012 at 09:28:11AM -0500, Paul Gortmaker wrote:
> >>>>From: Ying Xue <ying.xue@windriver.com>
> >>>>
> >>>>The sk_receive_queue limit control is currently performed for
> >>>>all arriving messages, disregarding socket and message type.
> >>>>But for connected sockets this check is redundant, since the protocol
> >>>>flow control already makes queue overflow impossible.
> >>>>
> >>>Can you explain where that occurs?
> >>It happens in the functions port_dispatcher_sigh() and
> >>tipc_send(), among other places. Both are to be found in the
> >>file port.c, which was supposed to contain the 'generic' (i.e.,
> >>API independent) part of the send/receive code.
> >>Now that we have only one API left, the socket API, we are
> >>planning to merge the code in socket.c and port.c, and get rid
> >>of some code overhead.
> >>
> >>The flow control in TIPC is message based, where the sender
> >>requires to receive an explicit acknowledge message for each 512
> >>message the receiver reads to user space.
> >>If the sender has more than 1024 messages outstanding without having
> >>received an acknowledge he will be suspended or receive EAGAIN
> >>until he does.
> >>The plan going forward is to replace this mechanism with a more
> >>standard looking byte based flow control, while maintaining
> >>backwards compatibility.
> >>
> >Ok, That makes more sense, thank you.  Although I still don't think this is
> >safe (but the problem may not be solely introduced by this patch).  Using a
> >global limit that assumes the sender will block when the congestion window is
> >reached just doesn't seem sane to me.  It clearly works with the Linux
> >implementation, as it conforms to your expectations, but an alternate
> >implementation could create a DOS situation by simply ignoring the window limit,
> >and continuing to send.  I see that we drop frames over the global limit in
> >filter_rcv, but the check in rx_queue_full bumps up that limit based on the
> >value of msg_importance(msg), but that threshold is ignored if the value of
> >msg_importance is invalid.  All a sender needs to do is flood a receiver with
> >frames containing an invalid set of message importance bits, and you will queue
> >frames indefinately.  In fact that will also happen if you send message of
> >CRITICAL importance as well, so you don't even need to supply an invalid value
> >here.
> >
> 
> You are absolutely right. I will correct these drawbacks in next version.
> 
> >>>I see where the tipc dispatch function calls
> >>>sk_add_backlog, which checks the per socket recieve queue (regardless of weather
> >>>the receiving socket is connection oriented or connectionless), but if the
> >>>receiver doesn't call receive very often, This just adds a check against your
> >>>global limit, doing nothing for your per-socket limits.
> >>OVERLOAD_LIMIT_BASE is tested against a per-socket message counter, so it _is_
> >>our per-socket limit. In fact, TIPC connectionless overflow
> >>control currently is a kind of a hybrid, based on a message
> >>counter when the socket is not locked, and based on
> >>sk_rcv_queue's byte limit when a message has to be added to the
> >>backlog.
> >>We are planning to fix this inconsistency too.
> >Good, thank you,  that was seeming quite wrong to me.
> >
> >> In fact it seems to
> >>>repeat the same check twice, as in the worst case of the incomming message being
> >>>TIPC_LOW_IMPORTANCE, its just going to check that the global limit is exactly
> >>>OVERLOAD_LIMIT_BASE/2 again.
> >>Yes, you are right. The intention is that only the first test,
> >>if (unlikely(recv_q_len >= (OVERLOAD_LIMIT_BASE / 2)){..}
> >>will be run for the vast majority of messages, since we must assume
> >>that there is no overload most of the time.
> >>An inelegant optimization, perhaps, but not logically wrong.
> >No, not logically wrong, but not an optimization either.  With this change,
> >your only use of rx_queue_full passes OVERLOAD_LIMIT_BASE/2 as the base value to
> >rx_queue_full, and then you do some multiplication based on that.  If you really
> >want to optimize this, leave OVERLOAD_LIMIT_BASE where it is (rather than
> >doubling it like this patch series does), mark rx_queue_full as inline, and just
> >pass OVERLOAD_LIMIT_BASE as the argument, it will save you a division opration,
> >the conditional branch and a call instruction.  If you add a multiplication
> >factor table, you can eliminate the if/else clauses in rx_queue_full as well.
> >
> 
> Good suggestion with a factor table. Maybe it's unnecessary to
> explicitly mark rx_queue_full as inline. Currently it sounds like we
> let complier decide whether a function is defined as inline or not.
> 
Thats correct, the compiler usually decides if something should be inlined
(unless you use the __always_inline) attribute.  In this case, given a single
call site, it most like will just inline anyway.  But if you're interested in
optimizing here, it might be worth taking the extra steps to make sure.  In
fact, since this is your only call site, it may be worthwhile to just remove the
function entirely, and manually inline the check.
Neil

> Regards,
> Ying
> 
> >Neil
> >
> >>///jon
> >>
> >>>Neil
> >>>
> >>>--
> >>>To unsubscribe from this list: send the line "unsubscribe netdev" in
> >>>the body of a message to majordomo@vger.kernel.org
> >>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>
> >--
> >To unsubscribe from this list: send the line "unsubscribe netdev" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index a059ed0..4d6a448 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -1188,9 +1188,6 @@  static int rx_queue_full(struct tipc_msg *msg, u32 queue_size, u32 base)
 	else
 		return 0;
 
-	if (msg_connected(msg))
-		threshold *= 4;
-
 	return queue_size >= threshold;
 }
 
@@ -1219,6 +1216,15 @@  static u32 filter_rcv(struct sock *sk, struct sk_buff *buf)
 	if (sock->state == SS_READY) {
 		if (msg_connected(msg))
 			return TIPC_ERR_NO_PORT;
+		/* Reject SOCK_DGRAM and SOCK_RDM messages if there isn't room
+		 * to queue it.
+		 */
+		recv_q_len = skb_queue_len(&sk->sk_receive_queue);
+		if (unlikely(recv_q_len >= (OVERLOAD_LIMIT_BASE / 2))) {
+			if (rx_queue_full(msg, recv_q_len,
+			    OVERLOAD_LIMIT_BASE / 2))
+				return TIPC_ERR_OVERLOAD;
+		}
 	} else {
 		if (msg_mcast(msg))
 			return TIPC_ERR_NO_PORT;
@@ -1240,13 +1246,6 @@  static u32 filter_rcv(struct sock *sk, struct sk_buff *buf)
 		}
 	}
 
-	/* Reject message if there isn't room to queue it */
-	recv_q_len = skb_queue_len(&sk->sk_receive_queue);
-	if (unlikely(recv_q_len >= (OVERLOAD_LIMIT_BASE / 2))) {
-		if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE / 2))
-			return TIPC_ERR_OVERLOAD;
-	}
-
 	/* Enqueue message (finally!) */
 	TIPC_SKB_CB(buf)->handle = 0;
 	atomic_inc(&tipc_queue_size);