diff mbox

[V2,5/7] sctp: use limited socket backlog

Message ID 1267605389-7369-5-git-send-email-yi.zhu@intel.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Zhu Yi March 3, 2010, 8:36 a.m. UTC
Make sctp adapt to the limited socket backlog change.

Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
---
 net/sctp/input.c |   12 ++++++++----
 1 files changed, 8 insertions(+), 4 deletions(-)

Comments

Vlad Yasevich March 3, 2010, 2:10 p.m. UTC | #1
Zhu Yi wrote:
> Make sctp adapt to the limited socket backlog change.
> 
> Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
> Cc: Sridhar Samudrala <sri@us.ibm.com>
> Signed-off-by: Zhu Yi <yi.zhu@intel.com>
> ---
>  net/sctp/input.c |   12 ++++++++----
>  1 files changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/net/sctp/input.c b/net/sctp/input.c
> index c0c973e..20e69c3 100644
> --- a/net/sctp/input.c
> +++ b/net/sctp/input.c
> @@ -75,7 +75,7 @@ static struct sctp_association *__sctp_lookup_association(
>  					const union sctp_addr *peer,
>  					struct sctp_transport **pt);
>  
> -static void sctp_add_backlog(struct sock *sk, struct sk_buff *skb);
> +static int sctp_add_backlog(struct sock *sk, struct sk_buff *skb);
>  
>  
>  /* Calculate the SCTP checksum of an SCTP packet.  */
> @@ -265,8 +265,12 @@ int sctp_rcv(struct sk_buff *skb)
>  	}
>  
>  	if (sock_owned_by_user(sk)) {
> +		if (sctp_add_backlog(sk, skb)) {
> +			sctp_bh_unlock_sock(sk);
> +			sctp_chunk_free(chunk);
> +			goto discard_release;
> +		}

I think this will result in a double-free of the skb, because sctp_chunk_free
attempts to free the skb that's been assigned to the chunk.  You can set the skb
to NULL to get around that.

>  		SCTP_INC_STATS_BH(SCTP_MIB_IN_PKT_BACKLOG);
> -		sctp_add_backlog(sk, skb);
>  	} else {
>  		SCTP_INC_STATS_BH(SCTP_MIB_IN_PKT_SOFTIRQ);
>  		sctp_inq_push(&chunk->rcvr->inqueue, chunk);
> @@ -362,7 +366,7 @@ done:
>  	return 0;
>  }
>  
> -static void sctp_add_backlog(struct sock *sk, struct sk_buff *skb)
> +static int sctp_add_backlog(struct sock *sk, struct sk_buff *skb)
>  {
>  	struct sctp_chunk *chunk = SCTP_INPUT_CB(skb)->chunk;
>  	struct sctp_ep_common *rcvr = chunk->rcvr;
> @@ -377,7 +381,7 @@ static void sctp_add_backlog(struct sock *sk, struct sk_buff *skb)
>  	else
>  		BUG();
>  
> -	sk_add_backlog(sk, skb);
> +	return sk_add_backlog_limited(sk, skb);
>  }

You also leak the ref counts here since now it's possible to not add a packet to
 the backlog queue.  That means you'll take refs, but never drop them because
the receive routing will never run.

-vlad
>  
>  /* Handle icmp frag needed error. */
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhu Yi March 3, 2010, 2:19 p.m. UTC | #2
Vlad Yasevich <vladislav.yasevich@hp.com> wrote:

> I think this will result in a double-free of the skb, because sctp_chunk_free
> attempts to free the skb that's been assigned to the chunk.  You can set the skb
> to NULL to get around that.

Ah, I missed that. Thanks!

<...>

> You also leak the ref counts here since now it's possible to not add a packet to
> the backlog queue.  That means you'll take refs, but never drop them because
> the receive routing will never run.

Good catch. I'll fix it.

BTW, does the current backlog limit (sysctl_rmem_default[1] << 1) enough for sctp?
I noticed the sysctl_sctp_rmem[1] is set to 373500 in my box.

Thanks,
-yi

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vlad Yasevich March 3, 2010, 2:36 p.m. UTC | #3
Zhu, Yi wrote:
> Vlad Yasevich <vladislav.yasevich@hp.com> wrote:
> 
>> I think this will result in a double-free of the skb, because sctp_chunk_free
>> attempts to free the skb that's been assigned to the chunk.  You can set the skb
>> to NULL to get around that.
> 
> Ah, I missed that. Thanks!
> 
> <...>
> 
>> You also leak the ref counts here since now it's possible to not add a packet to
>> the backlog queue.  That means you'll take refs, but never drop them because
>> the receive routing will never run.
> 
> Good catch. I'll fix it.
> 
> BTW, does the current backlog limit (sysctl_rmem_default[1] << 1) enough for sctp?
> I noticed the sysctl_sctp_rmem[1] is set to 373500 in my box.
> 

sctp uses the same algorithm as TCP to figure out the memory values.
I guess the issue with using the smaller value that it would be possible to
queue more the socket receive buffer then to the backlog.  Thus backlog would
start dropping packets even though receive buffer would still accept them.

-vlad

> Thanks,
> -yi
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhu Yi March 4, 2010, 2 a.m. UTC | #4
Vlad Yasevich <vladislav.yasevich@hp.com> wrote:

>> BTW, does the current backlog limit (sysctl_rmem_default[1] << 1) enough for sctp?
>> I noticed the sysctl_sctp_rmem[1] is set to 373500 in my box.
>> 

> sctp uses the same algorithm as TCP to figure out the memory values.
> I guess the issue with using the smaller value that it would be possible to
> queue more the socket receive buffer then to the backlog.  Thus backlog would
> start dropping packets even though receive buffer would still accept them.

Sysctl_tcp_rmem[1] = 87380 in my box which is much smaller than sctp. The backlog
limit is set to 258048 by default which is smaller than sysctl_sctp_rmem[1] = 373500.
I don't think backlog starts to drop packets before receive queue is correct. Fortunately
we can set the limit by individual protocols. I'll set the sk->sk_backlog.limit to
sysctl_sctp_rmem[1] in sctp_init_sock() for sctp socks in the next version.

Thanks,
-yi
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/sctp/input.c b/net/sctp/input.c
index c0c973e..20e69c3 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -75,7 +75,7 @@  static struct sctp_association *__sctp_lookup_association(
 					const union sctp_addr *peer,
 					struct sctp_transport **pt);
 
-static void sctp_add_backlog(struct sock *sk, struct sk_buff *skb);
+static int sctp_add_backlog(struct sock *sk, struct sk_buff *skb);
 
 
 /* Calculate the SCTP checksum of an SCTP packet.  */
@@ -265,8 +265,12 @@  int sctp_rcv(struct sk_buff *skb)
 	}
 
 	if (sock_owned_by_user(sk)) {
+		if (sctp_add_backlog(sk, skb)) {
+			sctp_bh_unlock_sock(sk);
+			sctp_chunk_free(chunk);
+			goto discard_release;
+		}
 		SCTP_INC_STATS_BH(SCTP_MIB_IN_PKT_BACKLOG);
-		sctp_add_backlog(sk, skb);
 	} else {
 		SCTP_INC_STATS_BH(SCTP_MIB_IN_PKT_SOFTIRQ);
 		sctp_inq_push(&chunk->rcvr->inqueue, chunk);
@@ -362,7 +366,7 @@  done:
 	return 0;
 }
 
-static void sctp_add_backlog(struct sock *sk, struct sk_buff *skb)
+static int sctp_add_backlog(struct sock *sk, struct sk_buff *skb)
 {
 	struct sctp_chunk *chunk = SCTP_INPUT_CB(skb)->chunk;
 	struct sctp_ep_common *rcvr = chunk->rcvr;
@@ -377,7 +381,7 @@  static void sctp_add_backlog(struct sock *sk, struct sk_buff *skb)
 	else
 		BUG();
 
-	sk_add_backlog(sk, skb);
+	return sk_add_backlog_limited(sk, skb);
 }
 
 /* Handle icmp frag needed error. */