From patchwork Fri May 9 13:13:23 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jon Maloy X-Patchwork-Id: 347407 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 85BA1140122 for ; Fri, 9 May 2014 23:20:56 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755254AbaEINUw (ORCPT ); Fri, 9 May 2014 09:20:52 -0400 Received: from smtp105.biz.mail.ne1.yahoo.com ([98.138.207.12]:37126 "HELO smtp105.biz.mail.ne1.yahoo.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1751748AbaEINUp (ORCPT ); Fri, 9 May 2014 09:20:45 -0400 Received: (qmail 95329 invoked from network); 9 May 2014 13:14:05 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1399641245; bh=SZ7EWNN6DX8k3FvjsySzWKBvaYqXCU2M5WWsU/W2+/s=; h=X-Yahoo-Newman-Property:X-YMail-OSG:X-Yahoo-SMTP:X-Rocket-Received:From:To:Cc:Subject:Date:Message-Id:X-Mailer:In-Reply-To:References; b=EXp8eNqSDzx5Xb8+3MItTeq6UaFC6n5uhZAoNwRUZtjtzzodA4K49ti/x+4ClJdZnq+n6ow8dFi0CnsRNuTOR/t9gDbxjj6isA8a5pl5rDddNdKI3bE0xEcoKPTzTxQ185ZuBk1SPJIHMGpIUm3oZWprVOXHxPq/wZ0Ix/sJ2Go= X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: LeCCM9oVM1luZbbedOJFRrR_nK842C16JPKxxxLKYCGKo0p EAh4B_UJOpWM6qOgX5xTfGjXAi9.0ogYrWRQdRrdW8f.4_MH2MA4Qt7flbCO Nf9oBChmeH1CbdgLWD_B5NzfpnIjPwHTH3_lr3ofvcFqZQ0MwQS.OR8iP6Xo 2fwq.ragEjj7pZdTk4wvfs6msrR_8U0POfKYTT.dTv3.3FsUU_ODrv6jRbx6 UqFcFW6VtGqvf4AiMW8cyCIelf3eprX8nZvgSuSh603itckuBu.jnzP7P23S VU9WcuScx0.NMrDM0JspKCQyRHGBHWsPhVwB.YXw.noHBdNo0eGtJ_MMmWkK QhQ0w5Qt.FId2zQNcqHZoQUI6ZdK6Uv5bRmZtS1a6H062PXafMpy5VRv9vGE ieioT0FlPYxQ5H5OiRARGPbG6NjWzgF3tV0XLpPNybLbXBzLlnxkvBKJOwe0 K3cMCBtsAyba4HsUF8r3uMeOkKKIT_moVknO.buG6oEgeXPeis1eNdlCZMAo Y9sMCNuEKwOpMcaUqottKYNOT6.GJ5A-- X-Yahoo-SMTP: gPXIZm2swBAFQJ_Vx0CebjUfUdhJ X-Rocket-Received: from goethe.lan (jon.maloy@65.93.115.57 with plain [98.138.105.25]) by smtp105.biz.mail.ne1.yahoo.com with SMTP; 09 May 2014 06:14:05 -0700 PDT From: Jon Maloy To: davem@davemloft.net Cc: netdev@vger.kernel.org, Paul Gortmaker , erik.hugne@ericsson.com, ying.xue@windriver.com, maloy@donjonn.com, tipc-discussion@lists.sourceforge.net, Jon Maloy Subject: [PATCH net-next 2/8] tipc: compensate for double accounting in socket rcv buffer Date: Fri, 9 May 2014 09:13:23 -0400 Message-Id: <1399641209-26112-3-git-send-email-jon.maloy@ericsson.com> X-Mailer: git-send-email 1.7.9.5 In-Reply-To: <1399641209-26112-1-git-send-email-jon.maloy@ericsson.com> References: <1399641209-26112-1-git-send-email-jon.maloy@ericsson.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org The function net/core/sock.c::__release_sock() runs a tight loop to move buffers from the socket backlog queue to the receive queue. As a security measure, sk_backlog.len of the receiving socket is not set to zero until after the loop is finished, i.e., until the whole backlog queue has been transferred to the receive queue. During this transfer, the data that has already been moved is counted both in the backlog queue and the receive queue, hence giving an incorrect picture of the available queue space for new arriving buffers. This leads to unnecessary rejection of buffers by sk_add_backlog(), which in TIPC leads to unnecessarily broken connections. In this commit, we compensate for this double accounting by adding a counter that keeps track of it. The function socket.c::backlog_rcv() receives buffers one by one from __release_sock(), and adds them to the socket receive queue. If the transfer is successful, it increases a new atomic counter 'tipc_sock::dupl_rcvcnt' with 'truesize' of the transferred buffer. If a new buffer arrives during this transfer and finds the socket busy (owned), we attempt to add it to the backlog. However, when sk_add_backlog() is called, we adjust the 'limit' parameter with the value of the new counter, so that the risk of inadvertent rejection is eliminated. It should be noted that this change does not invalidate the original purpose of zeroing 'sk_backlog.len' after the full transfer. We set an upper limit for dupl_rcvcnt, so that if a 'wild' sender (i.e., one that doesn't respect the send window) keeps pumping in buffers to sk_add_backlog(), he will eventually reach an upper limit, (2 x TIPC_CONN_OVERLOAD_LIMIT). After that, no messages can be added to the backlog, and the connection will be broken. Ordinary, well- behaved senders will never will never reach the buffer limit at all. Signed-off-by: Jon Maloy Reviewed-by: Ying Xue --- net/tipc/socket.c | 28 +++++++++++++++++++--------- net/tipc/socket.h | 2 ++ 2 files changed, 21 insertions(+), 9 deletions(-) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 8685daf..2495006 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -1,5 +1,5 @@ /* - * net/tipc/socket.c: TIPC socket API +* net/tipc/socket.c: TIPC socket API * * Copyright (c) 2001-2007, 2012-2014, Ericsson AB * Copyright (c) 2004-2008, 2010-2013, Wind River Systems @@ -45,7 +45,7 @@ #define CONN_TIMEOUT_DEFAULT 8000 /* default connect timeout = 8s */ -static int backlog_rcv(struct sock *sk, struct sk_buff *skb); +static int tipc_backlog_rcv(struct sock *sk, struct sk_buff *skb); static void tipc_data_ready(struct sock *sk); static void tipc_write_space(struct sock *sk); static int tipc_release(struct socket *sock); @@ -196,11 +196,12 @@ static int tipc_sk_create(struct net *net, struct socket *sock, sock->state = state; sock_init_data(sock, sk); - sk->sk_backlog_rcv = backlog_rcv; + sk->sk_backlog_rcv = tipc_backlog_rcv; sk->sk_rcvbuf = sysctl_tipc_rmem[1]; sk->sk_data_ready = tipc_data_ready; sk->sk_write_space = tipc_write_space; - tipc_sk(sk)->conn_timeout = CONN_TIMEOUT_DEFAULT; + tsk->conn_timeout = CONN_TIMEOUT_DEFAULT; + atomic_set(&tsk->dupl_rcvcnt, 0); tipc_port_unlock(port); if (sock->state == SS_READY) { @@ -1416,7 +1417,7 @@ static u32 filter_rcv(struct sock *sk, struct sk_buff *buf) } /** - * backlog_rcv - handle incoming message from backlog queue + * tipc_backlog_rcv - handle incoming message from backlog queue * @sk: socket * @buf: message * @@ -1424,13 +1425,18 @@ static u32 filter_rcv(struct sock *sk, struct sk_buff *buf) * * Returns 0 */ -static int backlog_rcv(struct sock *sk, struct sk_buff *buf) +static int tipc_backlog_rcv(struct sock *sk, struct sk_buff *buf) { u32 res; + struct tipc_sock *tsk = tipc_sk(sk); res = filter_rcv(sk, buf); - if (res) + if (unlikely(res)) tipc_reject_msg(buf, res); + + if (atomic_read(&tsk->dupl_rcvcnt) < TIPC_CONN_OVERLOAD_LIMIT) + atomic_add(buf->truesize, &tsk->dupl_rcvcnt); + return 0; } @@ -1445,8 +1451,9 @@ static int backlog_rcv(struct sock *sk, struct sk_buff *buf) */ u32 tipc_sk_rcv(struct sock *sk, struct sk_buff *buf) { + struct tipc_sock *tsk = tipc_sk(sk); u32 res; - + uint limit; /* * Process message if socket is unlocked; otherwise add to backlog queue * @@ -1457,7 +1464,10 @@ u32 tipc_sk_rcv(struct sock *sk, struct sk_buff *buf) if (!sock_owned_by_user(sk)) { res = filter_rcv(sk, buf); } else { - if (sk_add_backlog(sk, buf, rcvbuf_limit(sk, buf))) + if (sk->sk_backlog.len == 0) + atomic_set(&tsk->dupl_rcvcnt, 0); + limit = rcvbuf_limit(sk, buf) + atomic_read(&tsk->dupl_rcvcnt); + if (sk_add_backlog(sk, buf, limit)) res = TIPC_ERR_OVERLOAD; else res = TIPC_OK; diff --git a/net/tipc/socket.h b/net/tipc/socket.h index 74e5c7f..86c27cc 100644 --- a/net/tipc/socket.h +++ b/net/tipc/socket.h @@ -44,12 +44,14 @@ * @port: port - interacts with 'sk' and with the rest of the TIPC stack * @peer_name: the peer of the connection, if any * @conn_timeout: the time we can wait for an unresponded setup request + * @dupl_rcvcnt: number of bytes counted twice, in both backlog and rcv queue */ struct tipc_sock { struct sock sk; struct tipc_port port; unsigned int conn_timeout; + atomic_t dupl_rcvcnt; }; static inline struct tipc_sock *tipc_sk(const struct sock *sk)