From patchwork Sat Oct 6 05:42:37 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "jeff.liu" X-Patchwork-Id: 189664 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id DC2062C020B for ; Sat, 6 Oct 2012 15:43:52 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752057Ab2JFFnu (ORCPT ); Sat, 6 Oct 2012 01:43:50 -0400 Received: from rcsinet15.oracle.com ([148.87.113.117]:30155 "EHLO rcsinet15.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750835Ab2JFFnt (ORCPT ); Sat, 6 Oct 2012 01:43:49 -0400 Received: from ucsinet21.oracle.com (ucsinet21.oracle.com [156.151.31.93]) by rcsinet15.oracle.com (Sentrion-MTA-4.2.2/Sentrion-MTA-4.2.2) with ESMTP id q965hkiG016005 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Sat, 6 Oct 2012 05:43:47 GMT Received: from acsmt358.oracle.com (acsmt358.oracle.com [141.146.40.158]) by ucsinet21.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id q965hjbF002551 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 6 Oct 2012 05:43:46 GMT Received: from abhmt112.oracle.com (abhmt112.oracle.com [141.146.116.64]) by acsmt358.oracle.com (8.12.11.20060308/8.12.11) with ESMTP id q965hjfb028341; Sat, 6 Oct 2012 00:43:45 -0500 Received: from [10.191.10.132] (/10.191.10.132) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Fri, 05 Oct 2012 22:43:44 -0700 Message-ID: <506FC4CD.7070509@oracle.com> Date: Sat, 06 Oct 2012 13:42:37 +0800 From: Jeff Liu Reply-To: jeff.liu@oracle.com Organization: Oracle User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.28) Gecko/20120313 Thunderbird/3.1.20 MIME-Version: 1.0 To: rds-devel@oss.oracle.com CC: Venkat Venkatsubra , Dan Carpenter , davem@davemloft.net, James Morris , netdev@vger.kernel.org Subject: [PATCH] RDS: Fix spinlock recursion for rds over tcp transmit X-Source-IP: ucsinet21.oracle.com [156.151.31.93] Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Hello, RDS ping/pong over TCP feature has broke for years(2.6.39 to 3.6.0) since we have to set TCP cork and call kerenel_sendmsg() to reply a ping requirement which both need to lock "struct sock *sk". However, this lock has already been hold before our rda_tcp_data_ready() callback is triggerred. As a result, we always facing spinlock recursion which would resulting in system panic... Given that RDS ping is a special kind of message, we don't need to reply it as soon as possible, IMHO, we can schedule it to work queue as a delayed response to make TCP transport totally works. Also, I think we can using the system default work queue to serve it to reduce the possible impact on general TCP transmit. With below patch, I have run rds-ping(run multiple ping between two hosts at the same time) and rds-stress(hostA listen, hostB send packets) for half day, it works to me. Thanks, -Jeff Reported-by: Dan Carpenter CC: Venkat Venkatsubra CC: David S. Miller CC: James Morris Signed-off-by: Jie Liu --- net/rds/send.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++++---- net/rds/tcp.h | 7 +++++ 2 files changed, 82 insertions(+), 5 deletions(-) diff --git a/net/rds/send.c b/net/rds/send.c index 96531d4..011006e 100644 --- a/net/rds/send.c +++ b/net/rds/send.c @@ -38,8 +38,10 @@ #include #include #include +#include #include "rds.h" +#include "tcp.h" /* When transmitting messages in rds_send_xmit, we need to emerge from * time to time and briefly release the CPU. Otherwise the softlock watchdog @@ -55,6 +57,12 @@ static int send_batch_count = 64; module_param(send_batch_count, int, 0444); MODULE_PARM_DESC(send_batch_count, " batch factor when working the send queue"); +/* RDS over TCP ping/pong */ +static void rds_tcp_pong_worker(struct work_struct *work); +static DEFINE_SPINLOCK(rds_tcp_pong_lock); +static LIST_HEAD(rds_tcp_pong_list); +static DECLARE_WORK(rds_tcp_pong_work, rds_tcp_pong_worker); + static void rds_send_remove_from_sock(struct list_head *messages, int status); /* @@ -1082,11 +1090,7 @@ out: return ret; } -/* - * Reply to a ping packet. - */ -int -rds_send_pong(struct rds_connection *conn, __be16 dport) +static int rds_tcp_send_pong(struct rds_connection *conn, __be16 dport) { struct rds_message *rm; unsigned long flags; @@ -1132,3 +1136,69 @@ out: rds_message_put(rm); return ret; } + +static void rds_tcp_pong_worker(struct work_struct *work) +{ + struct rds_tcp_pong *tp; + struct rds_connection *conn; + __be16 dport; + + spin_lock(&rds_tcp_pong_lock); + if (list_empty(&rds_tcp_pong_list)) + goto out_unlock; + + /* + * Process on tcp pong once one time to reduce the possbile impact + * on normal transmit. + */ + tp = list_entry(rds_tcp_pong_list.next, struct rds_tcp_pong, tp_node); + conn = tp->tp_conn; + dport = tp->tp_dport; + list_del(&tp->tp_node); + spin_unlock(&rds_tcp_pong_lock); + + kfree(tp); + rds_tcp_send_pong(conn, dport); + goto out; + +out_unlock: + spin_unlock(&rds_tcp_pong_lock); +out: + return; +} + +/* + * RDS over TCP transport support ping/pong message. However, it + * always resulting in sock spinlock recursion up to 3.7.0. To solve + * this issue, we can shedule it to work queue as a delayed response. + * Here we using the system default work queue. + */ +int rds_tcp_pong(struct rds_connection *conn, __be16 dport) +{ + struct rds_tcp_pong *tp; + int ret = 0; + + tp = kmalloc(sizeof(*tp), GFP_ATOMIC); + if (!tp) { + ret = -ENOMEM; + goto out; + } + + tp->tp_conn = conn; + tp->tp_dport = dport; + spin_lock(&rds_tcp_pong_lock); + list_add_tail(&tp->tp_node, &rds_tcp_pong_list); + spin_unlock(&rds_tcp_pong_lock); + schedule_work(&rds_tcp_pong_work); + +out: + return ret; +} + +/* + * Reply to a ping package, TCP only. + */ +int rds_send_pong(struct rds_connection *conn, __be16 dport) +{ + return rds_tcp_pong(conn, dport); +} diff --git a/net/rds/tcp.h b/net/rds/tcp.h index 9cf2927..c4c7e01 100644 --- a/net/rds/tcp.h +++ b/net/rds/tcp.h @@ -3,6 +3,13 @@ #define RDS_TCP_PORT 16385 +/* RDS over TCP ping/pong message entry */ +struct rds_tcp_pong { + struct list_head tp_node; + struct rds_connection *tp_conn; + __be16 tp_dport; +}; + struct rds_tcp_incoming { struct rds_incoming ti_inc; struct sk_buff_head ti_skb_list;