From patchwork Fri Jan 30 18:41:59 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 21251 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by ozlabs.org (Postfix) with ESMTP id 88548DE0A6 for ; Sat, 31 Jan 2009 05:43:34 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752321AbZA3Sn3 (ORCPT ); Fri, 30 Jan 2009 13:43:29 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752258AbZA3Sn3 (ORCPT ); Fri, 30 Jan 2009 13:43:29 -0500 Received: from gw1.cosmosbay.com ([212.99.114.194]:38708 "EHLO gw1.cosmosbay.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751716AbZA3Sn3 convert rfc822-to-8bit (ORCPT ); Fri, 30 Jan 2009 13:43:29 -0500 Received: from [127.0.0.1] (localhost [127.0.0.1]) by gw1.cosmosbay.com (8.13.7/8.13.7) with ESMTP id n0UIfxRR003909; Fri, 30 Jan 2009 19:41:59 +0100 Message-ID: <498349F7.4050300@cosmosbay.com> Date: Fri, 30 Jan 2009 19:41:59 +0100 From: Eric Dumazet User-Agent: Thunderbird 2.0.0.19 (Windows/20081209) MIME-Version: 1.0 To: Stephen Hemminger CC: Herbert Xu , Evgeniy Polyakov , berrange@redhat.com, et-mgmt-tools@redhat.com, davem@davemloft.net, netdev@vger.kernel.org Subject: Re: virt-manager broken by bind(0) in net-next. References: <20090130112125.GA9908@ioremap.net> <20090130125337.GA7155@gondor.apana.org.au> <20090130095737.103edbff@extreme> In-Reply-To: <20090130095737.103edbff@extreme> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.6 (gw1.cosmosbay.com [0.0.0.0]); Fri, 30 Jan 2009 19:42:03 +0100 (CET) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Stephen Hemminger a écrit : > On Fri, 30 Jan 2009 23:53:37 +1100 > Herbert Xu wrote: > >> Evgeniy Polyakov wrote: >>> So it is not explicit bind call, but port autoselection in the >>> connect(). Can you check what errno is returned? >>> Did I understand it right, that connect fails, you try different >>> address, but then suddenly all those sockets become 'alive'? >> Yes, I think a good strace vs. a bad strace would be really helpful >> in these cases. >> >> Thanks, > > I have the strace but it comes up no different. > What is different is that in the broken case (net-next), I see > IPV6 being used: > > State Recv-Q Send-Q Local Address:Port Peer Address:Port > ESTAB 23769 0 ::ffff:127.0.0.1:5900 ::ffff:127.0.0.1:55987 > ESTAB 0 0 127.0.0.1:55987 127.0.0.1:5900 > > and in the working case (2.6.29-rc3), IPV4 is being used > State Recv-Q Send-Q Local Address:Port Peer Address:Port > ESTAB 0 0 127.0.0.1:58894 127.0.0.1:5901 > ESTAB 0 0 127.0.0.1:5901 127.0.0.1:58894 > Reviewing commit a9d8f9110d7e953c2f2b521087a4179677843c2a I see use of a hashinfo->bsockets field that : - lacks proper lock/synchronization - suffers from cache line ping pongs on SMP Also there might be a problem at line 175 if (sk->sk_reuse && sk->sk_state != TCP_LISTEN && --attempts >= 0) { spin_unlock(&head->lock); goto again; If we entered inet_csk_get_port() with a non null snum, we can "goto again" while it was not expected. --- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index df8e72f..752c6b2 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -172,7 +172,8 @@ tb_found: } else { ret = 1; if (inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb)) { - if (sk->sk_reuse && sk->sk_state != TCP_LISTEN && --attempts >= 0) { + if (sk->sk_reuse && sk->sk_state != TCP_LISTEN && + smallest_size == -1 && --attempts >= 0) { spin_unlock(&head->lock); goto again; }