From patchwork Tue Mar 2 12:11:45 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: jamal X-Patchwork-Id: 46633 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id D9A4EB7D2D for ; Tue, 2 Mar 2010 23:12:01 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752478Ab0CBML4 (ORCPT ); Tue, 2 Mar 2010 07:11:56 -0500 Received: from mail-vw0-f46.google.com ([209.85.212.46]:59462 "EHLO mail-vw0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752123Ab0CBMLz (ORCPT ); Tue, 2 Mar 2010 07:11:55 -0500 Received: by vws9 with SMTP id 9so57112vws.19 for ; Tue, 02 Mar 2010 04:11:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:subject:from:reply-to :to:cc:in-reply-to:references:content-type:date:message-id :mime-version:x-mailer:content-transfer-encoding; bh=cMf1tt32+cjsAEZ10LM0cFSDU6raH9O5VpmLefGsaNY=; b=pYlIKZevKuzWNrphtlsyrHwKAN2UnLBZsHk88hCD6FZpVb4SxCyNFVrz0J8I2m4Ptg 9WQ7srBqLKJ7HyZ9ARZKUUuAFzCv7nd9E5wUZHLqfGeIlwr0Tb+w8fk7FmluWgMk6Plj kYD4gYOVVpzmTPn58ZxgVounEMYuAHqGGW12g= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:subject:from:reply-to:to:cc:in-reply-to:references :content-type:date:message-id:mime-version:x-mailer :content-transfer-encoding; b=RDZYW9FnuFFG6Y8eOMq6IcfJddj1NjUfWqXWWG1M2EG+2Ehyll8alKAXi0I2ouya7q ZWCU2fG47B0n7Tq66A5wm1SD8hP4+UKejmTZqphffImoNQZUMfFB8jPsL7XzbpKYzpIg 4GOB0uWt0eAD8BfAjGselTGu7KdMcgSEQ5jig= Received: by 10.220.124.15 with SMTP id s15mr4095433vcr.60.1267531914235; Tue, 02 Mar 2010 04:11:54 -0800 (PST) Received: from ?10.0.0.26? (CPE0030ab124d2f-CM001bd7a7f1a0.cpe.net.cable.rogers.com [99.240.66.42]) by mx.google.com with ESMTPS id 22sm36532597vws.14.2010.03.02.04.11.52 (version=SSLv3 cipher=RC4-MD5); Tue, 02 Mar 2010 04:11:53 -0800 (PST) Subject: Re: [RFC PATCH]xfrm: fix perpetual bundles From: jamal Reply-To: hadi@cyberus.ca To: Herbert Xu Cc: davem@davemloft.net, kaber@trash.net, yoshfuji@linux-ipv6.org, nakam@linux-ipv6.org, eric.dumazet@gmail.com, netdev@vger.kernel.org, Steffen Klassert In-Reply-To: <20100302112754.GA1513@gondor.apana.org.au> References: <1267017564.3973.790.camel@bigi> <20100302112754.GA1513@gondor.apana.org.au> Date: Tue, 02 Mar 2010 07:11:45 -0500 Message-Id: <1267531905.21749.21.camel@bigi> Mime-Version: 1.0 X-Mailer: Evolution 2.26.1 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Tue, 2010-03-02 at 19:27 +0800, Herbert Xu wrote: > On Wed, Feb 24, 2010 at 08:19:24AM -0500, jamal wrote: > > 1)In the connect() stage, in the slow path a route cache is > > created with the rth->fl.fl4_src of 0.0.0.0... > > ==> policy->bundles is empty, so we do a lookup, fail, create > > one.. (remember rth->fl.fl4_src of 0.0.0.0 at this stage and > > thats what we end storing in the bundle/xdst for later comparison > > instead of the skb's fl) > > So this is root number 1. When this stuff was first written this > case simply wasn't possible. So I think the question we need to > ask here is can we get a valid address there at the connect stage? fl->fl4_src is valid non-zero. But in xfrm4_fill_dst() we do wholesale copy of xdst->u.rt.fl = rt->fl; and rt->fl.fl4_src is 0.0.0.0. > After all, for non-IPsec connect(2)s, you do get a valid IP address. > So I don't see why the IPsec case should be different. > > Creating a bundle with a zero source address is just a hack to > make connect(2) succeed immediately. AFAICS getting a valid IP > address can also be done without waiting for the whole IPsec state > to be created. > I did try to "fix it" above via: + if (!xdst->u.rt.fl.fl4_src) { + xdst->u.rt.fl.fl4_src = fl->fl4_src; + } But this breaks again later in sendmsg bundle lookup because of XFRM_SUB_POLICY. If i turned off config XFRM_SUB_POLICY, then all works. I didnt look closely, but SUB_POLICY does do a memcpy or two off the dst passed in connect() - which has the wrong src. So i would have to "fix" a few more spots for it to work. This is where i gave up concluding that i was just plugging with band-aids. > Of course if anybody is still interested we could also revisit > the neighbouresque queueing idea. not plugged into that discussion.. > > 2)ping sends a packet (does a sendmsg) > > ==> xfrm_find_bundle() ends up comparing skb's->fl (non-zero > > fl->fl4_src) with what we stored from #1b. Fails. > > ==> we create a new bundle at attach the old one at the end of it. > > ...and now policy->bundles has two xdst entries > > This is the way it's supposed to work. > > 3) Repeat #2, and now we have 3 xdsts in policy bundles > > This is what I don't understand. The code is supposed to look > at every bundle attached to the policy. So why doesn't it find > the one we created at step #2? The issue is that the comparison is between xdst->u.rt.fl.fl4_src and fl->fl4_src. fl->fl4_src is always non-zero. stored xdst->u.rt.fl.fl4_src is always zero > In conclusion, I think we have two problems, with the second > one being the most immediate cause of your symptoms. Remember the route cache (refer to dst copying above) is created at connect time;-> So Steffen (on CC) tried to "fix it" by fixing at route cache creation time. His approach: sk, flags ? XFRM_LOOKUP_WAIT : 0); if (err == -EREMOTE) --- I was worried about the impact of this on something else that expects the behavior. cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -2778,15 +2778,26 @@ int ip_route_output_flow(struct net *net, struct rtable **rp, struct flowi *flp, struct sock *sk, int flags) { int err; + int update_route = 0; if ((err = __ip_route_output_key(net, rp, flp)) != 0) return err; if (flp->proto) { - if (!flp->fl4_src) + if (!flp->fl4_src) { flp->fl4_src = (*rp)->rt_src; - if (!flp->fl4_dst) + update_route = 1; + } + if (!flp->fl4_dst) { flp->fl4_dst = (*rp)->rt_dst; + update_route = 1; + } + if (update_route) { + dst_release(&(*rp)->u.dst); + if ((err = __ip_route_output_key(net, rp, flp)) != 0) + return err; + } + err = __xfrm_lookup(net, (struct dst_entry **)rp, flp,