From patchwork Tue Mar 15 04:42:46 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 86852 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 0FE28B6FEF for ; Tue, 15 Mar 2011 15:42:59 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751276Ab1COEmv (ORCPT ); Tue, 15 Mar 2011 00:42:51 -0400 Received: from mail-ww0-f44.google.com ([74.125.82.44]:40834 "EHLO mail-ww0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750808Ab1COEmu (ORCPT ); Tue, 15 Mar 2011 00:42:50 -0400 Received: by wwa36 with SMTP id 36so266128wwa.1 for ; Mon, 14 Mar 2011 21:42:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:subject:from:to:cc:in-reply-to:references :content-type:date:message-id:mime-version:x-mailer :content-transfer-encoding; bh=zK2x67En9QlB1qgUZ9RvPiXgfhYjDc9NOQ+6t7pxS68=; b=HFATlKIvYWmU72k1UtuS8uoSugM7t5kOqok8pensCAxlUI2TxtmVKZnjNBA5DVH88I N3DALdZZI89V5CKjjnOx6O+WI2oSDg3fCAQFCVBG6BVjd70U6XgXgAJwqqk06t1Bp3PT TTh8a2rEZlsL0lFnu+OP/XA2TwdyTpUhE1a9g= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=subject:from:to:cc:in-reply-to:references:content-type:date :message-id:mime-version:x-mailer:content-transfer-encoding; b=p93+1QK2fFrrYGUB2EZHDPlMgEXyEeXPJbs/mFHOhveJOpIwKm8EHxrULdXsaisTmO 8X0NBpVhmHel8oWBlS7S3g6HHUKZ5lDlui7EFJbwkBY26sbUjP99jRlrAa57/F4zO8zH yO2iosNjTZCjwRrKXQpHaMf7vZgTuuHzTDZmc= Received: by 10.227.195.129 with SMTP id ec1mr446343wbb.180.1300164169424; Mon, 14 Mar 2011 21:42:49 -0700 (PDT) Received: from [192.168.1.21] (167.237.66-86.rev.gaoland.net [86.66.237.167]) by mx.google.com with ESMTPS id y29sm6765916wbd.4.2011.03.14.21.42.48 (version=SSLv3 cipher=OTHER); Mon, 14 Mar 2011 21:42:48 -0700 (PDT) Subject: Re: [Bloat] shaper team forming up From: Eric Dumazet To: Jonathan Morton , David Miller Cc: Dave =?ISO-8859-1?Q?T=E4ht?= , netdev In-Reply-To: <1300134277.2649.19.camel@edumazet-laptop> References: <87wrk1a4gx.fsf@cruithne.co.teklibre.org> <5BC42741-852B-4699-BA5D-D70B8D610D96@gmail.com> <1300134277.2649.19.camel@edumazet-laptop> Date: Tue, 15 Mar 2011 05:42:46 +0100 Message-ID: <1300164166.2649.70.camel@edumazet-laptop> Mime-Version: 1.0 X-Mailer: Evolution 2.30.3 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Le lundi 14 mars 2011 à 21:24 +0100, Eric Dumazet a écrit : remove CC to bloat lists for now, adding David Miller to thread. > Le lundi 14 mars 2011 à 21:55 +0200, Jonathan Morton a écrit : > > On 14 Mar, 2011, at 9:26 pm, Dave Täht wrote: > > > > > Over the weekend, Dan Siemons uncovered a possible bad interaction > > > between ECN and the default pfifo_fast qdisc in Linux. > > > > > > http://www.coverfire.com/archives/2011/03/13/pfifo_fast-and-ecn/ > > > > This seems to be more complicated that it appears. It looks as though > > Linux has re-used the LSB of the old TOS field for some "link local" > > flag which is used by routing. > > > > It's not immediately obvious whether pfifo_fast is using this new > > interpretation though. If it isn't, the fix should be to remove the > > RTO_ONLINK bit from the mask it's using on the tos field. The other > > half of the mask correctly excludes the ECN bits from the field. > > > > CC netdev, where linux network dev can take a look. > > I would say that this is a wrong analysis : > > 1) ECN uses two low order bits of TOS byte > > 2) pfifo_fast uses skb->priority > > > skb->priority = rt_tos2priority(iph->tos); > > #define IPTOS_TOS_MASK 0x1E > #define IPTOS_TOS(tos) ((tos)&IPTOS_TOS_MASK) > > static inline char rt_tos2priority(u8 tos) > { > return ip_tos2prio[IPTOS_TOS(tos)>>1]; > } > > No interference between two mechanisms, unless sysadmin messed up things > (skb_edit) > > David, it seems ip_tos2prio is wrong on its 2nd entry : #define TC_PRIO_BESTEFFORT 0 #define TC_PRIO_FILLER 1 #define TC_PRIO_BULK 2 #define TC_PRIO_INTERACTIVE_BULK 4 #define TC_PRIO_INTERACTIVE 6 #define TC_PRIO_CONTROL 7 #define TC_PRIO_MAX 15 net/ipv4/route.c:170:#define ECN_OR_COST(class) TC_PRIO_##class const __u8 ip_tos2prio[16] = { TC_PRIO_BESTEFFORT, /* 0 : for flow without ECN */ ECN_OR_COST(FILLER), /* 1 : flow with ECN */ ... }; This means ECN enabled flows got TC_PRIO_FILLER (what the hell is that ?) pfifo_fast has : static const u8 prio2band[TC_PRIO_MAX+1] = { 1, 2, 2, 2, 1, 2, 0, 0 , 1, 1, 1, 1, 1, 1, 1, 1 }; So a non ECN enabled flow goes to band 1, while an ECN enabled one is in band 2 (!). Thus, ECN enabled flows have a chance being droped more often than non ECN flows. Thats not fair... What do you think ? Thanks --- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 6ed6603..fabfe81 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -171,7 +171,7 @@ static struct dst_ops ipv4_dst_ops = { const __u8 ip_tos2prio[16] = { TC_PRIO_BESTEFFORT, - ECN_OR_COST(FILLER), + ECN_OR_COST(BESTEFFORT), TC_PRIO_BESTEFFORT, ECN_OR_COST(BESTEFFORT), TC_PRIO_BULK,