Message ID | 1389375935.31367.102.camel@edumazet-glaptop2.roam.corp.google.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Fri, Jan 10, 2014 at 9:45 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Fri, 2014-01-10 at 08:27 -0800, Tom Herbert wrote: >> On Thu, Jan 9, 2014 at 9:38 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: >> > On Thu, 2014-01-09 at 20:54 -0800, Tom Herbert wrote: >> >> When initializing a gro_list for a packet, first check the rxhash of >> >> the incoming skb against that of the skb's in the list. This should be >> >> a very strong inidicator of whether the flow is going to be matched, >> >> and potentially allows a lot of other checks to be short circuited. >> >> >> > >> > Hmm... this idea was discussed in the past. I used it when attempting to >> > use a hash table instead of a gro_list last year. >> > >> > Unfortunately this added lot of cycles when rxhash is not provided by >> > hardware, and my tests found it was not a win. >> > >> > Remember : in most cases, gro_list contains one flow, so this test does >> > nothing special but adds overhead. >> >> I don't understand what your basis is that gro_list in most cases >> contains one flow, but assuming that were true, maybe we should make >> the it only contain one flow eliminating the complexity of multiple >> flows (same_flow logic is convoluted and layers of encapsulation is >> not going to simplify things). >> > > The complexity of GRO is the aggregation itself. You wont avoid it. > >> If we are doing RPS or RFS, rxhash will be computed anyway, so the >> case your optimizing is pretty narrow: no RPS, no RFS, no hardware >> hash, and a single flow in gro_list. Nevertheless, if this is really >> an important concern, we can make the check directly against >> skb->rxhash so to be opportunistic and avoid the possibility of >> computation. > > > We'll compute rxhash once per GRO packet, containing up to 40 MSS > packets. > > Thats a big difference. > The objective is to compute the rxhash at most once per packet. > If your patch was doing this, I would have no complain. > > (No new conditional branch, and skb->rxhash, if provided by the NIC, > can give a hint.) > > diff --git a/net/core/dev.c b/net/core/dev.c > index ce01847793c0..c9f7a26d7ce7 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -3798,7 +3798,8 @@ static void gro_list_prepare(struct napi_struct *napi, struct sk_buff *skb) > for (p = napi->gro_list; p; p = p->next) { > unsigned long diffs; > > - diffs = (unsigned long)p->dev ^ (unsigned long)skb->dev; > + diffs = p->rxhash ^ skb->rxhash; > + diffs |= (unsigned long)p->dev ^ (unsigned long)skb->dev; > diffs |= p->vlan_tci ^ skb->vlan_tci; > if (maclen == ETH_HLEN) > diffs |= compare_ether_header(skb_mac_header(p), > As I said in the commit log, the skb->rxhash should be a very strong indicator of that flows will match (maybe >99% ?), so putting that first potentially short circuits a lot of work even in just this function. > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2014-01-10 at 10:15 -0800, Tom Herbert wrote: > The objective is to compute the rxhash at most once per packet. Once per GRO packet ( up to 45 MSS) or once per incoming frame ( 1 MSS ) ? Your patch computes rxhash for every incoming frame. > > > If your patch was doing this, I would have no complain. > > > > (No new conditional branch, and skb->rxhash, if provided by the NIC, > > can give a hint.) > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > index ce01847793c0..c9f7a26d7ce7 100644 > > --- a/net/core/dev.c > > +++ b/net/core/dev.c > > @@ -3798,7 +3798,8 @@ static void gro_list_prepare(struct napi_struct *napi, struct sk_buff *skb) > > for (p = napi->gro_list; p; p = p->next) { > > unsigned long diffs; > > > > - diffs = (unsigned long)p->dev ^ (unsigned long)skb->dev; > > + diffs = p->rxhash ^ skb->rxhash; > > + diffs |= (unsigned long)p->dev ^ (unsigned long)skb->dev; > > diffs |= p->vlan_tci ^ skb->vlan_tci; > > if (maclen == ETH_HLEN) > > diffs |= compare_ether_header(skb_mac_header(p), > > > As I said in the commit log, the skb->rxhash should be a very strong > indicator of that flows will match (maybe >99% ?), so putting that > first potentially short circuits a lot of work even in just this > function. Are you speaking of your "goto skip;" ? compare_ether_header() is done with 10 instructions on x86_64 There is no point trying to avoid it. Really, 99% of the time gro_list contains zero or one single slot, I have hard data saying so. If you want to optimize the case where list is fully populated (because of yet another synthetic benchmark you use), you really need to build a temporary list so that all layers do not even have to check NAPI_GRO_CB(p)->same_flow Each gro handler would remove non matching flow from this temp list. -> when we finally reach tcp_gro_receive(), list would contain a single element. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Really, 99% of the time gro_list contains zero or one single slot, I > have hard data saying so. > Please provide the data. > If you want to optimize the case where list is fully populated (because > of yet another synthetic benchmark you use), you really need to build a > temporary list so that all layers do not even have to check > NAPI_GRO_CB(p)->same_flow > Well if you prefer, you can think of the "synthetic benchmark" as emulating an obvious DOS attack by pumping MSS sized TCP segments with random ports to a server. The stack needs to be resilient to such things, an O(n*m) algorithm in the data path is a red flag. > Each gro handler would remove non matching flow from this temp list. > > -> when we finally reach tcp_gro_receive(), list would contain a single > element. > > > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/core/dev.c b/net/core/dev.c index ce01847793c0..c9f7a26d7ce7 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3798,7 +3798,8 @@ static void gro_list_prepare(struct napi_struct *napi, struct sk_buff *skb) for (p = napi->gro_list; p; p = p->next) { unsigned long diffs; - diffs = (unsigned long)p->dev ^ (unsigned long)skb->dev; + diffs = p->rxhash ^ skb->rxhash; + diffs |= (unsigned long)p->dev ^ (unsigned long)skb->dev; diffs |= p->vlan_tci ^ skb->vlan_tci; if (maclen == ETH_HLEN) diffs |= compare_ether_header(skb_mac_header(p),