Message ID | 20100519075721.GA23926@gondor.apana.org.au |
---|---|
State | Superseded, archived |
Delegated to: | David Miller |
Headers | show |
Le mercredi 19 mai 2010 à 17:57 +1000, Herbert Xu a écrit : > Hi: > > tun: Use netif_receive_skb instead of netif_rx > > First a bit of history as I recall, Dave can correct me where > he recalls differently :) > > 1) There was netif_rx and everyone had to use that. > 2) Everyone had to use that, including drivers/net/tun.c. > 3) NAPI brings us netif_receive_skb. > 4) About the same time people noticed that tun.c can cause wild > fluctuations in latency because of its use of netif_rx with IRQs > enabled. > 5) netif_rx_ni was added to address this. > 6) netif_rx() pro is that packet processing is done while stack usage is guaranteed to be low (from process_backlog, using a special softirq stack, instead of current stack) After your patch, tun will use more stack. Is it safe on all contexts ? Another concern I have is about RPS. netif_receive_skb() must be called from process_backlog() context, or there is no guarantee the IPI will be sent if this skb is enqueued for another cpu. > However, netif_rx_ni > was really a bit of a roundabout way of > injecting a packet if you think about it. What ends up happening > is that we always queue the packet into the backlog, and then > immediately process it. Which is what would happen if we simply > called netif_receive_skb directly. > > So this patch just does the obvious thing and makes tun.c call > netif_receive_skb, albeit through the netif_receive_skb_ni wrapper > which does the necessary things for calling it in process context. > > Now apart from potential performance gains from eliminating > unnecessary steps in the process, this has the benefit of keeping > the process context for the packet processing. This is needed > by cgroups to shape network traffic based on the original process. > > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > index 4326520..0eed49f 100644 > --- a/drivers/net/tun.c > +++ b/drivers/net/tun.c > @@ -667,7 +667,7 @@ static __inline__ ssize_t tun_get_user(struct tun_struct *tun, > skb_shinfo(skb)->gso_segs = 0; > } > > - netif_rx_ni(skb); > + netif_receive_skb_ni(skb); > > tun->dev->stats.rx_packets++; > tun->dev->stats.rx_bytes += len; > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index fa8b476..34bb405 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -1562,6 +1562,18 @@ extern int netif_rx(struct sk_buff *skb); > extern int netif_rx_ni(struct sk_buff *skb); > #define HAVE_NETIF_RECEIVE_SKB 1 > extern int netif_receive_skb(struct sk_buff *skb); > + > +static inline int netif_receive_skb_ni(struct sk_buff *skb) > +{ > + int err; > + > + local_bh_disable(); > + err = netif_receive_skb(skb); > + local_bh_enable(); > + > + return err; > +} > + > extern gro_result_t dev_gro_receive(struct napi_struct *napi, > struct sk_buff *skb); > extern gro_result_t napi_skb_finish(gro_result_t ret, struct sk_buff *skb); > > Cheers, -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le mercredi 19 mai 2010 à 10:09 +0200, Eric Dumazet a écrit : > Another concern I have is about RPS. > > netif_receive_skb() must be called from process_backlog() context, or > there is no guarantee the IPI will be sent if this skb is enqueued for > another cpu. Hmm, I just checked again, and this is wrong. In case we enqueue skb on a remote cpu backlog, we also do __raise_softirq_irqoff(NET_RX_SOFTIRQ); so the IPI will be done later. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 19, 2010 at 10:09:42AM +0200, Eric Dumazet wrote: > > 6) netif_rx() pro is that packet processing is done while stack usage is > guaranteed to be low (from process_backlog, using a special softirq > stack, instead of current stack) > > After your patch, tun will use more stack. Is it safe on all contexts ? Dave also raised this but I believe nothing changes with regards to the stack. We currently call do_softirq which does not switch stacks. Only a real interrupt would switch stacks. > Another concern I have is about RPS. > > netif_receive_skb() must be called from process_backlog() context, or > there is no guarantee the IPI will be sent if this skb is enqueued for > another cpu. Can you explain this a bit more? Thanks,
On Wed, May 19, 2010 at 10:18:09AM +0200, Eric Dumazet wrote: > Le mercredi 19 mai 2010 à 10:09 +0200, Eric Dumazet a écrit : > > > Another concern I have is about RPS. > > > > netif_receive_skb() must be called from process_backlog() context, or > > there is no guarantee the IPI will be sent if this skb is enqueued for > > another cpu. > > Hmm, I just checked again, and this is wrong. > > In case we enqueue skb on a remote cpu backlog, we also > do __raise_softirq_irqoff(NET_RX_SOFTIRQ); so the IPI will be done > later. OK your concern is only with the stack usage, right? Thanks,
Le mercredi 19 mai 2010 à 18:20 +1000, Herbert Xu a écrit : > On Wed, May 19, 2010 at 10:09:42AM +0200, Eric Dumazet wrote: > > > > 6) netif_rx() pro is that packet processing is done while stack usage is > > guaranteed to be low (from process_backlog, using a special softirq > > stack, instead of current stack) > > > > After your patch, tun will use more stack. Is it safe on all contexts ? > > Dave also raised this but I believe nothing changes with regards > to the stack. We currently call do_softirq which does not switch > stacks. > > Only a real interrupt would switch stacks. This is a bit wrong, at least here (CONFIG_4KSTACKS=y) Some people still use 32bits these days ;) Please check arch/x86/kernel/irq_32.c asmlinkage void do_softirq(void) { unsigned long flags; struct thread_info *curctx; union irq_ctx *irqctx; u32 *isp; if (in_interrupt()) return; local_irq_save(flags); if (local_softirq_pending()) { curctx = current_thread_info(); irqctx = __get_cpu_var(softirq_ctx); irqctx->tinfo.task = curctx->task; irqctx->tinfo.previous_esp = current_stack_pointer; /* build the stack frame on the softirq stack */ isp = (u32 *) ((char *)irqctx + sizeof(*irqctx)); call_on_stack(__do_softirq, isp); /* * Shouldnt happen, we returned above if in_interrupt(): */ WARN_ON_ONCE(softirq_count()); } local_irq_restore(flags); } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 19, 2010 at 10:27:17AM +0200, Eric Dumazet wrote: > > This is a bit wrong, at least here (CONFIG_4KSTACKS=y) > > Some people still use 32bits these days ;) > > Please check arch/x86/kernel/irq_32.c Right, I was looking at the generic version. Still, as I'm only changing tun.c where we know the call comes from a syscall, I don't think the stack is a great issue. Besides, for TX we already perform everything from the same stack depth and the TX path isn't that much shallower compared to the RX path. Cheers,
On Wed, May 19, 2010 at 10:18:09AM +0200, Eric Dumazet wrote: > Le mercredi 19 mai 2010 à 10:09 +0200, Eric Dumazet a écrit : > > > Another concern I have is about RPS. > > > > netif_receive_skb() must be called from process_backlog() context, or > > there is no guarantee the IPI will be sent if this skb is enqueued for > > another cpu. > > Hmm, I just checked again, and this is wrong. > > In case we enqueue skb on a remote cpu backlog, we also > do __raise_softirq_irqoff(NET_RX_SOFTIRQ); so the IPI will be done > later. > But if this happens, then we loose the connection between the packet being received and the process doing the reception, so the network cgroup classifier breaks again. Performance gains are still a big advantage here of course. Neil -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 19, 2010 at 08:05:47AM -0400, Neil Horman wrote: > On Wed, May 19, 2010 at 10:18:09AM +0200, Eric Dumazet wrote: > > Le mercredi 19 mai 2010 à 10:09 +0200, Eric Dumazet a écrit : > > > > > Another concern I have is about RPS. > > > > > > netif_receive_skb() must be called from process_backlog() context, or > > > there is no guarantee the IPI will be sent if this skb is enqueued for > > > another cpu. > > > > Hmm, I just checked again, and this is wrong. > > > > In case we enqueue skb on a remote cpu backlog, we also > > do __raise_softirq_irqoff(NET_RX_SOFTIRQ); so the IPI will be done > > later. > > > But if this happens, then we loose the connection between the packet being > received and the process doing the reception, so the network cgroup classifier > breaks again. > > Performance gains are still a big advantage here of course. > Neil > Scratch what I said here, Herbert corrected me on this, and we're ok, as tun has no rps map. I'll test this patch out in just a bit Neil > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le mercredi 19 mai 2010 à 08:05 -0400, Neil Horman a écrit : > On Wed, May 19, 2010 at 10:18:09AM +0200, Eric Dumazet wrote: > > Le mercredi 19 mai 2010 à 10:09 +0200, Eric Dumazet a écrit : > > > > > Another concern I have is about RPS. > > > > > > netif_receive_skb() must be called from process_backlog() context, or > > > there is no guarantee the IPI will be sent if this skb is enqueued for > > > another cpu. > > > > Hmm, I just checked again, and this is wrong. > > > > In case we enqueue skb on a remote cpu backlog, we also > > do __raise_softirq_irqoff(NET_RX_SOFTIRQ); so the IPI will be done > > later. > > > But if this happens, then we loose the connection between the packet being > received and the process doing the reception, so the network cgroup classifier > breaks again. > > Performance gains are still a big advantage here of course. RPS is enabled on a per device (or more precisely per subqueue) basis, and disabled by default, so if cgroup classifier is needed, it should work as is. Speaking of net/sched/cls_cgroup.c, I am contemplating following sequence : rcu_read_lock(); classid = task_cls_state(current)->classid; rcu_read_unlock(); RCU is definitly so special (should I say magic ?), that we use it even if not needed. It makes us happy... -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 19, 2010 at 04:10:29PM +0200, Eric Dumazet wrote: > Le mercredi 19 mai 2010 à 08:05 -0400, Neil Horman a écrit : > > On Wed, May 19, 2010 at 10:18:09AM +0200, Eric Dumazet wrote: > > > Le mercredi 19 mai 2010 à 10:09 +0200, Eric Dumazet a écrit : > > > > > > > Another concern I have is about RPS. > > > > > > > > netif_receive_skb() must be called from process_backlog() context, or > > > > there is no guarantee the IPI will be sent if this skb is enqueued for > > > > another cpu. > > > > > > Hmm, I just checked again, and this is wrong. > > > > > > In case we enqueue skb on a remote cpu backlog, we also > > > do __raise_softirq_irqoff(NET_RX_SOFTIRQ); so the IPI will be done > > > later. > > > > > But if this happens, then we loose the connection between the packet being > > received and the process doing the reception, so the network cgroup classifier > > breaks again. > > > > Performance gains are still a big advantage here of course. > > RPS is enabled on a per device (or more precisely per subqueue) basis, and disabled > by default, so if cgroup classifier is needed, it should work as is. > > Speaking of net/sched/cls_cgroup.c, I am contemplating following > sequence : > > rcu_read_lock(); > classid = task_cls_state(current)->classid; > rcu_read_unlock(); > Yeah, I'd noticed there was no write side to that, but hadn't quite gotten to investigating that further :) > RCU is definitly so special (should I say magic ?), that we use it even > if not needed. It makes us happy... > > > > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 19, 2010 at 08:55:43AM -0400, Neil Horman wrote: > On Wed, May 19, 2010 at 08:05:47AM -0400, Neil Horman wrote: > > On Wed, May 19, 2010 at 10:18:09AM +0200, Eric Dumazet wrote: > > > Le mercredi 19 mai 2010 à 10:09 +0200, Eric Dumazet a écrit : > > > > > > > Another concern I have is about RPS. > > > > > > > > netif_receive_skb() must be called from process_backlog() context, or > > > > there is no guarantee the IPI will be sent if this skb is enqueued for > > > > another cpu. > > > > > > Hmm, I just checked again, and this is wrong. > > > > > > In case we enqueue skb on a remote cpu backlog, we also > > > do __raise_softirq_irqoff(NET_RX_SOFTIRQ); so the IPI will be done > > > later. > > > > > But if this happens, then we loose the connection between the packet being > > received and the process doing the reception, so the network cgroup classifier > > breaks again. > > > > Performance gains are still a big advantage here of course. > > Neil > > > Scratch what I said here, Herbert corrected me on this, and we're ok, as tun has > no rps map. > > I'll test this patch out in just a bit > Neil > I'm currently testing this, unfortunately, and its not breaking anything, but it doesn't allow cgroups to classify frames comming from tun interfaces. I'm still investigating, but I think the issue is that, because we call local_bh_disable with this patch, we wind up raising the count at SOFTIRQ_OFFSET in preempt_count for the task. Since the cgroup classifier has this check: if (softirq_count() != SOFTIRQ_OFFSET)) return -1; We still fail to classify the frame. the cgroup classifier is assuming that any frame arriving with a softirq count of 1 means we came directly from the dev_queue_xmit routine and is safe to check current(). Any less than that, and something is wrong (as we at least need the local_bh_disable in dev_queue_xmit), and any more implies that we have nested calls to local_bh_disable, meaning we're really handling a softirq context. Neil > > -- > > To unsubscribe from this list: send the line "unsubscribe netdev" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Herbert Xu <herbert@gondor.apana.org.au> Date: Wed, 19 May 2010 18:20:47 +1000 > On Wed, May 19, 2010 at 10:09:42AM +0200, Eric Dumazet wrote: >> >> 6) netif_rx() pro is that packet processing is done while stack usage is >> guaranteed to be low (from process_backlog, using a special softirq >> stack, instead of current stack) >> >> After your patch, tun will use more stack. Is it safe on all contexts ? > > Dave also raised this but I believe nothing changes with regards > to the stack. We currently call do_softirq which does not switch > stacks. do_softirq() _does_ switch stacks, it's a per-arch function that does the stack switch and calls __do_softirq() on the softirq stack. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 19, 2010 at 02:00:53PM -0400, Neil Horman wrote: > On Wed, May 19, 2010 at 08:55:43AM -0400, Neil Horman wrote: > > On Wed, May 19, 2010 at 08:05:47AM -0400, Neil Horman wrote: > > > On Wed, May 19, 2010 at 10:18:09AM +0200, Eric Dumazet wrote: > > > > Le mercredi 19 mai 2010 à 10:09 +0200, Eric Dumazet a écrit : > > > > > > > > > Another concern I have is about RPS. > > > > > > > > > > netif_receive_skb() must be called from process_backlog() context, or > > > > > there is no guarantee the IPI will be sent if this skb is enqueued for > > > > > another cpu. > > > > > > > > Hmm, I just checked again, and this is wrong. > > > > > > > > In case we enqueue skb on a remote cpu backlog, we also > > > > do __raise_softirq_irqoff(NET_RX_SOFTIRQ); so the IPI will be done > > > > later. > > > > > > > But if this happens, then we loose the connection between the packet being > > > received and the process doing the reception, so the network cgroup classifier > > > breaks again. > > > > > > Performance gains are still a big advantage here of course. > > > Neil > > > > > Scratch what I said here, Herbert corrected me on this, and we're ok, as tun has > > no rps map. > > > > I'll test this patch out in just a bit > > Neil > > > > I'm currently testing this, unfortunately, and its not breaking anything, but it > doesn't allow cgroups to classify frames comming from tun interfaces. I'm still > investigating, but I think the issue is that, because we call local_bh_disable > with this patch, we wind up raising the count at SOFTIRQ_OFFSET in preempt_count > for the task. Since the cgroup classifier has this check: > > if (softirq_count() != SOFTIRQ_OFFSET)) > return -1; > > We still fail to classify the frame. the cgroup classifier is assuming that any > frame arriving with a softirq count of 1 means we came directly from the > dev_queue_xmit routine and is safe to check current(). Any less than that, and > something is wrong (as we at least need the local_bh_disable in dev_queue_xmit), > and any more implies that we have nested calls to local_bh_disable, meaning > we're really handling a softirq context. > > Neil > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe netdev" in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > -- > > To unsubscribe from this list: send the line "unsubscribe netdev" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Just out of curiosity, how unsavory would it be if we were to dedicate the upper bit in the SOFTIRQ_BITS range to be an indicator of weather we were actually executing softirqs? As noted above, we're tripping over the ambiguity here between running in softirq context and actually just having softirqs disabled. Would it be against anyones sensibilities if we were to dedicate the upper bit in the softirq_count range to disambiguate the two conitions (or use a separate flag for that matter)? Neil -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2010-05-19 at 14:00 -0400, Neil Horman wrote: > I'm currently testing this, unfortunately, and its not breaking anything, but it > doesn't allow cgroups to classify frames comming from tun interfaces. I'm still > investigating, but I think the issue is that, because we call local_bh_disable > with this patch, we wind up raising the count at SOFTIRQ_OFFSET in preempt_count > for the task. Since the cgroup classifier has this check: > > if (softirq_count() != SOFTIRQ_OFFSET)) > return -1; > > We still fail to classify the frame. the cgroup classifier is assuming that any > frame arriving with a softirq count of 1 means we came directly from the > dev_queue_xmit routine and is safe to check current(). Any less than that, and > something is wrong (as we at least need the local_bh_disable in dev_queue_xmit), > and any more implies that we have nested calls to local_bh_disable, meaning > we're really handling a softirq context. It is a hack but the only method to check for softirq context I found. I would favor using a flag if there was one. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/19/2010 04:49 PM, Thomas Graf wrote: > On Wed, 2010-05-19 at 14:00 -0400, Neil Horman wrote: >> I'm currently testing this, unfortunately, and its not breaking anything, but it >> doesn't allow cgroups to classify frames comming from tun interfaces. I'm still >> investigating, but I think the issue is that, because we call local_bh_disable >> with this patch, we wind up raising the count at SOFTIRQ_OFFSET in preempt_count >> for the task. Since the cgroup classifier has this check: >> >> if (softirq_count() != SOFTIRQ_OFFSET)) >> return -1; >> >> We still fail to classify the frame. the cgroup classifier is assuming that any >> frame arriving with a softirq count of 1 means we came directly from the >> dev_queue_xmit routine and is safe to check current(). Any less than that, and >> something is wrong (as we at least need the local_bh_disable in dev_queue_xmit), >> and any more implies that we have nested calls to local_bh_disable, meaning >> we're really handling a softirq context. > > It is a hack but the only method to check for softirq context I found. I > would favor using a flag if there was one. Eric probably has some thoughts on this -- his scheduler-batching patch RFC from last year needed the same bit of info: http://patchwork.ozlabs.org/patch/24536/ (see the changes to trace_softirq_context). -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Since it seems now inevitable to me that we'll need to store the 'classid' in struct sk_buff, I've prepared two patches to do that at zero cost. Someone please go with this. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 19, 2010 at 07:55:33PM -0700, David Miller wrote: > > Since it seems now inevitable to me that we'll need to store the > 'classid' in struct sk_buff, I've prepared two patches to do that at > zero cost. Actually I'm currently working on a patch to store this in struct sock. The reason I prefer struct sock is because we can then centralise the place where we set the classid rather than having it spread out through every place that creates an skb. I'll post it soon. Cheers,
From: Herbert Xu <herbert@gondor.apana.org.au> Date: Thu, 20 May 2010 12:57:42 +1000 > On Wed, May 19, 2010 at 07:55:33PM -0700, David Miller wrote: >> >> Since it seems now inevitable to me that we'll need to store the >> 'classid' in struct sk_buff, I've prepared two patches to do that at >> zero cost. > > Actually I'm currently working on a patch to store this in struct > sock. The reason I prefer struct sock is because we can then > centralise the place where we set the classid rather than having > it spread out through every place that creates an skb. > > I'll post it soon. Ok, looking forward to it. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 4326520..0eed49f 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -667,7 +667,7 @@ static __inline__ ssize_t tun_get_user(struct tun_struct *tun, skb_shinfo(skb)->gso_segs = 0; } - netif_rx_ni(skb); + netif_receive_skb_ni(skb); tun->dev->stats.rx_packets++; tun->dev->stats.rx_bytes += len; diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index fa8b476..34bb405 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1562,6 +1562,18 @@ extern int netif_rx(struct sk_buff *skb); extern int netif_rx_ni(struct sk_buff *skb); #define HAVE_NETIF_RECEIVE_SKB 1 extern int netif_receive_skb(struct sk_buff *skb); + +static inline int netif_receive_skb_ni(struct sk_buff *skb) +{ + int err; + + local_bh_disable(); + err = netif_receive_skb(skb); + local_bh_enable(); + + return err; +} + extern gro_result_t dev_gro_receive(struct napi_struct *napi, struct sk_buff *skb); extern gro_result_t napi_skb_finish(gro_result_t ret, struct sk_buff *skb);
Hi: tun: Use netif_receive_skb instead of netif_rx First a bit of history as I recall, Dave can correct me where he recalls differently :) 1) There was netif_rx and everyone had to use that. 2) Everyone had to use that, including drivers/net/tun.c. 3) NAPI brings us netif_receive_skb. 4) About the same time people noticed that tun.c can cause wild fluctuations in latency because of its use of netif_rx with IRQs enabled. 5) netif_rx_ni was added to address this. However, netif_rx_ni was really a bit of a roundabout way of injecting a packet if you think about it. What ends up happening is that we always queue the packet into the backlog, and then immediately process it. Which is what would happen if we simply called netif_receive_skb directly. So this patch just does the obvious thing and makes tun.c call netif_receive_skb, albeit through the netif_receive_skb_ni wrapper which does the necessary things for calling it in process context. Now apart from potential performance gains from eliminating unnecessary steps in the process, this has the benefit of keeping the process context for the packet processing. This is needed by cgroups to shape network traffic based on the original process. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers,