diff mbox

tun: Use netif_receive_skb instead of netif_rx

Message ID 20100519075721.GA23926@gondor.apana.org.au
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Herbert Xu May 19, 2010, 7:57 a.m. UTC
Hi:

tun: Use netif_receive_skb instead of netif_rx

First a bit of history as I recall, Dave can correct me where
he recalls differently :)

1) There was netif_rx and everyone had to use that.
2) Everyone had to use that, including drivers/net/tun.c.
3) NAPI brings us netif_receive_skb.
4) About the same time people noticed that tun.c can cause wild
   fluctuations in latency because of its use of netif_rx with IRQs
   enabled.
5) netif_rx_ni was added to address this.

However, netif_rx_ni was really a bit of a roundabout way of
injecting a packet if you think about it.  What ends up happening
is that we always queue the packet into the backlog, and then
immediately process it.  Which is what would happen if we simply
called netif_receive_skb directly.

So this patch just does the obvious thing and makes tun.c call
netif_receive_skb, albeit through the netif_receive_skb_ni wrapper
which does the necessary things for calling it in process context.

Now apart from potential performance gains from eliminating
unnecessary steps in the process, this has the benefit of keeping
the process context for the packet processing.  This is needed
by cgroups to shape network traffic based on the original process.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


Cheers,

Comments

Eric Dumazet May 19, 2010, 8:09 a.m. UTC | #1
Le mercredi 19 mai 2010 à 17:57 +1000, Herbert Xu a écrit :
> Hi:
> 
> tun: Use netif_receive_skb instead of netif_rx
> 
> First a bit of history as I recall, Dave can correct me where
> he recalls differently :)
> 
> 1) There was netif_rx and everyone had to use that.
> 2) Everyone had to use that, including drivers/net/tun.c.
> 3) NAPI brings us netif_receive_skb.
> 4) About the same time people noticed that tun.c can cause wild
>    fluctuations in latency because of its use of netif_rx with IRQs
>    enabled.
> 5) netif_rx_ni was added to address this.
> 

6) netif_rx() pro is that packet processing is done while stack usage is
guaranteed to be low (from process_backlog, using a special softirq
stack, instead of current stack)

After your patch, tun will use more stack. Is it safe on all contexts ?

Another concern I have is about RPS.

netif_receive_skb() must be called from process_backlog() context, or
there is no guarantee the IPI will be sent if this skb is enqueued for
another cpu.

> However, netif_rx_ni
>  was really a bit of a roundabout way of
> injecting a packet if you think about it.  What ends up happening
> is that we always queue the packet into the backlog, and then
> immediately process it.  Which is what would happen if we simply
> called netif_receive_skb directly.
> 
> So this patch just does the obvious thing and makes tun.c call
> netif_receive_skb, albeit through the netif_receive_skb_ni wrapper
> which does the necessary things for calling it in process context.
> 
> Now apart from potential performance gains from eliminating
> unnecessary steps in the process, this has the benefit of keeping
> the process context for the packet processing.  This is needed
> by cgroups to shape network traffic based on the original process.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 4326520..0eed49f 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -667,7 +667,7 @@ static __inline__ ssize_t tun_get_user(struct tun_struct *tun,
>  		skb_shinfo(skb)->gso_segs = 0;
>  	}
>  
> -	netif_rx_ni(skb);
> +	netif_receive_skb_ni(skb);
>  
>  	tun->dev->stats.rx_packets++;
>  	tun->dev->stats.rx_bytes += len;
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index fa8b476..34bb405 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1562,6 +1562,18 @@ extern int		netif_rx(struct sk_buff *skb);
>  extern int		netif_rx_ni(struct sk_buff *skb);
>  #define HAVE_NETIF_RECEIVE_SKB 1
>  extern int		netif_receive_skb(struct sk_buff *skb);
> +
> +static inline int netif_receive_skb_ni(struct sk_buff *skb)
> +{
> +	int err;
> +
> +	local_bh_disable();
> +	err = netif_receive_skb(skb);
> +	local_bh_enable();
> +
> +	return err;
> +}
> +
>  extern gro_result_t	dev_gro_receive(struct napi_struct *napi,
>  					struct sk_buff *skb);
>  extern gro_result_t	napi_skb_finish(gro_result_t ret, struct sk_buff *skb);
> 
> Cheers,


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet May 19, 2010, 8:18 a.m. UTC | #2
Le mercredi 19 mai 2010 à 10:09 +0200, Eric Dumazet a écrit :

> Another concern I have is about RPS.
> 
> netif_receive_skb() must be called from process_backlog() context, or
> there is no guarantee the IPI will be sent if this skb is enqueued for
> another cpu.

Hmm, I just checked again, and this is wrong.

In case we enqueue skb on a remote cpu backlog, we also
do __raise_softirq_irqoff(NET_RX_SOFTIRQ); so the IPI will be done
later.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu May 19, 2010, 8:20 a.m. UTC | #3
On Wed, May 19, 2010 at 10:09:42AM +0200, Eric Dumazet wrote:
> 
> 6) netif_rx() pro is that packet processing is done while stack usage is
> guaranteed to be low (from process_backlog, using a special softirq
> stack, instead of current stack)
> 
> After your patch, tun will use more stack. Is it safe on all contexts ?

Dave also raised this but I believe nothing changes with regards
to the stack.  We currently call do_softirq which does not switch
stacks.

Only a real interrupt would switch stacks.

> Another concern I have is about RPS.
> 
> netif_receive_skb() must be called from process_backlog() context, or
> there is no guarantee the IPI will be sent if this skb is enqueued for
> another cpu.

Can you explain this a bit more?

Thanks,
Herbert Xu May 19, 2010, 8:21 a.m. UTC | #4
On Wed, May 19, 2010 at 10:18:09AM +0200, Eric Dumazet wrote:
> Le mercredi 19 mai 2010 à 10:09 +0200, Eric Dumazet a écrit :
> 
> > Another concern I have is about RPS.
> > 
> > netif_receive_skb() must be called from process_backlog() context, or
> > there is no guarantee the IPI will be sent if this skb is enqueued for
> > another cpu.
> 
> Hmm, I just checked again, and this is wrong.
> 
> In case we enqueue skb on a remote cpu backlog, we also
> do __raise_softirq_irqoff(NET_RX_SOFTIRQ); so the IPI will be done
> later.

OK your concern is only with the stack usage, right?

Thanks,
Eric Dumazet May 19, 2010, 8:27 a.m. UTC | #5
Le mercredi 19 mai 2010 à 18:20 +1000, Herbert Xu a écrit :
> On Wed, May 19, 2010 at 10:09:42AM +0200, Eric Dumazet wrote:
> > 
> > 6) netif_rx() pro is that packet processing is done while stack usage is
> > guaranteed to be low (from process_backlog, using a special softirq
> > stack, instead of current stack)
> > 
> > After your patch, tun will use more stack. Is it safe on all contexts ?
> 
> Dave also raised this but I believe nothing changes with regards
> to the stack.  We currently call do_softirq which does not switch
> stacks.
> 
> Only a real interrupt would switch stacks.

This is a bit wrong, at least here (CONFIG_4KSTACKS=y)

Some people still use 32bits these days ;)

Please check arch/x86/kernel/irq_32.c

asmlinkage void do_softirq(void)
{
        unsigned long flags;
        struct thread_info *curctx;
        union irq_ctx *irqctx;
        u32 *isp;

        if (in_interrupt())
                return;

        local_irq_save(flags);

        if (local_softirq_pending()) {
                curctx = current_thread_info();
                irqctx = __get_cpu_var(softirq_ctx);
                irqctx->tinfo.task = curctx->task;
                irqctx->tinfo.previous_esp = current_stack_pointer;

                /* build the stack frame on the softirq stack */
                isp = (u32 *) ((char *)irqctx + sizeof(*irqctx));

                call_on_stack(__do_softirq, isp);
                /*
                 * Shouldnt happen, we returned above if in_interrupt():
                 */
                WARN_ON_ONCE(softirq_count());
        }

        local_irq_restore(flags);
}



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu May 19, 2010, 8:44 a.m. UTC | #6
On Wed, May 19, 2010 at 10:27:17AM +0200, Eric Dumazet wrote:
> 
> This is a bit wrong, at least here (CONFIG_4KSTACKS=y)
> 
> Some people still use 32bits these days ;)
> 
> Please check arch/x86/kernel/irq_32.c

Right, I was looking at the generic version.

Still, as I'm only changing tun.c where we know the call comes
from a syscall, I don't think the stack is a great issue.

Besides, for TX we already perform everything from the same stack
depth and the TX path isn't that much shallower compared to the
RX path.

Cheers,
Neil Horman May 19, 2010, 12:05 p.m. UTC | #7
On Wed, May 19, 2010 at 10:18:09AM +0200, Eric Dumazet wrote:
> Le mercredi 19 mai 2010 à 10:09 +0200, Eric Dumazet a écrit :
> 
> > Another concern I have is about RPS.
> > 
> > netif_receive_skb() must be called from process_backlog() context, or
> > there is no guarantee the IPI will be sent if this skb is enqueued for
> > another cpu.
> 
> Hmm, I just checked again, and this is wrong.
> 
> In case we enqueue skb on a remote cpu backlog, we also
> do __raise_softirq_irqoff(NET_RX_SOFTIRQ); so the IPI will be done
> later.
> 
But if this happens, then we loose the connection between the packet being
received and the process doing the reception, so the network cgroup classifier
breaks again.

Performance gains are still a big advantage here of course.
Neil

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman May 19, 2010, 12:55 p.m. UTC | #8
On Wed, May 19, 2010 at 08:05:47AM -0400, Neil Horman wrote:
> On Wed, May 19, 2010 at 10:18:09AM +0200, Eric Dumazet wrote:
> > Le mercredi 19 mai 2010 à 10:09 +0200, Eric Dumazet a écrit :
> > 
> > > Another concern I have is about RPS.
> > > 
> > > netif_receive_skb() must be called from process_backlog() context, or
> > > there is no guarantee the IPI will be sent if this skb is enqueued for
> > > another cpu.
> > 
> > Hmm, I just checked again, and this is wrong.
> > 
> > In case we enqueue skb on a remote cpu backlog, we also
> > do __raise_softirq_irqoff(NET_RX_SOFTIRQ); so the IPI will be done
> > later.
> > 
> But if this happens, then we loose the connection between the packet being
> received and the process doing the reception, so the network cgroup classifier
> breaks again.
> 
> Performance gains are still a big advantage here of course.
> Neil
> 
Scratch what I said here, Herbert corrected me on this, and we're ok, as tun has
no rps map.

I'll test this patch out in just a bit
Neil

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet May 19, 2010, 2:10 p.m. UTC | #9
Le mercredi 19 mai 2010 à 08:05 -0400, Neil Horman a écrit :
> On Wed, May 19, 2010 at 10:18:09AM +0200, Eric Dumazet wrote:
> > Le mercredi 19 mai 2010 à 10:09 +0200, Eric Dumazet a écrit :
> > 
> > > Another concern I have is about RPS.
> > > 
> > > netif_receive_skb() must be called from process_backlog() context, or
> > > there is no guarantee the IPI will be sent if this skb is enqueued for
> > > another cpu.
> > 
> > Hmm, I just checked again, and this is wrong.
> > 
> > In case we enqueue skb on a remote cpu backlog, we also
> > do __raise_softirq_irqoff(NET_RX_SOFTIRQ); so the IPI will be done
> > later.
> > 
> But if this happens, then we loose the connection between the packet being
> received and the process doing the reception, so the network cgroup classifier
> breaks again.
> 
> Performance gains are still a big advantage here of course.

RPS is enabled on a per device (or more precisely per subqueue) basis, and disabled
by default, so if cgroup classifier is needed, it should work as is.

Speaking of net/sched/cls_cgroup.c, I am contemplating following
sequence :

rcu_read_lock();
classid = task_cls_state(current)->classid;
rcu_read_unlock();

RCU is definitly so special (should I say magic ?), that we use it even
if not needed. It makes us happy...




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman May 19, 2010, 2:31 p.m. UTC | #10
On Wed, May 19, 2010 at 04:10:29PM +0200, Eric Dumazet wrote:
> Le mercredi 19 mai 2010 à 08:05 -0400, Neil Horman a écrit :
> > On Wed, May 19, 2010 at 10:18:09AM +0200, Eric Dumazet wrote:
> > > Le mercredi 19 mai 2010 à 10:09 +0200, Eric Dumazet a écrit :
> > > 
> > > > Another concern I have is about RPS.
> > > > 
> > > > netif_receive_skb() must be called from process_backlog() context, or
> > > > there is no guarantee the IPI will be sent if this skb is enqueued for
> > > > another cpu.
> > > 
> > > Hmm, I just checked again, and this is wrong.
> > > 
> > > In case we enqueue skb on a remote cpu backlog, we also
> > > do __raise_softirq_irqoff(NET_RX_SOFTIRQ); so the IPI will be done
> > > later.
> > > 
> > But if this happens, then we loose the connection between the packet being
> > received and the process doing the reception, so the network cgroup classifier
> > breaks again.
> > 
> > Performance gains are still a big advantage here of course.
> 
> RPS is enabled on a per device (or more precisely per subqueue) basis, and disabled
> by default, so if cgroup classifier is needed, it should work as is.
> 
> Speaking of net/sched/cls_cgroup.c, I am contemplating following
> sequence :
> 
> rcu_read_lock();
> classid = task_cls_state(current)->classid;
> rcu_read_unlock();
> 
Yeah, I'd noticed there was no write side to that, but hadn't quite gotten to
investigating  that further :)

> RCU is definitly so special (should I say magic ?), that we use it even
> if not needed. It makes us happy...
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman May 19, 2010, 6 p.m. UTC | #11
On Wed, May 19, 2010 at 08:55:43AM -0400, Neil Horman wrote:
> On Wed, May 19, 2010 at 08:05:47AM -0400, Neil Horman wrote:
> > On Wed, May 19, 2010 at 10:18:09AM +0200, Eric Dumazet wrote:
> > > Le mercredi 19 mai 2010 à 10:09 +0200, Eric Dumazet a écrit :
> > > 
> > > > Another concern I have is about RPS.
> > > > 
> > > > netif_receive_skb() must be called from process_backlog() context, or
> > > > there is no guarantee the IPI will be sent if this skb is enqueued for
> > > > another cpu.
> > > 
> > > Hmm, I just checked again, and this is wrong.
> > > 
> > > In case we enqueue skb on a remote cpu backlog, we also
> > > do __raise_softirq_irqoff(NET_RX_SOFTIRQ); so the IPI will be done
> > > later.
> > > 
> > But if this happens, then we loose the connection between the packet being
> > received and the process doing the reception, so the network cgroup classifier
> > breaks again.
> > 
> > Performance gains are still a big advantage here of course.
> > Neil
> > 
> Scratch what I said here, Herbert corrected me on this, and we're ok, as tun has
> no rps map.
> 
> I'll test this patch out in just a bit
> Neil
> 

I'm currently testing this, unfortunately, and its not breaking anything, but it
doesn't allow cgroups to classify frames comming from tun interfaces.  I'm still
investigating, but I think the issue is that, because we call local_bh_disable
with this patch, we wind up raising the count at SOFTIRQ_OFFSET in preempt_count
for the task.  Since the cgroup classifier has this check:

if (softirq_count() != SOFTIRQ_OFFSET))
	return -1;

We still fail to classify the frame.  the cgroup classifier is assuming that any
frame arriving with a softirq count of 1 means we came directly from the
dev_queue_xmit routine and is safe to check current().  Any less than that, and
something is wrong (as we at least need the local_bh_disable in dev_queue_xmit),
and any more implies that we have nested calls to local_bh_disable, meaning
we're really handling a softirq context.

Neil

> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller May 19, 2010, 8:14 p.m. UTC | #12
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Wed, 19 May 2010 18:20:47 +1000

> On Wed, May 19, 2010 at 10:09:42AM +0200, Eric Dumazet wrote:
>> 
>> 6) netif_rx() pro is that packet processing is done while stack usage is
>> guaranteed to be low (from process_backlog, using a special softirq
>> stack, instead of current stack)
>> 
>> After your patch, tun will use more stack. Is it safe on all contexts ?
> 
> Dave also raised this but I believe nothing changes with regards
> to the stack.  We currently call do_softirq which does not switch
> stacks.

do_softirq() _does_ switch stacks, it's a per-arch function that
does the stack switch and calls __do_softirq() on the softirq
stack.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman May 19, 2010, 8:24 p.m. UTC | #13
On Wed, May 19, 2010 at 02:00:53PM -0400, Neil Horman wrote:
> On Wed, May 19, 2010 at 08:55:43AM -0400, Neil Horman wrote:
> > On Wed, May 19, 2010 at 08:05:47AM -0400, Neil Horman wrote:
> > > On Wed, May 19, 2010 at 10:18:09AM +0200, Eric Dumazet wrote:
> > > > Le mercredi 19 mai 2010 à 10:09 +0200, Eric Dumazet a écrit :
> > > > 
> > > > > Another concern I have is about RPS.
> > > > > 
> > > > > netif_receive_skb() must be called from process_backlog() context, or
> > > > > there is no guarantee the IPI will be sent if this skb is enqueued for
> > > > > another cpu.
> > > > 
> > > > Hmm, I just checked again, and this is wrong.
> > > > 
> > > > In case we enqueue skb on a remote cpu backlog, we also
> > > > do __raise_softirq_irqoff(NET_RX_SOFTIRQ); so the IPI will be done
> > > > later.
> > > > 
> > > But if this happens, then we loose the connection between the packet being
> > > received and the process doing the reception, so the network cgroup classifier
> > > breaks again.
> > > 
> > > Performance gains are still a big advantage here of course.
> > > Neil
> > > 
> > Scratch what I said here, Herbert corrected me on this, and we're ok, as tun has
> > no rps map.
> > 
> > I'll test this patch out in just a bit
> > Neil
> > 
> 
> I'm currently testing this, unfortunately, and its not breaking anything, but it
> doesn't allow cgroups to classify frames comming from tun interfaces.  I'm still
> investigating, but I think the issue is that, because we call local_bh_disable
> with this patch, we wind up raising the count at SOFTIRQ_OFFSET in preempt_count
> for the task.  Since the cgroup classifier has this check:
> 
> if (softirq_count() != SOFTIRQ_OFFSET))
> 	return -1;
> 
> We still fail to classify the frame.  the cgroup classifier is assuming that any
> frame arriving with a softirq count of 1 means we came directly from the
> dev_queue_xmit routine and is safe to check current().  Any less than that, and
> something is wrong (as we at least need the local_bh_disable in dev_queue_xmit),
> and any more implies that we have nested calls to local_bh_disable, meaning
> we're really handling a softirq context.
> 
> Neil
> 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


Just out of curiosity, how unsavory would it be if we were to dedicate the upper
bit in the SOFTIRQ_BITS range to be an indicator of weather we were actually
executing softirqs?  As noted above, we're tripping over the ambiguity here
between running in softirq context and actually just having softirqs disabled.
Would it be against anyones sensibilities if we were to dedicate the upper bit
in the softirq_count range to disambiguate the two conitions (or use a separate
flag for that matter)?

Neil

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Graf May 19, 2010, 8:49 p.m. UTC | #14
On Wed, 2010-05-19 at 14:00 -0400, Neil Horman wrote: 
> I'm currently testing this, unfortunately, and its not breaking anything, but it
> doesn't allow cgroups to classify frames comming from tun interfaces.  I'm still
> investigating, but I think the issue is that, because we call local_bh_disable
> with this patch, we wind up raising the count at SOFTIRQ_OFFSET in preempt_count
> for the task.  Since the cgroup classifier has this check:
> 
> if (softirq_count() != SOFTIRQ_OFFSET))
> 	return -1;
> 
> We still fail to classify the frame.  the cgroup classifier is assuming that any
> frame arriving with a softirq count of 1 means we came directly from the
> dev_queue_xmit routine and is safe to check current().  Any less than that, and
> something is wrong (as we at least need the local_bh_disable in dev_queue_xmit),
> and any more implies that we have nested calls to local_bh_disable, meaning
> we're really handling a softirq context.

It is a hack but the only method to check for softirq context I found. I
would favor using a flag if there was one.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Brian Bloniarz May 19, 2010, 9 p.m. UTC | #15
On 05/19/2010 04:49 PM, Thomas Graf wrote:
> On Wed, 2010-05-19 at 14:00 -0400, Neil Horman wrote: 
>> I'm currently testing this, unfortunately, and its not breaking anything, but it
>> doesn't allow cgroups to classify frames comming from tun interfaces.  I'm still
>> investigating, but I think the issue is that, because we call local_bh_disable
>> with this patch, we wind up raising the count at SOFTIRQ_OFFSET in preempt_count
>> for the task.  Since the cgroup classifier has this check:
>>
>> if (softirq_count() != SOFTIRQ_OFFSET))
>> 	return -1;
>>
>> We still fail to classify the frame.  the cgroup classifier is assuming that any
>> frame arriving with a softirq count of 1 means we came directly from the
>> dev_queue_xmit routine and is safe to check current().  Any less than that, and
>> something is wrong (as we at least need the local_bh_disable in dev_queue_xmit),
>> and any more implies that we have nested calls to local_bh_disable, meaning
>> we're really handling a softirq context.
> 
> It is a hack but the only method to check for softirq context I found. I
> would favor using a flag if there was one.

Eric probably has some thoughts on this -- his scheduler-batching patch RFC
from last year needed the same bit of info:
http://patchwork.ozlabs.org/patch/24536/
(see the changes to trace_softirq_context).
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller May 20, 2010, 2:55 a.m. UTC | #16
Since it seems now inevitable to me that we'll need to store the
'classid' in struct sk_buff, I've prepared two patches to do that at
zero cost.

Someone please go with this.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu May 20, 2010, 2:57 a.m. UTC | #17
On Wed, May 19, 2010 at 07:55:33PM -0700, David Miller wrote:
> 
> Since it seems now inevitable to me that we'll need to store the
> 'classid' in struct sk_buff, I've prepared two patches to do that at
> zero cost.

Actually I'm currently working on a patch to store this in struct
sock.  The reason I prefer struct sock is because we can then
centralise the place where we set the classid rather than having
it spread out through every place that creates an skb.

I'll post it soon.

Cheers,
David Miller May 20, 2010, 3:05 a.m. UTC | #18
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Thu, 20 May 2010 12:57:42 +1000

> On Wed, May 19, 2010 at 07:55:33PM -0700, David Miller wrote:
>> 
>> Since it seems now inevitable to me that we'll need to store the
>> 'classid' in struct sk_buff, I've prepared two patches to do that at
>> zero cost.
> 
> Actually I'm currently working on a patch to store this in struct
> sock.  The reason I prefer struct sock is because we can then
> centralise the place where we set the classid rather than having
> it spread out through every place that creates an skb.
> 
> I'll post it soon.

Ok, looking forward to it.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 4326520..0eed49f 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -667,7 +667,7 @@  static __inline__ ssize_t tun_get_user(struct tun_struct *tun,
 		skb_shinfo(skb)->gso_segs = 0;
 	}
 
-	netif_rx_ni(skb);
+	netif_receive_skb_ni(skb);
 
 	tun->dev->stats.rx_packets++;
 	tun->dev->stats.rx_bytes += len;
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index fa8b476..34bb405 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1562,6 +1562,18 @@  extern int		netif_rx(struct sk_buff *skb);
 extern int		netif_rx_ni(struct sk_buff *skb);
 #define HAVE_NETIF_RECEIVE_SKB 1
 extern int		netif_receive_skb(struct sk_buff *skb);
+
+static inline int netif_receive_skb_ni(struct sk_buff *skb)
+{
+	int err;
+
+	local_bh_disable();
+	err = netif_receive_skb(skb);
+	local_bh_enable();
+
+	return err;
+}
+
 extern gro_result_t	dev_gro_receive(struct napi_struct *napi,
 					struct sk_buff *skb);
 extern gro_result_t	napi_skb_finish(gro_result_t ret, struct sk_buff *skb);