diff mbox

net: Fix vlan_gro_frags vs netpoll and bonding paths

Message ID 20100827205042.GA13844@del.dom.local
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Jarek Poplawski Aug. 27, 2010, 8:50 p.m. UTC
After positive netpoll_rx_on() check in vlan_gro_receive() there is
skipped part of the "common" GRO_NORMAL path, especially "pull:" in
dev_gro_receive(), where at least eth header should be copied for
entirely paged skbs. So, eth_type_trans() can read zeroed header only.

Alas moving the netpoll_rx_on() check isn't enough here because a bit
later, in vlan_gro_common(), skb_bond_should_drop() is called, which
depends on skb->protocol and skb->pkt_type data, and can also change
eth_hdr h_dest address (which is overwritten later, btw).

To fix both problems this patch completes copying and verifying of eth
header from the fragment in vlan_gro_receive(). For that purpose the
"pull:" part of dev_gro_receive() was separated into an inline as
skb_gro_pull_in(), and there was added a "lighter" version of
napi_frags_finish(). No gro path except vlan_gro_frags() should be
affected by these changes.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
---

 include/linux/netdevice.h |   33 ++++++++++++++++++++++++++++++
 net/8021q/vlan_core.c     |   16 ++++++++++----
 net/core/dev.c            |   48 +++++++++++++++++++++++++-------------------
 3 files changed, 71 insertions(+), 26 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Herbert Xu Aug. 28, 2010, 12:13 a.m. UTC | #1
On Fri, Aug 27, 2010 at 10:50:42PM +0200, Jarek Poplawski wrote:
> After positive netpoll_rx_on() check in vlan_gro_receive() there is
> skipped part of the "common" GRO_NORMAL path, especially "pull:" in
> dev_gro_receive(), where at least eth header should be copied for
> entirely paged skbs. So, eth_type_trans() can read zeroed header only.

Right, thanks for catching this.

> diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
> index 01ddb04..58289fe 100644
> --- a/net/8021q/vlan_core.c
> +++ b/net/8021q/vlan_core.c
> @@ -139,13 +139,19 @@ gro_result_t vlan_gro_frags(struct napi_struct *napi, struct vlan_group *grp,
>  	if (!skb)
>  		return GRO_DROP;
>  
> -	if (netpoll_rx_on(skb)) {
> -		skb->protocol = eth_type_trans(skb, skb->dev);
> +	/*
> +	 * Complete the eth header here, mainly for skb_bond_should_drop(),
> +	 * and for netpoll_rx_on() btw.
> +	 */
> +	skb_gro_pull_in(skb);
> +	skb->protocol = eth_type_trans(skb, skb->dev);
> +	skb_gro_pull(skb, -ETH_HLEN);

But this code should go into the netpoll (i.e., slow-path) case
only so as not to impede performance.

Also, we need to fix this for the non-VLAN case as well.

Cheers,
Jarek Poplawski Aug. 28, 2010, 9:44 a.m. UTC | #2
On Sat, Aug 28, 2010 at 08:13:37AM +0800, Herbert Xu wrote:
> On Fri, Aug 27, 2010 at 10:50:42PM +0200, Jarek Poplawski wrote:
> > After positive netpoll_rx_on() check in vlan_gro_receive() there is
> > skipped part of the "common" GRO_NORMAL path, especially "pull:" in
> > dev_gro_receive(), where at least eth header should be copied for
> > entirely paged skbs. So, eth_type_trans() can read zeroed header only.
> 
> Right, thanks for catching this.
> 
> > diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
> > index 01ddb04..58289fe 100644
> > --- a/net/8021q/vlan_core.c
> > +++ b/net/8021q/vlan_core.c
> > @@ -139,13 +139,19 @@ gro_result_t vlan_gro_frags(struct napi_struct *napi, struct vlan_group *grp,
> >  	if (!skb)
> >  		return GRO_DROP;
> >  
> > -	if (netpoll_rx_on(skb)) {
> > -		skb->protocol = eth_type_trans(skb, skb->dev);
> > +	/*
> > +	 * Complete the eth header here, mainly for skb_bond_should_drop(),
> > +	 * and for netpoll_rx_on() btw.
> > +	 */
> > +	skb_gro_pull_in(skb);
> > +	skb->protocol = eth_type_trans(skb, skb->dev);
> > +	skb_gro_pull(skb, -ETH_HLEN);
> 
> But this code should go into the netpoll (i.e., slow-path) case
> only so as not to impede performance.

I'm not sure I can understand. What about skb_bond_should_drop()
mentioned in the comments? IMHO we have to impede performance a bit
(and treat it similarly to non-fragmented path) just to fix it, and
consider optimization of this bonding call later.

> 
> Also, we need to fix this for the non-VLAN case as well.

But the only other non-VLAN case I know was fixed already...

Cheers,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Aug. 28, 2010, 10:54 a.m. UTC | #3
In commit f2bde7328633269ee935d9ed96535ade15cc348f
Author: Stephen Hemminger <shemminger@vyatta.com>

    net: allow multiple dev per napi with GRO
    
    GRO assumes that there is a one-to-one relationship between NAPI
    structure and network device. Some devices like sky2 share multiple
    devices on a single interrupt so only have one NAPI handler. Rather than
    split GRO from NAPI, just have GRO assume if device changes that
    it is a different flow.
   

It was assumed a napi could be shared by several devs, but I dont really
understand, since we have an unique ->dev pointer inside "napi_struct",
this one is set once, and never change.

This pointer is currently used from napi_get_frags() [but that could be
avoided], and from netpoll_poll_lock().

The netpoll_poll_lock() case is problematic.

static inline void *netpoll_poll_lock(struct napi_struct *napi)
{
        struct net_device *dev = napi->_dev;

        if (dev && dev->npinfo) {
                spin_lock(&napi->poll_lock);

Maybe we should remove 'dev' field from napi_struct and replace it by a
npinfo pointer ?




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski Aug. 28, 2010, 2:31 p.m. UTC | #4
On Sat, Aug 28, 2010 at 12:54:06PM +0200, Eric Dumazet wrote:
> In commit f2bde7328633269ee935d9ed96535ade15cc348f
> Author: Stephen Hemminger <shemminger@vyatta.com>
> 
>     net: allow multiple dev per napi with GRO
>     
>     GRO assumes that there is a one-to-one relationship between NAPI
>     structure and network device. Some devices like sky2 share multiple
>     devices on a single interrupt so only have one NAPI handler. Rather than
>     split GRO from NAPI, just have GRO assume if device changes that
>     it is a different flow.
>    
> 
> It was assumed a napi could be shared by several devs, but I dont really
> understand, since we have an unique ->dev pointer inside "napi_struct",
> this one is set once, and never change.
> 
> This pointer is currently used from napi_get_frags() [but that could be
> avoided], and from netpoll_poll_lock().
> 
> The netpoll_poll_lock() case is problematic.
> 
> static inline void *netpoll_poll_lock(struct napi_struct *napi)
> {
>         struct net_device *dev = napi->_dev;
> 
>         if (dev && dev->npinfo) {
>                 spin_lock(&napi->poll_lock);
> 
> Maybe we should remove 'dev' field from napi_struct and replace it by a
> npinfo pointer ?

Sky2 seems to work like a single netdev (with an internal sub-netdev),
so I can't see your concern: what is the main aim of this replacement?

Jarek P. 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Aug. 28, 2010, 2:48 p.m. UTC | #5
Le samedi 28 août 2010 à 16:31 +0200, Jarek Poplawski a écrit :
> On Sat, Aug 28, 2010 at 12:54:06PM +0200, Eric Dumazet wrote:
> > In commit f2bde7328633269ee935d9ed96535ade15cc348f
> > Author: Stephen Hemminger <shemminger@vyatta.com>
> > 
> >     net: allow multiple dev per napi with GRO
> >     
> >     GRO assumes that there is a one-to-one relationship between NAPI
> >     structure and network device. Some devices like sky2 share multiple
> >     devices on a single interrupt so only have one NAPI handler. Rather than
> >     split GRO from NAPI, just have GRO assume if device changes that
> >     it is a different flow.
> >    
> > 
> > It was assumed a napi could be shared by several devs, but I dont really
> > understand, since we have an unique ->dev pointer inside "napi_struct",
> > this one is set once, and never change.
> > 
> > This pointer is currently used from napi_get_frags() [but that could be
> > avoided], and from netpoll_poll_lock().
> > 
> > The netpoll_poll_lock() case is problematic.
> > 
> > static inline void *netpoll_poll_lock(struct napi_struct *napi)
> > {
> >         struct net_device *dev = napi->_dev;
> > 
> >         if (dev && dev->npinfo) {
> >                 spin_lock(&napi->poll_lock);
> > 
> > Maybe we should remove 'dev' field from napi_struct and replace it by a
> > npinfo pointer ?
> 
> Sky2 seems to work like a single netdev (with an internal sub-netdev),
> so I can't see your concern: what is the main aim of this replacement?

I am trying to understand why this commit was needed then.

It adds an extra test in the main loop, testing skb->dev against p->dev,
it must me for something...

I am trying to say that the one to one relationship between NAPI
structure and a device is not only a GRO thing, but also a netpoll one.

So either we completely remove this one to one relationship, either
Stephen commit was not needed.

Some clarification is needed.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski Aug. 28, 2010, 3:16 p.m. UTC | #6
On Sat, Aug 28, 2010 at 04:48:29PM +0200, Eric Dumazet wrote:
> Le samedi 28 ao??t 2010 ?? 16:31 +0200, Jarek Poplawski a écrit :
> > On Sat, Aug 28, 2010 at 12:54:06PM +0200, Eric Dumazet wrote:
> > > In commit f2bde7328633269ee935d9ed96535ade15cc348f
> > > Author: Stephen Hemminger <shemminger@vyatta.com>
> > > 
> > >     net: allow multiple dev per napi with GRO
> > >     
> > >     GRO assumes that there is a one-to-one relationship between NAPI
> > >     structure and network device. Some devices like sky2 share multiple
> > >     devices on a single interrupt so only have one NAPI handler. Rather than
> > >     split GRO from NAPI, just have GRO assume if device changes that
> > >     it is a different flow.
> > >    
> > > 
> > > It was assumed a napi could be shared by several devs, but I dont really
> > > understand, since we have an unique ->dev pointer inside "napi_struct",
> > > this one is set once, and never change.
> > > 
> > > This pointer is currently used from napi_get_frags() [but that could be
> > > avoided], and from netpoll_poll_lock().
> > > 
> > > The netpoll_poll_lock() case is problematic.
> > > 
> > > static inline void *netpoll_poll_lock(struct napi_struct *napi)
> > > {
> > >         struct net_device *dev = napi->_dev;
> > > 
> > >         if (dev && dev->npinfo) {
> > >                 spin_lock(&napi->poll_lock);
> > > 
> > > Maybe we should remove 'dev' field from napi_struct and replace it by a
> > > npinfo pointer ?
> > 
> > Sky2 seems to work like a single netdev (with an internal sub-netdev),
> > so I can't see your concern: what is the main aim of this replacement?
> 
> I am trying to understand why this commit was needed then.
> 
> It adds an extra test in the main loop, testing skb->dev against p->dev,
> it must me for something...
> 
> I am trying to say that the one to one relationship between NAPI
> structure and a device is not only a GRO thing, but also a netpoll one.

I guess it's not about the full one to one relationship, but kind of
overloading (pseudo vlan?). How much it's needed is another question,
but at least the order of these tests seems doubtful.

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
stephen hemminger Aug. 28, 2010, 5:14 p.m. UTC | #7
On Sat, 28 Aug 2010 16:48:29 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Le samedi 28 août 2010 à 16:31 +0200, Jarek Poplawski a écrit :
> > On Sat, Aug 28, 2010 at 12:54:06PM +0200, Eric Dumazet wrote:
> > > In commit f2bde7328633269ee935d9ed96535ade15cc348f
> > > Author: Stephen Hemminger <shemminger@vyatta.com>
> > > 
> > >     net: allow multiple dev per napi with GRO
> > >     
> > >     GRO assumes that there is a one-to-one relationship between NAPI
> > >     structure and network device. Some devices like sky2 share multiple
> > >     devices on a single interrupt so only have one NAPI handler. Rather than
> > >     split GRO from NAPI, just have GRO assume if device changes that
> > >     it is a different flow.
> > >    
> > > 
> > > It was assumed a napi could be shared by several devs, but I dont really
> > > understand, since we have an unique ->dev pointer inside "napi_struct",
> > > this one is set once, and never change.
> > > 
> > > This pointer is currently used from napi_get_frags() [but that could be
> > > avoided], and from netpoll_poll_lock().
> > > 
> > > The netpoll_poll_lock() case is problematic.
> > > 
> > > static inline void *netpoll_poll_lock(struct napi_struct *napi)
> > > {
> > >         struct net_device *dev = napi->_dev;
> > > 
> > >         if (dev && dev->npinfo) {
> > >                 spin_lock(&napi->poll_lock);
> > > 
> > > Maybe we should remove 'dev' field from napi_struct and replace it by a
> > > npinfo pointer ?
> > 
> > Sky2 seems to work like a single netdev (with an internal sub-netdev),
> > so I can't see your concern: what is the main aim of this replacement?
> 
> I am trying to understand why this commit was needed then.
> 
> It adds an extra test in the main loop, testing skb->dev against p->dev,
> it must me for something...
> 
> I am trying to say that the one to one relationship between NAPI
> structure and a device is not only a GRO thing, but also a netpoll one.
> 
> So either we completely remove this one to one relationship, either
> Stephen commit was not needed.
> 
> Some clarification is needed.
> 
> 

The Marvell Yukon2 hardware supports two interfaces sharing a common interrupt.
Therfore the sky2 driver has up to two net devices and a single NAPI per board.
It is possible in a single invocation of the poll loop to process frames
for both ports.  GRO works by combining received packets from identical
flows over one NAPI interval. It is possible on sky2 that one packet
could be processed for the first port, and the next packet processed was for
second port and the two packets were related so that GRO would combine them.
The check for the same dev is required to prevent this. Yes it is an unlikely
corner case, but the purpose of GRO is to do aggregation but preserve the
flow characteristics of the incoming traffic.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Aug. 28, 2010, 9:41 p.m. UTC | #8
From: Stephen Hemminger <shemminger@vyatta.com>
Date: Sat, 28 Aug 2010 10:14:24 -0700

> The Marvell Yukon2 hardware supports two interfaces sharing a common interrupt.
> Therfore the sky2 driver has up to two net devices and a single NAPI per board.
> It is possible in a single invocation of the poll loop to process frames
> for both ports.  GRO works by combining received packets from identical
> flows over one NAPI interval. It is possible on sky2 that one packet
> could be processed for the first port, and the next packet processed was for
> second port and the two packets were related so that GRO would combine them.
> The check for the same dev is required to prevent this. Yes it is an unlikely
> corner case, but the purpose of GRO is to do aggregation but preserve the
> flow characteristics of the incoming traffic.

If that is true then GRO is going to refuse to merge every single
frame that arrives on the second port of a SKY2 device. :-)

This is because for the two ports, you allocate and register one NAPI
instance which uses only the first port's netdev pointer.

Therefore, when GRO compares napi->dev to skb->dev it will always not
match for packets coming from the second port since the netdev pointer
in skb->dev will be different.

Since netpoll does similar things, this means both NAPI and netpoll
cannot function properly with SKY2's second port.  It will only work
right on the first port.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
stephen hemminger Aug. 28, 2010, 10:31 p.m. UTC | #9
On Sat, 28 Aug 2010 14:41:30 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> Since netpoll does similar things, this means both NAPI and netpoll
> cannot function properly with SKY2's second port.  It will only work
> right on the first port.

I knew netpoll was broken on second port.
David Miller Aug. 28, 2010, 10:33 p.m. UTC | #10
From: Stephen Hemminger <shemminger@vyatta.com>
Date: Sat, 28 Aug 2010 15:31:24 -0700

> On Sat, 28 Aug 2010 14:41:30 -0700 (PDT)
> David Miller <davem@davemloft.net> wrote:
> 
>> Since netpoll does similar things, this means both NAPI and netpoll
>> cannot function properly with SKY2's second port.  It will only work
>> right on the first port.
> 
> I knew netpoll was broken on second port.

Well, now we know that GRO is too :-)

If we wish to keep the one-to-one mapping of NAPI contexts to
interrupt sources, what we can do is provide some kind of "dev list".

The other option is to register two NAPI contexts and schedule them
both on an interrupt.

But in both cases, detecting the end of polling, and thus when to turn
the interrupts back on, is non-trivial.

I really don't like either solution, therefore, so I'll try to brain
storm on this a bit more.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski Aug. 29, 2010, 9:59 a.m. UTC | #11
On Sat, Aug 28, 2010 at 02:41:30PM -0700, David Miller wrote:
> From: Stephen Hemminger <shemminger@vyatta.com>
> Date: Sat, 28 Aug 2010 10:14:24 -0700
> 
> > The Marvell Yukon2 hardware supports two interfaces sharing a common interrupt.
> > Therfore the sky2 driver has up to two net devices and a single NAPI per board.
> > It is possible in a single invocation of the poll loop to process frames
> > for both ports.  GRO works by combining received packets from identical
> > flows over one NAPI interval. It is possible on sky2 that one packet
> > could be processed for the first port, and the next packet processed was for
> > second port and the two packets were related so that GRO would combine them.
> > The check for the same dev is required to prevent this. Yes it is an unlikely
> > corner case, but the purpose of GRO is to do aggregation but preserve the
> > flow characteristics of the incoming traffic.
> 
> If that is true then GRO is going to refuse to merge every single
> frame that arrives on the second port of a SKY2 device. :-)
> 
> This is because for the two ports, you allocate and register one NAPI
> instance which uses only the first port's netdev pointer.
> 
> Therefore, when GRO compares napi->dev to skb->dev it will always not

Actually, when GRO compares napi->dev to skb->dev?

Jarek P.

> match for packets coming from the second port since the netdev pointer
> in skb->dev will be different.
> 
> Since netpoll does similar things, this means both NAPI and netpoll
> cannot function properly with SKY2's second port.  It will only work
> right on the first port.
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Aug. 29, 2010, 5:06 p.m. UTC | #12
From: Jarek Poplawski <jarkao2@gmail.com>
Date: Sun, 29 Aug 2010 11:59:51 +0200

> Actually, when GRO compares napi->dev to skb->dev?

Hmmm, I thought the code made a skb->dev comparison with the
existing SKBs in the list when checking for same-flow matches.

It doesn't, probably based upon the assumption that a NAPI
instance maps to a unique device, the very topic we're
discussing right now :-/


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Aug. 29, 2010, 6:39 p.m. UTC | #13
Le dimanche 29 août 2010 à 10:06 -0700, David Miller a écrit :
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Sun, 29 Aug 2010 11:59:51 +0200
> 
> > Actually, when GRO compares napi->dev to skb->dev?
> 
> Hmmm, I thought the code made a skb->dev comparison with the
> existing SKBs in the list when checking for same-flow matches.
> 
> It doesn't, probably based upon the assumption that a NAPI
> instance maps to a unique device, the very topic we're
> discussing right now :-/
> 
> 

It does the check, Stephen added it in the commit I mentioned to start
this thread.

With net-next-2.6 this now reads :


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski Aug. 30, 2010, 6:42 a.m. UTC | #14
On 2010-08-29 20:39, Eric Dumazet wrote:
> Le dimanche 29 aoĂťt 2010 Ă  10:06 -0700, David Miller a ĂŠcrit :
>> From: Jarek Poplawski <jarkao2@gmail.com>
>> Date: Sun, 29 Aug 2010 11:59:51 +0200
>>
>>> Actually, when GRO compares napi->dev to skb->dev?
>>
>> Hmmm, I thought the code made a skb->dev comparison with the
>> existing SKBs in the list when checking for same-flow matches.
>>
>> It doesn't, probably based upon the assumption that a NAPI
>> instance maps to a unique device, the very topic we're
>> discussing right now :-/
>>
>>
> 
> It does the check, Stephen added it in the commit I mentioned to start
> this thread.
> 
> With net-next-2.6 this now reads :
> 

Since Stephen didn't seem to miss this too much it seems quite obvious
to me this check should be removed.

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
stephen hemminger Aug. 30, 2010, 3:57 p.m. UTC | #15
On Mon, 30 Aug 2010 06:42:31 +0000
Jarek Poplawski <jarkao2@gmail.com> wrote:

> On 2010-08-29 20:39, Eric Dumazet wrote:
> > Le dimanche 29 aoĂťt 2010 Ă  10:06 -0700, David Miller a ĂŠcrit :
> >> From: Jarek Poplawski <jarkao2@gmail.com>
> >> Date: Sun, 29 Aug 2010 11:59:51 +0200
> >>
> >>> Actually, when GRO compares napi->dev to skb->dev?
> >>
> >> Hmmm, I thought the code made a skb->dev comparison with the
> >> existing SKBs in the list when checking for same-flow matches.
> >>
> >> It doesn't, probably based upon the assumption that a NAPI
> >> instance maps to a unique device, the very topic we're
> >> discussing right now :-/
> >>
> >>
> > 
> > It does the check, Stephen added it in the commit I mentioned to start
> > this thread.
> > 
> > With net-next-2.6 this now reads :
> > 
> 
> Since Stephen didn't seem to miss this too much it seems quite obvious
> to me this check should be removed.

No. I just don't use that system much, breaking code for
sake of one comparison is ridiculous. If you need to do some
performance wanking, go figure out why just loading netfilter
drops forwarding performance by 20%.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Aug. 30, 2010, 4:50 p.m. UTC | #16
From: Stephen Hemminger <shemminger@vyatta.com>
Date: Mon, 30 Aug 2010 08:57:21 -0700

> On Mon, 30 Aug 2010 06:42:31 +0000
> Jarek Poplawski <jarkao2@gmail.com> wrote:
> 
>> On 2010-08-29 20:39, Eric Dumazet wrote:
>> > Le dimanche 29 aoĂťt 2010 Ă  10:06 -0700, David Miller a ĂŠcrit :
>> >> From: Jarek Poplawski <jarkao2@gmail.com>
>> >> Date: Sun, 29 Aug 2010 11:59:51 +0200
>> >>
>> >>> Actually, when GRO compares napi->dev to skb->dev?
>> >>
>> >> Hmmm, I thought the code made a skb->dev comparison with the
>> >> existing SKBs in the list when checking for same-flow matches.
>> >>
>> >> It doesn't, probably based upon the assumption that a NAPI
>> >> instance maps to a unique device, the very topic we're
>> >> discussing right now :-/
>> >>
>> >>
>> > 
>> > It does the check, Stephen added it in the commit I mentioned to start
>> > this thread.
>> > 
>> > With net-next-2.6 this now reads :
>> > 
>> 
>> Since Stephen didn't seem to miss this too much it seems quite obvious
>> to me this check should be removed.
> 
> No. I just don't use that system much, breaking code for
> sake of one comparison is ridiculous.

It's not working to begin with.

I agree with Jarek that the check should be removed.  And GRO is one
of those places that, precisely, even one memory reference removal
can improve performance dramatically.

Herbert spent a lot of time doing micro-optimizations to make GRO
better and better, and the smallest things can turn out to make a huge
difference there.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski Aug. 30, 2010, 6:36 p.m. UTC | #17
On Mon, Aug 30, 2010 at 09:50:12AM -0700, David Miller wrote:
> From: Stephen Hemminger <shemminger@vyatta.com>
> Date: Mon, 30 Aug 2010 08:57:21 -0700
> 
> > On Mon, 30 Aug 2010 06:42:31 +0000
> > Jarek Poplawski <jarkao2@gmail.com> wrote:
> > 
> >> On 2010-08-29 20:39, Eric Dumazet wrote:
> >> > Le dimanche 29 aoĂťt 2010 Ă  10:06 -0700, David Miller a ĂŠcrit :
> >> >> From: Jarek Poplawski <jarkao2@gmail.com>
> >> >> Date: Sun, 29 Aug 2010 11:59:51 +0200
> >> >>
> >> >>> Actually, when GRO compares napi->dev to skb->dev?
> >> >>
> >> >> Hmmm, I thought the code made a skb->dev comparison with the
> >> >> existing SKBs in the list when checking for same-flow matches.
> >> >>
> >> >> It doesn't, probably based upon the assumption that a NAPI
> >> >> instance maps to a unique device, the very topic we're
> >> >> discussing right now :-/
> >> >>
> >> >>
> >> > 
> >> > It does the check, Stephen added it in the commit I mentioned to start
> >> > this thread.
> >> > 
> >> > With net-next-2.6 this now reads :
> >> > 
> >> 
> >> Since Stephen didn't seem to miss this too much it seems quite obvious
> >> to me this check should be removed.
> > 
> > No. I just don't use that system much, breaking code for
> > sake of one comparison is ridiculous.
> 
> It's not working to begin with.

IMHO it should work yet. On the other hand, after removing this test
GRO should still work OK for these NICs in most cases, so it should be
treated as an optimization only. And it seems very unusual to keep such
optimizations at this level for such rare cases.

> I agree with Jarek that the check should be removed.  And GRO is one
> of those places that, precisely, even one memory reference removal
> can improve performance dramatically.

Hmm... I proposed removal when Stephen didn't defend it. Since he
seems to change his line, I'd prefer to be convinced I'm wrong,
preferably with some use/test case; most preferably from some
desperated user...

> Herbert spent a lot of time doing micro-optimizations to make GRO
> better and better, and the smallest things can turn out to make a huge
> difference there.

Anyway, after the last Eric's optimization, it's almost invisible...

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Aug. 30, 2010, 7:59 p.m. UTC | #18
So everybody jumped on GRO, while my concern was more the napi->dev
thing.

I pointed to gro because the commit was about gro, but gro was fine
IMHO.

Are we sure netpoll is ok ?

If a napi is shared by several net_device, should'nt we change
netpoll_poll_lock() ?

static inline void *netpoll_poll_lock(struct napi_struct *napi)
{
        struct net_device *dev = napi->dev;

        if (dev && dev->npinfo) {
                spin_lock(&napi->poll_lock);
                napi->poll_owner = smp_processor_id();
                return napi;
        }
        return NULL;
}

---->

static inline void *netpoll_poll_lock(struct napi_struct *napi)
{
        if (atomic_read(&napi->poll_count)) {
                spin_lock(&napi->poll_lock);
                napi->poll_owner = smp_processor_id();
                return napi;
        }
        return NULL;
}



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
stephen hemminger Aug. 30, 2010, 8:12 p.m. UTC | #19
On Mon, 30 Aug 2010 21:59:16 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> So everybody jumped on GRO, while my concern was more the napi->dev
> thing.
> 
> I pointed to gro because the commit was about gro, but gro was fine
> IMHO.
> 
> Are we sure netpoll is ok ?

Netpoll is dependent on napi->netdev but does not cause problems.
The sky2 driver doesn't expose netpoll on second port.
The second port has different net_device_ops without netpoll.
Eric Dumazet Aug. 30, 2010, 8:19 p.m. UTC | #20
Le lundi 30 août 2010 à 13:12 -0700, Stephen Hemminger a écrit :
> On Mon, 30 Aug 2010 21:59:16 +0200
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > So everybody jumped on GRO, while my concern was more the napi->dev
> > thing.
> > 
> > I pointed to gro because the commit was about gro, but gro was fine
> > IMHO.
> > 
> > Are we sure netpoll is ok ?
> 
> Netpoll is dependent on napi->netdev but does not cause problems.
> The sky2 driver doesn't expose netpoll on second port.
> The second port has different net_device_ops without netpoll.
> 
> 

Ah, great, thanks for clarification.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 46c36ff..ad59f76 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1339,6 +1339,36 @@  static inline void skb_gro_pull(struct sk_buff *skb, unsigned int len)
 	NAPI_GRO_CB(skb)->data_offset += len;
 }
 
+/*
+ * Complete pulling of the header(s) from the fragment
+ */
+static inline void skb_gro_pull_in(struct sk_buff *skb)
+{
+	int grow;
+
+	if (skb_headlen(skb) >= skb_gro_offset(skb))
+		return;
+
+	grow = skb_gro_offset(skb) - skb_headlen(skb);
+
+	BUG_ON(skb->end - skb->tail < grow);
+
+	memcpy(skb_tail_pointer(skb), NAPI_GRO_CB(skb)->frag0, grow);
+
+	skb->tail += grow;
+	skb->data_len -= grow;
+
+	skb_shinfo(skb)->frags[0].page_offset += grow;
+	skb_shinfo(skb)->frags[0].size -= grow;
+
+	if (unlikely(!skb_shinfo(skb)->frags[0].size)) {
+		put_page(skb_shinfo(skb)->frags[0].page);
+		memmove(skb_shinfo(skb)->frags,
+			skb_shinfo(skb)->frags + 1,
+			--skb_shinfo(skb)->nr_frags * sizeof(skb_frag_t));
+	}
+}
+
 static inline void *skb_gro_header_fast(struct sk_buff *skb,
 					unsigned int offset)
 {
@@ -1701,6 +1731,9 @@  extern struct sk_buff *	napi_get_frags(struct napi_struct *napi);
 extern gro_result_t	napi_frags_finish(struct napi_struct *napi,
 					  struct sk_buff *skb,
 					  gro_result_t ret);
+extern gro_result_t	__napi_frags_finish(struct napi_struct *napi,
+					    struct sk_buff *skb,
+					    gro_result_t ret);
 extern struct sk_buff *	napi_frags_skb(struct napi_struct *napi);
 extern gro_result_t	napi_gro_frags(struct napi_struct *napi);
 
diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
index 01ddb04..58289fe 100644
--- a/net/8021q/vlan_core.c
+++ b/net/8021q/vlan_core.c
@@ -139,13 +139,19 @@  gro_result_t vlan_gro_frags(struct napi_struct *napi, struct vlan_group *grp,
 	if (!skb)
 		return GRO_DROP;
 
-	if (netpoll_rx_on(skb)) {
-		skb->protocol = eth_type_trans(skb, skb->dev);
+	/*
+	 * Complete the eth header here, mainly for skb_bond_should_drop(),
+	 * and for netpoll_rx_on() btw.
+	 */
+	skb_gro_pull_in(skb);
+	skb->protocol = eth_type_trans(skb, skb->dev);
+	skb_gro_pull(skb, -ETH_HLEN);
+
+	if (netpoll_rx_on(skb))
 		return vlan_hwaccel_receive_skb(skb, grp, vlan_tci)
 			? GRO_DROP : GRO_NORMAL;
-	}
 
-	return napi_frags_finish(napi, skb,
-				 vlan_gro_common(napi, grp, vlan_tci, skb));
+	return __napi_frags_finish(napi, skb,
+				   vlan_gro_common(napi, grp, vlan_tci, skb));
 }
 EXPORT_SYMBOL(vlan_gro_frags);
diff --git a/net/core/dev.c b/net/core/dev.c
index 3721fbb..cdc0be5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3126,27 +3126,7 @@  enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff *skb)
 	ret = GRO_HELD;
 
 pull:
-	if (skb_headlen(skb) < skb_gro_offset(skb)) {
-		int grow = skb_gro_offset(skb) - skb_headlen(skb);
-
-		BUG_ON(skb->end - skb->tail < grow);
-
-		memcpy(skb_tail_pointer(skb), NAPI_GRO_CB(skb)->frag0, grow);
-
-		skb->tail += grow;
-		skb->data_len -= grow;
-
-		skb_shinfo(skb)->frags[0].page_offset += grow;
-		skb_shinfo(skb)->frags[0].size -= grow;
-
-		if (unlikely(!skb_shinfo(skb)->frags[0].size)) {
-			put_page(skb_shinfo(skb)->frags[0].page);
-			memmove(skb_shinfo(skb)->frags,
-				skb_shinfo(skb)->frags + 1,
-				--skb_shinfo(skb)->nr_frags * sizeof(skb_frag_t));
-		}
-	}
-
+	skb_gro_pull_in(skb);
 ok:
 	return ret;
 
@@ -3267,6 +3247,32 @@  gro_result_t napi_frags_finish(struct napi_struct *napi, struct sk_buff *skb,
 }
 EXPORT_SYMBOL(napi_frags_finish);
 
+/*
+ * The lighter version of napi_frags_finish() without eth_type_trans() etc.
+ */
+gro_result_t __napi_frags_finish(struct napi_struct *napi, struct sk_buff *skb,
+				 gro_result_t ret)
+{
+	switch (ret) {
+	case GRO_NORMAL:
+		if (netif_receive_skb(skb))
+			ret = GRO_DROP;
+		break;
+
+	case GRO_DROP:
+	case GRO_MERGED_FREE:
+		napi_reuse_skb(napi, skb);
+		break;
+
+	case GRO_HELD:
+	case GRO_MERGED:
+		break;
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL(__napi_frags_finish);
+
 struct sk_buff *napi_frags_skb(struct napi_struct *napi)
 {
 	struct sk_buff *skb = napi->skb;