diff mbox

[net-next,2/2] bnx2x: add RSS capability for GRE traffic

Message ID 1363625464-21633-2-git-send-email-dmitry@broadcom.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Dmitry Kravkov March 18, 2013, 4:51 p.m. UTC
The patch drives FW to perform RSS for GRE traffic,
based on inner headers.

Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h |    3 +++
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c  |   23 ++++++++++++-----------
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.h  |    9 +++++++++
 3 files changed, 24 insertions(+), 11 deletions(-)

Comments

David Miller March 18, 2013, 5:11 p.m. UTC | #1
From: "Dmitry Kravkov" <dmitry@broadcom.com>
Date: Mon, 18 Mar 2013 18:51:04 +0200

> The patch drives FW to perform RSS for GRE traffic,
> based on inner headers.
> 
> Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
> Signed-off-by: Eilon Greenstein <eilong@broadcom.com>

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet March 19, 2013, 12:07 a.m. UTC | #2
On Mon, 2013-03-18 at 18:51 +0200, Dmitry Kravkov wrote:
> The patch drives FW to perform RSS for GRE traffic,
> based on inner headers.
> 
> Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
> Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
> ---
>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h |    3 +++
>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c  |   23 ++++++++++++-----------
>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.h  |    9 +++++++++
>  3 files changed, 24 insertions(+), 11 deletions(-)

This works very well.

Problem is we skb_set_queue_mapping(skb, 0); in __skb_tunnel_rx()

(this was a patch from Tom Herbert, commit
693019e90ca45d881109d32c0c6d29adf03f6447 (net: reset skb queue mapping
when rx'ing over tunnel )

Meaning we hit a single cpu for the GRO stuff in ip_gre.

I have to think about it.


Another question is : 

Can bnx2x check the tcp checksum if GRE encapsulated ?



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dmitry Kravkov March 19, 2013, 6:30 a.m. UTC | #3
> -----Original Message-----

> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On Behalf Of Eric Dumazet

> Sent: Tuesday, March 19, 2013 2:08 AM

> To: Dmitry Kravkov

> Cc: davem@davemloft.net; netdev@vger.kernel.org; Eilon Greenstein; Tom Herbert; Maciej Żenczykowski

> Subject: Re: [PATCH net-next 2/2] bnx2x: add RSS capability for GRE traffic

> 

> On Mon, 2013-03-18 at 18:51 +0200, Dmitry Kravkov wrote:

> > The patch drives FW to perform RSS for GRE traffic,

> > based on inner headers.

> >

> > Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>

> > Signed-off-by: Eilon Greenstein <eilong@broadcom.com>

> > ---

> >  drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h |    3 +++

> >  drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c  |   23 ++++++++++++-----------

> >  drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.h  |    9 +++++++++

> >  3 files changed, 24 insertions(+), 11 deletions(-)

> 

> This works very well.

> 

> Problem is we skb_set_queue_mapping(skb, 0); in __skb_tunnel_rx()

> 

> (this was a patch from Tom Herbert, commit

> 693019e90ca45d881109d32c0c6d29adf03f6447 (net: reset skb queue mapping

> when rx'ing over tunnel )

> 

> Meaning we hit a single cpu for the GRO stuff in ip_gre.

> 

> I have to think about it.

> 

> 

> Another question is :

> 

> Can bnx2x check the tcp checksum if GRE encapsulated ?

> 

Current HW can't provide this. Probably, it's possible to separate CSUM from GRO/TPA then stack will have to handle CSUM validation for huge packets. Is it worth?
Maciej Żenczykowski March 19, 2013, 9:18 a.m. UTC | #4
Can the HW calculate and return a 1s complement sum of the entire
packet (or a large portion there-of)?
Fixing that up to be only of the outer IPv4, inner IPv4 and inner TCP
relevant portions should still be simpler (well faster) than
calculating the TCP checksum.
I'm pretty sure that some relationship between 1s complement sum of
all bytes, outer IPv4 checksum, inner IPv4 checksum and TCP checksum
could be pulled out of a hat with some deeper thought.  (similarly for
IPv4/GRE/IPv6/TCP and other combinations)

What portions of the packet can the HW/FW [partially] checksum - and
return the value to the driver for further processing?
Can it return 1s complement sum of data portion of outer IPv4 (ie. in
IPv4/GRE/IPv4/TCP return a 1s complement sum of GRE/IPv4/TCP bytes)

Maciej Żenczykowski, Kernel Networking Developer @ Google

On Mon, Mar 18, 2013 at 11:30 PM, Dmitry Kravkov <dmitry@broadcom.com> wrote:
>> -----Original Message-----
>> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On Behalf Of Eric Dumazet
>> Sent: Tuesday, March 19, 2013 2:08 AM
>> To: Dmitry Kravkov
>> Cc: davem@davemloft.net; netdev@vger.kernel.org; Eilon Greenstein; Tom Herbert; Maciej Żenczykowski
>> Subject: Re: [PATCH net-next 2/2] bnx2x: add RSS capability for GRE traffic
>>
>> On Mon, 2013-03-18 at 18:51 +0200, Dmitry Kravkov wrote:
>> > The patch drives FW to perform RSS for GRE traffic,
>> > based on inner headers.
>> >
>> > Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
>> > Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
>> > ---
>> >  drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h |    3 +++
>> >  drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c  |   23 ++++++++++++-----------
>> >  drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.h  |    9 +++++++++
>> >  3 files changed, 24 insertions(+), 11 deletions(-)
>>
>> This works very well.
>>
>> Problem is we skb_set_queue_mapping(skb, 0); in __skb_tunnel_rx()
>>
>> (this was a patch from Tom Herbert, commit
>> 693019e90ca45d881109d32c0c6d29adf03f6447 (net: reset skb queue mapping
>> when rx'ing over tunnel )
>>
>> Meaning we hit a single cpu for the GRO stuff in ip_gre.
>>
>> I have to think about it.
>>
>>
>> Another question is :
>>
>> Can bnx2x check the tcp checksum if GRE encapsulated ?
>>
> Current HW can't provide this. Probably, it's possible to separate CSUM from GRO/TPA then stack will have to handle CSUM validation for huge packets. Is it worth?
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dmitry Kravkov March 19, 2013, 12:21 p.m. UTC | #5
> -----Original Message-----

> From: Maciej Żenczykowski [mailto:maze@google.com]

> Sent: Tuesday, March 19, 2013 11:18 AM

> To: Dmitry Kravkov

> Cc: Eric Dumazet; davem@davemloft.net; netdev@vger.kernel.org; Eilon Greenstein; Tom Herbert

> Subject: Re: [PATCH net-next 2/2] bnx2x: add RSS capability for GRE traffic

> 

> Can the HW calculate and return a 1s complement sum of the entire

> packet (or a large portion there-of)?

No, it can't :(

> Fixing that up to be only of the outer IPv4, inner IPv4 and inner TCP

> relevant portions should still be simpler (well faster) than

> calculating the TCP checksum.

> I'm pretty sure that some relationship between 1s complement sum of

> all bytes, outer IPv4 checksum, inner IPv4 checksum and TCP checksum

> could be pulled out of a hat with some deeper thought.  (similarly for

> IPv4/GRE/IPv6/TCP and other combinations)

> 

> What portions of the packet can the HW/FW [partially] checksum - and

> return the value to the driver for further processing?

> Can it return 1s complement sum of data portion of outer IPv4 (ie. in

> IPv4/GRE/IPv4/TCP return a 1s complement sum of GRE/IPv4/TCP bytes)

> 

> Maciej Żenczykowski, Kernel Networking Developer @ Google

> 

> On Mon, Mar 18, 2013 at 11:30 PM, Dmitry Kravkov <dmitry@broadcom.com> wrote:

> >> -----Original Message-----

> >> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On Behalf Of Eric Dumazet

> >> Sent: Tuesday, March 19, 2013 2:08 AM

> >> To: Dmitry Kravkov

> >> Cc: davem@davemloft.net; netdev@vger.kernel.org; Eilon Greenstein; Tom Herbert; Maciej Żenczykowski

> >> Subject: Re: [PATCH net-next 2/2] bnx2x: add RSS capability for GRE traffic

> >>

> >> On Mon, 2013-03-18 at 18:51 +0200, Dmitry Kravkov wrote:

> >> > The patch drives FW to perform RSS for GRE traffic,

> >> > based on inner headers.

> >> >

> >> > Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>

> >> > Signed-off-by: Eilon Greenstein <eilong@broadcom.com>

> >> > ---

> >> >  drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h |    3 +++

> >> >  drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c  |   23 ++++++++++++-----------

> >> >  drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.h  |    9 +++++++++

> >> >  3 files changed, 24 insertions(+), 11 deletions(-)

> >>

> >> This works very well.

> >>

> >> Problem is we skb_set_queue_mapping(skb, 0); in __skb_tunnel_rx()

> >>

> >> (this was a patch from Tom Herbert, commit

> >> 693019e90ca45d881109d32c0c6d29adf03f6447 (net: reset skb queue mapping

> >> when rx'ing over tunnel )

> >>

> >> Meaning we hit a single cpu for the GRO stuff in ip_gre.

> >>

> >> I have to think about it.

> >>

> >>

> >> Another question is :

> >>

> >> Can bnx2x check the tcp checksum if GRE encapsulated ?

> >>

> > Current HW can't provide this. Probably, it's possible to separate CSUM from GRO/TPA then stack will have to handle CSUM

> validation for huge packets. Is it worth?

> >
Eric Dumazet March 19, 2013, 12:25 p.m. UTC | #6
Please Maciej do not top post on lkml or netdev mailing lists.

On Tue, 2013-03-19 at 02:18 -0700, Maciej Żenczykowski wrote:
> Can the HW calculate and return a 1s complement sum of the entire
> packet (or a large portion there-of)?
> Fixing that up to be only of the outer IPv4, inner IPv4 and inner TCP
> relevant portions should still be simpler (well faster) than
> calculating the TCP checksum.
> I'm pretty sure that some relationship between 1s complement sum of
> all bytes, outer IPv4 checksum, inner IPv4 checksum and TCP checksum
> could be pulled out of a hat with some deeper thought.  (similarly for
> IPv4/GRE/IPv6/TCP and other combinations)
> 
> What portions of the packet can the HW/FW [partially] checksum - and
> return the value to the driver for further processing?
> Can it return 1s complement sum of data portion of outer IPv4 (ie. in
> IPv4/GRE/IPv4/TCP return a 1s complement sum of GRE/IPv4/TCP bytes)
> 

I assume Dmitry was speaking of this possibility, and our stack should
handle this just fine.

NIC providing these kind of checksums set :

skb->ip_summed = CHECKSUM_COMPLETE;
skb->csum = csum;

before feeding the packet to the stack.

When we pull some header, we have to call skb_postpull_rcsum()
to adjust the skb->csum so that final check can be done.

About 20 drivers currently provide these kind of checksumming.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dmitry Kravkov March 19, 2013, 1:43 p.m. UTC | #7
> -----Original Message-----

> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]

> Sent: Tuesday, March 19, 2013 2:26 PM

> To: Maciej Żenczykowski

> Cc: Dmitry Kravkov; davem@davemloft.net; netdev@vger.kernel.org; Eilon Greenstein; Tom Herbert

> Subject: Re: [PATCH net-next 2/2] bnx2x: add RSS capability for GRE traffic

> 

> Please Maciej do not top post on lkml or netdev mailing lists.

> 

> On Tue, 2013-03-19 at 02:18 -0700, Maciej Żenczykowski wrote:

> > Can the HW calculate and return a 1s complement sum of the entire

> > packet (or a large portion there-of)?

> > Fixing that up to be only of the outer IPv4, inner IPv4 and inner TCP

> > relevant portions should still be simpler (well faster) than

> > calculating the TCP checksum.

> > I'm pretty sure that some relationship between 1s complement sum of

> > all bytes, outer IPv4 checksum, inner IPv4 checksum and TCP checksum

> > could be pulled out of a hat with some deeper thought.  (similarly for

> > IPv4/GRE/IPv6/TCP and other combinations)

> >

> > What portions of the packet can the HW/FW [partially] checksum - and

> > return the value to the driver for further processing?

> > Can it return 1s complement sum of data portion of outer IPv4 (ie. in

> > IPv4/GRE/IPv4/TCP return a 1s complement sum of GRE/IPv4/TCP bytes)

> >

> 

> I assume Dmitry was speaking of this possibility, and our stack should

> handle this just fine.


In case of tunneling bnx2x HW can not provide csum of any portion of the packet.
Flag for XSUM_NO_VALIDATION on cqe will be set for all gre packets.
As a result driver will leave:
skb->ip_summed = CHECKSUM_NONE;

> 

> NIC providing these kind of checksums set :

> 

> skb->ip_summed = CHECKSUM_COMPLETE;

> skb->csum = csum;

> 

> before feeding the packet to the stack.

> 

> When we pull some header, we have to call skb_postpull_rcsum()

> to adjust the skb->csum so that final check can be done.

> 

> About 20 drivers currently provide these kind of checksumming.

>
Jerry Chu Aug. 17, 2013, 5:53 p.m. UTC | #8
Dmitry,

On Mon, Mar 18, 2013 at 11:30 PM, Dmitry Kravkov <dmitry@broadcom.com> wrote:
>> -----Original Message-----
>> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On Behalf Of Eric Dumazet
>> Sent: Tuesday, March 19, 2013 2:08 AM
>> To: Dmitry Kravkov
>> Cc: davem@davemloft.net; netdev@vger.kernel.org; Eilon Greenstein; Tom Herbert; Maciej Żenczykowski
>> Subject: Re: [PATCH net-next 2/2] bnx2x: add RSS capability for GRE traffic
>>
>> On Mon, 2013-03-18 at 18:51 +0200, Dmitry Kravkov wrote:
>> > The patch drives FW to perform RSS for GRE traffic,
>> > based on inner headers.
>> >
>> > Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
>> > Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
>> > ---
>> >  drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h |    3 +++
>> >  drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c  |   23 ++++++++++++-----------
>> >  drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.h  |    9 +++++++++
>> >  3 files changed, 24 insertions(+), 11 deletions(-)
>>
>> This works very well.
>>
>> Problem is we skb_set_queue_mapping(skb, 0); in __skb_tunnel_rx()
>>
>> (this was a patch from Tom Herbert, commit
>> 693019e90ca45d881109d32c0c6d29adf03f6447 (net: reset skb queue mapping
>> when rx'ing over tunnel )
>>
>> Meaning we hit a single cpu for the GRO stuff in ip_gre.
>>
>> I have to think about it.
>>
>>
>> Another question is :
>>
>> Can bnx2x check the tcp checksum if GRE encapsulated ?
>>
> Current HW can't provide this. Probably, it's possible to separate CSUM from GRO/TPA then stack will have to handle CSUM validation for huge packets. Is it worth?

Could you elaborate on what you meant above? (I'm looking for any form
of h/w assist for
csum validation for GRO/TPA pkts since csum computation can be expensive and  as
you said below CHECKSUM_COMPLETE is out of the question.)

Thanks,

Jerry

>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dmitry Kravkov Aug. 17, 2013, 6:52 p.m. UTC | #9
On Sat, Aug 17, 2013 at 8:53 PM, Jerry Chu <hkchu@google.com> wrote:
> Dmitry,
>
> On Mon, Mar 18, 2013 at 11:30 PM, Dmitry Kravkov <dmitry@broadcom.com> wrote:
>>> -----Original Message-----
>>> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On Behalf Of Eric Dumazet
>>> Sent: Tuesday, March 19, 2013 2:08 AM
>>> To: Dmitry Kravkov
>>> Cc: davem@davemloft.net; netdev@vger.kernel.org; Eilon Greenstein; Tom Herbert; Maciej Żenczykowski
>>> Subject: Re: [PATCH net-next 2/2] bnx2x: add RSS capability for GRE traffic
>>>
>>> On Mon, 2013-03-18 at 18:51 +0200, Dmitry Kravkov wrote:
>>> > The patch drives FW to perform RSS for GRE traffic,
>>> > based on inner headers.
>>> >
>>> > Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
>>> > Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
>>> > ---
>>> >  drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h |    3 +++
>>> >  drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c  |   23 ++++++++++++-----------
>>> >  drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.h  |    9 +++++++++
>>> >  3 files changed, 24 insertions(+), 11 deletions(-)
>>>
>>> This works very well.
>>>
>>> Problem is we skb_set_queue_mapping(skb, 0); in __skb_tunnel_rx()
>>>
>>> (this was a patch from Tom Herbert, commit
>>> 693019e90ca45d881109d32c0c6d29adf03f6447 (net: reset skb queue mapping
>>> when rx'ing over tunnel )
>>>
>>> Meaning we hit a single cpu for the GRO stuff in ip_gre.
>>>
>>> I have to think about it.
>>>
>>>
>>> Another question is :
>>>
>>> Can bnx2x check the tcp checksum if GRE encapsulated ?
>>>
>> Current HW can't provide this. Probably, it's possible to separate CSUM from GRO/TPA then stack will have to handle CSUM validation for huge packets. Is it worth?
>
> Could you elaborate on what you meant above? (I'm looking for any form
> of h/w assist for
> csum validation for GRO/TPA pkts since csum computation can be expensive and  as
> you said below CHECKSUM_COMPLETE is out of the question.)
>

Current bnx2x HW is not able to perform CSUM validation for
encapsulated packets, so in any case host needs to do that.
Today GRO/TPA feature depends on CSUM, but theoretically (i did not
investigate it) and probably HW can provide aggregated packets w/o
csum validation - this will save headers processing for host.


> Thanks,
>
> Jerry
>
>>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Aug. 17, 2013, 7:01 p.m. UTC | #10
On Sat, 2013-08-17 at 21:52 +0300, Dmitry Kravkov wrote:

> 
> Current bnx2x HW is not able to perform CSUM validation for
> encapsulated packets, so in any case host needs to do that.
> Today GRO/TPA feature depends on CSUM, but theoretically (i did not
> investigate it) and probably HW can provide aggregated packets w/o
> csum validation - this will save headers processing for host.

I am not sure I understand this.

Aggregation cannot be done if csums are not validated.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dmitry Kravkov Aug. 17, 2013, 7:04 p.m. UTC | #11
On Sat, Aug 17, 2013 at 10:01 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Sat, 2013-08-17 at 21:52 +0300, Dmitry Kravkov wrote:
>
>>
>> Current bnx2x HW is not able to perform CSUM validation for
>> encapsulated packets, so in any case host needs to do that.
>> Today GRO/TPA feature depends on CSUM, but theoretically (i did not
>> investigate it) and probably HW can provide aggregated packets w/o
>> csum validation - this will save headers processing for host.
>
> I am not sure I understand this.
>
> Aggregation cannot be done if csums are not validated.
>
>

Thanks. Eric. So we can't do with current bnx2x HW.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jerry Chu Aug. 18, 2013, 11:55 a.m. UTC | #12
On Sat, Aug 17, 2013 at 12:01 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Sat, 2013-08-17 at 21:52 +0300, Dmitry Kravkov wrote:
>
>>
>> Current bnx2x HW is not able to perform CSUM validation for
>> encapsulated packets, so in any case host needs to do that.
>> Today GRO/TPA feature depends on CSUM, but theoretically (i did not
>> investigate it) and probably HW can provide aggregated packets w/o
>> csum validation - this will save headers processing for host.
>
> I am not sure I understand this.
>
> Aggregation cannot be done if csums are not validated.

Unless all the csums from aggregated pkts are also aggregated into one
(so that the host computes s/w csum on the large pkt and validates against
the aggregated csum...) There may be some performance benefit for doing
this but it may arguably weaken already weak 1's complement csum.

Jerry

>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Aug. 18, 2013, 3:37 p.m. UTC | #13
On Sun, 2013-08-18 at 04:55 -0700, Jerry Chu wrote:
> On Sat, Aug 17, 2013 at 12:01 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Sat, 2013-08-17 at 21:52 +0300, Dmitry Kravkov wrote:
> >
> >>
> >> Current bnx2x HW is not able to perform CSUM validation for
> >> encapsulated packets, so in any case host needs to do that.
> >> Today GRO/TPA feature depends on CSUM, but theoretically (i did not
> >> investigate it) and probably HW can provide aggregated packets w/o
> >> csum validation - this will save headers processing for host.
> >
> > I am not sure I understand this.
> >
> > Aggregation cannot be done if csums are not validated.
> 
> Unless all the csums from aggregated pkts are also aggregated into one
> (so that the host computes s/w csum on the large pkt and validates against
> the aggregated csum...) There may be some performance benefit for doing
> this but it may arguably weaken already weak 1's complement csum.

To aggregate two packets, you first have to make sure the checksums of
each packet are ok. If the hardware does not validate checksum, then it
cannot aggregate packets.

Sure, CHECKSUM_COMPLETE support would be nice, so that linux can perform
aggregation without having to bring into cpu cache the whole frame.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jerry Chu Aug. 18, 2013, 9:11 p.m. UTC | #14
+mwdalton, uday

On Sun, Aug 18, 2013 at 8:37 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Sun, 2013-08-18 at 04:55 -0700, Jerry Chu wrote:
>> On Sat, Aug 17, 2013 at 12:01 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > On Sat, 2013-08-17 at 21:52 +0300, Dmitry Kravkov wrote:
>> >
>> >>
>> >> Current bnx2x HW is not able to perform CSUM validation for
>> >> encapsulated packets, so in any case host needs to do that.
>> >> Today GRO/TPA feature depends on CSUM, but theoretically (i did not
>> >> investigate it) and probably HW can provide aggregated packets w/o
>> >> csum validation - this will save headers processing for host.
>> >
>> > I am not sure I understand this.
>> >
>> > Aggregation cannot be done if csums are not validated.
>>
>> Unless all the csums from aggregated pkts are also aggregated into one
>> (so that the host computes s/w csum on the large pkt and validates against
>> the aggregated csum...) There may be some performance benefit for doing
>> this but it may arguably weaken already weak 1's complement csum.
>
> To aggregate two packets, you first have to make sure the checksums of
> each packet are ok. If the hardware does not validate checksum, then it
> cannot aggregate packets.

It can if the csum validation is deferred. But this requires the stack code to
aggregate csum from individual pkt header, which can get ugly as one needs
to remove the header part from the individual csum first. It may also weaken
the already weak TCP csum as i mentioned previously. Also one corrupted
pkt will cause the whole GRO pkt to be thrown away. But I won't worry about
it for the data center environment where pkt corruption is rare.

>
> Sure, CHECKSUM_COMPLETE support would be nice, so that linux can perform
> aggregation without having to bring into cpu cache the whole frame.

I wish CHECKSUM_COMPLETE can be supported by the h/w - we are currently
running an older kernel w/o RSS for GRE support so all the GRE traffic between
two IPs hits the same NIC queue. With my GRO-GRE patch (to be upstreamed
later) I'm 20% below line rate (7.1Gbps) on 10GbE multi-TCP stream test because
RSS will saturate a single CPU that is handling softirq. The majority
of CPU cost
was from bcopy (~20%) followed by csum computation (4%). If I comment out the
csum validation in the GRO code then I hit line rate (9.1Gbps) right away on the
same test.

In order for me to saturate the line rate but without h/w CHECKSUM_COMPLETE
support, one other alternative is to back port the RSS for GRE support, which
seems a daunting task (if not please tell me). If I defer the csum
validation after
GRO then RPS will spread out traffic and the csum cost so we are no longer
bottlenecked on one CPU (and hopefully hit the line rate).

Jerry

>
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Aug. 18, 2013, 11:44 p.m. UTC | #15
On Sun, 2013-08-18 at 14:11 -0700, Jerry Chu wrote:

> It can if the csum validation is deferred. But this requires the stack code to
> aggregate csum from individual pkt header, which can get ugly as one needs
> to remove the header part from the individual csum first.

You can _not_ aggregate packets _before_ validating their checksum, or
else you could end up with a correct final checksum while two or more
segments had buggy checksums. Checksum are already quite weak, please
do not make them even weaker ;)

GRO should not throw out any segments, it is not its role.

If a router receives a bunch of packets, it should forward them
regardless of the (TCP) checksum being good or not.

The final receiver has full responsibility for ultimately validate the
checksums and update various SNMP counters.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jerry Chu Aug. 19, 2013, 2:57 p.m. UTC | #16
On Sun, Aug 18, 2013 at 4:44 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Sun, 2013-08-18 at 14:11 -0700, Jerry Chu wrote:
>
>> It can if the csum validation is deferred. But this requires the stack code to
>> aggregate csum from individual pkt header, which can get ugly as one needs
>> to remove the header part from the individual csum first.
>
> You can _not_ aggregate packets _before_ validating their checksum, or
> else you could end up with a correct final checksum while two or more
> segments had buggy checksums. Checksum are already quite weak, please
> do not make them even weaker ;)

Yes I know how weak the 16bit 1's complement inet csum (see my previous email).
I've had the pleasure to debug a number of bugs in the past where 16bit words
were flipped by the buggy h/w hence went undetected by the weak TCP csum.

But I also think it's a tradeoff call. For WAN it's probably a bad
idea but for data
center traffic, pkt corruption is usually rare. If the performance
gain is large enough,
perhaps it justifies this hack (?)

>
> GRO should not throw out any segments, it is not its role.

I did not suggest GRO to throw away any pkt. GRO would just fix the csum
field in the header of the aggregated pkt and passes it on.

>
> If a router receives a bunch of packets, it should forward them
> regardless of the (TCP) checksum being good or not.

It won't affect the forwarding path (i think).

>
> The final receiver has full responsibility for ultimately validate the
> checksums and update various SNMP counters.

When the GRO pkt with the csum field fixed (aggregated) get to
the receiving TCP endpoint, it will be csum validated by the TCP stack
because skb->ip_summed == CHECKSUM_NONE, and the pkt will be
tossed if the csum check fails like non-GRO pkts.

To be clear, this hack is not something I'd be proud of and I doubt the
performance gain will be enough to justify it. This idea grew more out of
desperation for not having csum offload for encap'ed pkts, which I think
is a P0 bug!

Jerry

>
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
index 8f96372..f9098d8 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
@@ -973,6 +973,9 @@  static inline int bnx2x_func_start(struct bnx2x *bp)
 	else /* CHIP_IS_E1X */
 		start_params->network_cos_mode = FW_WRR;
 
+	start_params->gre_tunnel_mode = IPGRE_TUNNEL;
+	start_params->gre_tunnel_rss = GRE_INNER_HEADERS_RSS;
+
 	return bnx2x_func_state_change(bp, &func_params);
 }
 
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c
index 66ab259..5bdc1d6 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c
@@ -5679,17 +5679,18 @@  static inline int bnx2x_func_send_start(struct bnx2x *bp,
 	memset(rdata, 0, sizeof(*rdata));
 
 	/* Fill the ramrod data with provided parameters */
-	rdata->function_mode    = (u8)start_params->mf_mode;
-	rdata->sd_vlan_tag      = cpu_to_le16(start_params->sd_vlan_tag);
-	rdata->path_id          = BP_PATH(bp);
-	rdata->network_cos_mode = start_params->network_cos_mode;
-
-	/*
-	 *  No need for an explicit memory barrier here as long we would
-	 *  need to ensure the ordering of writing to the SPQ element
-	 *  and updating of the SPQ producer which involves a memory
-	 *  read and we will have to put a full memory barrier there
-	 *  (inside bnx2x_sp_post()).
+	rdata->function_mode	= (u8)start_params->mf_mode;
+	rdata->sd_vlan_tag	= cpu_to_le16(start_params->sd_vlan_tag);
+	rdata->path_id		= BP_PATH(bp);
+	rdata->network_cos_mode	= start_params->network_cos_mode;
+	rdata->gre_tunnel_mode	= start_params->gre_tunnel_mode;
+	rdata->gre_tunnel_rss	= start_params->gre_tunnel_rss;
+
+	/* No need for an explicit memory barrier here as long we would
+	 * need to ensure the ordering of writing to the SPQ element
+	 * and updating of the SPQ producer which involves a memory
+	 * read and we will have to put a full memory barrier there
+	 * (inside bnx2x_sp_post()).
 	 */
 
 	return bnx2x_sp_post(bp, RAMROD_CMD_ID_COMMON_FUNCTION_START, 0,
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.h
index 064dba2..35479da 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.h
@@ -1123,6 +1123,15 @@  struct bnx2x_func_start_params {
 
 	/* Function cos mode */
 	u8 network_cos_mode;
+
+	/* NVGRE classification enablement */
+	u8 nvgre_clss_en;
+
+	/* NO_GRE_TUNNEL/NVGRE_TUNNEL/L2GRE_TUNNEL/IPGRE_TUNNEL */
+	u8 gre_tunnel_mode;
+
+	/* GRE_OUTER_HEADERS_RSS/GRE_INNER_HEADERS_RSS/NVGRE_KEY_ENTROPY_RSS */
+	u8 gre_tunnel_rss;
 };
 
 struct bnx2x_func_switch_update_params {