[net-next,6/9] e1000e: fix buffer overrun while the I219 is processing DMA transactions
diff mbox series

Message ID 20171010172139.77914-7-jeffrey.t.kirsher@intel.com
State Accepted, archived
Delegated to: David Miller
Headers show
Series
  • 1GbE Intel Wired LAN Driver Updates 2017-10-10
Related show

Commit Message

Jeff Kirsher Oct. 10, 2017, 5:21 p.m. UTC
From: Sasha Neftin <sasha.neftin@intel.com>

IntelĀ® 100/200 Series Chipset platforms reduced the round-trip
latency for the LAN Controller DMA accesses, causing in some high
performance cases a buffer overrun while the I219 LAN Connected
Device is processing the DMA transactions. I219LM and I219V devices
can fall into unrecovered Tx hang under very stressfully UDP traffic
and multiple reconnection of Ethernet cable. This Tx hang of the LAN
Controller is only recovered if the system is rebooted. Slightly slow
down DMA access by reducing the number of outstanding requests.
This workaround could have an impact on TCP traffic performance
on the platform. Disabling TSO eliminates performance loss for TCP
traffic without a noticeable impact on CPU performance.

Please, refer to I218/I219 specification update:
https://www.intel.com/content/www/us/en/embedded/products/networking/
ethernet-connection-i218-family-documentation.html

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com>
Reviewed-by: Raanan Avargil <raanan.avargil@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

Comments

David Laight Oct. 11, 2017, 9:07 a.m. UTC | #1
From: Jeff Kirsher

> Sent: 10 October 2017 18:22

> Intel 100/200 Series Chipset platforms reduced the round-trip

> latency for the LAN Controller DMA accesses, causing in some high

> performance cases a buffer overrun while the I219 LAN Connected

> Device is processing the DMA transactions. I219LM and I219V devices

> can fall into unrecovered Tx hang under very stressfully UDP traffic

> and multiple reconnection of Ethernet cable. This Tx hang of the LAN

> Controller is only recovered if the system is rebooted. Slightly slow

> down DMA access by reducing the number of outstanding requests.

> This workaround could have an impact on TCP traffic performance

> on the platform. Disabling TSO eliminates performance loss for TCP

> traffic without a noticeable impact on CPU performance.

> 

> Please, refer to I218/I219 specification update:

> https://www.intel.com/content/www/us/en/embedded/products/networking/

> ethernet-connection-i218-family-documentation.html

> 

> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>

> Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com>

> Reviewed-by: Raanan Avargil <raanan.avargil@intel.com>

> Tested-by: Aaron Brown <aaron.f.brown@intel.com>

> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

> ---

>  drivers/net/ethernet/intel/e1000e/netdev.c | 8 +++++---

>  1 file changed, 5 insertions(+), 3 deletions(-)

> 

> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c

> index ee9de3500331..14b096f3d1da 100644

> --- a/drivers/net/ethernet/intel/e1000e/netdev.c

> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c

> @@ -3021,8 +3021,8 @@ static void e1000_configure_tx(struct e1000_adapter *adapter)

> 

>  	hw->mac.ops.config_collision_dist(hw);

> 

> -	/* SPT and CNP Si errata workaround to avoid data corruption */

> -	if (hw->mac.type >= e1000_pch_spt) {

> +	/* SPT and KBL Si errata workaround to avoid data corruption */

> +	if (hw->mac.type == e1000_pch_spt) {

>  		u32 reg_val;

> 

>  		reg_val = er32(IOSFPC);

> @@ -3030,7 +3030,9 @@ static void e1000_configure_tx(struct e1000_adapter *adapter)

>  		ew32(IOSFPC, reg_val);

> 

>  		reg_val = er32(TARC(0));

> -		reg_val |= E1000_TARC0_CB_MULTIQ_3_REQ;

> +		/* SPT and KBL Si errata workaround to avoid Tx hang */

> +		reg_val &= ~BIT(28);

> +		reg_val |= BIT(29);


Shouldn't some more of the commit message about what this is doing
be in the comment?
And shouldn't the 28 and 28 be named constants?

>  		ew32(TARC(0), reg_val);


	David
Sasha Neftin Oct. 16, 2017, 10:24 a.m. UTC | #2
On 10/11/2017 12:07, David Laight wrote:
> From: Jeff Kirsher
>> Sent: 10 October 2017 18:22
>> Intel 100/200 Series Chipset platforms reduced the round-trip
>> latency for the LAN Controller DMA accesses, causing in some high
>> performance cases a buffer overrun while the I219 LAN Connected
>> Device is processing the DMA transactions. I219LM and I219V devices
>> can fall into unrecovered Tx hang under very stressfully UDP traffic
>> and multiple reconnection of Ethernet cable. This Tx hang of the LAN
>> Controller is only recovered if the system is rebooted. Slightly slow
>> down DMA access by reducing the number of outstanding requests.
>> This workaround could have an impact on TCP traffic performance
>> on the platform. Disabling TSO eliminates performance loss for TCP
>> traffic without a noticeable impact on CPU performance.
>>
>> Please, refer to I218/I219 specification update:
>> https://www.intel.com/content/www/us/en/embedded/products/networking/
>> ethernet-connection-i218-family-documentation.html
>>
>> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
>> Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com>
>> Reviewed-by: Raanan Avargil <raanan.avargil@intel.com>
>> Tested-by: Aaron Brown <aaron.f.brown@intel.com>
>> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
>> ---
>>   drivers/net/ethernet/intel/e1000e/netdev.c | 8 +++++---
>>   1 file changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
>> index ee9de3500331..14b096f3d1da 100644
>> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
>> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
>> @@ -3021,8 +3021,8 @@ static void e1000_configure_tx(struct e1000_adapter *adapter)
>>
>>   	hw->mac.ops.config_collision_dist(hw);
>>
>> -	/* SPT and CNP Si errata workaround to avoid data corruption */
>> -	if (hw->mac.type >= e1000_pch_spt) {
>> +	/* SPT and KBL Si errata workaround to avoid data corruption */
>> +	if (hw->mac.type == e1000_pch_spt) {
>>   		u32 reg_val;
>>
>>   		reg_val = er32(IOSFPC);
>> @@ -3030,7 +3030,9 @@ static void e1000_configure_tx(struct e1000_adapter *adapter)
>>   		ew32(IOSFPC, reg_val);
>>
>>   		reg_val = er32(TARC(0));
>> -		reg_val |= E1000_TARC0_CB_MULTIQ_3_REQ;
>> +		/* SPT and KBL Si errata workaround to avoid Tx hang */
>> +		reg_val &= ~BIT(28);
>> +		reg_val |= BIT(29);
> Shouldn't some more of the commit message about what this is doing
> be in the comment?
There is provided link on specification update: 
https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/i218-i219-ethernet-connection-spec-update.pdf?asset=9561. 
This is Intel's public edition.
> And shouldn't the 28 and 28 be named constants?
(28 and 29) you can easy understand from the code that value has been 
changed from 3 to 2. There is no point add flags here I thought.
>
>>   		ew32(TARC(0), reg_val);
> 	David
>
Thanks,

Sasha
Sasha Neftin Oct. 16, 2017, 10:39 a.m. UTC | #3
On 10/11/2017 12:07, David Laight wrote:
> From: Jeff Kirsher
>> Sent: 10 October 2017 18:22
>> Intel 100/200 Series Chipset platforms reduced the round-trip
>> latency for the LAN Controller DMA accesses, causing in some high
>> performance cases a buffer overrun while the I219 LAN Connected
>> Device is processing the DMA transactions. I219LM and I219V devices
>> can fall into unrecovered Tx hang under very stressfully UDP traffic
>> and multiple reconnection of Ethernet cable. This Tx hang of the LAN
>> Controller is only recovered if the system is rebooted. Slightly slow
>> down DMA access by reducing the number of outstanding requests.
>> This workaround could have an impact on TCP traffic performance
>> on the platform. Disabling TSO eliminates performance loss for TCP
>> traffic without a noticeable impact on CPU performance.
>>
>> Please, refer to I218/I219 specification update:
>> https://www.intel.com/content/www/us/en/embedded/products/networking/
>> ethernet-connection-i218-family-documentation.html
>>
>> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
>> Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com>
>> Reviewed-by: Raanan Avargil <raanan.avargil@intel.com>
>> Tested-by: Aaron Brown <aaron.f.brown@intel.com>
>> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
>> ---
>>   drivers/net/ethernet/intel/e1000e/netdev.c | 8 +++++---
>>   1 file changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
>> index ee9de3500331..14b096f3d1da 100644
>> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
>> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
>> @@ -3021,8 +3021,8 @@ static void e1000_configure_tx(struct e1000_adapter *adapter)
>>
>>   	hw->mac.ops.config_collision_dist(hw);
>>
>> -	/* SPT and CNP Si errata workaround to avoid data corruption */
>> -	if (hw->mac.type >= e1000_pch_spt) {
>> +	/* SPT and KBL Si errata workaround to avoid data corruption */
>> +	if (hw->mac.type == e1000_pch_spt) {
>>   		u32 reg_val;
>>
>>   		reg_val = er32(IOSFPC);
>> @@ -3030,7 +3030,9 @@ static void e1000_configure_tx(struct e1000_adapter *adapter)
>>   		ew32(IOSFPC, reg_val);
>>
>>   		reg_val = er32(TARC(0));
>> -		reg_val |= E1000_TARC0_CB_MULTIQ_3_REQ;
>> +		/* SPT and KBL Si errata workaround to avoid Tx hang */
>> +		reg_val &= ~BIT(28);
>> +		reg_val |= BIT(29);
> Shouldn't some more of the commit message about what this is doing
> be in the comment?
There is provided link on specification update: 
https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/i218-i219-ethernet-connection-spec-update.pdf?asset=9561. 
This is Intel's public release.
> And shouldn't the 28 and 28 be named constants?
(28 and 29) - you can easy understand from code that same value has been 
changed from 3 to 2. There is no point add flag here I thought.
>
>>   		ew32(TARC(0), reg_val);
> 	David
>
Thanks,

Sasha
David Laight Oct. 16, 2017, 1:27 p.m. UTC | #4
From: Neftin, Sasha

> Sent: 16 October 2017 11:40

> On 10/11/2017 12:07, David Laight wrote:

> > From: Jeff Kirsher

> >> Sent: 10 October 2017 18:22

> >> Intel 100/200 Series Chipset platforms reduced the round-trip

> >> latency for the LAN Controller DMA accesses, causing in some high

> >> performance cases a buffer overrun while the I219 LAN Connected

> >> Device is processing the DMA transactions. I219LM and I219V devices

> >> can fall into unrecovered Tx hang under very stressfully UDP traffic

> >> and multiple reconnection of Ethernet cable. This Tx hang of the LAN

> >> Controller is only recovered if the system is rebooted. Slightly slow

> >> down DMA access by reducing the number of outstanding requests.

> >> This workaround could have an impact on TCP traffic performance

> >> on the platform. Disabling TSO eliminates performance loss for TCP

> >> traffic without a noticeable impact on CPU performance.

> >>

> >> Please, refer to I218/I219 specification update:

> >> https://www.intel.com/content/www/us/en/embedded/products/networking/

> >> ethernet-connection-i218-family-documentation.html

> >>

> >> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>

> >> Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com>

> >> Reviewed-by: Raanan Avargil <raanan.avargil@intel.com>

> >> Tested-by: Aaron Brown <aaron.f.brown@intel.com>

> >> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

> >> ---

> >>   drivers/net/ethernet/intel/e1000e/netdev.c | 8 +++++---

> >>   1 file changed, 5 insertions(+), 3 deletions(-)

> >>

> >> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c

> b/drivers/net/ethernet/intel/e1000e/netdev.c

> >> index ee9de3500331..14b096f3d1da 100644

> >> --- a/drivers/net/ethernet/intel/e1000e/netdev.c

> >> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c

> >> @@ -3021,8 +3021,8 @@ static void e1000_configure_tx(struct e1000_adapter *adapter)

> >>

> >>   	hw->mac.ops.config_collision_dist(hw);

> >>

> >> -	/* SPT and CNP Si errata workaround to avoid data corruption */

> >> -	if (hw->mac.type >= e1000_pch_spt) {

> >> +	/* SPT and KBL Si errata workaround to avoid data corruption */

> >> +	if (hw->mac.type == e1000_pch_spt) {

> >>   		u32 reg_val;

> >>

> >>   		reg_val = er32(IOSFPC);

> >> @@ -3030,7 +3030,9 @@ static void e1000_configure_tx(struct e1000_adapter *adapter)

> >>   		ew32(IOSFPC, reg_val);

> >>

> >>   		reg_val = er32(TARC(0));

> >> -		reg_val |= E1000_TARC0_CB_MULTIQ_3_REQ;

> >> +		/* SPT and KBL Si errata workaround to avoid Tx hang */

> >> +		reg_val &= ~BIT(28);

> >> +		reg_val |= BIT(29);


> > Shouldn't some more of the commit message about what this is doing

> > be in the comment?


> There is provided link on specification update:

> https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/i218-i219-ethernet-

> connection-spec-update.pdf?asset=9561.

> This is Intel's public release.


And sometime next week the marketing people will decide to reorganise the
web site and the link will become invalid.

> > And shouldn't the 28 and 28 be named constants?


> (28 and 29) - you can easy understand from code that same value has been

> changed from 3 to 2. There is no point add flag here I thought.


Oh, there is. The 'workaround is':
  Slightly slow down DMA access by reducing the number of outstanding requests.
  This workaround could have an impact on TCP traffic performance and could
  reduce performance up to 5 to 15% (depending) on the platform.
  Disabling TSO eliminates performance loss for TCP traffic without a 
  noticeable impact on CPU performance.

I wonder what tests they did to show that TSO doesn't save cpu cycles!

So my guess is that you are changing the number of outstanding PCIe reads
(or reads for tx buffers, or ???) from 3 to 2.

Lets read between the lines a little further
(since you are at Intel you can probably check this):
Assuming that TSO is 'Transmit Segmentation Offload' and that TSO packets
might be 64k, then reading 3 TSO packets might issue PCIe reads for 196k
bytes of data (under 4k for non-TSO).
If the internal buffer that this data is stored in isn't that big then
that internal buffer would overflow.
It might be that data is removed from this buffer as soon as the last
completion TLP arrives - but they can be interleaved with other
outstanding PCIe reads.
It all rather depends on the negotiated maximum TLP size and number
of tags.

Perhaps reducing the maximum TSO packet to 32k stops the overflow
as well...

	David
Alexander Duyck Oct. 16, 2017, 4:11 p.m. UTC | #5
On Mon, Oct 16, 2017 at 3:24 AM, Neftin, Sasha <sasha.neftin@intel.com> wrote:
> On 10/11/2017 12:07, David Laight wrote:
>>
>> From: Jeff Kirsher
>>>
>>> Sent: 10 October 2017 18:22
>>> Intel 100/200 Series Chipset platforms reduced the round-trip
>>> latency for the LAN Controller DMA accesses, causing in some high
>>> performance cases a buffer overrun while the I219 LAN Connected
>>> Device is processing the DMA transactions. I219LM and I219V devices
>>> can fall into unrecovered Tx hang under very stressfully UDP traffic
>>> and multiple reconnection of Ethernet cable. This Tx hang of the LAN
>>> Controller is only recovered if the system is rebooted. Slightly slow
>>> down DMA access by reducing the number of outstanding requests.
>>> This workaround could have an impact on TCP traffic performance
>>> on the platform. Disabling TSO eliminates performance loss for TCP
>>> traffic without a noticeable impact on CPU performance.
>>>
>>> Please, refer to I218/I219 specification update:
>>> https://www.intel.com/content/www/us/en/embedded/products/networking/
>>> ethernet-connection-i218-family-documentation.html
>>>
>>> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
>>> Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com>
>>> Reviewed-by: Raanan Avargil <raanan.avargil@intel.com>
>>> Tested-by: Aaron Brown <aaron.f.brown@intel.com>
>>> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
>>> ---
>>>   drivers/net/ethernet/intel/e1000e/netdev.c | 8 +++++---
>>>   1 file changed, 5 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c
>>> b/drivers/net/ethernet/intel/e1000e/netdev.c
>>> index ee9de3500331..14b096f3d1da 100644
>>> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
>>> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
>>> @@ -3021,8 +3021,8 @@ static void e1000_configure_tx(struct e1000_adapter
>>> *adapter)
>>>
>>>         hw->mac.ops.config_collision_dist(hw);
>>>
>>> -       /* SPT and CNP Si errata workaround to avoid data corruption */
>>> -       if (hw->mac.type >= e1000_pch_spt) {
>>> +       /* SPT and KBL Si errata workaround to avoid data corruption */
>>> +       if (hw->mac.type == e1000_pch_spt) {
>>>                 u32 reg_val;
>>>
>>>                 reg_val = er32(IOSFPC);
>>> @@ -3030,7 +3030,9 @@ static void e1000_configure_tx(struct e1000_adapter
>>> *adapter)
>>>                 ew32(IOSFPC, reg_val);
>>>
>>>                 reg_val = er32(TARC(0));
>>> -               reg_val |= E1000_TARC0_CB_MULTIQ_3_REQ;
>>> +               /* SPT and KBL Si errata workaround to avoid Tx hang */
>>> +               reg_val &= ~BIT(28);
>>> +               reg_val |= BIT(29);
>>
>> Shouldn't some more of the commit message about what this is doing
>> be in the comment?
>
> There is provided link on specification update:
> https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/i218-i219-ethernet-connection-spec-update.pdf?asset=9561.
> This is Intel's public edition.
>>
>> And shouldn't the 28 and 28 be named constants?
>
> (28 and 29) you can easy understand from the code that value has been
> changed from 3 to 2. There is no point add flags here I thought.

I have to agree with David. This isn't clear and this is going in the
opposite direction of being clear towards being very murky.

You already had the E1000_TARC0_CB_MULTIQ_3_REQ define. It shouldn't
be hard to come up with a bitmask that defines the full width of the
field you are updating so that you can use that mask to clear out the
value, and then also define a value for "MULTIQ_2_REQ" to replace it
the value you were using before. Assuming we still want to go with
this route.

He also has a point about using netif_set_gso_max_size() to restrict
the GSO size. If that would work for something like this then that
might be the preferred way to go as you wouldn't be introducing the
same type of issues as you currently do in that you are requiring
disabling TSO in order to avoid "performance loss" which in this case
I assume you are only referring to throughput without taking CPU into
account.

- Alex

Patch
diff mbox series

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index ee9de3500331..14b096f3d1da 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -3021,8 +3021,8 @@  static void e1000_configure_tx(struct e1000_adapter *adapter)
 
 	hw->mac.ops.config_collision_dist(hw);
 
-	/* SPT and CNP Si errata workaround to avoid data corruption */
-	if (hw->mac.type >= e1000_pch_spt) {
+	/* SPT and KBL Si errata workaround to avoid data corruption */
+	if (hw->mac.type == e1000_pch_spt) {
 		u32 reg_val;
 
 		reg_val = er32(IOSFPC);
@@ -3030,7 +3030,9 @@  static void e1000_configure_tx(struct e1000_adapter *adapter)
 		ew32(IOSFPC, reg_val);
 
 		reg_val = er32(TARC(0));
-		reg_val |= E1000_TARC0_CB_MULTIQ_3_REQ;
+		/* SPT and KBL Si errata workaround to avoid Tx hang */
+		reg_val &= ~BIT(28);
+		reg_val |= BIT(29);
 		ew32(TARC(0), reg_val);
 	}
 }