diff mbox

Failure to send fragmented IP packet in case of missing ARP entry

Message ID 1347270171.1234.1353.camel@edumazet-glaptop
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Sept. 10, 2012, 9:42 a.m. UTC
On Mon, 2012-09-10 at 12:59 +0400, Andrei Dolnikov wrote:
> Hello all,
> 
> The following issue is observed on most Linux distributions:
> Transmission of fragmented IP packets in case of missing ARP entry for 
> destination IP fails.
> Actually ARP request is sent, and, once ARP response is received, only 
> few queued fragments are transmitted. Remaining fragments are lost.
> It can be easily reproduced as follows:
>      # arp -d <dst IP>
>      # ping -s 65000 -c 1 <dst IP>
> Ping result is: "1 packets transmitted, 0 received, 100% packet loss, 
> time 0ms".
> 
> The latest kernel version I tried was 3.5.0-1 x86_64, but I also was 
> able to reproduce it with 3.2.x, 3.0.x and 2.6.32.
> It doesn't depend on hardware: was able to reproduce with VMWare Player, 
> Intel based laptop, Intel Atom and ARM based custom boards.
> As I'm not a networking standards expert I'm not sure if it's a real bug 
> or acceptable behaviour, but decided to raise the issue here as I can't 
> reproduce this anomaly with the Windows 7 PC.
> 
> Thanks,
> Andrei.
> --

Its a bit better with linux-3.3, with commit
8b5c171bb3dc0686b2647a84e990199c5faa9ef8
(neigh: new unresolved queue limits)

+neigh/default/unres_qlen_bytes - INTEGER
+       The maximum number of bytes which may be used by packets
+       queued for each unresolved address by other network layers.
+       (added in linux 3.3)
+
+neigh/default/unres_qlen - INTEGER
+       The maximum number of packets which may be queued for each
+       unresolved address by other network layers.
+       (deprecated in linux 3.3) : use unres_qlen_bytes instead.


Problem is : unres_qlen_bytes default value is 65536, so its a bit too
small once you take into account truesize overhead

I guess following patch would be needed :





--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Eric Dumazet Sept. 10, 2012, 9:53 a.m. UTC | #1
On Mon, 2012-09-10 at 11:42 +0200, Eric Dumazet wrote:
> On Mon, 2012-09-10 at 12:59 +0400, Andrei Dolnikov wrote:
> > Hello all,
> > 
> > The following issue is observed on most Linux distributions:
> > Transmission of fragmented IP packets in case of missing ARP entry for 
> > destination IP fails.
> > Actually ARP request is sent, and, once ARP response is received, only 
> > few queued fragments are transmitted. Remaining fragments are lost.
> > It can be easily reproduced as follows:
> >      # arp -d <dst IP>
> >      # ping -s 65000 -c 1 <dst IP>
> > Ping result is: "1 packets transmitted, 0 received, 100% packet loss, 
> > time 0ms".
> > 
> > The latest kernel version I tried was 3.5.0-1 x86_64, but I also was 
> > able to reproduce it with 3.2.x, 3.0.x and 2.6.32.
> > It doesn't depend on hardware: was able to reproduce with VMWare Player, 
> > Intel based laptop, Intel Atom and ARM based custom boards.
> > As I'm not a networking standards expert I'm not sure if it's a real bug 
> > or acceptable behaviour, but decided to raise the issue here as I can't 
> > reproduce this anomaly with the Windows 7 PC.
> > 
> > Thanks,
> > Andrei.
> > --
> 
> Its a bit better with linux-3.3, with commit
> 8b5c171bb3dc0686b2647a84e990199c5faa9ef8
> (neigh: new unresolved queue limits)
> 
> +neigh/default/unres_qlen_bytes - INTEGER
> +       The maximum number of bytes which may be used by packets
> +       queued for each unresolved address by other network layers.
> +       (added in linux 3.3)
> +
> +neigh/default/unres_qlen - INTEGER
> +       The maximum number of packets which may be queued for each
> +       unresolved address by other network layers.
> +       (deprecated in linux 3.3) : use unres_qlen_bytes instead.
> 
> 
> Problem is : unres_qlen_bytes default value is 65536, so its a bit too
> small once you take into account truesize overhead
> 
> I guess following patch would be needed :
> 
> diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
> index 4780045..3395bb6 100644
> --- a/net/ipv4/arp.c
> +++ b/net/ipv4/arp.c
> @@ -171,7 +171,7 @@ struct neigh_table arp_tbl = {
>  		.gc_staletime		= 60 * HZ,
>  		.reachable_time		= 30 * HZ,
>  		.delay_probe_time	= 5 * HZ,
> -		.queue_len_bytes	= 64*1024,
> +		.queue_len_bytes	= 64 * SKB_TRUESIZE(1024),
>  		.ucast_probes		= 3,
>  		.mcast_probes		= 3,
>  		.anycast_delay		= 1 * HZ,

In the mean time, you can also do

echo 50 >/proc/sys/net/ipv4/neigh/eth0/unres_qlen

(change eth0 by the name of your interface)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrei Dolnikov Sept. 12, 2012, 3:54 p.m. UTC | #2
Works for me.

Thank you!
Andrei.

On 09/10/2012 01:53 PM, Eric Dumazet wrote:
> On Mon, 2012-09-10 at 11:42 +0200, Eric Dumazet wrote:
>> On Mon, 2012-09-10 at 12:59 +0400, Andrei Dolnikov wrote:
>>> Hello all,
>>>
>>> The following issue is observed on most Linux distributions:
>>> Transmission of fragmented IP packets in case of missing ARP entry for
>>> destination IP fails.
>>> Actually ARP request is sent, and, once ARP response is received, only
>>> few queued fragments are transmitted. Remaining fragments are lost.
>>> It can be easily reproduced as follows:
>>>       # arp -d <dst IP>
>>>       # ping -s 65000 -c 1 <dst IP>
>>> Ping result is: "1 packets transmitted, 0 received, 100% packet loss,
>>> time 0ms".
>>>
>>> The latest kernel version I tried was 3.5.0-1 x86_64, but I also was
>>> able to reproduce it with 3.2.x, 3.0.x and 2.6.32.
>>> It doesn't depend on hardware: was able to reproduce with VMWare Player,
>>> Intel based laptop, Intel Atom and ARM based custom boards.
>>> As I'm not a networking standards expert I'm not sure if it's a real bug
>>> or acceptable behaviour, but decided to raise the issue here as I can't
>>> reproduce this anomaly with the Windows 7 PC.
>>>
>>> Thanks,
>>> Andrei.
>>> --
>> Its a bit better with linux-3.3, with commit
>> 8b5c171bb3dc0686b2647a84e990199c5faa9ef8
>> (neigh: new unresolved queue limits)
>>
>> +neigh/default/unres_qlen_bytes - INTEGER
>> +       The maximum number of bytes which may be used by packets
>> +       queued for each unresolved address by other network layers.
>> +       (added in linux 3.3)
>> +
>> +neigh/default/unres_qlen - INTEGER
>> +       The maximum number of packets which may be queued for each
>> +       unresolved address by other network layers.
>> +       (deprecated in linux 3.3) : use unres_qlen_bytes instead.
>>
>>
>> Problem is : unres_qlen_bytes default value is 65536, so its a bit too
>> small once you take into account truesize overhead
>>
>> I guess following patch would be needed :
>>
>> diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
>> index 4780045..3395bb6 100644
>> --- a/net/ipv4/arp.c
>> +++ b/net/ipv4/arp.c
>> @@ -171,7 +171,7 @@ struct neigh_table arp_tbl = {
>>   		.gc_staletime		= 60 * HZ,
>>   		.reachable_time		= 30 * HZ,
>>   		.delay_probe_time	= 5 * HZ,
>> -		.queue_len_bytes	= 64*1024,
>> +		.queue_len_bytes	= 64 * SKB_TRUESIZE(1024),
>>   		.ucast_probes		= 3,
>>   		.mcast_probes		= 3,
>>   		.anycast_delay		= 1 * HZ,
> In the mean time, you can also do
>
> echo 50 >/proc/sys/net/ipv4/neigh/eth0/unres_qlen
>
> (change eth0 by the name of your interface)
>
>
>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 4780045..3395bb6 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -171,7 +171,7 @@  struct neigh_table arp_tbl = {
 		.gc_staletime		= 60 * HZ,
 		.reachable_time		= 30 * HZ,
 		.delay_probe_time	= 5 * HZ,
-		.queue_len_bytes	= 64*1024,
+		.queue_len_bytes	= 64 * SKB_TRUESIZE(1024),
 		.ucast_probes		= 3,
 		.mcast_probes		= 3,
 		.anycast_delay		= 1 * HZ,