Message ID | 1324465386.2728.29.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
Eric Dumazet wrote: > Le mercredi 21 décembre 2011 à 11:45 +0100, Eric Dumazet a écrit : >> Le mercredi 21 décembre 2011 à 11:41 +0100, Michal Simek a écrit : >> >>> yes, page size is 4k. >>> >>> Here is the log for allocation - origin size for MTU 9000 is 0x2420(9248) and size after >>> overhead is 0x3f40(16192) >>> >>> kmalloc_node_track_caller size 00002420, SKB_WITH_OVERHEAD(ksize(data)) 00003f40 >>> skb->head c7870000 >>> skb->data c7870000 >>> skb->tail c7870000 >>> skb->truesize 000040c0 >>> skb->end c7873f40 >>> >> So basically truesize is bigger than previous kernel, and you hit a >> socket limit. (RCVBUF ?) >> >> truesize is more correct, so we uncover prior bugs. >> >> We should allow at least one packet, regardless of its size :( >> >> > > I would try the following patch : > > diff --git a/include/net/sock.h b/include/net/sock.h > index bf6b9fd..21bb3b5 100644 > --- a/include/net/sock.h > +++ b/include/net/sock.h > @@ -662,12 +662,14 @@ static inline void __sk_add_backlog(struct sock *sk, struct sk_buff *skb) > > /* > * Take into account size of receive queue and backlog queue > + * Do not take into account this skb truesize, > + * to allow even a single big packet to come. > */ > static inline bool sk_rcvqueues_full(const struct sock *sk, const struct sk_buff *skb) > { > unsigned int qsize = sk->sk_backlog.len + atomic_read(&sk->sk_rmem_alloc); > > - return qsize + skb->truesize > sk->sk_rcvbuf; > + return qsize > sk->sk_rcvbuf; > } > > /* The per-socket spinlock must be held here. */ > It doesn't fix the problem. Do you want to add some printk to code? Michal
Le mercredi 21 décembre 2011 à 12:10 +0100, Michal Simek a écrit : > Eric Dumazet wrote: > > Le mercredi 21 décembre 2011 à 11:45 +0100, Eric Dumazet a écrit : > >> Le mercredi 21 décembre 2011 à 11:41 +0100, Michal Simek a écrit : > >> > >>> yes, page size is 4k. > >>> > >>> Here is the log for allocation - origin size for MTU 9000 is 0x2420(9248) and size after > >>> overhead is 0x3f40(16192) > >>> > >>> kmalloc_node_track_caller size 00002420, SKB_WITH_OVERHEAD(ksize(data)) 00003f40 > >>> skb->head c7870000 > >>> skb->data c7870000 > >>> skb->tail c7870000 > >>> skb->truesize 000040c0 > >>> skb->end c7873f40 > >>> > >> So basically truesize is bigger than previous kernel, and you hit a > >> socket limit. (RCVBUF ?) > >> > >> truesize is more correct, so we uncover prior bugs. > >> > >> We should allow at least one packet, regardless of its size :( > >> > >> > > > > I would try the following patch : > > > > diff --git a/include/net/sock.h b/include/net/sock.h > > index bf6b9fd..21bb3b5 100644 > > --- a/include/net/sock.h > > +++ b/include/net/sock.h > > @@ -662,12 +662,14 @@ static inline void __sk_add_backlog(struct sock *sk, struct sk_buff *skb) > > > > /* > > * Take into account size of receive queue and backlog queue > > + * Do not take into account this skb truesize, > > + * to allow even a single big packet to come. > > */ > > static inline bool sk_rcvqueues_full(const struct sock *sk, const struct sk_buff *skb) > > { > > unsigned int qsize = sk->sk_backlog.len + atomic_read(&sk->sk_rmem_alloc); > > > > - return qsize + skb->truesize > sk->sk_rcvbuf; > > + return qsize > sk->sk_rcvbuf; > > } > > > > /* The per-socket spinlock must be held here. */ > > > > It doesn't fix the problem. Do you want to add some printk to code? > > Michal > > > try to change SOCK_MIN_RCVBUF include/net/sock.h #define SOCK_MIN_RCVBUF (2048 + sizeof(struct sk_buff)) -> #define SOCK_MIN_RCVBUF (16384 + sizeof(struct sk_buff)) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet wrote: > Le mercredi 21 décembre 2011 à 12:10 +0100, Michal Simek a écrit : >> Eric Dumazet wrote: >>> Le mercredi 21 décembre 2011 à 11:45 +0100, Eric Dumazet a écrit : >>>> Le mercredi 21 décembre 2011 à 11:41 +0100, Michal Simek a écrit : >>>> >>>>> yes, page size is 4k. >>>>> >>>>> Here is the log for allocation - origin size for MTU 9000 is 0x2420(9248) and size after >>>>> overhead is 0x3f40(16192) >>>>> >>>>> kmalloc_node_track_caller size 00002420, SKB_WITH_OVERHEAD(ksize(data)) 00003f40 >>>>> skb->head c7870000 >>>>> skb->data c7870000 >>>>> skb->tail c7870000 >>>>> skb->truesize 000040c0 >>>>> skb->end c7873f40 >>>>> >>>> So basically truesize is bigger than previous kernel, and you hit a >>>> socket limit. (RCVBUF ?) >>>> >>>> truesize is more correct, so we uncover prior bugs. >>>> >>>> We should allow at least one packet, regardless of its size :( >>>> >>>> >>> I would try the following patch : >>> >>> diff --git a/include/net/sock.h b/include/net/sock.h >>> index bf6b9fd..21bb3b5 100644 >>> --- a/include/net/sock.h >>> +++ b/include/net/sock.h >>> @@ -662,12 +662,14 @@ static inline void __sk_add_backlog(struct sock *sk, struct sk_buff *skb) >>> >>> /* >>> * Take into account size of receive queue and backlog queue >>> + * Do not take into account this skb truesize, >>> + * to allow even a single big packet to come. >>> */ >>> static inline bool sk_rcvqueues_full(const struct sock *sk, const struct sk_buff *skb) >>> { >>> unsigned int qsize = sk->sk_backlog.len + atomic_read(&sk->sk_rmem_alloc); >>> >>> - return qsize + skb->truesize > sk->sk_rcvbuf; >>> + return qsize > sk->sk_rcvbuf; >>> } >>> >>> /* The per-socket spinlock must be held here. */ >>> >> It doesn't fix the problem. Do you want to add some printk to code? >> >> Michal >> >> >> > > try to change SOCK_MIN_RCVBUF > > include/net/sock.h > > #define SOCK_MIN_RCVBUF (2048 + sizeof(struct sk_buff)) > > -> > > #define SOCK_MIN_RCVBUF (16384 + sizeof(struct sk_buff)) > Doesn't work. :-( Michal
Le mercredi 21 décembre 2011 à 12:50 +0100, Michal Simek a écrit : > > #define SOCK_MIN_RCVBUF (16384 + sizeof(struct sk_buff)) > > > > Doesn't work. :-( > Sorry, I have no idea where packet is dropped, I dont know your application. You might send "netstat -s" output if you cant find it ... You spoke of ICMP message, I cant find a limit in this area ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet wrote: > Le mercredi 21 décembre 2011 à 12:50 +0100, Michal Simek a écrit : > >>> #define SOCK_MIN_RCVBUF (16384 + sizeof(struct sk_buff)) >>> >> Doesn't work. :-( >> > > Sorry, I have no idea where packet is dropped, I dont know your > application. Application is simple ping from Busybox - nothing special. > You might send "netstat -s" output if you cant find it ... ~ # ./netstat -s Ip: 36 total packets received 2 with invalid addresses 0 forwarded 0 incoming packets discarded 34 incoming packets delivered 34 requests sent out Icmp: 34 ICMP messages received 0 input ICMP message failed. ICMP input histogram: echo replies: 34 34 ICMP messages sent 0 ICMP messages failed ICMP output histogram: echo request: 34 IcmpMsg: InType0: 34 OutType8: 34 Tcp: 0 active connections openings 0 passive connection openings 0 failed connection attempts 0 connection resets received 0 connections established 0 segments received 0 segments send out 0 segments retransmited 0 bad segments received. 0 resets sent Udp: 0 packets received 0 packets to unknown port received. 0 packet receive errors 0 packets sent RcvbufErrors: 0 SndbufErrors: 0 UdpLite: InDatagrams: 0 NoPorts: 0 InErrors: 0 OutDatagrams: 0 RcvbufErrors: 0 SndbufErrors: 0 TcpExt: ArpFilter: 0 0 packets header predicted TCPPureAcks: 0 TCPHPAcks: 0 TCPRenoRecovery: 0 TCPSackRecovery: 0 TCPSACKReneging: 0 TCPFACKReorder: 0 TCPSACKReorder: 0 TCPRenoReorder: 0 TCPTSReorder: 0 TCPFullUndo: 0 TCPPartialUndo: 0 TCPDSACKUndo: 0 TCPLossUndo: 0 TCPLoss: 0 TCPLostRetransmit: 0 TCPRenoFailures: 0 TCPSackFailures: 0 TCPLossFailures: 0 TCPFastRetrans: 0 TCPForwardRetrans: 0 TCPSlowStartRetrans: 0 TCPTimeouts: 0 TCPRenoRecoveryFail: 0 TCPSackRecoveryFail: 0 TCPSchedulerFailed: 0 TCPRcvCollapsed: 0 TCPDSACKOldSent: 0 TCPDSACKOfoSent: 0 TCPDSACKRecv: 0 TCPDSACKOfoRecv: 0 TCPAbortOnSyn: 0 TCPAbortOnData: 0 TCPAbortOnClose: 0 TCPAbortOnMemory: 0 TCPAbortOnTimeout: 0 TCPAbortOnLinger: 0 TCPAbortFailed: 0 TCPMemoryPressures: 0 TCPSACKDiscard: 0 TCPDSACKIgnoredOld: 0 TCPDSACKIgnoredNoUndo: 0 TCPSpuriousRTOs: 0 TCPMD5NotFound: 0 TCPMD5Unexpected: 0 TCPSackShifted: 0 TCPSackMerged: 0 TCPSackShiftFallback: 0 TCPBacklogDrop: 0 TCPMinTTLDrop: 0 TCPDeferAcceptDrop: 0 IPReversePathFilter: 0 TCPTimeWaitOverflow: 0 TCPReqQFullDoCookies: 0 TCPReqQFullDrop: 0 IpExt: InNoRoutes: 0 InTruncatedPkts: 0 InMcastPkts: 0 OutMcastPkts: 0 InBcastPkts: 0 OutBcastPkts: 0 InOctets: 4008 OutOctets: 2856 InMcastOctets: 0 OutMcastOctets: 0 InBcastOctets: 0 OutBcastOctets: 0 > You spoke of ICMP message, I cant find a limit in this area ? ok. Can you provide me any background why size should be setup by size = SKB_WITH_OVERHEAD(ksize(data)); and not to use size which is passed to kmalloc in __alloc_skb. In past I have seen some issues because of caches but this is not that case because on QEMU (where cache is not modeled) everything works the same as on HW. Below is the log what I see. Thanks, Michal ~ # ping -c 5 192.168.0.101 PING 192.168.0.101 (192.168.0.101): 56 data bytes --- 192.168.0.101 ping statistics --- 5 packets transmitted, 0 packets received, 100% packet loss ~ # mount -t nfs -o nolock -o tcp 192.168.0.101:/tftpboot/nfs /mnt ~ # ls /mnt/ | head 0001-sd.patch 0001-syscalls-getdtablesize01-fix-for-missing-etc-hosts.patch 1 1-xils 2 arp arp-mb busybox core exec ~ # umount /mnt/ ~ # ./netstat -s Ip: 36 total packets received 2 with invalid addresses 0 forwarded 0 incoming packets discarded 34 incoming packets delivered 43 requests sent out Icmp: 5 ICMP messages received 0 input ICMP message failed. ICMP input histogram: echo replies: 5 5 ICMP messages sent 0 ICMP messages failed ICMP output histogram: echo request: 5 IcmpMsg: InType0: 5 OutType8: 5 Tcp: 4 active connections openings 0 passive connection openings 0 failed connection attempts 0 connection resets received 0 connections established 29 segments received 38 segments send out 0 segments retransmited 0 bad segments received. 0 resets sent Udp: 0 packets received 0 packets to unknown port received. 0 packet receive errors 0 packets sent RcvbufErrors: 0 SndbufErrors: 0 UdpLite: InDatagrams: 0 NoPorts: 0 InErrors: 0 OutDatagrams: 0 RcvbufErrors: 0 SndbufErrors: 0 TcpExt: ArpFilter: 0 1 delayed acks sent 10 packets header predicted TCPPureAcks: 4 TCPHPAcks: 6 TCPRenoRecovery: 0 TCPSackRecovery: 0 TCPSACKReneging: 0 TCPFACKReorder: 0 TCPSACKReorder: 0 TCPRenoReorder: 0 TCPTSReorder: 0 TCPFullUndo: 0 TCPPartialUndo: 0 TCPDSACKUndo: 0 TCPLossUndo: 0 TCPLoss: 0 TCPLostRetransmit: 0 TCPRenoFailures: 0 TCPSackFailures: 0 TCPLossFailures: 0 TCPFastRetrans: 0 TCPForwardRetrans: 0 TCPSlowStartRetrans: 0 TCPTimeouts: 0 TCPRenoRecoveryFail: 0 TCPSackRecoveryFail: 0 TCPSchedulerFailed: 0 TCPRcvCollapsed: 0 TCPDSACKOldSent: 0 TCPDSACKOfoSent: 0 TCPDSACKRecv: 0 TCPDSACKOfoRecv: 0 TCPAbortOnSyn: 0 TCPAbortOnData: 0 TCPAbortOnClose: 0 TCPAbortOnMemory: 0 TCPAbortOnTimeout: 0 TCPAbortOnLinger: 0 TCPAbortFailed: 0 TCPMemoryPressures: 0 TCPSACKDiscard: 0 TCPDSACKIgnoredOld: 0 TCPDSACKIgnoredNoUndo: 0 TCPSpuriousRTOs: 0 TCPMD5NotFound: 0 TCPMD5Unexpected: 0 TCPSackShifted: 0 TCPSackMerged: 0 TCPSackShiftFallback: 0 TCPBacklogDrop: 0 TCPMinTTLDrop: 0 TCPDeferAcceptDrop: 0 IPReversePathFilter: 0 TCPTimeWaitOverflow: 0 TCPReqQFullDoCookies: 0 TCPReqQFullDrop: 0 IpExt: InNoRoutes: 0 InTruncatedPkts: 0 InMcastPkts: 0 OutMcastPkts: 0 InBcastPkts: 0 OutBcastPkts: 0 InOctets: 11348 OutOctets: 3752 InMcastOctets: 0 OutMcastOctets: 0 InBcastOctets: 0 OutBcastOctets: 0 ~ # ~ # ~ # tcpdump device eth0 entered promiscuous mode tcpdump: listening on eth0 00:01:31.421191 192.168.0.101 > 192.168.0.100: icmp: echo request (DF) 00:01:31.479686 192.168.0.100 > 192.168.0.101: icmp: echo reply 00:01:31.442325 arp who-has 192.168.0.1 tell 192.168.0.100 00:01:31.446052 arp reply 192.168.0.1 is-at f4:ec:38:b2:6a:9e 00:01:31.446256 192.168.0.100.47044 > 192.168.0.1.53: 45191+ (44) (DF) 00:01:31.461859 192.168.0.1.53 > 192.168.0.100.47044: 45191 NXDomain 0/1/0 (121) 00:01:31.464125 192.168.0.100.41833 > 192.168.0.1.53: 37485+ (44) (DF) 00:01:31.478026 192.168.0.1.53 > 192.168.0.100.41833: 37485 NXDomain 0/1/0 (121) 00:01:31.481794 192.168.0.100.35138 > 192.168.0.1.53: 56594+ (42) (DF) 00:01:31.497202 192.168.0.1.53 > 192.168.0.100.35138: 56594 NXDomain 0/1/0 (119) 00:01:32.419130 192.168.0.101 > 192.168.0.100: icmp: echo request (DF) 00:01:32.419482 192.168.0.100 > 192.168.0.101: icmp: echo reply 00:01:33.418342 192.168.0.101 > 192.168.0.100: icmp: echo request (DF) 00:01:33.418693 192.168.0.100 > 192.168.0.101: icmp: echo reply 00:01:34.419327 192.168.0.101 > 192.168.0.100: icmp: echo request (DF) 00:01:34.419676 192.168.0.100 > 192.168.0.101: icmp: echo reply 00:01:35.418042 192.168.0.101 > 192.168.0.100: icmp: echo request (DF) 00:01:35.418406 192.168.0.100 > 192.168.0.101: icmp: echo reply 00:01:36.418923 192.168.0.101 > 192.168.0.100: icmp: echo request (DF) 00:01:36.419275 192.168.0.100 > 192.168.0.101: icmp: echo reply 00:01:36.459936 arp who-has 192.168.0.100 tell 192.168.0.1 00:01:36.460216 arp reply 192.168.0.100 is-at 0:a:35:0:19:15 00:01:37.418800 192.168.0.101 > 192.168.0.100: icmp: echo request (DF) 00:01:37.419149 192.168.0.100 > 192.168.0.101: icmp: echo reply 00:01:38.419247 192.168.0.101 > 192.168.0.100: icmp: echo request (DF) 00:01:38.419617 192.168.0.100 > 192.168.0.101: icmp: echo reply 00:01:39.418820 192.168.0.101 > 192.168.0.100: icmp: echo request (DF) 00:01:39.419169 192.168.0.100 > 192.168.0.101: icmp: echo reply 56 packets received by filter 0 packets droppeddevice eth0 left promiscuous mode by kernel ~ # ~ # ping 192.168.0.101 PING 192.168.0.101 (192.168.0.101): 56 data bytes --- 192.168.0.101 ping statistics --- 2 packets transmitted, 0 packets received, 100% packet loss ~ # ping 192.168.0.1 PING 192.168.0.1 (192.168.0.1): 56 data bytes --- 192.168.0.1 ping statistics --- 1 packets transmitted, 0 packets received, 100% packet loss ~ # ping 192.168.0.101 -c 10 & ~ # PING 192.168.0.101 (192.168.0.101): 56 data bytes ~ # tcpdump device eth0 entered promiscuous mode tcpdump: listening on eth0 00:01:59.615918 192.168.0.100 > 192.168.0.101: icmp: echo request (DF) 00:01:59.619166 192.168.0.101 > 192.168.0.100: icmp: echo reply 00:01:59.636473 192.168.0.100.57212 > 192.168.0.1.53: 43026+ (44) (DF) 00:01:59.650838 192.168.0.1.53 > 192.168.0.100.57212: 43026 NXDomain 0/1/0 (121) 00:01:59.652915 192.168.0.100.41475 > 192.168.0.1.53: 28371+ (44) (DF) 00:01:59.666907 192.168.0.1.53 > 192.168.0.100.41475: 28371 NXDomain 0/1/0 (121) 00:01:59.670832 192.168.0.100.47770 > 192.168.0.1.53: 61908+ (42) (DF) 00:01:59.684921 192.168.0.1.53 > 192.168.0.100.47770: 61908 NXDomain 0/1/0 (119) 00:02:00.616493 192.168.0.100 > 192.168.0.101: icmp: echo request (DF) 00:02:00.620237 192.168.0.101 > 192.168.0.100: icmp: echo reply 00:02:00.624275 arp who-has 192.168.0.101 tell 192.168.0.100 00:02:00.627936 arp reply 192.168.0.101 is-at 0:22:15:fb:34:a1 00:02:01.617232 192.168.0.100 > 192.168.0.101: icmp: echo request (DF) 00:02:01.620980 192.168.0.101 > 192.168.0.100: icmp: echo reply 00:02:02.617962 192.168.0.100 > 192.168.0.101: icmp: echo request (DF) 00:02:02.621687 192.168.0.101 > 192.168.0.100: icmp: echo reply 00:02:03.618687 192.168.0.100 > 192.168.0.101: icmp: echo request (DF) 00:02:03.622389 192.168.0.101 > 192.168.0.100: icmp: echo reply 00:02:04.619414 192.168.0.100 > 192.168.0.101: icmp: echo request (DF) 00:02:04.623129 192.168.0.101 > 192.168.0.100: icmp: echo reply 42 packets received by filter 0 packets droppeddevice eth0 left promiscuous mode by kernel ~ # --- 192.168.0.101 ping statistics --- 10 packets transmitted, 0 packets received, 100% packet loss [1] + Done(1) ping 192.168.0.101 -c 10 ~ #
diff --git a/include/net/sock.h b/include/net/sock.h index bf6b9fd..21bb3b5 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -662,12 +662,14 @@ static inline void __sk_add_backlog(struct sock *sk, struct sk_buff *skb) /* * Take into account size of receive queue and backlog queue + * Do not take into account this skb truesize, + * to allow even a single big packet to come. */ static inline bool sk_rcvqueues_full(const struct sock *sk, const struct sk_buff *skb) { unsigned int qsize = sk->sk_backlog.len + atomic_read(&sk->sk_rmem_alloc); - return qsize + skb->truesize > sk->sk_rcvbuf; + return qsize > sk->sk_rcvbuf; } /* The per-socket spinlock must be held here. */