Message ID | 1273085598.2367.233.camel@edumazet-laptop |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Thu, May 6, 2010 at 12:23 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > Le mercredi 05 mai 2010 à 23:33 +0530, Bhaskar Dutta a écrit : > >> Hi, >> >> TSO, GSO and SG are already turned off. >> rx/tx checksumming is on, but that shouldn't matter, right? >> >> # ethtool -k eth0 >> Offload parameters for eth0: >> rx-checksumming: on >> tx-checksumming: on >> scatter-gather: off >> tcp segmentation offload: off >> udp fragmentation offload: off >> generic segmentation offload: off >> >> The bad packets are very small in size, most have no data at all (<300 bytes). >> >> After adding some logs to kernel 2.6.31-12, it seems that >> tcp_v4_md5_hash_skb (function that calculates the md5 hash) is >> (might?) getting corrupt. >> >> The tcp4_pseudohdr (bp = &hp->md5_blk.ip4) structure's saddr, daddr >> and len fields get modified to different values towards the end of the >> tcp_v4_md5_hash_skb function whenever there is a checksum error. >> >> The tcp4_pseudohdr (bp) is within the tcp_md5sig_pool (hp), which is >> filled up by tcp_get_md5sig_pool (which calls per_cpu_ptr). >> >> Using a local copy of the tcp4_pseudohdr in the same function >> tcp_v4_md5_hash_skb (copied all fields from the original >> tcp4_pseudohdr within the tcp_md5sig_pool) and calculating the md5 >> checksum with the local tcp4_pseudohdr seems to solve the issue >> (don't see bad packets for a hours in load tests, and without the >> change I can see them instantaneously in the load tests). >> >> I am still unable to figure out how this is happening. Please let me >> know if you have any pointers. > > I am not familiar with this code, but I suspect same per_cpu data can be > used at both time by a sender (process context) and by a receiver > (softirq context). > > To trigger this, you need at least two active md5 sockets. > > tcp_get_md5sig_pool() should probably disable bh to make sure current > cpu wont be preempted by softirq processing > > > Something like : > > diff --git a/include/net/tcp.h b/include/net/tcp.h > index fb5c66b..e232123 100644 > --- a/include/net/tcp.h > +++ b/include/net/tcp.h > @@ -1221,12 +1221,15 @@ struct tcp_md5sig_pool *tcp_get_md5sig_pool(void) > struct tcp_md5sig_pool *ret = __tcp_get_md5sig_pool(cpu); > if (!ret) > put_cpu(); > + else > + local_bh_disable(); > return ret; > } > > static inline void tcp_put_md5sig_pool(void) > { > __tcp_put_md5sig_pool(); > + local_bh_enable(); > put_cpu(); > } > > > I put in the above change and ran some load tests with around 50 active TCP connections doing MD5. I could see only 1 bad packet in 30 min (earlier the problem used to occur instantaneously and repeatedly). I think there is another possibility of being preempted when calling tcp_alloc_md5sig_pool() this function releases the spinlock when calling __tcp_alloc_md5sig_pool(). I will run some more tests after changing the tcp_alloc_md5sig_pool and see if the problem is completely resolved. Thanks a lot for your help! Bhaskar -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le jeudi 06 mai 2010 à 17:25 +0530, Bhaskar Dutta a écrit : > I put in the above change and ran some load tests with around 50 > active TCP connections doing MD5. > I could see only 1 bad packet in 30 min (earlier the problem used to > occur instantaneously and repeatedly). > > I think there is another possibility of being preempted when calling > tcp_alloc_md5sig_pool() > this function releases the spinlock when calling __tcp_alloc_md5sig_pool(). > > I will run some more tests after changing the tcp_alloc_md5sig_pool > and see if the problem is completely resolved. > This code should be completely rewritten for linux-2.6.35, its very ugly and over complex, yet it is not scalable. It could use true percpu data, with no central lock or refcount. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Thu, 06 May 2010 14:06:26 +0200 > Le jeudi 06 mai 2010 à 17:25 +0530, Bhaskar Dutta a écrit : > >> I put in the above change and ran some load tests with around 50 >> active TCP connections doing MD5. >> I could see only 1 bad packet in 30 min (earlier the problem used to >> occur instantaneously and repeatedly). >> >> I think there is another possibility of being preempted when calling >> tcp_alloc_md5sig_pool() >> this function releases the spinlock when calling __tcp_alloc_md5sig_pool(). >> >> I will run some more tests after changing the tcp_alloc_md5sig_pool >> and see if the problem is completely resolved. >> > > This code should be completely rewritten for linux-2.6.35, its very ugly > and over complex, yet it is not scalable. > > It could use true percpu data, with no central lock or refcount. Yes I've always disliked the way the TCP MD5 pool stuff is coded, it's frankly crap and this is like the 5th major bug that had to get fixed in it. :-) But let's fix this as simply as possible in net-2.6 and -stable, so Eric let me know when you have a tested patch for me to apply. Thanks a lot! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le jeudi 06 mai 2010 à 22:04 -0700, David Miller a écrit : > From: Eric Dumazet <eric.dumazet@gmail.com> > > This code should be completely rewritten for linux-2.6.35, its very ugly > > and over complex, yet it is not scalable. > > > > It could use true percpu data, with no central lock or refcount. > > Yes I've always disliked the way the TCP MD5 pool stuff is coded, it's > frankly crap and this is like the 5th major bug that had to get fixed > in it. :-) > > But let's fix this as simply as possible in net-2.6 and -stable, so Eric > let me know when you have a tested patch for me to apply. There are so many things ... I am comtemplating commit aa133076 __tcp_alloc_md5sig_pool() now do a : p = kzalloc(sizeof(*p), sk->sk_allocation); But later call : hash = crypto_alloc_hash("md5", 0, CRYPTO_ALG_ASYNC); (GFP_KERNEL allocations for sure) tcp_v4_parse_md5_keys() also use : p = kzalloc(sizeof(*p), sk->sk_allocation); right after a (possibly sleeping) copy_from_user() Oh well... I'll test the already suggested patch before official submission, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le jeudi 06 mai 2010 à 17:25 +0530, Bhaskar Dutta a écrit : > On Thu, May 6, 2010 at 12:23 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > I am not familiar with this code, but I suspect same per_cpu data can be > > used at both time by a sender (process context) and by a receiver > > (softirq context). > > > > To trigger this, you need at least two active md5 sockets. > > > > tcp_get_md5sig_pool() should probably disable bh to make sure current > > cpu wont be preempted by softirq processing > > > > > > Something like : > > > > diff --git a/include/net/tcp.h b/include/net/tcp.h > > index fb5c66b..e232123 100644 > > --- a/include/net/tcp.h > > +++ b/include/net/tcp.h > > @@ -1221,12 +1221,15 @@ struct tcp_md5sig_pool *tcp_get_md5sig_pool(void) > > struct tcp_md5sig_pool *ret = __tcp_get_md5sig_pool(cpu); > > if (!ret) > > put_cpu(); > > + else > > + local_bh_disable(); > > return ret; > > } > > > > static inline void tcp_put_md5sig_pool(void) > > { > > __tcp_put_md5sig_pool(); > > + local_bh_enable(); > > put_cpu(); > > } > > > > > > > > I put in the above change and ran some load tests with around 50 > active TCP connections doing MD5. > I could see only 1 bad packet in 30 min (earlier the problem used to > occur instantaneously and repeatedly). > > I think there is another possibility of being preempted when calling > tcp_alloc_md5sig_pool() > this function releases the spinlock when calling __tcp_alloc_md5sig_pool(). > > I will run some more tests after changing the tcp_alloc_md5sig_pool > and see if the problem is completely resolved. I cant see a race with spinlock in tcp_alloc_md5sig_pool/__tcp_alloc_md5sig_pool(). We allocate structures for all cpus, so preemption.migration should be OK Could you elaborate please ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2010-5-6, at 15:06, Eric Dumazet wrote: > This code should be completely rewritten for linux-2.6.35, its very ugly > and over complex, yet it is not scalable. Let me quickly point out that the IETF has recently replaced TCP-MD5 with TCP-AO (http://tools.ietf.org/html/draft-ietf-tcpm-tcp-auth-opt-11) which is days or weeks away from being published as an RFC. (The draft has been approved and it waiting for the RFC Editor to finish the final copy-editing.) You'll obviously still want TCP-MD5 support for talking to existing equipment, but significant cycles would IMO be better spent on TCP-AO. Lars
Le vendredi 07 mai 2010 à 11:46 +0300, Lars Eggert a écrit : > On 2010-5-6, at 15:06, Eric Dumazet wrote: > > This code should be completely rewritten for linux-2.6.35, its very ugly > > and over complex, yet it is not scalable. > > Let me quickly point out that the IETF has recently replaced TCP-MD5 with TCP-AO (http://tools.ietf.org/html/draft-ietf-tcpm-tcp-auth-opt-11) which is days or weeks away from being published as an RFC. (The draft has been approved and it waiting for the RFC Editor to finish the final copy-editing.) > > You'll obviously still want TCP-MD5 support for talking to existing equipment, but significant cycles would IMO be better spent on TCP-AO. > Thanks for this info ! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Lars Eggert <lars.eggert@nokia.com> Date: Fri, 7 May 2010 11:46:40 +0300 > You'll obviously still want TCP-MD5 support for talking to existing > equipment, but significant cycles would IMO be better spent on > TCP-AO. Code we have and users use which is unstable and crashes is more important to work on and fix than code we don't have which user's therefore don't use. You're wrong from just about every possible angle. Whoever finds AO useful will work on it, just as was the case with MD5 support. And right now, that "whoever" is definitely not us. :-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 07 May 2010 07:32:09 +0200 Eric Dumazet <eric.dumazet@gmail.com> wrote: > Le jeudi 06 mai 2010 à 22:04 -0700, David Miller a écrit : > > From: Eric Dumazet <eric.dumazet@gmail.com> > > > > This code should be completely rewritten for linux-2.6.35, its very ugly > > > and over complex, yet it is not scalable. > > > > > > It could use true percpu data, with no central lock or refcount. > > > > Yes I've always disliked the way the TCP MD5 pool stuff is coded, it's > > frankly crap and this is like the 5th major bug that had to get fixed > > in it. :-) > > > > But let's fix this as simply as possible in net-2.6 and -stable, so Eric > > let me know when you have a tested patch for me to apply. > > There are so many things ... > > I am comtemplating commit aa133076 > > __tcp_alloc_md5sig_pool() now do a : > > p = kzalloc(sizeof(*p), sk->sk_allocation); > > But later call : > > hash = crypto_alloc_hash("md5", 0, CRYPTO_ALG_ASYNC); > > (GFP_KERNEL allocations for sure) > > > tcp_v4_parse_md5_keys() also use : > > p = kzalloc(sizeof(*p), sk->sk_allocation); > > right after a (possibly sleeping) copy_from_user() > > Oh well... > > > I'll test the already suggested patch before official submission, > thanks. Forget the per cpu data; the pool should just be scrapped. The only reason the pool exists is that the crypto hash state which should just be moved into the md5_info (88 bytes). The pseudo header can just be generated on the stack before passing to the crypto code. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le vendredi 07 mai 2010 à 10:14 -0700, Stephen Hemminger a écrit : > Forget the per cpu data; the pool should just be scrapped. > > The only reason the pool exists is that the crypto hash state which > should just be moved into the md5_info (88 bytes). The pseudo > header can just be generated on the stack before passing to the crypto > code. Sure, but I'm afraid there is no generic API do do that (if we want to reuse crypto/md5.c code) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 07 May 2010 19:21:33 +0200 Eric Dumazet <eric.dumazet@gmail.com> wrote: > Le vendredi 07 mai 2010 à 10:14 -0700, Stephen Hemminger a écrit : > > > Forget the per cpu data; the pool should just be scrapped. > > > > The only reason the pool exists is that the crypto hash state which > > should just be moved into the md5_info (88 bytes). The pseudo > > header can just be generated on the stack before passing to the crypto > > code. > > > Sure, but I'm afraid there is no generic API do do that (if we want to > reuse crypto/md5.c code). It looks like the pool is just an optimization to avoid opening too many crypto API connections. This should only be an issue if offloading MD5. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le vendredi 07 mai 2010 à 10:36 -0700, Stephen Hemminger a écrit : > On Fri, 07 May 2010 19:21:33 +0200 > Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > Le vendredi 07 mai 2010 à 10:14 -0700, Stephen Hemminger a écrit : > > > > > Forget the per cpu data; the pool should just be scrapped. > > > > > > The only reason the pool exists is that the crypto hash state which > > > should just be moved into the md5_info (88 bytes). The pseudo > > > header can just be generated on the stack before passing to the crypto > > > code. > > > > > > Sure, but I'm afraid there is no generic API do do that (if we want to > > reuse crypto/md5.c code). > > It looks like the pool is just an optimization to avoid opening too > many crypto API connections. This should only be an issue if offloading > MD5. You mean we could allocate two contexts per socket, one for tx path, one for rx path, but TCP-MD5 implementors chose to use percpu allocations to factorize them. They should have allocated two contexts per cpu (one for process context, preemption disabled, one for BH context) As you said, this could be allocated on stack, with some changes to crypto API I guess. Since TCP is not a module, md5 is also static, so there is no module loading involved. struct crypto_tfm *__crypto_alloc_tfm(struct crypto_alg *alg, u32 type,u32 mask) --> struct crypto_tfm *__crypto_alloc_tfm_onstack(struct crypto_alg *alg, u32 type, u32 mask, void *storage, size_t storage_max_length) Or a direct plug to crypto/md5.c functions and hand crafted context. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, I had noticed the corruption in the context and actually did what is mentioned. I allocated the context on the stack and plugged in the md5.c functions. I was able to temporarily solve the problem, all this before I got a response on this thread. But now I have seeing another problem, when i change the MTU on the interface from 1500 to 4470 none of the message from the peer get thru and I get hash failed message. I am wondering if this is another bug getting hit in this scenario. Regards, Bijay On 08-May-2010, at 3:10 AM, Eric Dumazet wrote: > Le vendredi 07 mai 2010 à 10:36 -0700, Stephen Hemminger a écrit : >> On Fri, 07 May 2010 19:21:33 +0200 >> Eric Dumazet <eric.dumazet@gmail.com> wrote: >> >>> Le vendredi 07 mai 2010 à 10:14 -0700, Stephen Hemminger a écrit : >>> >>>> Forget the per cpu data; the pool should just be scrapped. >>>> >>>> The only reason the pool exists is that the crypto hash state which >>>> should just be moved into the md5_info (88 bytes). The pseudo >>>> header can just be generated on the stack before passing to the crypto >>>> code. >>> >>> >>> Sure, but I'm afraid there is no generic API do do that (if we want to >>> reuse crypto/md5.c code). >> >> It looks like the pool is just an optimization to avoid opening too >> many crypto API connections. This should only be an issue if offloading >> MD5. > > You mean we could allocate two contexts per socket, one for tx path, one > for rx path, but TCP-MD5 implementors chose to use percpu allocations to > factorize them. They should have allocated two contexts per cpu (one for > process context, preemption disabled, one for BH context) > > As you said, this could be allocated on stack, with some changes to > crypto API I guess. Since TCP is not a module, md5 is also static, so > there is no module loading involved. > > struct crypto_tfm *__crypto_alloc_tfm(struct crypto_alg *alg, u32 > type,u32 mask) > > --> > > struct crypto_tfm *__crypto_alloc_tfm_onstack(struct crypto_alg *alg, > u32 type, u32 mask, void *storage, size_t storage_max_length) > > > Or a direct plug to crypto/md5.c functions and hand crafted context. > > > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le lundi 10 mai 2010 à 14:55 +0000, Bijay Singh a écrit : > Hi, > I had noticed the corruption in the context and actually did what is mentioned. > > I allocated the context on the stack and plugged in the md5.c functions. I was able to temporarily solve the problem, all this before I got a response on this thread. > > But now I have seeing another problem, when i change the MTU on the interface from 1500 to 4470 none of the message from the peer get thru and I get hash failed message. I am wondering if this is another bug getting hit in this scenario. Thats very fine, but you mix very different problems. Step by step resolution is required, and clean patches too, because plugging md5.c functions is not an option for stable series :) Obviously, nobody seriously used TCP-MD5 on linux, but you... -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Eric, Didn't intend to mix the issues. It was a hack intended to calm the nerves. I am going to apply the proper patches asap. About the latest problem of MD5 not working with MTU set to 4470. I noticed this when i needed to change the MTU for some other purpose. Since it was a production box, i have to first set up my box with the right NIC card to reproduce this and try debugging it. In the meantime any ques will help. Thanks, Bijay On 10-May-2010, at 8:48 PM, Eric Dumazet wrote: > Le lundi 10 mai 2010 à 14:55 +0000, Bijay Singh a écrit : >> Hi, >> I had noticed the corruption in the context and actually did what is mentioned. >> >> I allocated the context on the stack and plugged in the md5.c functions. I was able to temporarily solve the problem, all this before I got a response on this thread. >> >> But now I have seeing another problem, when i change the MTU on the interface from 1500 to 4470 none of the message from the peer get thru and I get hash failed message. I am wondering if this is another bug getting hit in this scenario. > > Thats very fine, but you mix very different problems. > > Step by step resolution is required, and clean patches too, because > plugging md5.c functions is not an option for stable series :) > > Obviously, nobody seriously used TCP-MD5 on linux, but you... > > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Eric, I guess that makes me the enviable one. So I am keen to test out this feature completely, as long as I know what to do as a next step, directions, patches. Thanks, Bijay On 10-May-2010, at 8:48 PM, Eric Dumazet wrote: > Le lundi 10 mai 2010 à 14:55 +0000, Bijay Singh a écrit : >> Hi, >> I had noticed the corruption in the context and actually did what is mentioned. >> >> I allocated the context on the stack and plugged in the md5.c functions. I was able to temporarily solve the problem, all this before I got a response on this thread. >> >> But now I have seeing another problem, when i change the MTU on the interface from 1500 to 4470 none of the message from the peer get thru and I get hash failed message. I am wondering if this is another bug getting hit in this scenario. > > Thats very fine, but you mix very different problems. > > Step by step resolution is required, and clean patches too, because > plugging md5.c functions is not an option for stable series :) > > Obviously, nobody seriously used TCP-MD5 on linux, but you... > > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le mardi 11 mai 2010 à 04:08 +0000, Bijay Singh a écrit : > Hi Eric, > > I guess that makes me the enviable one. So I am keen to test out this feature > completely, as long as I know what to do as a next step, directions, patches. > MTU > 4000 is not reliable because of high order allocations on typical NICS. I am afraid you need NIC able to deliver page fragments. Its working here (32bit kernel) with a tg3 NIC, but I got following message : ifconfig eth3 mtu 9000 ... [51492.936500] 167731 total pagecache pages [51492.936500] 0 pages in swap cache [51492.936500] Swap cache stats: add 0, delete 0, find 0/0 [51492.936500] Free swap = 4192928kB [51492.936500] Total swap = 4192928kB [51492.936500] 1114110 pages RAM [51492.936500] 885761 pages HighMem [51492.936500] 77073 pages reserved [51492.936500] 134483 pages shared [51492.936500] 159131 pages non-shared [51492.953027] tg3 0000:14:04.1: eth3: Using a smaller RX standard ring. Only 183 out of 511 buffers were allocated successfully $ ethtool -g eth3 Ring parameters for eth3: Pre-set maximums: RX: 511 RX Mini: 0 RX Jumbo: 0 TX: 511 Current hardware settings: RX: 183 RX Mini: 0 RX Jumbo: 0 TX: 511 $ cat /proc/buddyinfo Node 0, zone DMA 5 2 1 1 2 2 2 1 0 1 0 Node 0, zone Normal 4285 1823 248 73 9 5 0 0 0 0 0 Node 0, zone HighMem 97 199 921 583 383 261 155 117 69 41 649 I know that if I try to stress RX path, I'll get failures. Could you explain me why you need both big MTUS and TCP-MD5 ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
I need MD5 for my BGP sessions and need the jumbo packets for the IS-IS peering. MTU of 1500 results in LSPs higher that 1500 getting dropped at the peering router. On 11-May-2010, at 11:57 AM, Eric Dumazet wrote: > Le mardi 11 mai 2010 à 04:08 +0000, Bijay Singh a écrit : >> Hi Eric, >> >> I guess that makes me the enviable one. So I am keen to test out this feature > >> completely, as long as I know what to do as a next step, directions, patches. >> > > MTU > 4000 is not reliable because of high order allocations on typical > NICS. I am afraid you need NIC able to deliver page fragments. > > Its working here (32bit kernel) with a tg3 NIC, but I got following > message : > > ifconfig eth3 mtu 9000 > ... > [51492.936500] 167731 total pagecache pages > [51492.936500] 0 pages in swap cache > [51492.936500] Swap cache stats: add 0, delete 0, find 0/0 > [51492.936500] Free swap = 4192928kB > [51492.936500] Total swap = 4192928kB > [51492.936500] 1114110 pages RAM > [51492.936500] 885761 pages HighMem > [51492.936500] 77073 pages reserved > [51492.936500] 134483 pages shared > [51492.936500] 159131 pages non-shared > [51492.953027] tg3 0000:14:04.1: eth3: Using a smaller RX standard ring. > Only 183 out of 511 buffers were allocated successfully > > $ ethtool -g eth3 > Ring parameters for eth3: > Pre-set maximums: > RX: 511 > RX Mini: 0 > RX Jumbo: 0 > TX: 511 > Current hardware settings: > RX: 183 > RX Mini: 0 > RX Jumbo: 0 > TX: 511 > > $ cat /proc/buddyinfo > Node 0, zone DMA 5 2 1 1 2 2 2 1 0 1 0 > Node 0, zone Normal 4285 1823 248 73 9 5 0 0 0 0 0 > Node 0, zone HighMem 97 199 921 583 383 261 155 117 69 41 649 > > I know that if I try to stress RX path, I'll get failures. > > Could you explain me why you need both big MTUS and TCP-MD5 ? > > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Stephen Hemminger <shemminger@vyatta.com> Date: Fri, 7 May 2010 10:36:39 -0700 > On Fri, 07 May 2010 19:21:33 +0200 > Eric Dumazet <eric.dumazet@gmail.com> wrote: > >> Le vendredi 07 mai 2010 à 10:14 -0700, Stephen Hemminger a écrit : >> >> > Forget the per cpu data; the pool should just be scrapped. >> > >> > The only reason the pool exists is that the crypto hash state which >> > should just be moved into the md5_info (88 bytes). The pseudo >> > header can just be generated on the stack before passing to the crypto >> > code. >> >> >> Sure, but I'm afraid there is no generic API do do that (if we want to >> reuse crypto/md5.c code). > > It looks like the pool is just an optimization to avoid opening too > many crypto API connections. This should only be an issue if offloading > MD5. It's an issue because creating a crypto API context is expensive, so this influences our connection rates with MD5. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/net/tcp.h b/include/net/tcp.h index fb5c66b..e232123 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1221,12 +1221,15 @@ struct tcp_md5sig_pool *tcp_get_md5sig_pool(void) struct tcp_md5sig_pool *ret = __tcp_get_md5sig_pool(cpu); if (!ret) put_cpu(); + else + local_bh_disable(); return ret; } static inline void tcp_put_md5sig_pool(void) { __tcp_put_md5sig_pool(); + local_bh_enable(); put_cpu(); }