Message ID | 371a1925e68e68f873e30381e0fa60ea.squirrel@webmail.greenhost.nl |
---|---|
State | Rejected, archived |
Delegated to: | David Miller |
Headers | show |
On Tue, 2012-03-20 at 13:24 +1100, Indan Zupancic wrote: > If it does then perhaps the fast path should be made faster by inlining > the code instead of calling a function which may not be cached. > inlining 400 times a sequence of code is waste of icache, you probably missed this. I spent a lot of time on working on this implementation, tried many different strategies before choosing the one in place. Listen, I am tired of this thread, it seems you want to push changes that have almost no value but still need lot of review. Unless you make benchmarks and can make at least 5 % improvement of the speed, or improve maintainability of this code, I am not interested. We certainly _can_ one day have sizeof(struct sk_buff) > 256, and actual code is ready for this. You want to break this for absolutely no valid reason. We _can_ change fields order anytime in struct sk_buff, even if you state "its very unlikely that those fields are ever moved to the end of sk_buff". -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, March 20, 2012 13:59, Eric Dumazet wrote: > On Tue, 2012-03-20 at 13:24 +1100, Indan Zupancic wrote: > >> If it does then perhaps the fast path should be made faster by inlining >> the code instead of calling a function which may not be cached. >> > > inlining 400 times a sequence of code is waste of icache, you probably > missed this. Well, according to you most filters were small, inling 26 bytes a few times should be faster than calling an external function. Not all calls need to be inlined either. > > I spent a lot of time on working on this implementation, tried many > different strategies before choosing the one in place. > > Listen, I am tired of this thread, it seems you want to push changes > that have almost no value but still need lot of review. The latest patch didn't change generated code except for a few ancillary instructions. The one before that just added documentation. The first patch was indeed bad. > > Unless you make benchmarks and can make at least 5 % improvement of the > speed, or improve maintainability of this code, I am not interested. My next patch would have changed the compiler to always compile in two passes instead of looping till the result is stable. But never mind. > > We certainly _can_ one day have sizeof(struct sk_buff) > 256, and actual > code is ready for this. You want to break this for absolutely no valid > reason. I've the feeling you didn't read the latest patch, it doesn't assume sizeof(struct sk_buff) < 256, nor that fields aren't reordered. > > We _can_ change fields order anytime in struct sk_buff, even if you > state "its very unlikely that those fields are ever moved to the end > of sk_buff". And if the dev, len or data_len fields are really moved past the first 127 bytes the JIT code can be changed too. The JIT code already depends on some of struct sk_buff's field properties anyway. Greetings, Indan -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On Tue, March 20, 2012 13:59, Eric Dumazet wrote: > > On Tue, 2012-03-20 at 13:24 +1100, Indan Zupancic wrote: > > > >> If it does then perhaps the fast path should be made faster by inlining > >> the code instead of calling a function which may not be cached. > >> > > > > inlining 400 times a sequence of code is waste of icache you probably > > missed this. > > Well, according to you most filters were small, inling 26 bytes a few > times should be faster than calling an external function. Not > all calls need to be inlined either. I wouldn't bet on it. If the number of arguments is small enough to fit in the registers, the called function doesn't to save any registers, and the call doesn't mean the calling code runs out of registers, the actual cost of the call will be minimal. OTOH the benefit of only having to fetch the code once, and the higher likelyhood that it will be in the i-cache from some other use, will make the version with the calls faster. You need to do real benchmarks on a real system running a real workload to find out which is better. Oh and beware that changes in which code shares cache lines can have a measuarable effect (typified by unrelated changes affecting measured performance). David -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2012-03-20 at 22:33 +1100, Indan Zupancic wrote: > And if the dev, len or data_len fields are really moved past the first > 127 bytes the JIT code can be changed too. The JIT code already depends > on some of struct sk_buff's field properties anyway. Its seems you didnt understand, but I said NO to your patch. This time I am really tired. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index 7c1b765..5ddb82b 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -102,6 +102,12 @@ do { \ f_op = FOP; \ goto cond_branch +#define SKB_OFF8(field) ({ \ + int _off = offsetof(struct sk_buff, field); \ + BUILD_BUG_ON(_off > 127); \ + _off; \ + }) + #define SEEN_DATAREF 1 /* might call external helpers */ #define SEEN_XREG 2 /* ebx is used */ @@ -172,30 +178,13 @@ void bpf_jit_compile(struct sk_filter *fp) * r8 = skb->data */ if (seen_or_pass0 & SEEN_DATAREF) { - if (offsetof(struct sk_buff, len) <= 127) - /* mov off8(%rdi),%r9d */ - EMIT4(0x44, 0x8b, 0x4f, offsetof(struct sk_buff, len)); - else { - /* mov off32(%rdi),%r9d */ - EMIT3(0x44, 0x8b, 0x8f); - EMIT(offsetof(struct sk_buff, len), 4); - } - if (is_imm8(offsetof(struct sk_buff, data_len))) - /* sub off8(%rdi),%r9d */ - EMIT4(0x44, 0x2b, 0x4f, offsetof(struct sk_buff, data_len)); - else { - EMIT3(0x44, 0x2b, 0x8f); - EMIT(offsetof(struct sk_buff, data_len), 4); - } - - if (is_imm8(offsetof(struct sk_buff, data))) - /* mov off8(%rdi),%r8 */ - EMIT4(0x4c, 0x8b, 0x47, offsetof(struct sk_buff, data)); - else { - /* mov off32(%rdi),%r8 */ - EMIT3(0x4c, 0x8b, 0x87); - EMIT(offsetof(struct sk_buff, data), 4); - } + /* mov off8(%rdi),%r9d */ + EMIT4(0x44, 0x8b, 0x4f, SKB_OFF8(len)); + /* sub off8(%rdi),%r9d */ + EMIT4(0x44, 0x2b, 0x4f, SKB_OFF8(data_len)); + /* mov off32(%rdi),%r8 */ + EMIT3(0x4c, 0x8b, 0x87); + EMIT(offsetof(struct sk_buff, data), 4); } } @@ -391,43 +380,24 @@ void bpf_jit_compile(struct sk_filter *fp) break; case BPF_S_LD_W_LEN: /* A = skb->len; */ BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, len) != 4); - if (is_imm8(offsetof(struct sk_buff, len))) - /* mov off8(%rdi),%eax */ - EMIT3(0x8b, 0x47, offsetof(struct sk_buff, len)); - else { - EMIT2(0x8b, 0x87); - EMIT(offsetof(struct sk_buff, len), 4); - } + /* mov off8(%rdi),%eax */ + EMIT3(0x8b, 0x47, SKB_OFF8(len)); break; case BPF_S_LDX_W_LEN: /* X = skb->len; */ seen |= SEEN_XREG; - if (is_imm8(offsetof(struct sk_buff, len))) - /* mov off8(%rdi),%ebx */ - EMIT3(0x8b, 0x5f, offsetof(struct sk_buff, len)); - else { - EMIT2(0x8b, 0x9f); - EMIT(offsetof(struct sk_buff, len), 4); - } + /* mov off8(%rdi),%ebx */ + EMIT3(0x8b, 0x5f, SKB_OFF8(len)); break; case BPF_S_ANC_PROTOCOL: /* A = ntohs(skb->protocol); */ BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, protocol) != 2); - if (is_imm8(offsetof(struct sk_buff, protocol))) { - /* movzwl off8(%rdi),%eax */ - EMIT4(0x0f, 0xb7, 0x47, offsetof(struct sk_buff, protocol)); - } else { - EMIT3(0x0f, 0xb7, 0x87); /* movzwl off32(%rdi),%eax */ - EMIT(offsetof(struct sk_buff, protocol), 4); - } + /* movzwl off32(%rdi),%eax */ + EMIT3(0x0f, 0xb7, 0x87); + EMIT(offsetof(struct sk_buff, protocol), 4); EMIT2(0x86, 0xc4); /* ntohs() : xchg %al,%ah */ break; case BPF_S_ANC_IFINDEX: - if (is_imm8(offsetof(struct sk_buff, dev))) { - /* movq off8(%rdi),%rax */ - EMIT4(0x48, 0x8b, 0x47, offsetof(struct sk_buff, dev)); - } else { - EMIT3(0x48, 0x8b, 0x87); /* movq off32(%rdi),%rax */ - EMIT(offsetof(struct sk_buff, dev), 4); - } + /* movq off8(%rdi),%rax */ + EMIT4(0x48, 0x8b, 0x47, SKB_OFF8(dev)); EMIT3(0x48, 0x85, 0xc0); /* test %rax,%rax */ EMIT_COND_JMP(X86_JE, cleanup_addr - (addrs[i] - 6)); BUILD_BUG_ON(FIELD_SIZEOF(struct net_device, ifindex) != 4); @@ -436,33 +406,21 @@ void bpf_jit_compile(struct sk_filter *fp) break; case BPF_S_ANC_MARK: BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, mark) != 4); - if (is_imm8(offsetof(struct sk_buff, mark))) { - /* mov off8(%rdi),%eax */ - EMIT3(0x8b, 0x47, offsetof(struct sk_buff, mark)); - } else { - EMIT2(0x8b, 0x87); - EMIT(offsetof(struct sk_buff, mark), 4); - } + /* mov off32(%rdi),%eax */ + EMIT2(0x8b, 0x87); + EMIT(offsetof(struct sk_buff, mark), 4); break; case BPF_S_ANC_RXHASH: BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, rxhash) != 4); - if (is_imm8(offsetof(struct sk_buff, rxhash))) { - /* mov off8(%rdi),%eax */ - EMIT3(0x8b, 0x47, offsetof(struct sk_buff, rxhash)); - } else { - EMIT2(0x8b, 0x87); - EMIT(offsetof(struct sk_buff, rxhash), 4); - } + /* mov off32(%rdi),%eax */ + EMIT2(0x8b, 0x87); + EMIT(offsetof(struct sk_buff, rxhash), 4); break; case BPF_S_ANC_QUEUE: BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, queue_mapping) != 2); - if (is_imm8(offsetof(struct sk_buff, queue_mapping))) { - /* movzwl off8(%rdi),%eax */ - EMIT4(0x0f, 0xb7, 0x47, offsetof(struct sk_buff, queue_mapping)); - } else { - EMIT3(0x0f, 0xb7, 0x87); /* movzwl off32(%rdi),%eax */ - EMIT(offsetof(struct sk_buff, queue_mapping), 4); - } + /* movzwl off32(%rdi),%eax */ + EMIT3(0x0f, 0xb7, 0x87); + EMIT(offsetof(struct sk_buff, queue_mapping), 4); break; case BPF_S_ANC_CPU: #ifdef CONFIG_SMP