diff mbox

net: filter: Convert the BPF VM to threaded code

Message ID 1312005899.2873.70.camel@edumazet-laptop
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet July 30, 2011, 6:04 a.m. UTC
Le vendredi 29 juillet 2011 à 22:09 -0700, Rui Ueyama a écrit :
> The result of benchmark looks good. A simple benchmark that sends 10M UDP
> packets to lo took 76.24 seconds on average on Core 2 Duo L7500@1.6GHz.when
> tcpdump is running. With this patch it took 75.41 seconds, which means we save
> 80ns for each packet on that processor.
> 
> I think converting the VM to threaded code is low hanging fruit, even
> if we'd have
> JIT compilers for popular architectures. Most of the lines in my patch
> are indentation
> change, so the actual change is not big.
> 
...
> Tcpdump I used is this: tcpdump -p -n -s -i lo net 192.168.2.0/24
> 

Thanks for providing numbers. Was it on 32 or 64bit kernel ?

Have you done a test with a cold instruction cache ?

Your patch adds 540 bytes of code, so its a potential latency increase.

# size net/core/filter.o net/core/filter.o.old
   text	   data	    bss	    dec	    hex	filename
   4243	      0	      0	   4243	   1093	net/core/filter.o
   3703	     24	      0	   3727	    e8f	net/core/filter.o.old

Each 'NEXT' translates to :

 4db:	83 c3 08             	add    $0x8,%ebx
 4de:	0f b7 03             	movzwl (%ebx),%eax
 4e1:	8b 04 85 00 02 00 00 	mov    0x200(,%eax,4),%eax
 4e8:	ff e0                	jmp    *%eax


And this is on i386, expect more on cpus with 32bit fixed
instructions ...

We can remove one branch per BPF instruction with following patch :




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/core/filter.c b/net/core/filter.c
index 36f975f..377f3ca 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -119,16 +119,14 @@  unsigned int sk_run_filter(const struct sk_buff *skb,
 	u32 tmp;
 	int k;
 
+	fentry--;
 	/*
 	 * Process array of filter instructions.
 	 */
-	for (;; fentry++) {
-#if defined(CONFIG_X86_32)
+	for (;;) {
 #define	K (fentry->k)
-#else
-		const u32 K = fentry->k;
-#endif
 
+		fentry++;
 		switch (fentry->code) {
 		case BPF_S_ALU_ADD_X:
 			A += X;