diff mbox

multi bpf filter will impact performance?

Message ID AANLkTimynbhgo2dqFwKOc8NT9TXpkCtCJMvwkySEBORz@mail.gmail.com
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Changli Gao Dec. 1, 2010, 7:36 a.m. UTC
On Wed, Dec 1, 2010 at 11:48 AM, Rui <wirelesser@gmail.com> wrote:
> one more question is
>
> if  RPS can spread the load into 4 separate cpus, how about the
> "packet_rcv(or tpacket_rcv)" ? will they run in parallel?
>

You mentioned RPS. But the current bpf doesn't have an instruction to
get the current CPU number. You can try this patch attached.

Maybe we can leverage the bpf and SO_REUSEPORT to direct the traffic
to the socket instance on the local CPU.

Comments

Eric Dumazet Dec. 1, 2010, 7:47 a.m. UTC | #1
Le mercredi 01 décembre 2010 à 15:36 +0800, Changli Gao a écrit :
> On Wed, Dec 1, 2010 at 11:48 AM, Rui <wirelesser@gmail.com> wrote:
> > one more question is
> >
> > if  RPS can spread the load into 4 separate cpus, how about the
> > "packet_rcv(or tpacket_rcv)" ? will they run in parallel?
> >
> 
> You mentioned RPS. But the current bpf doesn't have an instruction to
> get the current CPU number. You can try this patch attached.
> 
> Maybe we can leverage the bpf and SO_REUSEPORT to direct the traffic
> to the socket instance on the local CPU.
> 

Oh well, it seems you read over my neck, I was preparing a patch with
SKF_AD_RXHASH and SKF_AD_CPU




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Changli Gao Dec. 1, 2010, 7:59 a.m. UTC | #2
On Wed, Dec 1, 2010 at 3:47 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mercredi 01 décembre 2010 à 15:36 +0800, Changli Gao a écrit :
>
> Oh well, it seems you read over my neck, I was preparing a patch with
> SKF_AD_RXHASH and SKF_AD_CPU
>
>

Nice to hear it. :)

There are too many filters: bpf, iptables and tc. Maybe an unified one
such as nft is needed. Then the duplicate code would be reduced. Maybe
it is just a good dream.
Eric Dumazet Dec. 1, 2010, 8:09 a.m. UTC | #3
Le mercredi 01 décembre 2010 à 15:59 +0800, Changli Gao a écrit :
> On Wed, Dec 1, 2010 at 3:47 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Le mercredi 01 décembre 2010 à 15:36 +0800, Changli Gao a écrit :
> >
> > Oh well, it seems you read over my neck, I was preparing a patch with
> > SKF_AD_RXHASH and SKF_AD_CPU
> >
> >
> 
> Nice to hear it. :)
> 
> There are too many filters: bpf, iptables and tc. Maybe an unified one
> such as nft is needed. Then the duplicate code would be reduced. Maybe
> it is just a good dream.
> 

You forgot the rxhash function as well, it would be nice to augment it
with custom code if necessary.

A dream would be to have a native compiler, and not interpret pseudo
code...



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Changli Gao Dec. 1, 2010, 8:15 a.m. UTC | #4
On Wed, Dec 1, 2010 at 4:09 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> A dream would be to have a native compiler, and not interpret pseudo
> code...
>

FYI: FreeBSD's BPF implementation have JIT compilers in kernel on both
i386 and amd64.
Eric Dumazet Dec. 1, 2010, 8:42 a.m. UTC | #5
Le mercredi 01 décembre 2010 à 16:15 +0800, Changli Gao a écrit :
> On Wed, Dec 1, 2010 at 4:09 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> > A dream would be to have a native compiler, and not interpret pseudo
> > code...
> >
> 
> FYI: FreeBSD's BPF implementation have JIT compilers in kernel on both
> i386 and amd64.
> 


IMHO, a better pcap optimizer would be the first step.

If you take a look at their generated code, its not a real win over the
code we currently have.


Really.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hagen Paul Pfeifer Dec. 1, 2010, 5:22 p.m. UTC | #6
On Wed, 01 Dec 2010 09:42:47 +0100, Eric Dumazet wrote:

> IMHO, a better pcap optimizer would be the first step.

+1

Optimizing complex filter rules is step one in the process of optimizing
the packet processing. A JIT compiler like FreeBSD provides cannot polish a
(pcap)turd. I thought Patrick was working on a generic filter mechanism for
netfilter!? ... ;)

Hagen

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Dec. 1, 2010, 6:18 p.m. UTC | #7
From: Hagen Paul Pfeifer <hagen@jauu.net>
Date: Wed, 01 Dec 2010 18:22:48 +0100

> On Wed, 01 Dec 2010 09:42:47 +0100, Eric Dumazet wrote:
> 
>> IMHO, a better pcap optimizer would be the first step.
> 
> +1
> 
> Optimizing complex filter rules is step one in the process of optimizing
> the packet processing. A JIT compiler like FreeBSD provides cannot polish a
> (pcap)turd. I thought Patrick was working on a generic filter mechanism for
> netfilter!? ... ;)

Yes, and we spoke at the netfilter workshop about making that interpreter
available to socket filters and the packet classifier layer.

However, I think it's still valuable to write a few JIT compilers for
the existing BPF stuff.  I considered working on a sparc64 JIT just to
see what it would look like.

If people work on the BPF optimizer and BPF JITs in parallel, we'll have
both ready at the same time.  win++
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Dec. 1, 2010, 6:24 p.m. UTC | #8
From: David Miller <davem@davemloft.net>
Date: Wed, 01 Dec 2010 10:18:09 -0800 (PST)

> If people work on the BPF optimizer and BPF JITs in parallel, we'll have
> both ready at the same time.  win++

BTW, the JIT is non-trivial work for us because of non-linear SKBs.
We'd need some kind of helper stub or similar, with a straight line
fast path for the linear case.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Dec. 1, 2010, 6:24 p.m. UTC | #9
Le mercredi 01 décembre 2010 à 10:18 -0800, David Miller a écrit :
> From: Hagen Paul Pfeifer <hagen@jauu.net>
> Date: Wed, 01 Dec 2010 18:22:48 +0100
> 
> > On Wed, 01 Dec 2010 09:42:47 +0100, Eric Dumazet wrote:
> > 
> >> IMHO, a better pcap optimizer would be the first step.
> > 
> > +1
> > 
> > Optimizing complex filter rules is step one in the process of optimizing
> > the packet processing. A JIT compiler like FreeBSD provides cannot polish a
> > (pcap)turd. I thought Patrick was working on a generic filter mechanism for
> > netfilter!? ... ;)
> 
> Yes, and we spoke at the netfilter workshop about making that interpreter
> available to socket filters and the packet classifier layer.
> 
> However, I think it's still valuable to write a few JIT compilers for
> the existing BPF stuff.  I considered working on a sparc64 JIT just to
> see what it would look like.
> 
> If people work on the BPF optimizer and BPF JITs in parallel, we'll have
> both ready at the same time.  win++

A third work in progress (from my side) is to add a check in
sk_chk_filter() to remove the memvalid we added lately to protect the
LOAD M(K).

It is needed anyway for arches without a BPF JIT :)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Dec. 1, 2010, 6:44 p.m. UTC | #10
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 01 Dec 2010 19:24:53 +0100

> A third work in progress (from my side) is to add a check in
> sk_chk_filter() to remove the memvalid we added lately to protect the
> LOAD M(K).

I understand your idea, but the static checkers are still going to
complain.  So better add a huge comment in sk_run_filter() explaining
why the checker's complaint should be ignored :-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Dec. 3, 2010, 6:32 a.m. UTC | #11
Le mercredi 01 décembre 2010 à 10:18 -0800, David Miller a écrit :

> However, I think it's still valuable to write a few JIT compilers for
> the existing BPF stuff.  I considered working on a sparc64 JIT just to
> see what it would look like.
> 
> If people work on the BPF optimizer and BPF JITs in parallel, we'll have
> both ready at the same time.  win++

I began work on implementing a BPF JIT for x86_64

My plan is to use external helpers to load skb data/metadata, to keep
BPF program very short and have no dependencies against struct layouts.

These helpers would be the three load_word, load_half, load_byte.

In case the bits are in skb head, these helpers should be fast.

For practical reasons, they would be in ASM for their fast path, and C
for the slow path. They are ASM because they are able to perform the
shortcut (in case of error, doing the stack unwind to perform the
"return 0;") so that we dont have to test their return from the JIT
program.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 447a775..35db44a 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -124,7 +124,8 @@  struct sock_fprog {	/* Required for SO_ATTACH_FILTER. */
 #define SKF_AD_MARK 	20
 #define SKF_AD_QUEUE	24
 #define SKF_AD_HATYPE	28
-#define SKF_AD_MAX	32
+#define SKF_AD_CPU	32
+#define SKF_AD_MAX	36
 #define SKF_NET_OFF   (-0x100000)
 #define SKF_LL_OFF    (-0x200000)
 
diff --git a/net/core/filter.c b/net/core/filter.c
index a44d27f..3baa3f7 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -410,6 +410,9 @@  load_b:
 				A = 0;
 			continue;
 		}
+		case SKF_AD_CPU:
+			A = raw_smp_processor_id();
+			continue;
 		default:
 			return 0;
 		}