diff mbox series

[net-next,1/1] net/sched: Introduce skb hash classifier

Message ID 20200807222816.18026-1-jhs@emojatatu.com
State Deferred
Delegated to: David Miller
Headers show
Series [net-next,1/1] net/sched: Introduce skb hash classifier | expand

Commit Message

Jamal Hadi Salim Aug. 7, 2020, 10:28 p.m. UTC
From: Jamal Hadi Salim <jhs@mojatatu.com>

his classifier, in the same spirit as the tc skb mark classifier,
provides a generic (fast lookup) approach to filter on the hash value
and optional mask.

like skb->mark, skb->hash could be set by multiple entities in the
datapath including but not limited to hardware offloaded classification
policies, hardware RSS (which already sets it today), XDP, ebpf programs
and even by other tc actions like skbedit and IFE.

Example use:

$TC qdisc add  dev $DEV1 ingress
... offloaded to hardware using flower ...
$TC filter add dev $DEV1 ingress protocol ip prio 1 flower hash 0x1f/0xff skip_sw flowid 1:1 \
action ok
... and when it shows up in s/w tagged from hardware...
$TC filter add dev $DEV1 parent ffff: protocol ip prio 2 handle 0x11/0xff skbhash flowid 1:11 \
action mirred egress redirect dev $DEV2
$TC filter add dev $DEV1 parent ffff: protocol ip prio 3 handle 0x12 skbhash flowid 1:12 \
action mirred egress redirect dev $DEV3

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
---
 include/uapi/linux/pkt_cls.h |  13 ++
 net/sched/Kconfig            |  10 +
 net/sched/Makefile           |   1 +
 net/sched/cls_skbhash.c      | 441 +++++++++++++++++++++++++++++++++++
 4 files changed, 465 insertions(+)
 create mode 100644 net/sched/cls_skbhash.c

Comments

Cong Wang Aug. 9, 2020, 6:15 p.m. UTC | #1
On Fri, Aug 7, 2020 at 3:28 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> From: Jamal Hadi Salim <jhs@mojatatu.com>
>
> his classifier, in the same spirit as the tc skb mark classifier,
> provides a generic (fast lookup) approach to filter on the hash value
> and optional mask.
>
> like skb->mark, skb->hash could be set by multiple entities in the
> datapath including but not limited to hardware offloaded classification
> policies, hardware RSS (which already sets it today), XDP, ebpf programs
> and even by other tc actions like skbedit and IFE.

Looks like a lot of code duplication with cls_fw, is there any way to
reuse the code?

And I wonder if it is time to introduce a filter to match multiple skb
fields, as adding a filter for each skb field clearly does not scale.
Perhaps something like:

$TC filter add dev $DEV1 parent ffff: protocol ip prio 3 handle X skb \
hash Y mark Z flowid 1:12

Thanks.
Jamal Hadi Salim Aug. 9, 2020, 11:41 p.m. UTC | #2
On 2020-08-09 2:15 p.m., Cong Wang wrote:
> On Fri, Aug 7, 2020 at 3:28 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>>
>> From: Jamal Hadi Salim <jhs@mojatatu.com>
>>
>> his classifier, in the same spirit as the tc skb mark classifier,
>> provides a generic (fast lookup) approach to filter on the hash value
>> and optional mask.
>>
>> like skb->mark, skb->hash could be set by multiple entities in the
>> datapath including but not limited to hardware offloaded classification
>> policies, hardware RSS (which already sets it today), XDP, ebpf programs
>> and even by other tc actions like skbedit and IFE.
> 
> Looks like a lot of code duplication with cls_fw, is there any way to
> reuse the code?
> 

Yeah, it was the simplest way to code this.
They are very similar since they both use 32 bit keys.
I am not exactly thrilled by the current 255 buckets(limited by rcu
structure) - it limits performance in presence of large number
of entries (since they can only be spread across 255 buckets).
If we have a million entries, there will be a lot of hash collisions.

> And I wonder if it is time to introduce a filter to match multiple skb
> fields, as adding a filter for each skb field clearly does not scale.
> Perhaps something like:
> 
> $TC filter add dev $DEV1 parent ffff: protocol ip prio 3 handle X skb \
> hash Y mark Z flowid 1:12
> 

Interesting idea. Note: my experience is that typical setup is
to have only one of those (from offload perspective). Ariel,
are your use cases requiring say both fields?

 From policy perspective, i think above will get more complex
mostly because you have to deal with either mark or hash
being optional. It also opens doors for more complex matching
requirements. Example "match mark X AND hash Y" and
"match mark X OR hash Y".
The new classifier will have to deal with that semantic.

With fw and hash being the complex/optional semantics are easy:

"match mark X AND hash Y":
$TC filter add dev $DEV1 parent ffff: protocol ip prio 3 handle X 
skbhash flowid 1:12 action continue
$TC filter add dev $DEV1 parent ffff: protocol ip prio 4 handle Y fw 
flowid 1:12 action ok

"match mark X OR hash Y":
$TC filter add dev $DEV1 parent ffff: protocol ip prio 3 handle X 
skbhash flowid 1:12 action ok
$TC filter add dev $DEV1 parent ffff: protocol ip prio 4 handle Y fw 
flowid 1:12 action ok

Then the question is how to implement? is it one hash table for
both or two(one for mark and one for hash), etc.

cheers,
jamal
Cong Wang Aug. 11, 2020, 11:25 p.m. UTC | #3
On Sun, Aug 9, 2020 at 4:41 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> Interesting idea. Note: my experience is that typical setup is
> to have only one of those (from offload perspective). Ariel,
> are your use cases requiring say both fields?
>
>  From policy perspective, i think above will get more complex
> mostly because you have to deal with either mark or hash
> being optional. It also opens doors for more complex matching
> requirements. Example "match mark X AND hash Y" and
> "match mark X OR hash Y".
> The new classifier will have to deal with that semantic.
>
> With fw and hash being the complex/optional semantics are easy:
>
> "match mark X AND hash Y":
> $TC filter add dev $DEV1 parent ffff: protocol ip prio 3 handle X
> skbhash flowid 1:12 action continue
> $TC filter add dev $DEV1 parent ffff: protocol ip prio 4 handle Y fw
> flowid 1:12 action ok
>
> "match mark X OR hash Y":
> $TC filter add dev $DEV1 parent ffff: protocol ip prio 3 handle X
> skbhash flowid 1:12 action ok
> $TC filter add dev $DEV1 parent ffff: protocol ip prio 4 handle Y fw
> flowid 1:12 action ok

Not sure if I get you correctly, but with a combined implementation
you can do above too, right? Something like:

(AND case)
$TC filter add dev $DEV1 parent ffff: protocol ip prio 3 handle 1
skb hash Y mark X flowid 1:12 action ok

(OR case)
$TC filter add dev $DEV1 parent ffff: protocol ip prio 3 handle 1
skb hash Y flowid 1:12 action ok
$TC filter add dev $DEV1 parent ffff: protocol ip prio 4 handle 2
skb mark X flowid 1:12 action ok

Side note: you don't have to use handle as the value of hash/mark,
which gives people freedom to choose different handles.


>
> Then the question is how to implement? is it one hash table for
> both or two(one for mark and one for hash), etc.
>

Good question. I am not sure, maybe no hash table at all?
Unless there are a lot of filters, we do not have to organize
them in a hash table, do we?

Thanks.
Marcelo Ricardo Leitner Aug. 12, 2020, 9:07 p.m. UTC | #4
On Tue, Aug 11, 2020 at 04:25:43PM -0700, Cong Wang wrote:
> On Sun, Aug 9, 2020 at 4:41 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> >
> > Interesting idea. Note: my experience is that typical setup is
> > to have only one of those (from offload perspective). Ariel,
> > are your use cases requiring say both fields?
> >
> >  From policy perspective, i think above will get more complex
> > mostly because you have to deal with either mark or hash
> > being optional. It also opens doors for more complex matching
> > requirements. Example "match mark X AND hash Y" and
> > "match mark X OR hash Y".
> > The new classifier will have to deal with that semantic.
> >
> > With fw and hash being the complex/optional semantics are easy:
> >
> > "match mark X AND hash Y":
> > $TC filter add dev $DEV1 parent ffff: protocol ip prio 3 handle X
> > skbhash flowid 1:12 action continue
> > $TC filter add dev $DEV1 parent ffff: protocol ip prio 4 handle Y fw
> > flowid 1:12 action ok
> >
> > "match mark X OR hash Y":
> > $TC filter add dev $DEV1 parent ffff: protocol ip prio 3 handle X
> > skbhash flowid 1:12 action ok
> > $TC filter add dev $DEV1 parent ffff: protocol ip prio 4 handle Y fw
> > flowid 1:12 action ok
> 
> Not sure if I get you correctly, but with a combined implementation
> you can do above too, right? Something like:
> 
> (AND case)
> $TC filter add dev $DEV1 parent ffff: protocol ip prio 3 handle 1
> skb hash Y mark X flowid 1:12 action ok

I probably missed something, but this kind of matching is pretty much
what flower does today. Is it just to avoid key extraction/flow
dissector or did I miss something?  I know there was a thread on how
to match on this hash before, but other than what I just said, I can't
recall other arguments.
Jamal Hadi Salim Aug. 13, 2020, 12:52 p.m. UTC | #5
On 2020-08-11 7:25 p.m., Cong Wang wrote:
> On Sun, Aug 9, 2020 at 4:41 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote:

[..]
> 
> Not sure if I get you correctly, but with a combined implementation
> you can do above too, right? Something like:
> 
> (AND case)
> $TC filter add dev $DEV1 parent ffff: protocol ip prio 3 handle 1
> skb hash Y mark X flowid 1:12 action ok
> 
> (OR case)
> $TC filter add dev $DEV1 parent ffff: protocol ip prio 3 handle 1
> skb hash Y flowid 1:12 action ok
> $TC filter add dev $DEV1 parent ffff: protocol ip prio 4 handle 2
> skb mark X flowid 1:12 action ok
> 

It will work but what i was tring to say is it is tricky to implement.
More below


> Side note: you don't have to use handle as the value of hash/mark,
> which gives people freedom to choose different handles.
>

Same comment here as above. More below.

> 
>>
>> Then the question is how to implement? is it one hash table for
>> both or two(one for mark and one for hash), etc.
>>
> 
> Good question. I am not sure, maybe no hash table at all?
> Unless there are a lot of filters, we do not have to organize
> them in a hash table, do we?
>

The _main_ requirement is to scale to a large number of filters
(a million is a good handwave number). Scale means
1) fast datapath lookup time + 2) fast insertion/deletion/get/dump
from control/user space.
fwmark is good at all these goals today for #2. It is good for #1 for
maybe 1K rules (limitation is the 256 buckets, constrained by rcu
trickery). Then you start having collisions in a bucket and your
lookup requires long linked list walks.

Generally something like a hash table with sufficient number of buckets
will work out ok.
There maybe other approaches (IDR in the kernel looks interesting,
but i didnt look closely).

So to the implementation issue:
Major issue is removing ambiguity while at the same time trying
to get good performance.

Lets say we decided to classify skbmark and skbhash at this point.
For a hash table, one simple approach is to set
lookupkey = hash<<32|mark

the key is used as input to the hash algo to find the bucket.

There are two outstanding challenges in my mind:

1)  To use the policy like you describe above
as an example:

$TC filter add dev $DEV1 parent ffff: protocol ip prio 3 handle 1
skb hash Y flowid 1:12 action ok

and say you receive a packet with both skb->hash and skb->mark set
Then there is ambiguity

How do you know whether to use hash or mark or both
for that specific key?
You can probably do some trick but I cant think of a cheap way to 
achieve this goal. Of course this issue doesnt exist if you have
separate classifiers.

2) If you decide tomorrow to add tcindex/prio etc, you will have to
rework this as well.

#2 is not as a big deal as #1.

cheers,
jamal
Cong Wang Aug. 16, 2020, 6:59 p.m. UTC | #6
On Thu, Aug 13, 2020 at 5:52 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> The _main_ requirement is to scale to a large number of filters
> (a million is a good handwave number). Scale means
> 1) fast datapath lookup time + 2) fast insertion/deletion/get/dump
> from control/user space.
> fwmark is good at all these goals today for #2. It is good for #1 for
> maybe 1K rules (limitation is the 256 buckets, constrained by rcu
> trickery). Then you start having collisions in a bucket and your
> lookup requires long linked list walks.
>
> Generally something like a hash table with sufficient number of buckets
> will work out ok.
> There maybe other approaches (IDR in the kernel looks interesting,
> but i didnt look closely).
>
> So to the implementation issue:
> Major issue is removing ambiguity while at the same time trying
> to get good performance.
>
> Lets say we decided to classify skbmark and skbhash at this point.
> For a hash table, one simple approach is to set
> lookupkey = hash<<32|mark
>
> the key is used as input to the hash algo to find the bucket.
>
> There are two outstanding challenges in my mind:
>
> 1)  To use the policy like you describe above
> as an example:
>
> $TC filter add dev $DEV1 parent ffff: protocol ip prio 3 handle 1
> skb hash Y flowid 1:12 action ok
>
> and say you receive a packet with both skb->hash and skb->mark set
> Then there is ambiguity
>
> How do you know whether to use hash or mark or both
> for that specific key?

Hmm, you can just unconditionally pass skb->hash and skb->mark,
no? Something like:

if (filter_parameter_has_hash) {
    match skb->hash with cls->param_hash
}

if (filter_parameter_has_mark) {
    match skb->mark with cls->param_mark
}

fw_classify() uses skb->mark unconditionally anyway, without checking
whether it is set or not first.

But if filters were put in a global hashtable, the above would be
much harder to implement.


> You can probably do some trick but I cant think of a cheap way to
> achieve this goal. Of course this issue doesnt exist if you have
> separate classifiers.
>
> 2) If you decide tomorrow to add tcindex/prio etc, you will have to
> rework this as well.
>
> #2 is not as a big deal as #1.

Well, I think #2 is more serious than #1, if we have to use a hashtable.
(If we don't have to, then it would be much easier to extend, of course.)

THanks.
Jamal Hadi Salim Aug. 17, 2020, 11:19 a.m. UTC | #7
On 2020-08-16 2:59 p.m., Cong Wang wrote:
> On Thu, Aug 13, 2020 at 5:52 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:


[..]
>> How do you know whether to use hash or mark or both
>> for that specific key?
> 
> Hmm, you can just unconditionally pass skb->hash and skb->mark,
> no? Something like:
> 
> if (filter_parameter_has_hash) {
>      match skb->hash with cls->param_hash
> }
> 
> if (filter_parameter_has_mark) {
>      match skb->mark with cls->param_mark
> }
> 
 >
> fw_classify() uses skb->mark unconditionally anyway, without checking
> whether it is set or not first.
> 

There is no ambiguity of intent in the fw case, there is only one field.
In the case of having multiple fields it is ambigious if you 
unconditionally look.

Example: policy says to match skb mark of 5 and hash of 3.
If packet arrives with skb->mark is 5 and skb->hash is 3
very clearly matched the intent of the policy.
If packet arrives withj skb->mark 7 and hash 3 it clearly
did not match the intent. etc.

> But if filters were put in a global hashtable, the above would be
> much harder to implement.
> 

Ok, yes. My assumption has been you will have some global shared
structure where all filters will be installed on.

I think i may have misunderstood all along what you were saying
which is:

a) add the rules so they are each _independent with different
    priorities_ in a chain.

b)  when i do lookup for packet arrival, i will only see a filter
  that matches "match mark 5 and hash 3" (meaning there is no
  ambiguity on intent). If packet data doesnt match policy then
  i will iterate to another filter on the chain list with lower
  priority.

Am i correct in my understanding?

If i am - then we still have a problem with lookup scale in presence
of a large number of filters since essentially this approach
is linear lookup (similar problem iptables has). I am afraid
a hash table or something with similar principle goals is needed.

> 
>> You can probably do some trick but I cant think of a cheap way to
>> achieve this goal. Of course this issue doesnt exist if you have
>> separate classifiers.
>>
>> 2) If you decide tomorrow to add tcindex/prio etc, you will have to
>> rework this as well.
>>
>> #2 is not as a big deal as #1.
> 
> Well, I think #2 is more serious than #1, if we have to use a hashtable.
> (If we don't have to, then it would be much easier to extend, of course.)
>

In both cases youd have to extend the existing code.

cheers,
jamal
Cong Wang Aug. 17, 2020, 7:47 p.m. UTC | #8
On Mon, Aug 17, 2020 at 4:19 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> On 2020-08-16 2:59 p.m., Cong Wang wrote:
> > On Thu, Aug 13, 2020 at 5:52 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
>
> [..]
> >> How do you know whether to use hash or mark or both
> >> for that specific key?
> >
> > Hmm, you can just unconditionally pass skb->hash and skb->mark,
> > no? Something like:
> >
> > if (filter_parameter_has_hash) {
> >      match skb->hash with cls->param_hash
> > }
> >
> > if (filter_parameter_has_mark) {
> >      match skb->mark with cls->param_mark
> > }
> >
>  >
> > fw_classify() uses skb->mark unconditionally anyway, without checking
> > whether it is set or not first.
> >
>
> There is no ambiguity of intent in the fw case, there is only one field.
> In the case of having multiple fields it is ambigious if you
> unconditionally look.
>
> Example: policy says to match skb mark of 5 and hash of 3.
> If packet arrives with skb->mark is 5 and skb->hash is 3
> very clearly matched the intent of the policy.
> If packet arrives withj skb->mark 7 and hash 3 it clearly
> did not match the intent. etc.

This example clearly shows no ambiguous, right? ;)


>
> > But if filters were put in a global hashtable, the above would be
> > much harder to implement.
> >
>
> Ok, yes. My assumption has been you will have some global shared
> structure where all filters will be installed on.

Sure, if not hashtable, we could simply put them in a list:

list_for_each_filter {
  if (filter_parameter_has_hash) {
    match skb->hash with cls->param_hash
  }
  if (filter_parameter_has_mark) {
    match skb->mark with cls->param_mark
  }
}


>
> I think i may have misunderstood all along what you were saying
> which is:
>
> a) add the rules so they are each _independent with different
>     priorities_ in a chain.

Yes, because this gives users freedom to pick a different prio
from its value (hash or mark).


>
> b)  when i do lookup for packet arrival, i will only see a filter
>   that matches "match mark 5 and hash 3" (meaning there is no
>   ambiguity on intent). If packet data doesnt match policy then
>   i will iterate to another filter on the chain list with lower
>   priority.

Right. Multiple values mean AND, not OR, so if you specify
mark 5 and hash 3, it will match skb->mark==5 && skb->hash==3.
If not matched, it will continue the iteration until the end.

>
> Am i correct in my understanding?
>
> If i am - then we still have a problem with lookup scale in presence
> of a large number of filters since essentially this approach
> is linear lookup (similar problem iptables has). I am afraid
> a hash table or something with similar principle goals is needed.

Yeah, this is why I asked you whether we have to put them in a
hashtable in previous emails, as hashtable organizes them with
a key, it is hard to combine multiple fields in one key and allow
to extend easily in the future. But other people smarter than me
may have better ideas here.

Thanks.
Jamal Hadi Salim Aug. 19, 2020, 9:48 a.m. UTC | #9
On 2020-08-17 3:47 p.m., Cong Wang wrote:
> On Mon, Aug 17, 2020 at 4:19 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>>

[..]
>> There is no ambiguity of intent in the fw case, there is only one field.
>> In the case of having multiple fields it is ambigious if you
>> unconditionally look.
>>
>> Example: policy says to match skb mark of 5 and hash of 3.
>> If packet arrives with skb->mark is 5 and skb->hash is 3
>> very clearly matched the intent of the policy.
>> If packet arrives withj skb->mark 7 and hash 3 it clearly
>> did not match the intent. etc.
> 
> This example clearly shows no ambiguous, right? ;)
> 

Ambigious only from the perspective of relational AND vs OR
(your original pseudo code had it in OR relation).

> 
>>
>>> But if filters were put in a global hashtable, the above would be
>>> much harder to implement.
>>>
>>
>> Ok, yes. My assumption has been you will have some global shared
>> structure where all filters will be installed on.
> 
> Sure, if not hashtable, we could simply put them in a list:
> 
> list_for_each_filter {
>    if (filter_parameter_has_hash) {
>      match skb->hash with cls->param_hash
>    }
>    if (filter_parameter_has_mark) {
>      match skb->mark with cls->param_mark
>    }
> }
> 

Yes, that would work - but iteration is linear.

> 
>>
>> I think i may have misunderstood all along what you were saying
>> which is:
>>
>> a) add the rules so they are each _independent with different
>>      priorities_ in a chain.
> 
> Yes, because this gives users freedom to pick a different prio
> from its value (hash or mark).
>

ok.

> 
>>
>> b)  when i do lookup for packet arrival, i will only see a filter
>>    that matches "match mark 5 and hash 3" (meaning there is no
>>    ambiguity on intent). If packet data doesnt match policy then
>>    i will iterate to another filter on the chain list with lower
>>    priority.
> 
> Right. Multiple values mean AND, not OR, so if you specify
> mark 5 and hash 3, it will match skb->mark==5 && skb->hash==3.
> If not matched, it will continue the iteration until the end.
>

That would remove the ambiguity (assuming iteration with "continue"
to create the AND effect).

>>
>> Am i correct in my understanding?
>>
>> If i am - then we still have a problem with lookup scale in presence
>> of a large number of filters since essentially this approach
>> is linear lookup (similar problem iptables has). I am afraid
>> a hash table or something with similar principle goals is needed.
> 
> Yeah, this is why I asked you whether we have to put them in a
> hashtable in previous emails, as hashtable organizes them with
> a key, it is hard to combine multiple fields in one key and allow
> to extend easily in the future. But other people smarter than me
> may have better ideas here.

To achieve reasonable performance (with many filters) I dont think
there is escape from having something that is centralized
(per priority) - sort of what fw, u32 or flower do. A hash table is
the most common approach; i was hoping that IDR maybe useful since the
skb->hash maps nicely to "32 bit key" but Vlad was saying at the
tc workshop that he was seeing bottlenecks with IDR.

I will test a few hash algorithms (including common one like jenkins)
for this use case.
Problem right now is we have that rcu-inspired upper bucket limit
(which exists) in fw as well: If i have 1M entries which can only be
spread to 256 buckets perfectly then i have ~4K entries per bucket
which are a linked list. So we are still not doing so well. Do you have
time to make that better for fw (since you are an rcu maestro).

cheers,
jamal
diff mbox series

Patch

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index ee95f42fb0ec..804cc326c2ce 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -770,4 +770,17 @@  enum {
 	TCF_EM_OPND_LT
 };
 
+/* skbhash filter */
+
+enum {
+	TCA_SKBHASH_UNSPEC,
+	TCA_SKBHASH_CLASSID,
+	TCA_SKBHASH_POLICE,
+	TCA_SKBHASH_ACT,
+	TCA_SKBHASH_MASK,
+	__TCA_SKBHASH_MAX
+};
+
+#define TCA_SKBHASH_MAX (__TCA_SKBHASH_MAX - 1)
+
 #endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index a3b37d88800e..aed2c74466e6 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -640,6 +640,16 @@  config NET_CLS_MATCHALL
 	  To compile this code as a module, choose M here: the module will
 	  be called cls_matchall.
 
+config NET_CLS_SKBHASH
+	tristate "skb->hash classifier"
+	select NET_CLS
+	---help---
+	  If you say Y here, you will be able to classify packets
+	  according to the skb->hash value
+
+	  To compile this code as a module, choose M here: the
+	  module will be called cls_skbhash.
+
 config NET_EMATCH
 	bool "Extended Matches"
 	select NET_CLS
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 66bbf9a98f9e..cbffba3b1fa1 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -68,6 +68,7 @@  obj-$(CONFIG_NET_SCH_TAPRIO)	+= sch_taprio.o
 obj-$(CONFIG_NET_CLS_U32)	+= cls_u32.o
 obj-$(CONFIG_NET_CLS_ROUTE4)	+= cls_route.o
 obj-$(CONFIG_NET_CLS_FW)	+= cls_fw.o
+obj-$(CONFIG_NET_CLS_SKBHASH)	+= cls_skbhash.o
 obj-$(CONFIG_NET_CLS_RSVP)	+= cls_rsvp.o
 obj-$(CONFIG_NET_CLS_TCINDEX)	+= cls_tcindex.o
 obj-$(CONFIG_NET_CLS_RSVP6)	+= cls_rsvp6.o
diff --git a/net/sched/cls_skbhash.c b/net/sched/cls_skbhash.c
new file mode 100644
index 000000000000..e523a1ce2fc5
--- /dev/null
+++ b/net/sched/cls_skbhash.c
@@ -0,0 +1,441 @@ 
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * net/sched/cls_skbhash.c	TC Classifier for skb->hash
+ *
+ * Authors:	J Hadi Salim <jhs@mojatatu.com>
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <linux/errno.h>
+#include <linux/skbuff.h>
+#include <net/netlink.h>
+#include <net/act_api.h>
+#include <net/pkt_cls.h>
+#include <net/sch_generic.h>
+
+#define SKBHASH_HTSIZE 256
+
+struct skbhash_head {
+	struct rcu_head		rcu;
+	u32			mask;
+	struct skbhash_filter __rcu	*ht[SKBHASH_HTSIZE];
+};
+
+struct skbhash_filter {
+	struct skbhash_filter __rcu	*next;
+	u32			id;
+	struct tcf_result	res;
+	struct tcf_exts		exts;
+	struct tcf_proto	*tp;
+	struct rcu_work		rwork;
+};
+
+static u32 skbhash_hash(u32 handle)
+{
+	handle ^= (handle >> 16);
+	handle ^= (handle >> 8);
+	return handle % SKBHASH_HTSIZE;
+}
+
+static int skbhash_classify(struct sk_buff *skb, const struct tcf_proto *tp,
+			    struct tcf_result *res)
+{
+	struct skbhash_head *head = rcu_dereference_bh(tp->root);
+	struct skbhash_filter *f;
+	int r;
+	u32 id = skb->hash;
+
+	if (head != NULL) {
+		id &= head->mask;
+
+		for (f = rcu_dereference_bh(head->ht[skbhash_hash(id)]); f;
+		     f = rcu_dereference_bh(f->next)) {
+			if (f->id == id) {
+				*res = f->res;
+				r = tcf_exts_exec(skb, &f->exts, res);
+				if (r < 0)
+					continue;
+
+				return r;
+			}
+		}
+	} else {
+		struct Qdisc *q = tcf_block_q(tp->chain->block);
+
+		/* Legacy method: classify the packet using its skb hash. */
+		if (id && (TC_H_MAJ(id) == 0 ||
+			   !(TC_H_MAJ(id ^ q->handle)))) {
+			res->classid = id;
+			res->class = 0;
+			return 0;
+		}
+	}
+
+	return -1;
+}
+
+static void *skbhash_get(struct tcf_proto *tp, u32 handle)
+{
+	struct skbhash_head *head = rtnl_dereference(tp->root);
+	struct skbhash_filter *f;
+
+	if (head == NULL)
+		return NULL;
+
+	f = rtnl_dereference(head->ht[skbhash_hash(handle)]);
+	for (; f; f = rtnl_dereference(f->next)) {
+		if (f->id == handle)
+			return f;
+	}
+	return NULL;
+}
+
+static int skbhash_init(struct tcf_proto *tp)
+{
+	/* We don't allocate skbhash_head here, because in the old method
+	 * we don't need it at all.
+	 */
+	return 0;
+}
+
+static void __skbhash_delete_filter(struct skbhash_filter *f)
+{
+	tcf_exts_destroy(&f->exts);
+	tcf_exts_put_net(&f->exts);
+	kfree(f);
+}
+
+static void skbhash_delete_filter_work(struct work_struct *work)
+{
+	struct skbhash_filter *f = container_of(to_rcu_work(work),
+					   struct skbhash_filter,
+					   rwork);
+	rtnl_lock();
+	__skbhash_delete_filter(f);
+	rtnl_unlock();
+}
+
+static void skbhash_destroy(struct tcf_proto *tp, bool rtnl_held,
+			    struct netlink_ext_ack *extack)
+{
+	struct skbhash_head *head = rtnl_dereference(tp->root);
+	struct skbhash_filter *f;
+	int h;
+
+	if (head == NULL)
+		return;
+
+	for (h = 0; h < SKBHASH_HTSIZE; h++) {
+		while ((f = rtnl_dereference(head->ht[h])) != NULL) {
+			RCU_INIT_POINTER(head->ht[h],
+					 rtnl_dereference(f->next));
+			tcf_unbind_filter(tp, &f->res);
+			if (tcf_exts_get_net(&f->exts))
+				tcf_queue_work(&f->rwork,
+					       skbhash_delete_filter_work);
+			else
+				__skbhash_delete_filter(f);
+		}
+	}
+	kfree_rcu(head, rcu);
+}
+
+static int skbhash_delete(struct tcf_proto *tp, void *arg, bool *last,
+			  bool rtnl_held, struct netlink_ext_ack *extack)
+{
+	struct skbhash_head *head = rtnl_dereference(tp->root);
+	struct skbhash_filter *f = arg;
+	struct skbhash_filter __rcu **fp;
+	struct skbhash_filter *pfp;
+	int ret = -EINVAL;
+	int h;
+
+	if (head == NULL || f == NULL)
+		goto out;
+
+	fp = &head->ht[skbhash_hash(f->id)];
+
+	for (pfp = rtnl_dereference(*fp); pfp;
+	     fp = &pfp->next, pfp = rtnl_dereference(*fp)) {
+		if (pfp == f) {
+			RCU_INIT_POINTER(*fp, rtnl_dereference(f->next));
+			tcf_unbind_filter(tp, &f->res);
+			tcf_exts_get_net(&f->exts);
+			tcf_queue_work(&f->rwork, skbhash_delete_filter_work);
+			ret = 0;
+			break;
+		}
+	}
+
+	*last = true;
+	for (h = 0; h < SKBHASH_HTSIZE; h++) {
+		if (rcu_access_pointer(head->ht[h])) {
+			*last = false;
+			break;
+		}
+	}
+
+out:
+	return ret;
+}
+
+static const struct nla_policy skbhash_policy[TCA_SKBHASH_MAX + 1] = {
+	[TCA_SKBHASH_CLASSID]	= { .type = NLA_U32 },
+	[TCA_SKBHASH_MASK]	= { .type = NLA_U32 },
+};
+
+static int skbhash_set_parms(struct net *net, struct tcf_proto *tp,
+			     struct skbhash_filter *f, struct nlattr **tb,
+			     struct nlattr **tca, unsigned long base,
+			     bool ovr, struct netlink_ext_ack *extack)
+{
+	struct skbhash_head *head = rtnl_dereference(tp->root);
+	u32 mask;
+	int err;
+
+	err = tcf_exts_validate(net, tp, tb, tca[TCA_RATE], &f->exts, ovr,
+				true, extack);
+	if (err < 0)
+		return err;
+
+	if (tb[TCA_SKBHASH_CLASSID]) {
+		f->res.classid = nla_get_u32(tb[TCA_SKBHASH_CLASSID]);
+		tcf_bind_filter(tp, &f->res, base);
+	}
+
+	err = -EINVAL;
+	if (tb[TCA_SKBHASH_MASK]) {
+		mask = nla_get_u32(tb[TCA_SKBHASH_MASK]);
+		if (mask != head->mask)
+			return err;
+	} else if (head->mask != 0xFFFFFFFF)
+		return err;
+
+	return 0;
+}
+
+static int skbhash_change(struct net *net, struct sk_buff *in_skb,
+			  struct tcf_proto *tp, unsigned long base, u32 handle,
+			  struct nlattr **tca, void **arg, bool ovr,
+			  bool rtnl_held, struct netlink_ext_ack *extack)
+{
+	struct skbhash_head *head = rtnl_dereference(tp->root);
+	struct skbhash_filter *f = *arg;
+	struct nlattr *opt = tca[TCA_OPTIONS];
+	struct nlattr *tb[TCA_SKBHASH_MAX + 1];
+	int err;
+
+	if (!opt)
+		return handle ? -EINVAL : 0; /* Succeed if it is old method. */
+
+	err = nla_parse_nested_deprecated(tb, TCA_SKBHASH_MAX, opt,
+					  skbhash_policy, NULL);
+	if (err < 0)
+		return err;
+
+	if (f) {
+		struct skbhash_filter *pfp, *fnew;
+		struct skbhash_filter __rcu **fp;
+
+		if (f->id != handle && handle)
+			return -EINVAL;
+
+		fnew = kzalloc(sizeof(struct skbhash_filter), GFP_KERNEL);
+		if (!fnew)
+			return -ENOBUFS;
+
+		fnew->id = f->id;
+		fnew->res = f->res;
+		fnew->tp = f->tp;
+
+		err = tcf_exts_init(&fnew->exts, net, TCA_SKBHASH_ACT,
+				    TCA_SKBHASH_POLICE);
+		if (err < 0) {
+			kfree(fnew);
+			return err;
+		}
+
+		err = skbhash_set_parms(net, tp, fnew, tb, tca, base, ovr,
+					extack);
+		if (err < 0) {
+			tcf_exts_destroy(&fnew->exts);
+			kfree(fnew);
+			return err;
+		}
+
+		fp = &head->ht[skbhash_hash(fnew->id)];
+		for (pfp = rtnl_dereference(*fp); pfp;
+		     fp = &pfp->next, pfp = rtnl_dereference(*fp))
+			if (pfp == f)
+				break;
+
+		RCU_INIT_POINTER(fnew->next, rtnl_dereference(pfp->next));
+		rcu_assign_pointer(*fp, fnew);
+		tcf_unbind_filter(tp, &f->res);
+		tcf_exts_get_net(&f->exts);
+		tcf_queue_work(&f->rwork, skbhash_delete_filter_work);
+
+		*arg = fnew;
+		return err;
+	}
+
+	if (!handle)
+		return -EINVAL;
+
+	if (!head) {
+		u32 mask = 0xFFFFFFFF;
+
+		if (tb[TCA_SKBHASH_MASK])
+			mask = nla_get_u32(tb[TCA_SKBHASH_MASK]);
+
+		head = kzalloc(sizeof(*head), GFP_KERNEL);
+		if (!head)
+			return -ENOBUFS;
+		head->mask = mask;
+
+		rcu_assign_pointer(tp->root, head);
+	}
+
+	f = kzalloc(sizeof(struct skbhash_filter), GFP_KERNEL);
+	if (f == NULL)
+		return -ENOBUFS;
+
+	err = tcf_exts_init(&f->exts, net, TCA_SKBHASH_ACT, TCA_SKBHASH_POLICE);
+	if (err < 0)
+		goto errout;
+	f->id = handle;
+	f->tp = tp;
+
+	err = skbhash_set_parms(net, tp, f, tb, tca, base, ovr, extack);
+	if (err < 0)
+		goto errout;
+
+	RCU_INIT_POINTER(f->next, head->ht[skbhash_hash(handle)]);
+	rcu_assign_pointer(head->ht[skbhash_hash(handle)], f);
+
+	*arg = f;
+	return 0;
+
+errout:
+	tcf_exts_destroy(&f->exts);
+	kfree(f);
+	return err;
+}
+
+static void skbhash_walk(struct tcf_proto *tp, struct tcf_walker *arg,
+			 bool rtnl_held)
+{
+	struct skbhash_head *head = rtnl_dereference(tp->root);
+	int h;
+
+	if (head == NULL)
+		arg->stop = 1;
+
+	if (arg->stop)
+		return;
+
+	for (h = 0; h < SKBHASH_HTSIZE; h++) {
+		struct skbhash_filter *f;
+
+		for (f = rtnl_dereference(head->ht[h]); f;
+		     f = rtnl_dereference(f->next)) {
+			if (arg->count < arg->skip) {
+				arg->count++;
+				continue;
+			}
+			if (arg->fn(tp, f, arg) < 0) {
+				arg->stop = 1;
+				return;
+			}
+			arg->count++;
+		}
+	}
+}
+
+static int skbhash_dump(struct net *net, struct tcf_proto *tp, void *fh,
+			struct sk_buff *skb, struct tcmsg *t, bool rtnl_held)
+{
+	struct skbhash_head *head = rtnl_dereference(tp->root);
+	struct skbhash_filter *f = fh;
+	struct nlattr *nest;
+
+	if (f == NULL)
+		return skb->len;
+
+	t->tcm_handle = f->id;
+
+	if (!f->res.classid && !tcf_exts_has_actions(&f->exts))
+		return skb->len;
+
+	nest = nla_nest_start_noflag(skb, TCA_OPTIONS);
+	if (nest == NULL)
+		goto nla_put_failure;
+
+	if (f->res.classid &&
+	    nla_put_u32(skb, TCA_SKBHASH_CLASSID, f->res.classid))
+		goto nla_put_failure;
+
+	if (head->mask != 0xFFFFFFFF &&
+	    nla_put_u32(skb, TCA_SKBHASH_MASK, head->mask))
+		goto nla_put_failure;
+
+	if (tcf_exts_dump(skb, &f->exts) < 0)
+		goto nla_put_failure;
+
+	nla_nest_end(skb, nest);
+
+	if (tcf_exts_dump_stats(skb, &f->exts) < 0)
+		goto nla_put_failure;
+
+	return skb->len;
+
+nla_put_failure:
+	nla_nest_cancel(skb, nest);
+	return -1;
+}
+
+static void skbhash_bind_class(void *fh, u32 classid, unsigned long cl, void *q,
+			       unsigned long base)
+{
+	struct skbhash_filter *f = fh;
+
+	if (f && f->res.classid == classid) {
+		if (cl)
+			__tcf_bind_filter(q, &f->res, base);
+		else
+			__tcf_unbind_filter(q, &f->res);
+	}
+}
+
+static struct tcf_proto_ops cls_skbhash_ops __read_mostly = {
+	.kind		=	"skbhash",
+	.classify	=	skbhash_classify,
+	.init		=	skbhash_init,
+	.destroy	=	skbhash_destroy,
+	.get		=	skbhash_get,
+	.change		=	skbhash_change,
+	.delete		=	skbhash_delete,
+	.walk		=	skbhash_walk,
+	.dump		=	skbhash_dump,
+	.bind_class	=	skbhash_bind_class,
+	.owner		=	THIS_MODULE,
+};
+
+static int __init init_skbhash(void)
+{
+	return register_tcf_proto_ops(&cls_skbhash_ops);
+}
+
+static void __exit exit_skbhash(void)
+{
+	unregister_tcf_proto_ops(&cls_skbhash_ops);
+}
+
+module_init(init_skbhash)
+module_exit(exit_skbhash)
+MODULE_LICENSE("GPL");