diff mbox

[v1,7/14] netfilter: Use rhashtable_lookup instead of lookup_compare

Message ID E1YX624-0005ns-CB@gondolin.me.apana.org.au
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Herbert Xu March 15, 2015, 10:44 a.m. UTC
The use of rhashtable_lookup_compare in nft_hash is gratuitous
since the comparison function is just doing memcmp.  Furthermore,
there is cruft embedded in the comparson function that should
instead be moved into the caller of the lookup.

This patch moves that cruft over and replacces the call to
rhashtable_lookup_compare with rhashtable_lookup.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/netfilter/nft_hash.c |   40 ++++++++++------------------------------
 1 file changed, 10 insertions(+), 30 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Thomas Graf March 16, 2015, 8:28 a.m. UTC | #1
On 03/15/15 at 09:44pm, Herbert Xu wrote:
> The use of rhashtable_lookup_compare in nft_hash is gratuitous
> since the comparison function is just doing memcmp.  Furthermore,
> there is cruft embedded in the comparson function that should
> instead be moved into the caller of the lookup.
> 
> This patch moves that cruft over and replacces the call to
> rhashtable_lookup_compare with rhashtable_lookup.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

The reason for the indirection was to not bypass the abstraction
nft_data_cmp() provides.

No objection to the change but maybe leave a comment in
nft_data_cmp() that if one changes nft_data_cmp() one needs to
look at nft_hash and see if the direct use of rhashtable_lookup()
is still valid.

Copying Patrick as well.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu March 16, 2015, 9:14 a.m. UTC | #2
On Mon, Mar 16, 2015 at 08:28:42AM +0000, Thomas Graf wrote:
> On 03/15/15 at 09:44pm, Herbert Xu wrote:
> > The use of rhashtable_lookup_compare in nft_hash is gratuitous
> > since the comparison function is just doing memcmp.  Furthermore,
> > there is cruft embedded in the comparson function that should
> > instead be moved into the caller of the lookup.
> > 
> > This patch moves that cruft over and replacces the call to
> > rhashtable_lookup_compare with rhashtable_lookup.
> > 
> > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> The reason for the indirection was to not bypass the abstraction
> nft_data_cmp() provides.
> 
> No objection to the change but maybe leave a comment in
> nft_data_cmp() that if one changes nft_data_cmp() one needs to
> look at nft_hash and see if the direct use of rhashtable_lookup()
> is still valid.

Well we could preserve nft_data_cmp if necessary.  It'll just
mean an extra indirect call to do the compare when needed.

Patrick?

Cheers,
Thomas Graf March 16, 2015, 9:28 a.m. UTC | #3
On 03/16/15 at 08:14pm, Herbert Xu wrote:
> On Mon, Mar 16, 2015 at 08:28:42AM +0000, Thomas Graf wrote:
> > On 03/15/15 at 09:44pm, Herbert Xu wrote:
> > > The use of rhashtable_lookup_compare in nft_hash is gratuitous
> > > since the comparison function is just doing memcmp.  Furthermore,
> > > there is cruft embedded in the comparson function that should
> > > instead be moved into the caller of the lookup.
> > > 
> > > This patch moves that cruft over and replacces the call to
> > > rhashtable_lookup_compare with rhashtable_lookup.
> > > 
> > > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> > 
> > The reason for the indirection was to not bypass the abstraction
> > nft_data_cmp() provides.
> > 
> > No objection to the change but maybe leave a comment in
> > nft_data_cmp() that if one changes nft_data_cmp() one needs to
> > look at nft_hash and see if the direct use of rhashtable_lookup()
> > is still valid.
> 
> Well we could preserve nft_data_cmp if necessary.  It'll just
> mean an extra indirect call to do the compare when needed.

I like Dave's idea of implementing the lookup as a macro to avoid
the redirect for both rhashtable and API users requiring their own
compare function to inline the compare and hash function:

#define rhashtable_lookup_compare(ht, obj, compare_fn, obj_hash_fn, arg)

Leaving indirect hash calls to the slow path only. This was also
the initial design of the compare lookup variantion until it
"evolved" away from that due to lack of users of that API ;-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy March 16, 2015, 11:13 a.m. UTC | #4
On 16.03, Herbert Xu wrote:
> On Mon, Mar 16, 2015 at 08:28:42AM +0000, Thomas Graf wrote:
> > On 03/15/15 at 09:44pm, Herbert Xu wrote:
> > > The use of rhashtable_lookup_compare in nft_hash is gratuitous
> > > since the comparison function is just doing memcmp.  Furthermore,
> > > there is cruft embedded in the comparson function that should
> > > instead be moved into the caller of the lookup.
> > > 
> > > This patch moves that cruft over and replacces the call to
> > > rhashtable_lookup_compare with rhashtable_lookup.
> > > 
> > > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> > 
> > The reason for the indirection was to not bypass the abstraction
> > nft_data_cmp() provides.
> > 
> > No objection to the change but maybe leave a comment in
> > nft_data_cmp() that if one changes nft_data_cmp() one needs to
> > look at nft_hash and see if the direct use of rhashtable_lookup()
> > is still valid.
> 
> Well we could preserve nft_data_cmp if necessary.  It'll just
> mean an extra indirect call to do the compare when needed.
> 
> Patrick?

An upcoming patchset will add transaction support to sets, which needs
the callback. So please leave it for know since it will only cause
unnecessary churn.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu March 20, 2015, 8:55 a.m. UTC | #5
On Mon, Mar 16, 2015 at 11:13:46AM +0000, Patrick McHardy wrote:
>
> An upcoming patchset will add transaction support to sets, which needs
> the callback. So please leave it for know since it will only cause
> unnecessary churn.

Patrick, can you explain what you mean by the callback?

Thanks,
Patrick McHardy March 20, 2015, 9:22 a.m. UTC | #6
On 20.03, Herbert Xu wrote:
> On Mon, Mar 16, 2015 at 11:13:46AM +0000, Patrick McHardy wrote:
> >
> > An upcoming patchset will add transaction support to sets, which needs
> > the callback. So please leave it for know since it will only cause
> > unnecessary churn.
> 
> Patrick, can you explain what you mean by the callback?

I need the compare functions for transaction support to decide whether
an element is already activated or has been deactivated and, with a
further patch, for timeouts. Inactive and timed out elements are treated
as non-existant but are in case of transactions present in the hash
until the transaction is committed, in case of timeouts until GC.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu March 20, 2015, 9:27 a.m. UTC | #7
On Fri, Mar 20, 2015 at 09:22:17AM +0000, Patrick McHardy wrote:
>
> I need the compare functions for transaction support to decide whether
> an element is already activated or has been deactivated and, with a
> further patch, for timeouts. Inactive and timed out elements are treated
> as non-existant but are in case of transactions present in the hash
> until the transaction is committed, in case of timeouts until GC.

I still don't understand what is in the compare callback.  Can
you provide some code example?

More importantly, why is that you can't lookup and then just do
whatever you need to do in the caller of lookup? If you're planning
on having multiple objects in the hash table with the same key then
I'm afraid I'll have to say no because we want to use the chain
length to determine whether we're being attacked and having multiple
objects with the same key defeats that mechanism.

Of course many hash table users need to be able to keep multiple
objects under the same key.  My suggestion would be to allocate
your own linked list and have the linked list be the object that
is inserted into the hash table.

Cheers,
Herbert Xu March 20, 2015, 9:36 a.m. UTC | #8
On Mon, Mar 16, 2015 at 08:14:15PM +1100, Herbert Xu wrote:
> On Mon, Mar 16, 2015 at 08:28:42AM +0000, Thomas Graf wrote:
>
> > The reason for the indirection was to not bypass the abstraction
> > nft_data_cmp() provides.
> > 
> > No objection to the change but maybe leave a comment in
> > nft_data_cmp() that if one changes nft_data_cmp() one needs to
> > look at nft_hash and see if the direct use of rhashtable_lookup()
> > is still valid.
> 
> Well we could preserve nft_data_cmp if necessary.  It'll just
> mean an extra indirect call to do the compare when needed.

FWIW I've decided to ditch nft_data_cmp unless someone can show
me that it's really necessary.  The reason is that nft_hash_lookup
already uses memcmp instead of nft_data_cmp.

Cheers,
Patrick McHardy March 20, 2015, 9:59 a.m. UTC | #9
On 20.03, Herbert Xu wrote:
> On Fri, Mar 20, 2015 at 09:22:17AM +0000, Patrick McHardy wrote:
> >
> > I need the compare functions for transaction support to decide whether
> > an element is already activated or has been deactivated and, with a
> > further patch, for timeouts. Inactive and timed out elements are treated
> > as non-existant but are in case of transactions present in the hash
> > until the transaction is committed, in case of timeouts until GC.
> 
> I still don't understand what is in the compare callback.  Can
> you provide some code example?

Sure, for lookup / insert:

static bool nft_hash_cmp(void *ptr, void *arg)
{
        struct nft_hash_cmp_arg *x = arg;
        struct nft_hash_elem *he = ptr;

        if (nft_data_cmp(nft_set_ext_key(&he->ext), x->key, x->set->klen))
                return false;
        if (nft_hash_elem_expired(he))
                return false;
        if (!nft_hash_is_active(he, x->genmask))
                return false;
        return true;
}

static bool nft_hash_lookup(const struct nft_set *set,
                            const struct nft_data *key,
                            const struct nft_set_ext **ext)
{
        struct nft_hash *priv = nft_set_priv(set);
        const struct nft_hash_elem *he;
        struct nft_hash_cmp_arg arg = {
                .genmask = nft_genmask_cur(set->net) << NFT_HASH_GEN_SHIFT,
                .set     = set,
                .key     = key,
        };

        he = rhashtable_lookup_compare(&priv->ht, key, &nft_hash_cmp, &arg);
        if (he != NULL)                                                         
                *ext = &he->ext;

        return !!he;
}

static int nft_hash_insert(const struct nft_set *set,                           
                           const struct nft_set_elem *elem)
{
        struct nft_hash *priv = nft_set_priv(set);
        struct nft_hash_elem *he = elem->priv;
        struct nft_hash_cmp_arg arg = {
                .genmask = nft_genmask_next(set->net) << NFT_HASH_GEN_SHIFT,
                .set     = set,
                .key     = &elem->key,
        };

        nft_hash_set_inactive(set, he);
        if (!rhashtable_lookup_compare_insert(&priv->ht, &he->node,
                                              nft_hash_cmp, &arg))
                return -EEXIST;
        return 0;
}

> More importantly, why is that you can't lookup and then just do
> whatever you need to do in the caller of lookup? If you're planning
> on having multiple objects in the hash table with the same key then
> I'm afraid I'll have to say no because we want to use the chain
> length to determine whether we're being attacked and having multiple
> objects with the same key defeats that mechanism.

It's exactly that, there are multiple similar keys in the hash. Timed
out and inactive elements are treated as non-existant. Timeout could
be handled differently, but for transactions there is no other way
to implement this in order to provide atomic transaction semantics.

Consider f.i. the sequence:

- delete element X
- add element X

If we fail, we want the first X to be still active, otherwise the second
X becomes active and the first one inactive atomically. For this to work,
both need to be present in the hash already. Transactions can cover
an arbitrary amount of elements, so even if the case of a single similar
key could be handled otherwise, its not possible for multiple keys.

Regarding the chain length as trigger - I'm sorry, but this doesn't work
for us. I don't see why you would have to look at chain length. That
implies that you don't trust your hash function - why not fix that
instead?

> Of course many hash table users need to be able to keep multiple
> objects under the same key.  My suggestion would be to allocate
> your own linked list and have the linked list be the object that
> is inserted into the hash table.

That would require a huge amount of extra memory per element and having
millions of them is not unrealistic for our use case.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy March 20, 2015, 10:02 a.m. UTC | #10
On 20.03, Herbert Xu wrote:
> On Mon, Mar 16, 2015 at 08:14:15PM +1100, Herbert Xu wrote:
> > On Mon, Mar 16, 2015 at 08:28:42AM +0000, Thomas Graf wrote:
> >
> > > The reason for the indirection was to not bypass the abstraction
> > > nft_data_cmp() provides.
> > > 
> > > No objection to the change but maybe leave a comment in
> > > nft_data_cmp() that if one changes nft_data_cmp() one needs to
> > > look at nft_hash and see if the direct use of rhashtable_lookup()
> > > is still valid.
> > 
> > Well we could preserve nft_data_cmp if necessary.  It'll just
> > mean an extra indirect call to do the compare when needed.
> 
> FWIW I've decided to ditch nft_data_cmp unless someone can show
> me that it's really necessary.  The reason is that nft_hash_lookup
> already uses memcmp instead of nft_data_cmp.

I don't care about nft_data_cmp, but I do care about being able to
override the compare function.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu March 20, 2015, 10:16 a.m. UTC | #11
On Fri, Mar 20, 2015 at 09:59:09AM +0000, Patrick McHardy wrote:
>
> Regarding the chain length as trigger - I'm sorry, but this doesn't work
> for us. I don't see why you would have to look at chain length. That
> implies that you don't trust your hash function - why not fix that
> instead?

Any hash function can be attacked.  That's why we need to be able
to rehash it.  And the best way to decide when to rehash is based
on chain length (otherwise you'd waste time rehashing periodically
like we used to do).  With name spaces these days anyone could be
an adversary.

Besides, putting multiple objects with the same key into a hash
table defeats the whole point of hashing.

> > Of course many hash table users need to be able to keep multiple
> > objects under the same key.  My suggestion would be to allocate
> > your own linked list and have the linked list be the object that
> > is inserted into the hash table.
> 
> That would require a huge amount of extra memory per element and having
> millions of them is not unrealistic for our use case.

You should be able to do it with just 8 extra bytes per unique
hash table key.

Cheers,
Patrick McHardy March 20, 2015, 10:27 a.m. UTC | #12
On 20.03, Herbert Xu wrote:
> On Fri, Mar 20, 2015 at 09:59:09AM +0000, Patrick McHardy wrote:
> >
> > Regarding the chain length as trigger - I'm sorry, but this doesn't work
> > for us. I don't see why you would have to look at chain length. That
> > implies that you don't trust your hash function - why not fix that
> > instead?
> 
> Any hash function can be attacked.  That's why we need to be able
> to rehash it.  And the best way to decide when to rehash is based
> on chain length (otherwise you'd waste time rehashing periodically
> like we used to do).  With name spaces these days anyone could be
> an adversary.

We already had this discussion. I strongly do not believe this is
the right way to fix namespace problems. There are millions of ways
of creating CPU intensive workloads. You need to be able to put
bounds on the entire namespace. Fixing individual spots will not
solve that problem.

> Besides, putting multiple objects with the same key into a hash
> table defeats the whole point of hashing.

They exist only for (very) short periods of time. Its simply not a
problem in our case. We could even put hard bounds on them, meaning
an element will at most exist twice during that period.

> > > Of course many hash table users need to be able to keep multiple
> > > objects under the same key.  My suggestion would be to allocate
> > > your own linked list and have the linked list be the object that
> > > is inserted into the hash table.
> > 
> > That would require a huge amount of extra memory per element and having
> > millions of them is not unrealistic for our use case.
> 
> You should be able to do it with just 8 extra bytes per unique
> hash table key.

That's something 25% more memory usage for us in common cases. We try
very hard to keep the active memory size small. I don't want to waste
that amount of memory just for the very short periods while transactions
are unconfirmed.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu March 20, 2015, 9:47 p.m. UTC | #13
On Fri, Mar 20, 2015 at 10:27:01AM +0000, Patrick McHardy wrote:
> On 20.03, Herbert Xu wrote:
>
> > Any hash function can be attacked.  That's why we need to be able
> > to rehash it.  And the best way to decide when to rehash is based
> > on chain length (otherwise you'd waste time rehashing periodically
> > like we used to do).  With name spaces these days anyone could be
> > an adversary.
> 
> We already had this discussion. I strongly do not believe this is
> the right way to fix namespace problems. There are millions of ways
> of creating CPU intensive workloads. You need to be able to put
> bounds on the entire namespace. Fixing individual spots will not
> solve that problem.

A CPU intensive workload that can be rescheduled is completely
different from one that is running under spin lock with BH disabled.

Cheers,
Thomas Graf March 20, 2015, 9:56 p.m. UTC | #14
On 03/21/15 at 08:47am, Herbert Xu wrote:
> On Fri, Mar 20, 2015 at 10:27:01AM +0000, Patrick McHardy wrote:
> > On 20.03, Herbert Xu wrote:
> >
> > > Any hash function can be attacked.  That's why we need to be able
> > > to rehash it.  And the best way to decide when to rehash is based
> > > on chain length (otherwise you'd waste time rehashing periodically
> > > like we used to do).  With name spaces these days anyone could be
> > > an adversary.
> > 
> > We already had this discussion. I strongly do not believe this is
> > the right way to fix namespace problems. There are millions of ways
> > of creating CPU intensive workloads. You need to be able to put
> > bounds on the entire namespace. Fixing individual spots will not
> > solve that problem.
> 
> A CPU intensive workload that can be rescheduled is completely
> different from one that is running under spin lock with BH disabled.

Just make the chain length based growth function configurable
and nft_hash can disable it. nft_hash entries are not created by
unprivileged users so attacking the table is out of the question.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu March 20, 2015, 9:57 p.m. UTC | #15
On Fri, Mar 20, 2015 at 09:56:12PM +0000, Thomas Graf wrote:
> On 03/21/15 at 08:47am, Herbert Xu wrote:
> > On Fri, Mar 20, 2015 at 10:27:01AM +0000, Patrick McHardy wrote:
> > > On 20.03, Herbert Xu wrote:
> > >
> > > > Any hash function can be attacked.  That's why we need to be able
> > > > to rehash it.  And the best way to decide when to rehash is based
> > > > on chain length (otherwise you'd waste time rehashing periodically
> > > > like we used to do).  With name spaces these days anyone could be
> > > > an adversary.
> > > 
> > > We already had this discussion. I strongly do not believe this is
> > > the right way to fix namespace problems. There are millions of ways
> > > of creating CPU intensive workloads. You need to be able to put
> > > bounds on the entire namespace. Fixing individual spots will not
> > > solve that problem.
> > 
> > A CPU intensive workload that can be rescheduled is completely
> > different from one that is running under spin lock with BH disabled.
> 
> Just make the chain length based growth function configurable
> and nft_hash can disable it. nft_hash entries are not created by
> unprivileged users so attacking the table is out of the question.

Please read the quoted text, we're talking about potential attacks
on nft_hash.

Cheers,
Thomas Graf March 20, 2015, 10:07 p.m. UTC | #16
On 03/21/15 at 08:57am, Herbert Xu wrote:
> On Fri, Mar 20, 2015 at 09:56:12PM +0000, Thomas Graf wrote:
> > On 03/21/15 at 08:47am, Herbert Xu wrote:
> > > On Fri, Mar 20, 2015 at 10:27:01AM +0000, Patrick McHardy wrote:
> > > > We already had this discussion. I strongly do not believe this is
> > > > the right way to fix namespace problems. There are millions of ways
> > > > of creating CPU intensive workloads. You need to be able to put
> > > > bounds on the entire namespace. Fixing individual spots will not
> > > > solve that problem.
> > > 
> > > A CPU intensive workload that can be rescheduled is completely
> > > different from one that is running under spin lock with BH disabled.
> > 
> > Just make the chain length based growth function configurable
> > and nft_hash can disable it. nft_hash entries are not created by
> > unprivileged users so attacking the table is out of the question.
> 
> Please read the quoted text, we're talking about potential attacks
> on nft_hash.

Attack by whom? If I read the nft_set code correctly then the only
way to add to an nft_set is via nfnetlink which requires
CAP_NET_ADMIN. My understanding was that the chain length based
growth is to counter hash seed attacks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu March 20, 2015, 10:10 p.m. UTC | #17
On Fri, Mar 20, 2015 at 10:07:51PM +0000, Thomas Graf wrote:
>
> Attack by whom? If I read the nft_set code correctly then the only
> way to add to an nft_set is via nfnetlink which requires
> CAP_NET_ADMIN. My understanding was that the chain length based
> growth is to counter hash seed attacks.

You cannot trust root in a namespace.

Cheers,
Thomas Graf March 20, 2015, 10:23 p.m. UTC | #18
On 03/21/15 at 09:10am, Herbert Xu wrote:
> On Fri, Mar 20, 2015 at 10:07:51PM +0000, Thomas Graf wrote:
> >
> > Attack by whom? If I read the nft_set code correctly then the only
> > way to add to an nft_set is via nfnetlink which requires
> > CAP_NET_ADMIN. My understanding was that the chain length based
> > growth is to counter hash seed attacks.
> 
> You cannot trust root in a namespace.

He might as well just run for (;;) to burn cycles in his namespace.
If you give away virtualized local privileges you better be ready
to restrict the resources consumed. 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu March 20, 2015, 10:25 p.m. UTC | #19
On Fri, Mar 20, 2015 at 10:23:11PM +0000, Thomas Graf wrote:
> 
> He might as well just run for (;;) to burn cycles in his namespace.
> If you give away virtualized local privileges you better be ready
> to restrict the resources consumed. 

Please reread the first email that you replied to, let me quote:

	A CPU intensive workload that can be rescheduled is
	completely different from one that is running under spin
	lock with BH disabled.

Cheers,
Thomas Graf March 20, 2015, 10:36 p.m. UTC | #20
On 03/21/15 at 09:25am, Herbert Xu wrote:
> On Fri, Mar 20, 2015 at 10:23:11PM +0000, Thomas Graf wrote:
> > 
> > He might as well just run for (;;) to burn cycles in his namespace.
> > If you give away virtualized local privileges you better be ready
> > to restrict the resources consumed. 
> 
> Please reread the first email that you replied to, let me quote:
> 
> 	A CPU intensive workload that can be rescheduled is
> 	completely different from one that is running under spin
> 	lock with BH disabled.

We have countless ways to create linear list of things like classifiers,
qdiscs, multicast memberships, net_devices, fib_rules, etc. All taking
spin locks or write locks. Most of them with BH disabled. Some at
least use hashtables with most of them fixed size.

I don't want to downplay this but do you *really* want to run
untrusted workloads with CAP_NET_ADMIN privileges?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy March 21, 2015, 5:23 a.m. UTC | #21
On 21.03, Herbert Xu wrote:
> On Fri, Mar 20, 2015 at 10:27:01AM +0000, Patrick McHardy wrote:
> > On 20.03, Herbert Xu wrote:
> >
> > > Any hash function can be attacked.  That's why we need to be able
> > > to rehash it.  And the best way to decide when to rehash is based
> > > on chain length (otherwise you'd waste time rehashing periodically
> > > like we used to do).  With name spaces these days anyone could be
> > > an adversary.
> > 
> > We already had this discussion. I strongly do not believe this is
> > the right way to fix namespace problems. There are millions of ways
> > of creating CPU intensive workloads. You need to be able to put
> > bounds on the entire namespace. Fixing individual spots will not
> > solve that problem.
> 
> A CPU intensive workload that can be rescheduled is completely
> different from one that is running under spin lock with BH disabled.

Sure, that's what I actually meant of course.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy March 21, 2015, 5:25 a.m. UTC | #22
On 20.03, Thomas Graf wrote:
> On 03/21/15 at 09:25am, Herbert Xu wrote:
> > On Fri, Mar 20, 2015 at 10:23:11PM +0000, Thomas Graf wrote:
> > > 
> > > He might as well just run for (;;) to burn cycles in his namespace.
> > > If you give away virtualized local privileges you better be ready
> > > to restrict the resources consumed. 
> > 
> > Please reread the first email that you replied to, let me quote:
> > 
> > 	A CPU intensive workload that can be rescheduled is
> > 	completely different from one that is running under spin
> > 	lock with BH disabled.
> 
> We have countless ways to create linear list of things like classifiers,
> qdiscs, multicast memberships, net_devices, fib_rules, etc. All taking
> spin locks or write locks. Most of them with BH disabled. Some at
> least use hashtables with most of them fixed size.

That's my point. Its impossible to fix this by restricting data
structures, this just removes a valid use case.

> I don't want to downplay this but do you *really* want to run
> untrusted workloads with CAP_NET_ADMIN privileges?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/netfilter/nft_hash.c b/net/netfilter/nft_hash.c
index c82df0a..1fcae5e 100644
--- a/net/netfilter/nft_hash.c
+++ b/net/netfilter/nft_hash.c
@@ -89,41 +89,21 @@  static void nft_hash_remove(const struct nft_set *set,
 	kfree(elem->cookie);
 }
 
-struct nft_compare_arg {
-	const struct nft_set *set;
-	struct nft_set_elem *elem;
-};
-
-static bool nft_hash_compare(void *ptr, void *arg)
-{
-	struct nft_hash_elem *he = ptr;
-	struct nft_compare_arg *x = arg;
-
-	if (!nft_data_cmp(&he->key, &x->elem->key, x->set->klen)) {
-		x->elem->cookie = he;
-		x->elem->flags = 0;
-		if (x->set->flags & NFT_SET_MAP)
-			nft_data_copy(&x->elem->data, he->data);
-
-		return true;
-	}
-
-	return false;
-}
-
 static int nft_hash_get(const struct nft_set *set, struct nft_set_elem *elem)
 {
 	struct rhashtable *priv = nft_set_priv(set);
-	struct nft_compare_arg arg = {
-		.set = set,
-		.elem = elem,
-	};
+	struct nft_hash_elem *he;
 
-	if (rhashtable_lookup_compare(priv, &elem->key,
-				      &nft_hash_compare, &arg))
-		return 0;
+	he = rhashtable_lookup(priv, &elem->key);
+	if (!he)
+		return -ENOENT;
 
-	return -ENOENT;
+	elem->cookie = he;
+	elem->flags = 0;
+	if (set->flags & NFT_SET_MAP)
+		nft_data_copy(&elem->data, he->data);
+
+	return 0;
 }
 
 static void nft_hash_walk(const struct nft_ctx *ctx, const struct nft_set *set,