diff mbox series

[net] fib_rules: return 0 directly if an exactly same rule exists when NLM_F_EXCL not supplied

Message ID 20190507091118.24324-1-liuhangbin@gmail.com
State Accepted
Delegated to: David Miller
Headers show
Series [net] fib_rules: return 0 directly if an exactly same rule exists when NLM_F_EXCL not supplied | expand

Commit Message

Hangbin Liu May 7, 2019, 9:11 a.m. UTC
With commit 153380ec4b9 ("fib_rules: Added NLM_F_EXCL support to
fib_nl_newrule") we now able to check if a rule already exists. But this
only works with iproute2. For other tools like libnl, NetworkManager,
it still could add duplicate rules with only NLM_F_CREATE flag, like

[localhost ~ ]# ip rule
0:      from all lookup local
32766:  from all lookup main
32767:  from all lookup default
100000: from 192.168.7.5 lookup 5
100000: from 192.168.7.5 lookup 5

As it doesn't make sense to create two duplicate rules, let's just return
0 if the rule exists.

Fixes: 153380ec4b9 ("fib_rules: Added NLM_F_EXCL support to fib_nl_newrule")
Reported-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 net/core/fib_rules.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

David Miller May 8, 2019, 4:35 p.m. UTC | #1
From: Hangbin Liu <liuhangbin@gmail.com>
Date: Tue,  7 May 2019 17:11:18 +0800

> With commit 153380ec4b9 ("fib_rules: Added NLM_F_EXCL support to
> fib_nl_newrule") we now able to check if a rule already exists. But this
> only works with iproute2. For other tools like libnl, NetworkManager,
> it still could add duplicate rules with only NLM_F_CREATE flag, like
> 
> [localhost ~ ]# ip rule
> 0:      from all lookup local
> 32766:  from all lookup main
> 32767:  from all lookup default
> 100000: from 192.168.7.5 lookup 5
> 100000: from 192.168.7.5 lookup 5
> 
> As it doesn't make sense to create two duplicate rules, let's just return
> 0 if the rule exists.
> 
> Fixes: 153380ec4b9 ("fib_rules: Added NLM_F_EXCL support to fib_nl_newrule")
> Reported-by: Thomas Haller <thaller@redhat.com>
> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>

Applied and queued up for -stable, thanks.
Maciej Żenczykowski June 1, 2019, 1:43 a.m. UTC | #2
FYI, this userspace visible change in behaviour breaks Android.

We rely on being able to add a rule and either have a dup be created
(in which case we'll remove it later) or have it fail with EEXIST (in
which case we won't remove it later).

Returning 0 makes atomically changing a rule difficult.

Please revert.

On Wed, May 8, 2019 at 9:39 AM David Miller <davem@davemloft.net> wrote:
>
> From: Hangbin Liu <liuhangbin@gmail.com>
> Date: Tue,  7 May 2019 17:11:18 +0800
>
> > With commit 153380ec4b9 ("fib_rules: Added NLM_F_EXCL support to
> > fib_nl_newrule") we now able to check if a rule already exists. But this
> > only works with iproute2. For other tools like libnl, NetworkManager,
> > it still could add duplicate rules with only NLM_F_CREATE flag, like
> >
> > [localhost ~ ]# ip rule
> > 0:      from all lookup local
> > 32766:  from all lookup main
> > 32767:  from all lookup default
> > 100000: from 192.168.7.5 lookup 5
> > 100000: from 192.168.7.5 lookup 5
> >
> > As it doesn't make sense to create two duplicate rules, let's just return
> > 0 if the rule exists.
> >
> > Fixes: 153380ec4b9 ("fib_rules: Added NLM_F_EXCL support to fib_nl_newrule")
> > Reported-by: Thomas Haller <thaller@redhat.com>
> > Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
>
> Applied and queued up for -stable, thanks.
Greg Kroah-Hartman June 1, 2019, 9:32 a.m. UTC | #3
On Fri, May 31, 2019 at 06:43:42PM -0700, Maciej Żenczykowski wrote:
> FYI, this userspace visible change in behaviour breaks Android.
> 
> We rely on being able to add a rule and either have a dup be created
> (in which case we'll remove it later) or have it fail with EEXIST (in
> which case we won't remove it later).
> 
> Returning 0 makes atomically changing a rule difficult.
> 
> Please revert.

That's crazy, but makes sense in an odd way :)

And it explains why my "fix up of the patch" also breaks things, both
patches need to be reverted in the stable trees.  Do you need me to make
a patch to revert this in Linus's tree now, or can you do that?  Just
asking for an existing commit to be reverted is usually a bit harder as
that's not most maintainer's workflow (I know it's not mine.)

thanks,

gre gk-h
Hangbin Liu June 5, 2019, 1:43 a.m. UTC | #4
Hi David Ahern,

On Fri, May 31, 2019 at 06:43:42PM -0700, Maciej Żenczykowski wrote:
> FYI, this userspace visible change in behaviour breaks Android.
> 
> We rely on being able to add a rule and either have a dup be created
> (in which case we'll remove it later) or have it fail with EEXIST (in
> which case we won't remove it later).
> 
> Returning 0 makes atomically changing a rule difficult.
> 
> Please revert.

What do you think? Should I rever this commit?

Although I'm still not clear what's the difference between

a) adding a dup rule and remove it later
and
b) return 0 directly if the rule exactally the same.

Thanks
Hangbin

> 
> On Wed, May 8, 2019 at 9:39 AM David Miller <davem@davemloft.net> wrote:
> >
> > From: Hangbin Liu <liuhangbin@gmail.com>
> > Date: Tue,  7 May 2019 17:11:18 +0800
> >
> > > With commit 153380ec4b9 ("fib_rules: Added NLM_F_EXCL support to
> > > fib_nl_newrule") we now able to check if a rule already exists. But this
> > > only works with iproute2. For other tools like libnl, NetworkManager,
> > > it still could add duplicate rules with only NLM_F_CREATE flag, like
> > >
> > > [localhost ~ ]# ip rule
> > > 0:      from all lookup local
> > > 32766:  from all lookup main
> > > 32767:  from all lookup default
> > > 100000: from 192.168.7.5 lookup 5
> > > 100000: from 192.168.7.5 lookup 5
> > >
> > > As it doesn't make sense to create two duplicate rules, let's just return
> > > 0 if the rule exists.
> > >
> > > Fixes: 153380ec4b9 ("fib_rules: Added NLM_F_EXCL support to fib_nl_newrule")
> > > Reported-by: Thomas Haller <thaller@redhat.com>
> > > Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
> >
> > Applied and queued up for -stable, thanks.
Lorenzo Colitti June 5, 2019, 1:47 a.m. UTC | #5
On Wed, Jun 5, 2019 at 10:43 AM Hangbin Liu <liuhangbin@gmail.com> wrote:
> Although I'm still not clear what's the difference between
>
> a) adding a dup rule and remove it later
> and
> b) return 0 directly if the rule exactally the same.

The Android code updates ip rules by adding the new rule and then
deleting the old rule. Before this patch, the result of the operation
is that the old rule is deleted and the new rule exists. After this
patch, if the new rule is the same as the old rule, then the add does
nothing and the delete deletes the old rule. The result of the
operation is that the old rule is deleted and the new rule is no
longer there, and the rules are broken.
Hangbin Liu June 5, 2019, 2:15 a.m. UTC | #6
On Wed, Jun 05, 2019 at 10:47:58AM +0900, Lorenzo Colitti wrote:
> On Wed, Jun 5, 2019 at 10:43 AM Hangbin Liu <liuhangbin@gmail.com> wrote:
> > Although I'm still not clear what's the difference between
> >
> > a) adding a dup rule and remove it later
> > and
> > b) return 0 directly if the rule exactally the same.
> 
> The Android code updates ip rules by adding the new rule and then
> deleting the old rule. Before this patch, the result of the operation
> is that the old rule is deleted and the new rule exists. After this
> patch, if the new rule is the same as the old rule, then the add does
> nothing and the delete deletes the old rule. The result of the
> operation is that the old rule is deleted and the new rule is no
> longer there, and the rules are broken.

Hi Lorenzo,

How do you add the rules? with ip cmd it should has NLM_F_EXCL flag and
you will get -EEXIST error out.

Thanks
Hangbin
Lorenzo Colitti June 5, 2019, 2:25 a.m. UTC | #7
On Wed, Jun 5, 2019 at 11:15 AM Hangbin Liu <liuhangbin@gmail.com> wrote:
> How do you add the rules? with ip cmd it should has NLM_F_EXCL flag and
> you will get -EEXIST error out.

The fact that the code worked before this commit implies that it was
*not* using NLM_F_EXCL. :-)
The code is here if you want to take a look.

https://android.googlesource.com/platform/system/netd/+/master/server/RouteController.cpp#576
https://android.googlesource.com/platform/system/netd/+/master/server/NetlinkCommands.h#33
Hangbin Liu June 5, 2019, 3:29 a.m. UTC | #8
On Wed, Jun 05, 2019 at 11:25:11AM +0900, Lorenzo Colitti wrote:
> On Wed, Jun 5, 2019 at 11:15 AM Hangbin Liu <liuhangbin@gmail.com> wrote:
> > How do you add the rules? with ip cmd it should has NLM_F_EXCL flag and
> > you will get -EEXIST error out.
> 
> The fact that the code worked before this commit implies that it was
> *not* using NLM_F_EXCL. :-)

Yes, that's why you got the issue.

> We rely on being able to add a rule and either have a dup be created
> (in which case we'll remove it later) or have it fail with EEXIST (in
> which case we won't remove it later).

With Maciej said, how about add NLM_F_EXCL flag when you add a new rule.
If it returned EEXIST, which means there is an dup rule, you just do not
remove it later.

Would that fix your issue?

Thanks
Hangbin
Lorenzo Colitti June 5, 2019, 3:43 a.m. UTC | #9
On Wed, Jun 5, 2019 at 12:29 PM Hangbin Liu <liuhangbin@gmail.com> wrote:
> > We rely on being able to add a rule and either have a dup be created
> > (in which case we'll remove it later) or have it fail with EEXIST (in
> > which case we won't remove it later).
>
> With Maciej said, how about add NLM_F_EXCL flag when you add a new rule.
> If it returned EEXIST, which means there is an dup rule, you just do not
> remove it later.
>
> Would that fix your issue?

We can't do that without rewriting our code and making it more
complex. The way the code is structured is that an update is "add all
new rules; delete all old rules". To do what you suggest we would need
to either change that to "for rule in rules; add newrule; delete
oldrule" or we'd need to keep state on which rules already existed.

The previous behaviour provided semantics that are useful to
userspace, and this commit broke those semantics. Please revert.
David Ahern June 5, 2019, 3:57 a.m. UTC | #10
On 6/4/19 7:43 PM, Hangbin Liu wrote:
> Hi David Ahern,
> 
> On Fri, May 31, 2019 at 06:43:42PM -0700, Maciej Żenczykowski wrote:
>> FYI, this userspace visible change in behaviour breaks Android.
>>
>> We rely on being able to add a rule and either have a dup be created
>> (in which case we'll remove it later) or have it fail with EEXIST (in
>> which case we won't remove it later).
>>
>> Returning 0 makes atomically changing a rule difficult.
>>
>> Please revert.
> What do you think? Should I rever this commit?

I think it is crazy to add multiple identical rules given the linear
effect on performance. But, since it breaks Android, it has to be reverted.
Hangbin Liu June 5, 2019, 4:05 a.m. UTC | #11
On Wed, Jun 05, 2019 at 12:43:44PM +0900, Lorenzo Colitti wrote:
> On Wed, Jun 5, 2019 at 12:29 PM Hangbin Liu <liuhangbin@gmail.com> wrote:
> > > We rely on being able to add a rule and either have a dup be created
> > > (in which case we'll remove it later) or have it fail with EEXIST (in
> > > which case we won't remove it later).
> >
> > With Maciej said, how about add NLM_F_EXCL flag when you add a new rule.
> > If it returned EEXIST, which means there is an dup rule, you just do not
> > remove it later.
> >
> > Would that fix your issue?
> 
> We can't do that without rewriting our code and making it more
> complex. The way the code is structured is that an update is "add all
> new rules; delete all old rules". To do what you suggest we would need
> to either change that to "for rule in rules; add newrule; delete
> oldrule" or we'd need to keep state on which rules already existed.

Hmm...Generally speaking we need to check the cmd's return value, or why it
exists.

> 
> The previous behaviour provided semantics that are useful to
> userspace, and this commit broke those semantics. Please revert.

Keep two exactally same rules in kernel looks strange and doesn't make
sense to me. But let's see what does David Ahern think, he is more
experienced in this part :)

Thanks
Hangbin
Hangbin Liu June 5, 2019, 4:08 a.m. UTC | #12
On Tue, Jun 04, 2019 at 09:57:56PM -0600, David Ahern wrote:
> On 6/4/19 7:43 PM, Hangbin Liu wrote:
> > Hi David Ahern,
> > 
> > On Fri, May 31, 2019 at 06:43:42PM -0700, Maciej Żenczykowski wrote:
> >> FYI, this userspace visible change in behaviour breaks Android.
> >>
> >> We rely on being able to add a rule and either have a dup be created
> >> (in which case we'll remove it later) or have it fail with EEXIST (in
> >> which case we won't remove it later).
> >>
> >> Returning 0 makes atomically changing a rule difficult.
> >>
> >> Please revert.
> > What do you think? Should I rever this commit?
> 
> I think it is crazy to add multiple identical rules given the linear
> effect on performance. But, since it breaks Android, it has to be reverted.

OK.. Since you also agree to revert it.

Thanks
Hangbin
Lorenzo Colitti June 5, 2019, 4:58 a.m. UTC | #13
On Wed, Jun 5, 2019 at 12:58 PM David Ahern <dsa@cumulusnetworks.com> wrote:
> I think it is crazy to add multiple identical rules given the linear
> effect on performance.

Not sure if this is what you were implying or not, but our code
doesn't maintain multiple identical rules in steady state. It only
uses them for make-before-break when something changes.

> But, since it breaks Android, it has to be reverted.

Well... the immediate problem on Android is that we cannot live with
this going to LTS, since it is going to break devices in the field.

As for making this change in 5.3: we might be able to structure the
code differently in a future Android release, assuming the same
userspace code can work on kernels back to 4.4 (not sure it can, since
the semantics changed in 4.8). But even if we can fix this in Android,
this change is still breaking compatibility with existing other
userspace code. Are there concrete performance optimizations that
you'd like to make that can't be made unless you change the semantics
here? Are those optimizations worth breaking the backwards
compatibility guarantees for?
David Ahern June 5, 2019, 3:33 p.m. UTC | #14
On 6/4/19 10:58 PM, Lorenzo Colitti wrote:
> As for making this change in 5.3: we might be able to structure the
> code differently in a future Android release, assuming the same
> userspace code can work on kernels back to 4.4 (not sure it can, since
> the semantics changed in 4.8). But even if we can fix this in Android,
> this change is still breaking compatibility with existing other
> userspace code. Are there concrete performance optimizations that
> you'd like to make that can't be made unless you change the semantics
> here? Are those optimizations worth breaking the backwards
> compatibility guarantees for?

The list of fib rules is walked looking for a match. more rules = more
overhead. Given the flexibility of the rules, I have not come up with
any changes that have a real improvement in that overhead. VRF, which
uses policy routing, was changed to have a single l3mdev rule for all
VRFs for this reason.
Maciej Żenczykowski June 6, 2019, 11:01 p.m. UTC | #15
(side note: change was reverted in net stable)

On Wed, Jun 5, 2019 at 8:33 AM David Ahern <dsahern@gmail.com> wrote:
> On 6/4/19 10:58 PM, Lorenzo Colitti wrote:
> > As for making this change in 5.3: we might be able to structure the
> > code differently in a future Android release, assuming the same
> > userspace code can work on kernels back to 4.4 (not sure it can, since
> > the semantics changed in 4.8). But even if we can fix this in Android,
> > this change is still breaking compatibility with existing other
> > userspace code. Are there concrete performance optimizations that
> > you'd like to make that can't be made unless you change the semantics
> > here? Are those optimizations worth breaking the backwards
> > compatibility guarantees for?
>
> The list of fib rules is walked looking for a match. more rules = more
> overhead. Given the flexibility of the rules, I have not come up with
> any changes that have a real improvement in that overhead. VRF, which
> uses policy routing, was changed to have a single l3mdev rule for all
> VRFs for this reason.

Instead of keeping duplicates there could be a counter on the singleton rule.
And then adding/removing is just inc/dec on the counter (and only
actually delete when counter drops to 0).
Would require some extra effort to make it look the same when dumping
I guess (to expand it out).

---

I'm not sure this is worth optimizing for.  In Android these states
with dupes are temporary.
And really, if you complain about performance it is perfectly
reasonable to say "then don't create duplicate rules".

---

Note though that from a multithreaded perspective, you'd never want
the 'ignore it and return 0' behaviour.
It's fundamentally a bad api.  It's better to return an error and
userspace can decide that EEXIST is acceptable and treat it as 0.

Imagine two threads doing 'add ip rule foo; blah blah; remove ip rule foo'

You either want the create dupes, or the 'return EEXIST' behaviour (so
the second thread can fail, or wait until it succeeds or whatever).

This way if two threads both run the same operation, either both of
them succeed,
or one succeeds and the other gets notified of dup.

Otherwise you get a spurious failure in one of the threads and bad
behaviour in the other (since rule vanishes before it should).
diff mbox series

Patch

diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index ffbb827723a2..c49b752ea7eb 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -756,9 +756,9 @@  int fib_nl_newrule(struct sk_buff *skb, struct nlmsghdr *nlh,
 	if (err)
 		goto errout;
 
-	if ((nlh->nlmsg_flags & NLM_F_EXCL) &&
-	    rule_exists(ops, frh, tb, rule)) {
-		err = -EEXIST;
+	if (rule_exists(ops, frh, tb, rule)) {
+		if (nlh->nlmsg_flags & NLM_F_EXCL)
+			err = -EEXIST;
 		goto errout_free;
 	}