diff mbox

[ovs-dev] netdev: Fix netdev_open() to adhere to class type if given

Message ID 1b3d58132b9504be6f6273b9a0f07e9d18bd2fad.1496320636.git.echaudro@redhat.com
State Accepted
Headers show

Commit Message

Eelco Chaudron June 1, 2017, 12:38 p.m. UTC
When trying to configure a system port as type=internal it could start
an infinite port creation loop. When this happens you will see the
following log messages:

2017-06-01T09:00:17.900Z|02813|dpif|WARN|system@ovs-system: failed to add ve01_1 as port: File exists
2017-06-01T09:00:17.900Z|02814|bridge|WARN|could not add network device ve01_1 to ofproto (File exists)
2017-06-01T09:00:17.907Z|02815|bridge|INFO|bridge bzb: added interface ve01_1 on port 2
2017-06-01T09:00:17.909Z|02816|bridge|INFO|bridge bzb: deleted interface ve01_1 on port 2
2017-06-01T09:00:17.914Z|02817|dpif|WARN|system@ovs-system: failed to add ve01_1 as port: File exists
2017-06-01T09:00:17.914Z|02818|bridge|WARN|could not add network device ve01_1 to ofproto (File exists)
2017-06-01T09:00:17.921Z|02819|bridge|INFO|bridge bzb: added interface ve01_1 on port 3
2017-06-01T09:00:17.923Z|02820|bridge|INFO|bridge bzb: deleted interface ve01_1 on port 3
2017-06-01T09:00:17.929Z|02821|dpif|WARN|system@ovs-system: failed to add ve01_1 as port: File exists
2017-06-01T09:00:17.929Z|02822|bridge|WARN|could not add network device ve01_1 to ofproto (File exists)
2017-06-01T09:00:17.936Z|02823|bridge|INFO|bridge bzb: added interface ve01_1 on port 4
...
...

This is how to replicate it:

  ip link add name ve01_1 type veth peer name ve01_2
  ovs-vsctl add-br bzb
  ovs-vsctl add-port bzb ve01_1
  ovs-vsctl set interface ve01_1 type=internal
  ip link set dev ve01_1 up
  ip link set dev ve01_2 up

When changing the type to internal, the async configuration logic get
triggered and because the type has changed it will delete the
interface and the ofproto port. Next it will call iface_do_create() to
re-create the interface as internal. Because we just deleted the
interface netdev_open() will try to recreate it as internal.

However this will fail with EEXIST as a system interface already
exists withe the name.

Up till here all is fine...

Now some ipv6 route change comes along for the ve01_1 interface, and
the route infrastructure will call netdev_open(). This will create the
interface of type system.

Next the configuration verify process gets triggered due to
if_notifier_changed() being true. We now retry the above, but because
the interface exists (although in the system class) it will use it,
and create the interface successfully.

This triggers another if notification, causing yet another config
update, and because the system != internal reconfiguration happens and
it start from the top...

So the fix as presented below is causing netdev_open() only to return
the existing device for the class type requested (if the type is
specified).

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
---
 lib/netdev.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Gregory Rose June 1, 2017, 6:13 p.m. UTC | #1
On Thu, 2017-06-01 at 14:38 +0200, Eelco Chaudron wrote:
> When trying to configure a system port as type=internal it could start
> an infinite port creation loop. When this happens you will see the
> following log messages:
> 
> 2017-06-01T09:00:17.900Z|02813|dpif|WARN|system@ovs-system: failed to add ve01_1 as port: File exists
> 2017-06-01T09:00:17.900Z|02814|bridge|WARN|could not add network device ve01_1 to ofproto (File exists)
> 2017-06-01T09:00:17.907Z|02815|bridge|INFO|bridge bzb: added interface ve01_1 on port 2
> 2017-06-01T09:00:17.909Z|02816|bridge|INFO|bridge bzb: deleted interface ve01_1 on port 2
> 2017-06-01T09:00:17.914Z|02817|dpif|WARN|system@ovs-system: failed to add ve01_1 as port: File exists
> 2017-06-01T09:00:17.914Z|02818|bridge|WARN|could not add network device ve01_1 to ofproto (File exists)
> 2017-06-01T09:00:17.921Z|02819|bridge|INFO|bridge bzb: added interface ve01_1 on port 3
> 2017-06-01T09:00:17.923Z|02820|bridge|INFO|bridge bzb: deleted interface ve01_1 on port 3
> 2017-06-01T09:00:17.929Z|02821|dpif|WARN|system@ovs-system: failed to add ve01_1 as port: File exists
> 2017-06-01T09:00:17.929Z|02822|bridge|WARN|could not add network device ve01_1 to ofproto (File exists)
> 2017-06-01T09:00:17.936Z|02823|bridge|INFO|bridge bzb: added interface ve01_1 on port 4
> ...
> ...
> 
> This is how to replicate it:
> 
>   ip link add name ve01_1 type veth peer name ve01_2
>   ovs-vsctl add-br bzb
>   ovs-vsctl add-port bzb ve01_1
>   ovs-vsctl set interface ve01_1 type=internal
>   ip link set dev ve01_1 up
>   ip link set dev ve01_2 up
> 
> When changing the type to internal, the async configuration logic get
> triggered and because the type has changed it will delete the
> interface and the ofproto port. Next it will call iface_do_create() to
> re-create the interface as internal. Because we just deleted the
> interface netdev_open() will try to recreate it as internal.
> 
> However this will fail with EEXIST as a system interface already
> exists withe the name.
> 
> Up till here all is fine...
> 
> Now some ipv6 route change comes along for the ve01_1 interface, and
> the route infrastructure will call netdev_open(). This will create the
> interface of type system.
> 
> Next the configuration verify process gets triggered due to
> if_notifier_changed() being true. We now retry the above, but because
> the interface exists (although in the system class) it will use it,
> and create the interface successfully.
> 
> This triggers another if notification, causing yet another config
> update, and because the system != internal reconfiguration happens and
> it start from the top...
> 
> So the fix as presented below is causing netdev_open() only to return
> the existing device for the class type requested (if the type is
> specified).
> 
> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
> ---
>  lib/netdev.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/netdev.c b/lib/netdev.c
> index 26c4136..ca3192a 100644
> --- a/lib/netdev.c
> +++ b/lib/netdev.c
> @@ -412,7 +412,11 @@ netdev_open(const char *name, const char *type, struct netdev **netdevp)
>              error = EAFNOSUPPORT;
>          }
>      } else {
> -        error = 0;
> +        if (type && type[0] && strcmp(type, netdev->netdev_class->type)) {
> +            error = EEXIST;
> +        } else {
> +            error = 0;
> +        }
>      }
>  
>      if (!error) {

I tested this patch and it does fix the problem but how about the patch
that I've attached to this reply instead?  It seems a bit cleaner.

Thanks,

- Greg
Ben Pfaff June 1, 2017, 6:16 p.m. UTC | #2
On Thu, Jun 01, 2017 at 11:13:48AM -0700, Greg Rose wrote:
> On Thu, 2017-06-01 at 14:38 +0200, Eelco Chaudron wrote:
> > When trying to configure a system port as type=internal it could start
> > an infinite port creation loop. When this happens you will see the
> > following log messages:
> > 
> > 2017-06-01T09:00:17.900Z|02813|dpif|WARN|system@ovs-system: failed to add ve01_1 as port: File exists
> > 2017-06-01T09:00:17.900Z|02814|bridge|WARN|could not add network device ve01_1 to ofproto (File exists)
> > 2017-06-01T09:00:17.907Z|02815|bridge|INFO|bridge bzb: added interface ve01_1 on port 2
> > 2017-06-01T09:00:17.909Z|02816|bridge|INFO|bridge bzb: deleted interface ve01_1 on port 2
> > 2017-06-01T09:00:17.914Z|02817|dpif|WARN|system@ovs-system: failed to add ve01_1 as port: File exists
> > 2017-06-01T09:00:17.914Z|02818|bridge|WARN|could not add network device ve01_1 to ofproto (File exists)
> > 2017-06-01T09:00:17.921Z|02819|bridge|INFO|bridge bzb: added interface ve01_1 on port 3
> > 2017-06-01T09:00:17.923Z|02820|bridge|INFO|bridge bzb: deleted interface ve01_1 on port 3
> > 2017-06-01T09:00:17.929Z|02821|dpif|WARN|system@ovs-system: failed to add ve01_1 as port: File exists
> > 2017-06-01T09:00:17.929Z|02822|bridge|WARN|could not add network device ve01_1 to ofproto (File exists)
> > 2017-06-01T09:00:17.936Z|02823|bridge|INFO|bridge bzb: added interface ve01_1 on port 4
> > ...
> > ...
> > 
> > This is how to replicate it:
> > 
> >   ip link add name ve01_1 type veth peer name ve01_2
> >   ovs-vsctl add-br bzb
> >   ovs-vsctl add-port bzb ve01_1
> >   ovs-vsctl set interface ve01_1 type=internal
> >   ip link set dev ve01_1 up
> >   ip link set dev ve01_2 up
> > 
> > When changing the type to internal, the async configuration logic get
> > triggered and because the type has changed it will delete the
> > interface and the ofproto port. Next it will call iface_do_create() to
> > re-create the interface as internal. Because we just deleted the
> > interface netdev_open() will try to recreate it as internal.
> > 
> > However this will fail with EEXIST as a system interface already
> > exists withe the name.
> > 
> > Up till here all is fine...
> > 
> > Now some ipv6 route change comes along for the ve01_1 interface, and
> > the route infrastructure will call netdev_open(). This will create the
> > interface of type system.
> > 
> > Next the configuration verify process gets triggered due to
> > if_notifier_changed() being true. We now retry the above, but because
> > the interface exists (although in the system class) it will use it,
> > and create the interface successfully.
> > 
> > This triggers another if notification, causing yet another config
> > update, and because the system != internal reconfiguration happens and
> > it start from the top...
> > 
> > So the fix as presented below is causing netdev_open() only to return
> > the existing device for the class type requested (if the type is
> > specified).
> > 
> > Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
> > ---
> >  lib/netdev.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/lib/netdev.c b/lib/netdev.c
> > index 26c4136..ca3192a 100644
> > --- a/lib/netdev.c
> > +++ b/lib/netdev.c
> > @@ -412,7 +412,11 @@ netdev_open(const char *name, const char *type, struct netdev **netdevp)
> >              error = EAFNOSUPPORT;
> >          }
> >      } else {
> > -        error = 0;
> > +        if (type && type[0] && strcmp(type, netdev->netdev_class->type)) {
> > +            error = EEXIST;
> > +        } else {
> > +            error = 0;
> > +        }
> >      }
> >  
> >      if (!error) {
> 
> I tested this patch and it does fix the problem but how about the patch
> that I've attached to this reply instead?  It seems a bit cleaner.

You might have omitted the attachment.
Gregory Rose June 1, 2017, 6:31 p.m. UTC | #3
On Thu, Jun 1, 2017 at 11:16 AM, Ben Pfaff <blp@ovn.org> wrote:
> On Thu, Jun 01, 2017 at 11:13:48AM -0700, Greg Rose wrote:
>> On Thu, 2017-06-01 at 14:38 +0200, Eelco Chaudron wrote:
>> > When trying to configure a system port as type=internal it could start
>> > an infinite port creation loop. When this happens you will see the
>> > following log messages:
>> >
>> > 2017-06-01T09:00:17.900Z|02813|dpif|WARN|system@ovs-system: failed to add ve01_1 as port: File exists
>> > 2017-06-01T09:00:17.900Z|02814|bridge|WARN|could not add network device ve01_1 to ofproto (File exists)
>> > 2017-06-01T09:00:17.907Z|02815|bridge|INFO|bridge bzb: added interface ve01_1 on port 2
>> > 2017-06-01T09:00:17.909Z|02816|bridge|INFO|bridge bzb: deleted interface ve01_1 on port 2
>> > 2017-06-01T09:00:17.914Z|02817|dpif|WARN|system@ovs-system: failed to add ve01_1 as port: File exists
>> > 2017-06-01T09:00:17.914Z|02818|bridge|WARN|could not add network device ve01_1 to ofproto (File exists)
>> > 2017-06-01T09:00:17.921Z|02819|bridge|INFO|bridge bzb: added interface ve01_1 on port 3
>> > 2017-06-01T09:00:17.923Z|02820|bridge|INFO|bridge bzb: deleted interface ve01_1 on port 3
>> > 2017-06-01T09:00:17.929Z|02821|dpif|WARN|system@ovs-system: failed to add ve01_1 as port: File exists
>> > 2017-06-01T09:00:17.929Z|02822|bridge|WARN|could not add network device ve01_1 to ofproto (File exists)
>> > 2017-06-01T09:00:17.936Z|02823|bridge|INFO|bridge bzb: added interface ve01_1 on port 4
>> > ...
>> > ...
>> >
>> > This is how to replicate it:
>> >
>> >   ip link add name ve01_1 type veth peer name ve01_2
>> >   ovs-vsctl add-br bzb
>> >   ovs-vsctl add-port bzb ve01_1
>> >   ovs-vsctl set interface ve01_1 type=internal
>> >   ip link set dev ve01_1 up
>> >   ip link set dev ve01_2 up
>> >
>> > When changing the type to internal, the async configuration logic get
>> > triggered and because the type has changed it will delete the
>> > interface and the ofproto port. Next it will call iface_do_create() to
>> > re-create the interface as internal. Because we just deleted the
>> > interface netdev_open() will try to recreate it as internal.
>> >
>> > However this will fail with EEXIST as a system interface already
>> > exists withe the name.
>> >
>> > Up till here all is fine...
>> >
>> > Now some ipv6 route change comes along for the ve01_1 interface, and
>> > the route infrastructure will call netdev_open(). This will create the
>> > interface of type system.
>> >
>> > Next the configuration verify process gets triggered due to
>> > if_notifier_changed() being true. We now retry the above, but because
>> > the interface exists (although in the system class) it will use it,
>> > and create the interface successfully.
>> >
>> > This triggers another if notification, causing yet another config
>> > update, and because the system != internal reconfiguration happens and
>> > it start from the top...
>> >
>> > So the fix as presented below is causing netdev_open() only to return
>> > the existing device for the class type requested (if the type is
>> > specified).
>> >
>> > Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
>> > ---
>> >  lib/netdev.c | 6 +++++-
>> >  1 file changed, 5 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/lib/netdev.c b/lib/netdev.c
>> > index 26c4136..ca3192a 100644
>> > --- a/lib/netdev.c
>> > +++ b/lib/netdev.c
>> > @@ -412,7 +412,11 @@ netdev_open(const char *name, const char *type, struct netdev **netdevp)
>> >              error = EAFNOSUPPORT;
>> >          }
>> >      } else {
>> > -        error = 0;
>> > +        if (type && type[0] && strcmp(type, netdev->netdev_class->type)) {
>> > +            error = EEXIST;
>> > +        } else {
>> > +            error = 0;
>> > +        }
>> >      }
>> >
>> >      if (!error) {
>>
>> I tested this patch and it does fix the problem but how about the patch
>> that I've attached to this reply instead?  It seems a bit cleaner.
>
> You might have omitted the attachment.

Oops - I resent with the attachment but forgot to add back ovs-dev.
I'll get this right eventually...

Thanks,

- Greg
Ben Pfaff June 1, 2017, 10:25 p.m. UTC | #4
On Thu, Jun 01, 2017 at 02:38:09PM +0200, Eelco Chaudron wrote:
> When trying to configure a system port as type=internal it could start
> an infinite port creation loop. When this happens you will see the
> following log messages:
...
> So the fix as presented below is causing netdev_open() only to return
> the existing device for the class type requested (if the type is
> specified).

I suspect that this will cause other, subtle, issues, that we will have
to track down later, related to devices that currently can be opened
under multiple different types.  However, I've applied it for now, and
if/when those issues come up, we can figure them out.

Thanks a lot!

Ben
Eelco Chaudron June 14, 2017, 4:21 p.m. UTC | #5
On 02/06/17 00:25, Ben Pfaff wrote:
> On Thu, Jun 01, 2017 at 02:38:09PM +0200, Eelco Chaudron wrote:
>> When trying to configure a system port as type=internal it could start
>> an infinite port creation loop. When this happens you will see the
>> following log messages:
> ...
>> So the fix as presented below is causing netdev_open() only to return
>> the existing device for the class type requested (if the type is
>> specified).
> I suspect that this will cause other, subtle, issues, that we will have
> to track down later, related to devices that currently can be opened
> under multiple different types.  However, I've applied it for now, and
> if/when those issues come up, we can figure them out.
>
> Thanks a lot!
>
> Ben
Can we get this also on the 2.7 branch?

Thanks,

Eelco
Ben Pfaff June 14, 2017, 4:27 p.m. UTC | #6
On Wed, Jun 14, 2017 at 06:21:50PM +0200, Eelco Chaudron wrote:
> On 02/06/17 00:25, Ben Pfaff wrote:
> >On Thu, Jun 01, 2017 at 02:38:09PM +0200, Eelco Chaudron wrote:
> >>When trying to configure a system port as type=internal it could start
> >>an infinite port creation loop. When this happens you will see the
> >>following log messages:
> >...
> >>So the fix as presented below is causing netdev_open() only to return
> >>the existing device for the class type requested (if the type is
> >>specified).
> >I suspect that this will cause other, subtle, issues, that we will have
> >to track down later, related to devices that currently can be opened
> >under multiple different types.  However, I've applied it for now, and
> >if/when those issues come up, we can figure them out.
> >
> >Thanks a lot!
> >
> >Ben
> Can we get this also on the 2.7 branch?

OK.  If it causes problems, we'll need to fix them or revert it, of
course.
Eelco Chaudron June 15, 2017, 12:34 p.m. UTC | #7
On 14/06/17 18:27, Ben Pfaff wrote:
> On Wed, Jun 14, 2017 at 06:21:50PM +0200, Eelco Chaudron wrote:
>> ...
>> Can we get this also on the 2.7 branch?
> OK.  If it causes problems, we'll need to fix them or revert it, of
> course.
If something does come up I'll try my best the help out/fix it.

Thanks,

Eelco
diff mbox

Patch

diff --git a/lib/netdev.c b/lib/netdev.c
index 26c4136..ca3192a 100644
--- a/lib/netdev.c
+++ b/lib/netdev.c
@@ -412,7 +412,11 @@  netdev_open(const char *name, const char *type, struct netdev **netdevp)
             error = EAFNOSUPPORT;
         }
     } else {
-        error = 0;
+        if (type && type[0] && strcmp(type, netdev->netdev_class->type)) {
+            error = EEXIST;
+        } else {
+            error = 0;
+        }
     }
 
     if (!error) {