Message ID | 1b3d58132b9504be6f6273b9a0f07e9d18bd2fad.1496320636.git.echaudro@redhat.com |
---|---|
State | Accepted |
Headers | show |
On Thu, 2017-06-01 at 14:38 +0200, Eelco Chaudron wrote: > When trying to configure a system port as type=internal it could start > an infinite port creation loop. When this happens you will see the > following log messages: > > 2017-06-01T09:00:17.900Z|02813|dpif|WARN|system@ovs-system: failed to add ve01_1 as port: File exists > 2017-06-01T09:00:17.900Z|02814|bridge|WARN|could not add network device ve01_1 to ofproto (File exists) > 2017-06-01T09:00:17.907Z|02815|bridge|INFO|bridge bzb: added interface ve01_1 on port 2 > 2017-06-01T09:00:17.909Z|02816|bridge|INFO|bridge bzb: deleted interface ve01_1 on port 2 > 2017-06-01T09:00:17.914Z|02817|dpif|WARN|system@ovs-system: failed to add ve01_1 as port: File exists > 2017-06-01T09:00:17.914Z|02818|bridge|WARN|could not add network device ve01_1 to ofproto (File exists) > 2017-06-01T09:00:17.921Z|02819|bridge|INFO|bridge bzb: added interface ve01_1 on port 3 > 2017-06-01T09:00:17.923Z|02820|bridge|INFO|bridge bzb: deleted interface ve01_1 on port 3 > 2017-06-01T09:00:17.929Z|02821|dpif|WARN|system@ovs-system: failed to add ve01_1 as port: File exists > 2017-06-01T09:00:17.929Z|02822|bridge|WARN|could not add network device ve01_1 to ofproto (File exists) > 2017-06-01T09:00:17.936Z|02823|bridge|INFO|bridge bzb: added interface ve01_1 on port 4 > ... > ... > > This is how to replicate it: > > ip link add name ve01_1 type veth peer name ve01_2 > ovs-vsctl add-br bzb > ovs-vsctl add-port bzb ve01_1 > ovs-vsctl set interface ve01_1 type=internal > ip link set dev ve01_1 up > ip link set dev ve01_2 up > > When changing the type to internal, the async configuration logic get > triggered and because the type has changed it will delete the > interface and the ofproto port. Next it will call iface_do_create() to > re-create the interface as internal. Because we just deleted the > interface netdev_open() will try to recreate it as internal. > > However this will fail with EEXIST as a system interface already > exists withe the name. > > Up till here all is fine... > > Now some ipv6 route change comes along for the ve01_1 interface, and > the route infrastructure will call netdev_open(). This will create the > interface of type system. > > Next the configuration verify process gets triggered due to > if_notifier_changed() being true. We now retry the above, but because > the interface exists (although in the system class) it will use it, > and create the interface successfully. > > This triggers another if notification, causing yet another config > update, and because the system != internal reconfiguration happens and > it start from the top... > > So the fix as presented below is causing netdev_open() only to return > the existing device for the class type requested (if the type is > specified). > > Signed-off-by: Eelco Chaudron <echaudro@redhat.com> > --- > lib/netdev.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/lib/netdev.c b/lib/netdev.c > index 26c4136..ca3192a 100644 > --- a/lib/netdev.c > +++ b/lib/netdev.c > @@ -412,7 +412,11 @@ netdev_open(const char *name, const char *type, struct netdev **netdevp) > error = EAFNOSUPPORT; > } > } else { > - error = 0; > + if (type && type[0] && strcmp(type, netdev->netdev_class->type)) { > + error = EEXIST; > + } else { > + error = 0; > + } > } > > if (!error) { I tested this patch and it does fix the problem but how about the patch that I've attached to this reply instead? It seems a bit cleaner. Thanks, - Greg
On Thu, Jun 01, 2017 at 11:13:48AM -0700, Greg Rose wrote: > On Thu, 2017-06-01 at 14:38 +0200, Eelco Chaudron wrote: > > When trying to configure a system port as type=internal it could start > > an infinite port creation loop. When this happens you will see the > > following log messages: > > > > 2017-06-01T09:00:17.900Z|02813|dpif|WARN|system@ovs-system: failed to add ve01_1 as port: File exists > > 2017-06-01T09:00:17.900Z|02814|bridge|WARN|could not add network device ve01_1 to ofproto (File exists) > > 2017-06-01T09:00:17.907Z|02815|bridge|INFO|bridge bzb: added interface ve01_1 on port 2 > > 2017-06-01T09:00:17.909Z|02816|bridge|INFO|bridge bzb: deleted interface ve01_1 on port 2 > > 2017-06-01T09:00:17.914Z|02817|dpif|WARN|system@ovs-system: failed to add ve01_1 as port: File exists > > 2017-06-01T09:00:17.914Z|02818|bridge|WARN|could not add network device ve01_1 to ofproto (File exists) > > 2017-06-01T09:00:17.921Z|02819|bridge|INFO|bridge bzb: added interface ve01_1 on port 3 > > 2017-06-01T09:00:17.923Z|02820|bridge|INFO|bridge bzb: deleted interface ve01_1 on port 3 > > 2017-06-01T09:00:17.929Z|02821|dpif|WARN|system@ovs-system: failed to add ve01_1 as port: File exists > > 2017-06-01T09:00:17.929Z|02822|bridge|WARN|could not add network device ve01_1 to ofproto (File exists) > > 2017-06-01T09:00:17.936Z|02823|bridge|INFO|bridge bzb: added interface ve01_1 on port 4 > > ... > > ... > > > > This is how to replicate it: > > > > ip link add name ve01_1 type veth peer name ve01_2 > > ovs-vsctl add-br bzb > > ovs-vsctl add-port bzb ve01_1 > > ovs-vsctl set interface ve01_1 type=internal > > ip link set dev ve01_1 up > > ip link set dev ve01_2 up > > > > When changing the type to internal, the async configuration logic get > > triggered and because the type has changed it will delete the > > interface and the ofproto port. Next it will call iface_do_create() to > > re-create the interface as internal. Because we just deleted the > > interface netdev_open() will try to recreate it as internal. > > > > However this will fail with EEXIST as a system interface already > > exists withe the name. > > > > Up till here all is fine... > > > > Now some ipv6 route change comes along for the ve01_1 interface, and > > the route infrastructure will call netdev_open(). This will create the > > interface of type system. > > > > Next the configuration verify process gets triggered due to > > if_notifier_changed() being true. We now retry the above, but because > > the interface exists (although in the system class) it will use it, > > and create the interface successfully. > > > > This triggers another if notification, causing yet another config > > update, and because the system != internal reconfiguration happens and > > it start from the top... > > > > So the fix as presented below is causing netdev_open() only to return > > the existing device for the class type requested (if the type is > > specified). > > > > Signed-off-by: Eelco Chaudron <echaudro@redhat.com> > > --- > > lib/netdev.c | 6 +++++- > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > diff --git a/lib/netdev.c b/lib/netdev.c > > index 26c4136..ca3192a 100644 > > --- a/lib/netdev.c > > +++ b/lib/netdev.c > > @@ -412,7 +412,11 @@ netdev_open(const char *name, const char *type, struct netdev **netdevp) > > error = EAFNOSUPPORT; > > } > > } else { > > - error = 0; > > + if (type && type[0] && strcmp(type, netdev->netdev_class->type)) { > > + error = EEXIST; > > + } else { > > + error = 0; > > + } > > } > > > > if (!error) { > > I tested this patch and it does fix the problem but how about the patch > that I've attached to this reply instead? It seems a bit cleaner. You might have omitted the attachment.
On Thu, Jun 1, 2017 at 11:16 AM, Ben Pfaff <blp@ovn.org> wrote: > On Thu, Jun 01, 2017 at 11:13:48AM -0700, Greg Rose wrote: >> On Thu, 2017-06-01 at 14:38 +0200, Eelco Chaudron wrote: >> > When trying to configure a system port as type=internal it could start >> > an infinite port creation loop. When this happens you will see the >> > following log messages: >> > >> > 2017-06-01T09:00:17.900Z|02813|dpif|WARN|system@ovs-system: failed to add ve01_1 as port: File exists >> > 2017-06-01T09:00:17.900Z|02814|bridge|WARN|could not add network device ve01_1 to ofproto (File exists) >> > 2017-06-01T09:00:17.907Z|02815|bridge|INFO|bridge bzb: added interface ve01_1 on port 2 >> > 2017-06-01T09:00:17.909Z|02816|bridge|INFO|bridge bzb: deleted interface ve01_1 on port 2 >> > 2017-06-01T09:00:17.914Z|02817|dpif|WARN|system@ovs-system: failed to add ve01_1 as port: File exists >> > 2017-06-01T09:00:17.914Z|02818|bridge|WARN|could not add network device ve01_1 to ofproto (File exists) >> > 2017-06-01T09:00:17.921Z|02819|bridge|INFO|bridge bzb: added interface ve01_1 on port 3 >> > 2017-06-01T09:00:17.923Z|02820|bridge|INFO|bridge bzb: deleted interface ve01_1 on port 3 >> > 2017-06-01T09:00:17.929Z|02821|dpif|WARN|system@ovs-system: failed to add ve01_1 as port: File exists >> > 2017-06-01T09:00:17.929Z|02822|bridge|WARN|could not add network device ve01_1 to ofproto (File exists) >> > 2017-06-01T09:00:17.936Z|02823|bridge|INFO|bridge bzb: added interface ve01_1 on port 4 >> > ... >> > ... >> > >> > This is how to replicate it: >> > >> > ip link add name ve01_1 type veth peer name ve01_2 >> > ovs-vsctl add-br bzb >> > ovs-vsctl add-port bzb ve01_1 >> > ovs-vsctl set interface ve01_1 type=internal >> > ip link set dev ve01_1 up >> > ip link set dev ve01_2 up >> > >> > When changing the type to internal, the async configuration logic get >> > triggered and because the type has changed it will delete the >> > interface and the ofproto port. Next it will call iface_do_create() to >> > re-create the interface as internal. Because we just deleted the >> > interface netdev_open() will try to recreate it as internal. >> > >> > However this will fail with EEXIST as a system interface already >> > exists withe the name. >> > >> > Up till here all is fine... >> > >> > Now some ipv6 route change comes along for the ve01_1 interface, and >> > the route infrastructure will call netdev_open(). This will create the >> > interface of type system. >> > >> > Next the configuration verify process gets triggered due to >> > if_notifier_changed() being true. We now retry the above, but because >> > the interface exists (although in the system class) it will use it, >> > and create the interface successfully. >> > >> > This triggers another if notification, causing yet another config >> > update, and because the system != internal reconfiguration happens and >> > it start from the top... >> > >> > So the fix as presented below is causing netdev_open() only to return >> > the existing device for the class type requested (if the type is >> > specified). >> > >> > Signed-off-by: Eelco Chaudron <echaudro@redhat.com> >> > --- >> > lib/netdev.c | 6 +++++- >> > 1 file changed, 5 insertions(+), 1 deletion(-) >> > >> > diff --git a/lib/netdev.c b/lib/netdev.c >> > index 26c4136..ca3192a 100644 >> > --- a/lib/netdev.c >> > +++ b/lib/netdev.c >> > @@ -412,7 +412,11 @@ netdev_open(const char *name, const char *type, struct netdev **netdevp) >> > error = EAFNOSUPPORT; >> > } >> > } else { >> > - error = 0; >> > + if (type && type[0] && strcmp(type, netdev->netdev_class->type)) { >> > + error = EEXIST; >> > + } else { >> > + error = 0; >> > + } >> > } >> > >> > if (!error) { >> >> I tested this patch and it does fix the problem but how about the patch >> that I've attached to this reply instead? It seems a bit cleaner. > > You might have omitted the attachment. Oops - I resent with the attachment but forgot to add back ovs-dev. I'll get this right eventually... Thanks, - Greg
On Thu, Jun 01, 2017 at 02:38:09PM +0200, Eelco Chaudron wrote: > When trying to configure a system port as type=internal it could start > an infinite port creation loop. When this happens you will see the > following log messages: ... > So the fix as presented below is causing netdev_open() only to return > the existing device for the class type requested (if the type is > specified). I suspect that this will cause other, subtle, issues, that we will have to track down later, related to devices that currently can be opened under multiple different types. However, I've applied it for now, and if/when those issues come up, we can figure them out. Thanks a lot! Ben
On 02/06/17 00:25, Ben Pfaff wrote: > On Thu, Jun 01, 2017 at 02:38:09PM +0200, Eelco Chaudron wrote: >> When trying to configure a system port as type=internal it could start >> an infinite port creation loop. When this happens you will see the >> following log messages: > ... >> So the fix as presented below is causing netdev_open() only to return >> the existing device for the class type requested (if the type is >> specified). > I suspect that this will cause other, subtle, issues, that we will have > to track down later, related to devices that currently can be opened > under multiple different types. However, I've applied it for now, and > if/when those issues come up, we can figure them out. > > Thanks a lot! > > Ben Can we get this also on the 2.7 branch? Thanks, Eelco
On Wed, Jun 14, 2017 at 06:21:50PM +0200, Eelco Chaudron wrote: > On 02/06/17 00:25, Ben Pfaff wrote: > >On Thu, Jun 01, 2017 at 02:38:09PM +0200, Eelco Chaudron wrote: > >>When trying to configure a system port as type=internal it could start > >>an infinite port creation loop. When this happens you will see the > >>following log messages: > >... > >>So the fix as presented below is causing netdev_open() only to return > >>the existing device for the class type requested (if the type is > >>specified). > >I suspect that this will cause other, subtle, issues, that we will have > >to track down later, related to devices that currently can be opened > >under multiple different types. However, I've applied it for now, and > >if/when those issues come up, we can figure them out. > > > >Thanks a lot! > > > >Ben > Can we get this also on the 2.7 branch? OK. If it causes problems, we'll need to fix them or revert it, of course.
On 14/06/17 18:27, Ben Pfaff wrote: > On Wed, Jun 14, 2017 at 06:21:50PM +0200, Eelco Chaudron wrote: >> ... >> Can we get this also on the 2.7 branch? > OK. If it causes problems, we'll need to fix them or revert it, of > course. If something does come up I'll try my best the help out/fix it. Thanks, Eelco
diff --git a/lib/netdev.c b/lib/netdev.c index 26c4136..ca3192a 100644 --- a/lib/netdev.c +++ b/lib/netdev.c @@ -412,7 +412,11 @@ netdev_open(const char *name, const char *type, struct netdev **netdevp) error = EAFNOSUPPORT; } } else { - error = 0; + if (type && type[0] && strcmp(type, netdev->netdev_class->type)) { + error = EEXIST; + } else { + error = 0; + } } if (!error) {
When trying to configure a system port as type=internal it could start an infinite port creation loop. When this happens you will see the following log messages: 2017-06-01T09:00:17.900Z|02813|dpif|WARN|system@ovs-system: failed to add ve01_1 as port: File exists 2017-06-01T09:00:17.900Z|02814|bridge|WARN|could not add network device ve01_1 to ofproto (File exists) 2017-06-01T09:00:17.907Z|02815|bridge|INFO|bridge bzb: added interface ve01_1 on port 2 2017-06-01T09:00:17.909Z|02816|bridge|INFO|bridge bzb: deleted interface ve01_1 on port 2 2017-06-01T09:00:17.914Z|02817|dpif|WARN|system@ovs-system: failed to add ve01_1 as port: File exists 2017-06-01T09:00:17.914Z|02818|bridge|WARN|could not add network device ve01_1 to ofproto (File exists) 2017-06-01T09:00:17.921Z|02819|bridge|INFO|bridge bzb: added interface ve01_1 on port 3 2017-06-01T09:00:17.923Z|02820|bridge|INFO|bridge bzb: deleted interface ve01_1 on port 3 2017-06-01T09:00:17.929Z|02821|dpif|WARN|system@ovs-system: failed to add ve01_1 as port: File exists 2017-06-01T09:00:17.929Z|02822|bridge|WARN|could not add network device ve01_1 to ofproto (File exists) 2017-06-01T09:00:17.936Z|02823|bridge|INFO|bridge bzb: added interface ve01_1 on port 4 ... ... This is how to replicate it: ip link add name ve01_1 type veth peer name ve01_2 ovs-vsctl add-br bzb ovs-vsctl add-port bzb ve01_1 ovs-vsctl set interface ve01_1 type=internal ip link set dev ve01_1 up ip link set dev ve01_2 up When changing the type to internal, the async configuration logic get triggered and because the type has changed it will delete the interface and the ofproto port. Next it will call iface_do_create() to re-create the interface as internal. Because we just deleted the interface netdev_open() will try to recreate it as internal. However this will fail with EEXIST as a system interface already exists withe the name. Up till here all is fine... Now some ipv6 route change comes along for the ve01_1 interface, and the route infrastructure will call netdev_open(). This will create the interface of type system. Next the configuration verify process gets triggered due to if_notifier_changed() being true. We now retry the above, but because the interface exists (although in the system class) it will use it, and create the interface successfully. This triggers another if notification, causing yet another config update, and because the system != internal reconfiguration happens and it start from the top... So the fix as presented below is causing netdev_open() only to return the existing device for the class type requested (if the type is specified). Signed-off-by: Eelco Chaudron <echaudro@redhat.com> --- lib/netdev.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)