diff mbox

[ovs-dev] Revert "netdev: Fix netdev_open() to adhere to class type if given"

Message ID 1500014691-21497-1-git-send-email-zhouhan@gmail.com
State Deferred
Headers show

Commit Message

Han Zhou July 14, 2017, 6:44 a.m. UTC
This reverts commit 67ac844b55d4c5f6bbfa01773c82b3d6d8b62131.

The commit introduced a problem that "File exists" will be reported
when trying to open br0.

The operation that adds eth0 to br0 while moving IP address from
eth0 to bridge internal interface br0 reproduces this issue.

$ ip a del <ip> dev eth0; ip a add <ip> dev br0; ovs-vsctl add-port br0 eth0
$ ovs-dpctl show
...
port 1: br0 (internal: open failed (File exists))
...

At this point restarting OVS will result in connection lost for the
node.

Reverting the change fixes the problem. Since adding physical interface
to OVS bridge is quite normal operation, the problem is more severe
than the original problem fixed by commit 67ac844, so revert this
before a better fix is found for the original problem.

Signed-off-by: Han Zhou <zhouhan@gmail.com>
---
 lib/netdev.c | 2 --
 1 file changed, 2 deletions(-)

Comments

Numan Siddique July 14, 2017, 7:04 a.m. UTC | #1
On Fri, Jul 14, 2017 at 12:14 PM, Han Zhou <zhouhan@gmail.com> wrote:

> This reverts commit 67ac844b55d4c5f6bbfa01773c82b3d6d8b62131.
>
> The commit introduced a problem that "File exists" will be reported
> when trying to open br0.
>
> The operation that adds eth0 to br0 while moving IP address from
> eth0 to bridge internal interface br0 reproduces this issue.
>
> $ ip a del <ip> dev eth0; ip a add <ip> dev br0; ovs-vsctl add-port br0
> eth0
> $ ovs-dpctl show
> ...
> port 1: br0 (internal: open failed (File exists))
> ...
>
> At this point restarting OVS will result in connection lost for the
> node.
>
> Reverting the change fixes the problem. Since adding physical interface
> to OVS bridge is quite normal operation, the problem is more severe
> than the original problem fixed by commit 67ac844, so revert this
> before a better fix is found for the original problem.
>


Hi Han,

I think I am seeing a similar issue here [1]. Can you please check and see
if it's the same issue hit here which this patch fixes.

[1]  uses OVS 2.7.1 for tripleo and the upgrage job updates OVS 2.6.1 to
OVS 2.7.1. The restart of openvswitch service fails and there is a crash in
ovs-vswitchd. Please see the end of the log file (it's not too big).

[1] -
http://logs.openstack.org/91/473191/12/check/gate-tripleo-ci-centos-7-multinode-upgrades-nv/414af3c/logs/subnode-2/openvswitch/ovs-vswitchd.txt.gz
     https://review.openstack.org/#/c/473191/

2017-07-14T04:14:04.229Z|00038|in_band|ERR|br-ex: failed to initialize
in-band control: cannot open datapath local port br-ex (File exists)
2017-07-14T04:14:04.285Z|00002|daemon_unix|ERR|fork child died before
signaling startup (killed (Segmentation fault), core dumped)
2017-07-14T04:14:04.285Z|00003|daemon_unix|EMER|could not detach from
foreground session

This segfault is consistent. I did 2 rechecks and I see the same issue.

Thanks
Numan


Signed-off-by: Han Zhou <zhouhan@gmail.com>
> ---
>  lib/netdev.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/lib/netdev.c b/lib/netdev.c
> index 0d5fad5..e8843a8 100644
> --- a/lib/netdev.c
> +++ b/lib/netdev.c
> @@ -416,8 +416,6 @@ netdev_open(const char *name, const char *type, struct
> netdev **netdevp)
>                        name, type);
>              error = EAFNOSUPPORT;
>          }
> -    } else if (type && type[0] && strcmp(type,
> netdev->netdev_class->type)) {
> -        error = EEXIST;
>      } else {
>          error = 0;
>      }
> --
> 2.1.0
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Eelco Chaudron July 14, 2017, 8:35 a.m. UTC | #2
On 14/07/17 08:44, Han Zhou wrote:
> This reverts commit 67ac844b55d4c5f6bbfa01773c82b3d6d8b62131.
>
> The commit introduced a problem that "File exists" will be reported
> when trying to open br0.
>
> The operation that adds eth0 to br0 while moving IP address from
> eth0 to bridge internal interface br0 reproduces this issue.
>
> $ ip a del <ip> dev eth0; ip a add <ip> dev br0; ovs-vsctl add-port br0 eth0
> $ ovs-dpctl show
> ...
> port 1: br0 (internal: open failed (File exists))
> ...
>
> At this point restarting OVS will result in connection lost for the
> node.
>
> Reverting the change fixes the problem. Since adding physical interface
> to OVS bridge is quite normal operation, the problem is more severe
> than the original problem fixed by commit 67ac844, so revert this
> before a better fix is found for the original problem.
See also this thread:

https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335428.html

Will be sending out the patch later today.

//Eelco*

*
Han Zhou July 14, 2017, 4:59 p.m. UTC | #3
On Fri, Jul 14, 2017 at 1:35 AM, Eelco Chaudron <echaudro@redhat.com> wrote:
>
> On 14/07/17 08:44, Han Zhou wrote:
>>
>> This reverts commit 67ac844b55d4c5f6bbfa01773c82b3d6d8b62131.
>>
>> The commit introduced a problem that "File exists" will be reported
>> when trying to open br0.
>>
>> The operation that adds eth0 to br0 while moving IP address from
>> eth0 to bridge internal interface br0 reproduces this issue.
>>
>> $ ip a del <ip> dev eth0; ip a add <ip> dev br0; ovs-vsctl add-port br0
eth0
>> $ ovs-dpctl show
>> ...
>> port 1: br0 (internal: open failed (File exists))
>> ...
>>
>> At this point restarting OVS will result in connection lost for the
>> node.
>>
>> Reverting the change fixes the problem. Since adding physical interface
>> to OVS bridge is quite normal operation, the problem is more severe
>> than the original problem fixed by commit 67ac844, so revert this
>> before a better fix is found for the original problem.
>
> See also this thread:
>
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335428.html
>
> Will be sending out the patch later today.
>
> //Eelco*
>
Thanks Eelco! Too bad I didn't notice the thread yesterday, otherwise I
would not spend hours chasing the problem :(
Since the problem is critical (e.g. installing release v2.7.1 on any
existing hypervisors would break connection), do you think it is better to
revert the commit until we are confident with the new patch?
Eelco Chaudron July 17, 2017, 7:40 a.m. UTC | #4
On 14/07/17 18:59, Han Zhou wrote:
>
>
> On Fri, Jul 14, 2017 at 1:35 AM, Eelco Chaudron <echaudro@redhat.com 
> <mailto:echaudro@redhat.com>> wrote:
> >
> > On 14/07/17 08:44, Han Zhou wrote:
> >>
> >> This reverts commit 67ac844b55d4c5f6bbfa01773c82b3d6d8b62131.
> >>
> >> The commit introduced a problem that "File exists" will be reported
> >> when trying to open br0.
> >>
> >> The operation that adds eth0 to br0 while moving IP address from
> >> eth0 to bridge internal interface br0 reproduces this issue.
> >>
> >> $ ip a del <ip> dev eth0; ip a add <ip> dev br0; ovs-vsctl add-port 
> br0 eth0
> >> $ ovs-dpctl show
> >> ...
> >> port 1: br0 (internal: open failed (File exists))
> >> ...
> >>
> >> At this point restarting OVS will result in connection lost for the
> >> node.
> >>
> >> Reverting the change fixes the problem. Since adding physical interface
> >> to OVS bridge is quite normal operation, the problem is more severe
> >> than the original problem fixed by commit 67ac844, so revert this
> >> before a better fix is found for the original problem.
> >
> > See also this thread:
> >
> > https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335428.html
> >
> > Will be sending out the patch later today.
> >
> > //Eelco*
> >
> Thanks Eelco! Too bad I didn't notice the thread yesterday, otherwise 
> I would not spend hours chasing the problem :(
> Since the problem is critical (e.g. installing release v2.7.1 on any 
> existing hypervisors would break connection), do you think it is 
> better to revert the commit until we are confident with the new patch?
Assuming we can get the patch in before the next release, there should 
be no need to revert back. But if its common practice for OVS to do it 
right away I have no issue with it.
Numan Siddique July 17, 2017, 9:10 a.m. UTC | #5
On Mon, Jul 17, 2017 at 1:10 PM, Eelco Chaudron <echaudro@redhat.com> wrote:

> On 14/07/17 18:59, Han Zhou wrote:
>
>>
>>
>> On Fri, Jul 14, 2017 at 1:35 AM, Eelco Chaudron <echaudro@redhat.com
>> <mailto:echaudro@redhat.com>> wrote:
>> >
>> > On 14/07/17 08:44, Han Zhou wrote:
>> >>
>> >> This reverts commit 67ac844b55d4c5f6bbfa01773c82b3d6d8b62131.
>> >>
>> >> The commit introduced a problem that "File exists" will be reported
>> >> when trying to open br0.
>> >>
>> >> The operation that adds eth0 to br0 while moving IP address from
>> >> eth0 to bridge internal interface br0 reproduces this issue.
>> >>
>> >> $ ip a del <ip> dev eth0; ip a add <ip> dev br0; ovs-vsctl add-port
>> br0 eth0
>> >> $ ovs-dpctl show
>> >> ...
>> >> port 1: br0 (internal: open failed (File exists))
>> >> ...
>> >>
>> >> At this point restarting OVS will result in connection lost for the
>> >> node.
>> >>
>> >> Reverting the change fixes the problem. Since adding physical interface
>> >> to OVS bridge is quite normal operation, the problem is more severe
>> >> than the original problem fixed by commit 67ac844, so revert this
>> >> before a better fix is found for the original problem.
>> >
>> > See also this thread:
>> >
>> > https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335428.html
>> >
>> > Will be sending out the patch later today.
>> >
>> > //Eelco*
>> >
>> Thanks Eelco! Too bad I didn't notice the thread yesterday, otherwise I
>> would not spend hours chasing the problem :(
>> Since the problem is critical (e.g. installing release v2.7.1 on any
>> existing hypervisors would break connection), do you think it is better to
>> revert the commit until we are confident with the new patch?
>>
> Assuming we can get the patch in before the next release, there should be
> no need to revert back. But if its common practice for OVS to do it right
> away I have no issue with it.
>
>
For tripleo, we need OVS 2.7.1 for OVN DB HA support, but we can't take
v2.7.1 for this issue. I think we should atleast revert the patch for 2.7
branch  and have a new version 2.7.2.

Thanks
Numan


> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Eelco Chaudron July 17, 2017, 2:49 p.m. UTC | #6
On 17/07/17 11:10, Numan Siddique wrote:
>
>
> On Mon, Jul 17, 2017 at 1:10 PM, Eelco Chaudron <echaudro@redhat.com 
> <mailto:echaudro@redhat.com>> wrote:
>
>     On 14/07/17 18:59, Han Zhou wrote:
>
>
>
>         On Fri, Jul 14, 2017 at 1:35 AM, Eelco Chaudron
>         <echaudro@redhat.com <mailto:echaudro@redhat.com>
>         <mailto:echaudro@redhat.com <mailto:echaudro@redhat.com>>> wrote:
>         >
>         > On 14/07/17 08:44, Han Zhou wrote:
>         >>
>         >> This reverts commit 67ac844b55d4c5f6bbfa01773c82b3d6d8b62131.
>         >>
>         >> The commit introduced a problem that "File exists" will be
>         reported
>         >> when trying to open br0.
>         >>
>         >> The operation that adds eth0 to br0 while moving IP address
>         from
>         >> eth0 to bridge internal interface br0 reproduces this issue.
>         >>
>         >> $ ip a del <ip> dev eth0; ip a add <ip> dev br0; ovs-vsctl
>         add-port br0 eth0
>         >> $ ovs-dpctl show
>         >> ...
>         >> port 1: br0 (internal: open failed (File exists))
>         >> ...
>         >>
>         >> At this point restarting OVS will result in connection lost
>         for the
>         >> node.
>         >>
>         >> Reverting the change fixes the problem. Since adding
>         physical interface
>         >> to OVS bridge is quite normal operation, the problem is
>         more severe
>         >> than the original problem fixed by commit 67ac844, so
>         revert this
>         >> before a better fix is found for the original problem.
>         >
>         > See also this thread:
>         >
>         >
>         https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335428.html
>         <https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335428.html>
>         >
>         > Will be sending out the patch later today.
>         >
>         > //Eelco*
>         >
>         Thanks Eelco! Too bad I didn't notice the thread yesterday,
>         otherwise I would not spend hours chasing the problem :(
>         Since the problem is critical (e.g. installing release v2.7.1
>         on any existing hypervisors would break connection), do you
>         think it is better to revert the commit until we are confident
>         with the new patch?
>
>     Assuming we can get the patch in before the next release, there
>     should be no need to revert back. But if its common practice for
>     OVS to do it right away I have no issue with it.
>
>
> For tripleo, we need OVS 2.7.1 for OVN DB HA support, but we can't 
> take v2.7.1 for this issue. I think we should atleast revert the patch 
> for 2.7 branch  and have a new version 2.7.2.
>
If we are having a 2.7.2 release specially for this it makes sense to 
revert it.

In addition can you test/apply the updated patch, 
https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335560.html as 
I was hitting the issue with the persistant ports setup, it would be 
good to see your case is also fixed.

Thanks,

Eelco
Ben Pfaff July 17, 2017, 4:56 p.m. UTC | #7
On Fri, Jul 14, 2017 at 12:34:44PM +0530, Numan Siddique wrote:
> Hi Han,
> 
> I think I am seeing a similar issue here [1]. Can you please check and see
> if it's the same issue hit here which this patch fixes.
> 
> [1]  uses OVS 2.7.1 for tripleo and the upgrage job updates OVS 2.6.1 to
> OVS 2.7.1. The restart of openvswitch service fails and there is a crash in
> ovs-vswitchd. Please see the end of the log file (it's not too big).
> 
> [1] -
> http://logs.openstack.org/91/473191/12/check/gate-tripleo-ci-centos-7-multinode-upgrades-nv/414af3c/logs/subnode-2/openvswitch/ovs-vswitchd.txt.gz
>      https://review.openstack.org/#/c/473191/
> 
> 2017-07-14T04:14:04.229Z|00038|in_band|ERR|br-ex: failed to initialize
> in-band control: cannot open datapath local port br-ex (File exists)
> 2017-07-14T04:14:04.285Z|00002|daemon_unix|ERR|fork child died before
> signaling startup (killed (Segmentation fault), core dumped)
> 2017-07-14T04:14:04.285Z|00003|daemon_unix|EMER|could not detach from
> foreground session
> 
> This segfault is consistent. I did 2 rechecks and I see the same issue.

The ultimate cause of the segfault is the new netdev_open() behavior,
but the proximate cause of this segfault is a separate bug that we can
fix easily:
        https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335782.html

Once the proximate cause is fixed, we probably need to reassess what
you're reporting.
Justin Pettit July 17, 2017, 7:51 p.m. UTC | #8
> On Jul 17, 2017, at 7:49 AM, Eelco Chaudron <echaudro@redhat.com> wrote:
> 
> On 17/07/17 11:10, Numan Siddique wrote:
>> 
>> 
>> On Mon, Jul 17, 2017 at 1:10 PM, Eelco Chaudron <echaudro@redhat.com <mailto:echaudro@redhat.com>> wrote:
>> 
>>    Assuming we can get the patch in before the next release, there
>>    should be no need to revert back. But if its common practice for
>>    OVS to do it right away I have no issue with it.
>> 
>> 
>> For tripleo, we need OVS 2.7.1 for OVN DB HA support, but we can't take v2.7.1 for this issue. I think we should atleast revert the patch for 2.7 branch  and have a new version 2.7.2.
>> 
> If we are having a 2.7.2 release specially for this it makes sense to revert it.
> 
> In addition can you test/apply the updated patch, https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335560.html as I was hitting the issue with the persistant ports setup, it would be good to see your case is also fixed.

Han and Numan, it sounds like Eelco's original patch fixed an issue, but it's introducing problems for more common workloads.  Can you try it on the tip of "branch-2.7" and see if it resolves the issues you're seeing?  I'd suggest if it doesn't fix the problems, we revert Eelco's orignal patch and release 2.7.2.  If it does, and people think it's reasonably safe, we can release 2.7.2 with the patches.

Let me know if anyone has a different view on how we should proceed.

Thanks,

--Justin
Han Zhou July 17, 2017, 9:38 p.m. UTC | #9
On Mon, Jul 17, 2017 at 12:51 PM, Justin Pettit <jpettit@ovn.org> wrote:
>
>
> > On Jul 17, 2017, at 7:49 AM, Eelco Chaudron <echaudro@redhat.com> wrote:
> >
> > On 17/07/17 11:10, Numan Siddique wrote:
> >>
> >>
> >> On Mon, Jul 17, 2017 at 1:10 PM, Eelco Chaudron <echaudro@redhat.com
<mailto:echaudro@redhat.com>> wrote:
> >>
> >>    Assuming we can get the patch in before the next release, there
> >>    should be no need to revert back. But if its common practice for
> >>    OVS to do it right away I have no issue with it.
> >>
> >>
> >> For tripleo, we need OVS 2.7.1 for OVN DB HA support, but we can't
take v2.7.1 for this issue. I think we should atleast revert the patch for
2.7 branch  and have a new version 2.7.2.
> >>
> > If we are having a 2.7.2 release specially for this it makes sense to
revert it.
> >
> > In addition can you test/apply the updated patch,
https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335560.html as I
was hitting the issue with the persistant ports setup, it would be good to
see your case is also fixed.
>
> Han and Numan, it sounds like Eelco's original patch fixed an issue, but
it's introducing problems for more common workloads.  Can you try it on the
tip of "branch-2.7" and see if it resolves the issues you're seeing?  I'd
suggest if it doesn't fix the problems, we revert Eelco's orignal patch and
release 2.7.2.  If it does, and people think it's reasonably safe, we can
release 2.7.2 with the patches.
>
> Let me know if anyone has a different view on how we should proceed.
>
> Thanks,
>
> --Justin
>
>

I tested on branch-2.7 with patch
https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335560.html. I
didn't see the "File exists" problem any more :)
Gurucharan Shetty July 18, 2017, 3:51 a.m. UTC | #10
On 17 July 2017 at 12:51, Justin Pettit <jpettit@ovn.org> wrote:

>
> > On Jul 17, 2017, at 7:49 AM, Eelco Chaudron <echaudro@redhat.com> wrote:
> >
> > On 17/07/17 11:10, Numan Siddique wrote:
> >>
> >>
> >> On Mon, Jul 17, 2017 at 1:10 PM, Eelco Chaudron <echaudro@redhat.com
> <mailto:echaudro@redhat.com>> wrote:
> >>
> >>    Assuming we can get the patch in before the next release, there
> >>    should be no need to revert back. But if its common practice for
> >>    OVS to do it right away I have no issue with it.
> >>
> >>
> >> For tripleo, we need OVS 2.7.1 for OVN DB HA support, but we can't take
> v2.7.1 for this issue. I think we should atleast revert the patch for 2.7
> branch  and have a new version 2.7.2.
> >>
> > If we are having a 2.7.2 release specially for this it makes sense to
> revert it.
> >
> > In addition can you test/apply the updated patch,
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335560.html as I
> was hitting the issue with the persistant ports setup, it would be good to
> see your case is also fixed.
>
> Han and Numan, it sounds like Eelco's original patch fixed an issue, but
> it's introducing problems for more common workloads.  Can you try it on the
> tip of "branch-2.7" and see if it resolves the issues you're seeing?  I'd
> suggest if it doesn't fix the problems, we revert Eelco's orignal patch and
> release 2.7.2.  If it does, and people think it's reasonably safe, we can
> release 2.7.2 with the patches.
>
> Let me know if anyone has a different view on how we should proceed.
>

IMO, we should just revert this patch for 2.7 branch till we have a proper
fix. If we apply another patch in haste to fix this issue without letting
it bake for more testing and immediately release 2.7.2, we may just end up
with a different bug.

I tested the revert on tip of branch-2.7 and it atleast fixes the issue
where geneve tunnels across 2 hosts start to work again and I also do not
see the "open failed (File exists))" error too.


>
> Thanks,
>
> --Justin
>
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Justin Pettit July 18, 2017, 4:12 a.m. UTC | #11
> On Jul 17, 2017, at 8:51 PM, Guru Shetty <guru@ovn.org> wrote:
> 
> 
>> 
>> On 17 July 2017 at 12:51, Justin Pettit <jpettit@ovn.org> wrote:
>> 
>> > On Jul 17, 2017, at 7:49 AM, Eelco Chaudron <echaudro@redhat.com> wrote:
>> >
>> > On 17/07/17 11:10, Numan Siddique wrote:
>> >>
>> >>
>> >> On Mon, Jul 17, 2017 at 1:10 PM, Eelco Chaudron <echaudro@redhat.com <mailto:echaudro@redhat.com>> wrote:
>> >>
>> >>    Assuming we can get the patch in before the next release, there
>> >>    should be no need to revert back. But if its common practice for
>> >>    OVS to do it right away I have no issue with it.
>> >>
>> >>
>> >> For tripleo, we need OVS 2.7.1 for OVN DB HA support, but we can't take v2.7.1 for this issue. I think we should atleast revert the patch for 2.7 branch  and have a new version 2.7.2.
>> >>
>> > If we are having a 2.7.2 release specially for this it makes sense to revert it.
>> >
>> > In addition can you test/apply the updated patch, https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335560.html as I was hitting the issue with the persistant ports setup, it would be good to see your case is also fixed.
>> 
>> Han and Numan, it sounds like Eelco's original patch fixed an issue, but it's introducing problems for more common workloads.  Can you try it on the tip of "branch-2.7" and see if it resolves the issues you're seeing?  I'd suggest if it doesn't fix the problems, we revert Eelco's orignal patch and release 2.7.2.  If it does, and people think it's reasonably safe, we can release 2.7.2 with the patches.
> 
> Let me know if anyone has a different view on how we should proceed.
> 
> IMO, we should just revert this patch for 2.7 branch till we have a proper fix. If we apply another patch in haste to fix this issue without letting it bake for more testing and immediately release 2.7.2, we may just end up with a different bug. 
> 
> I tested the revert on tip of branch-2.7 and it atleast fixes the issue where geneve tunnels across 2 hosts start to work again and I also do not see the "open failed (File exists))" error too.

Sounds reasonable to me.  Does anyone else have any opinions?  If I don't hear anything by late tomorrow morning (PDT), I'll revert the patch and release 2.7.2.

Thanks,

--Justin
Numan Siddique July 18, 2017, 5:06 a.m. UTC | #12
On Tue, Jul 18, 2017 at 9:42 AM, Justin Pettit <jpettit@ovn.org> wrote:

>
> > On Jul 17, 2017, at 8:51 PM, Guru Shetty <guru@ovn.org> wrote:
> >
> >
> >>
> >> On 17 July 2017 at 12:51, Justin Pettit <jpettit@ovn.org> wrote:
> >>
> >> > On Jul 17, 2017, at 7:49 AM, Eelco Chaudron <echaudro@redhat.com>
> wrote:
> >> >
> >> > On 17/07/17 11:10, Numan Siddique wrote:
> >> >>
> >> >>
> >> >> On Mon, Jul 17, 2017 at 1:10 PM, Eelco Chaudron <echaudro@redhat.com
> <mailto:echaudro@redhat.com>> wrote:
> >> >>
> >> >>    Assuming we can get the patch in before the next release, there
> >> >>    should be no need to revert back. But if its common practice for
> >> >>    OVS to do it right away I have no issue with it.
> >> >>
> >> >>
> >> >> For tripleo, we need OVS 2.7.1 for OVN DB HA support, but we can't
> take v2.7.1 for this issue. I think we should atleast revert the patch for
> 2.7 branch  and have a new version 2.7.2.
> >> >>
> >> > If we are having a 2.7.2 release specially for this it makes sense to
> revert it.
> >> >
> >> > In addition can you test/apply the updated patch,
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335560.html as I
> was hitting the issue with the persistant ports setup, it would be good to
> see your case is also fixed.
> >>
> >> Han and Numan, it sounds like Eelco's original patch fixed an issue,
> but it's introducing problems for more common workloads.  Can you try it on
> the tip of "branch-2.7" and see if it resolves the issues you're seeing?
> I'd suggest if it doesn't fix the problems, we revert Eelco's orignal patch
> and release 2.7.2.  If it does, and people think it's reasonably safe, we
> can release 2.7.2 with the patches.
> >
> > Let me know if anyone has a different view on how we should proceed.
> >
> > IMO, we should just revert this patch for 2.7 branch till we have a
> proper fix. If we apply another patch in haste to fix this issue without
> letting it bake for more testing and immediately release 2.7.2, we may just
> end up with a different bug.
> >
> > I tested the revert on tip of branch-2.7 and it atleast fixes the issue
> where geneve tunnels across 2 hosts start to work again and I also do not
> see the "open failed (File exists))" error too.
>
> Sounds reasonable to me.  Does anyone else have any opinions?  If I don't
> hear anything by late tomorrow morning (PDT), I'll revert the patch and
> release 2.7.2.
>
>
Reverting the patch and a release 2.7.2 sounds fine to me.

Thanks
Numan

Thanks,
>
> --Justin
>
>
>
>
Justin Pettit July 18, 2017, 6:28 a.m. UTC | #13
> On Jul 17, 2017, at 10:06 PM, Numan Siddique <nusiddiq@redhat.com> wrote:
> 
> On Tue, Jul 18, 2017 at 9:42 AM, Justin Pettit <jpettit@ovn.org> wrote:
> 
> > On Jul 17, 2017, at 8:51 PM, Guru Shetty <guru@ovn.org> wrote:
> >
> >
> >>
> >> On 17 July 2017 at 12:51, Justin Pettit <jpettit@ovn.org> wrote:
> >>
> >> > On Jul 17, 2017, at 7:49 AM, Eelco Chaudron <echaudro@redhat.com> wrote:
> >> >
> >> > On 17/07/17 11:10, Numan Siddique wrote:
> >> >>
> >> >>
> >> >> For tripleo, we need OVS 2.7.1 for OVN DB HA support, but we can't take v2.7.1 for this issue. I think we should atleast revert the patch for 2.7 branch  and have a new version 2.7.2.
> >> >>
> >> > If we are having a 2.7.2 release specially for this it makes sense to revert it.
> >> >
> >> > In addition can you test/apply the updated patch, https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335560.html as I was hitting the issue with the persistant ports setup, it would be good to see your case is also fixed.
> >>
> >> Han and Numan, it sounds like Eelco's original patch fixed an issue, but it's introducing problems for more common workloads.  Can you try it on the tip of "branch-2.7" and see if it resolves the issues you're seeing?  I'd suggest if it doesn't fix the problems, we revert Eelco's orignal patch and release 2.7.2.  If it does, and people think it's reasonably safe, we can release 2.7.2 with the patches.
> >
> > Let me know if anyone has a different view on how we should proceed.
> >
> > IMO, we should just revert this patch for 2.7 branch till we have a proper fix. If we apply another patch in haste to fix this issue without letting it bake for more testing and immediately release 2.7.2, we may just end up with a different bug.
> >
> > I tested the revert on tip of branch-2.7 and it atleast fixes the issue where geneve tunnels across 2 hosts start to work again and I also do not see the "open failed (File exists))" error too.
> 
> Sounds reasonable to me.  Does anyone else have any opinions?  If I don't hear anything by late tomorrow morning (PDT), I'll revert the patch and release 2.7.2.
> 
> 
> Reverting the patch and a release 2.7.2 sounds fine to me. 

Are you okay with that, Eelco?  Unless anyone objects, we'll leave the commit on master and look at applying your new patch.

--Justin
Eelco Chaudron July 18, 2017, 11:15 a.m. UTC | #14
On 18/07/17 08:28, Justin Pettit wrote:
>> On Jul 17, 2017, at 10:06 PM, Numan Siddique <nusiddiq@redhat.com> wrote:
>>
>> On Tue, Jul 18, 2017 at 9:42 AM, Justin Pettit <jpettit@ovn.org> wrote:
>>
>>> On Jul 17, 2017, at 8:51 PM, Guru Shetty <guru@ovn.org> wrote:
>>>
>>>
>>>> On 17 July 2017 at 12:51, Justin Pettit <jpettit@ovn.org> wrote:
>>>>
>>>>> On Jul 17, 2017, at 7:49 AM, Eelco Chaudron <echaudro@redhat.com> wrote:
>>>>>
>>>>> On 17/07/17 11:10, Numan Siddique wrote:
>>>>>>
>>>>>> For tripleo, we need OVS 2.7.1 for OVN DB HA support, but we can't take v2.7.1 for this issue. I think we should atleast revert the patch for 2.7 branch  and have a new version 2.7.2.
>>>>>>
>>>>> If we are having a 2.7.2 release specially for this it makes sense to revert it.
>>>>>
>>>>> In addition can you test/apply the updated patch, https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335560.html as I was hitting the issue with the persistant ports setup, it would be good to see your case is also fixed.
>>>> Han and Numan, it sounds like Eelco's original patch fixed an issue, but it's introducing problems for more common workloads.  Can you try it on the tip of "branch-2.7" and see if it resolves the issues you're seeing?  I'd suggest if it doesn't fix the problems, we revert Eelco's orignal patch and release 2.7.2.  If it does, and people think it's reasonably safe, we can release 2.7.2 with the patches.
>>> Let me know if anyone has a different view on how we should proceed.
>>>
>>> IMO, we should just revert this patch for 2.7 branch till we have a proper fix. If we apply another patch in haste to fix this issue without letting it bake for more testing and immediately release 2.7.2, we may just end up with a different bug.
>>>
>>> I tested the revert on tip of branch-2.7 and it atleast fixes the issue where geneve tunnels across 2 hosts start to work again and I also do not see the "open failed (File exists))" error too.
>> Sounds reasonable to me.  Does anyone else have any opinions?  If I don't hear anything by late tomorrow morning (PDT), I'll revert the patch and release 2.7.2.
>>
>>
>> Reverting the patch and a release 2.7.2 sounds fine to me.
> Are you okay with that, Eelco?  Unless anyone objects, we'll leave the commit on master and look at applying your new patch.
>
> --Justin
>
>
Sound fine to me.

//Eelco
Justin Pettit July 18, 2017, 11:32 p.m. UTC | #15
> On Jul 18, 2017, at 4:15 AM, Eelco Chaudron <echaudro@redhat.com> wrote:
> 
> On 18/07/17 08:28, Justin Pettit wrote:
>>> On Jul 17, 2017, at 10:06 PM, Numan Siddique <nusiddiq@redhat.com> wrote:
>>> 
>>> On Tue, Jul 18, 2017 at 9:42 AM, Justin Pettit <jpettit@ovn.org> wrote:
>>> 
>>>> On Jul 17, 2017, at 8:51 PM, Guru Shetty <guru@ovn.org> wrote:
>>>> 
>>>> 
>>>>> On 17 July 2017 at 12:51, Justin Pettit <jpettit@ovn.org> wrote:
>>>>> 
>>> Sounds reasonable to me.  Does anyone else have any opinions?  If I don't hear anything by late tomorrow morning (PDT), I'll revert the patch and release 2.7.2.
>>> 
>>> 
>>> Reverting the patch and a release 2.7.2 sounds fine to me.
>> Are you okay with that, Eelco?  Unless anyone objects, we'll leave the commit on master and look at applying your new patch.
>> 
> Sound fine to me.

Okay, I reverted the patch and pushed to branch-2.7.  I've sent out a series to release 2.7.2:

	https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335886.html

If someone can review that, I'll get the release out tonight.

--Justin
nickcooper-zhangtonghao July 19, 2017, 3:27 a.m. UTC | #16
> On Jul 19, 2017, at 7:32 AM, Justin Pettit <jpettit@ovn.org> wrote:
> 
>> 
>> On Jul 18, 2017, at 4:15 AM, Eelco Chaudron <echaudro@redhat.com> wrote:
>> 
>> On 18/07/17 08:28, Justin Pettit wrote:
>>>> On Jul 17, 2017, at 10:06 PM, Numan Siddique <nusiddiq@redhat.com> wrote:
>>>> 
>>>> On Tue, Jul 18, 2017 at 9:42 AM, Justin Pettit <jpettit@ovn.org> wrote:
>>>> 
>>>>> On Jul 17, 2017, at 8:51 PM, Guru Shetty <guru@ovn.org> wrote:
>>>>> 
>>>>> 
>>>>>> On 17 July 2017 at 12:51, Justin Pettit <jpettit@ovn.org> wrote:
>>>>>> 
>>>> Sounds reasonable to me.  Does anyone else have any opinions?  If I don't hear anything by late tomorrow morning (PDT), I'll revert the patch and release 2.7.2.
>>>> 
>>>> 
>>>> Reverting the patch and a release 2.7.2 sounds fine to me.
>>> Are you okay with that, Eelco?  Unless anyone objects, we'll leave the commit on master and look at applying your new patch.
>>> 
>> Sound fine to me.
> 
> Okay, I reverted the patch and pushed to branch-2.7.  I've sent out a series to release 2.7.2:
> 
> 	https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335886.html
> 
> If someone can review that, I'll get the release out tonight.
> 
> --Justin

Hi all,
I sent patches, I hope it can help us. Thanks.

http://patchwork.ozlabs.org/patch/790729/
http://patchwork.ozlabs.org/patch/790730/
Ben Pfaff Aug. 2, 2017, 10:57 p.m. UTC | #17
On Thu, Jul 13, 2017 at 11:44:51PM -0700, Han Zhou wrote:
> This reverts commit 67ac844b55d4c5f6bbfa01773c82b3d6d8b62131.
> 
> The commit introduced a problem that "File exists" will be reported
> when trying to open br0.

Earlier today I applied Eelco's additional fix (appended below).  I hope
that it resolves all the issues.  If it doesn't, I think that we will
have to revert both this new commit and the one that you propose to
revert as well.

It seems like this area should not be so tricky.  I am a bit frustrated.

commit 8c2c225e481d6560faf73bb49c33f026bc58a664
Author: Eelco Chaudron <echaudro@redhat.com>
Date:   Fri Jul 14 14:33:27 2017 +0200

    netdev: Fix netdev_open() to track and recreate classless interfaces
    
    Due to commit 67ac844 an existing issue with OVS persisten ports
    surfaced. If we revert the commit we no longer get the error, and
    basic traffic will flow. However the wrong netdev class is used, hence
    the wrong callbacks get called.
    
    The main issue is with netdev_open() being called with type = NULL
    before the interface is actually configured in the system. This patch
    tracks these "auto" generated interfaces, and once netdev_open() gets
    called with a valid type, re-configures (re-create) it.
    
    Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
    Signed-off-by: Ben Pfaff <blp@ovn.org>
diff mbox

Patch

diff --git a/lib/netdev.c b/lib/netdev.c
index 0d5fad5..e8843a8 100644
--- a/lib/netdev.c
+++ b/lib/netdev.c
@@ -416,8 +416,6 @@  netdev_open(const char *name, const char *type, struct netdev **netdevp)
                       name, type);
             error = EAFNOSUPPORT;
         }
-    } else if (type && type[0] && strcmp(type, netdev->netdev_class->type)) {
-        error = EEXIST;
     } else {
         error = 0;
     }