Message ID | 1500014691-21497-1-git-send-email-zhouhan@gmail.com |
---|---|
State | Deferred |
Headers | show |
On Fri, Jul 14, 2017 at 12:14 PM, Han Zhou <zhouhan@gmail.com> wrote: > This reverts commit 67ac844b55d4c5f6bbfa01773c82b3d6d8b62131. > > The commit introduced a problem that "File exists" will be reported > when trying to open br0. > > The operation that adds eth0 to br0 while moving IP address from > eth0 to bridge internal interface br0 reproduces this issue. > > $ ip a del <ip> dev eth0; ip a add <ip> dev br0; ovs-vsctl add-port br0 > eth0 > $ ovs-dpctl show > ... > port 1: br0 (internal: open failed (File exists)) > ... > > At this point restarting OVS will result in connection lost for the > node. > > Reverting the change fixes the problem. Since adding physical interface > to OVS bridge is quite normal operation, the problem is more severe > than the original problem fixed by commit 67ac844, so revert this > before a better fix is found for the original problem. > Hi Han, I think I am seeing a similar issue here [1]. Can you please check and see if it's the same issue hit here which this patch fixes. [1] uses OVS 2.7.1 for tripleo and the upgrage job updates OVS 2.6.1 to OVS 2.7.1. The restart of openvswitch service fails and there is a crash in ovs-vswitchd. Please see the end of the log file (it's not too big). [1] - http://logs.openstack.org/91/473191/12/check/gate-tripleo-ci-centos-7-multinode-upgrades-nv/414af3c/logs/subnode-2/openvswitch/ovs-vswitchd.txt.gz https://review.openstack.org/#/c/473191/ 2017-07-14T04:14:04.229Z|00038|in_band|ERR|br-ex: failed to initialize in-band control: cannot open datapath local port br-ex (File exists) 2017-07-14T04:14:04.285Z|00002|daemon_unix|ERR|fork child died before signaling startup (killed (Segmentation fault), core dumped) 2017-07-14T04:14:04.285Z|00003|daemon_unix|EMER|could not detach from foreground session This segfault is consistent. I did 2 rechecks and I see the same issue. Thanks Numan Signed-off-by: Han Zhou <zhouhan@gmail.com> > --- > lib/netdev.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/lib/netdev.c b/lib/netdev.c > index 0d5fad5..e8843a8 100644 > --- a/lib/netdev.c > +++ b/lib/netdev.c > @@ -416,8 +416,6 @@ netdev_open(const char *name, const char *type, struct > netdev **netdevp) > name, type); > error = EAFNOSUPPORT; > } > - } else if (type && type[0] && strcmp(type, > netdev->netdev_class->type)) { > - error = EEXIST; > } else { > error = 0; > } > -- > 2.1.0 > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >
On 14/07/17 08:44, Han Zhou wrote: > This reverts commit 67ac844b55d4c5f6bbfa01773c82b3d6d8b62131. > > The commit introduced a problem that "File exists" will be reported > when trying to open br0. > > The operation that adds eth0 to br0 while moving IP address from > eth0 to bridge internal interface br0 reproduces this issue. > > $ ip a del <ip> dev eth0; ip a add <ip> dev br0; ovs-vsctl add-port br0 eth0 > $ ovs-dpctl show > ... > port 1: br0 (internal: open failed (File exists)) > ... > > At this point restarting OVS will result in connection lost for the > node. > > Reverting the change fixes the problem. Since adding physical interface > to OVS bridge is quite normal operation, the problem is more severe > than the original problem fixed by commit 67ac844, so revert this > before a better fix is found for the original problem. See also this thread: https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335428.html Will be sending out the patch later today. //Eelco* *
On Fri, Jul 14, 2017 at 1:35 AM, Eelco Chaudron <echaudro@redhat.com> wrote: > > On 14/07/17 08:44, Han Zhou wrote: >> >> This reverts commit 67ac844b55d4c5f6bbfa01773c82b3d6d8b62131. >> >> The commit introduced a problem that "File exists" will be reported >> when trying to open br0. >> >> The operation that adds eth0 to br0 while moving IP address from >> eth0 to bridge internal interface br0 reproduces this issue. >> >> $ ip a del <ip> dev eth0; ip a add <ip> dev br0; ovs-vsctl add-port br0 eth0 >> $ ovs-dpctl show >> ... >> port 1: br0 (internal: open failed (File exists)) >> ... >> >> At this point restarting OVS will result in connection lost for the >> node. >> >> Reverting the change fixes the problem. Since adding physical interface >> to OVS bridge is quite normal operation, the problem is more severe >> than the original problem fixed by commit 67ac844, so revert this >> before a better fix is found for the original problem. > > See also this thread: > > https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335428.html > > Will be sending out the patch later today. > > //Eelco* > Thanks Eelco! Too bad I didn't notice the thread yesterday, otherwise I would not spend hours chasing the problem :( Since the problem is critical (e.g. installing release v2.7.1 on any existing hypervisors would break connection), do you think it is better to revert the commit until we are confident with the new patch?
On 14/07/17 18:59, Han Zhou wrote: > > > On Fri, Jul 14, 2017 at 1:35 AM, Eelco Chaudron <echaudro@redhat.com > <mailto:echaudro@redhat.com>> wrote: > > > > On 14/07/17 08:44, Han Zhou wrote: > >> > >> This reverts commit 67ac844b55d4c5f6bbfa01773c82b3d6d8b62131. > >> > >> The commit introduced a problem that "File exists" will be reported > >> when trying to open br0. > >> > >> The operation that adds eth0 to br0 while moving IP address from > >> eth0 to bridge internal interface br0 reproduces this issue. > >> > >> $ ip a del <ip> dev eth0; ip a add <ip> dev br0; ovs-vsctl add-port > br0 eth0 > >> $ ovs-dpctl show > >> ... > >> port 1: br0 (internal: open failed (File exists)) > >> ... > >> > >> At this point restarting OVS will result in connection lost for the > >> node. > >> > >> Reverting the change fixes the problem. Since adding physical interface > >> to OVS bridge is quite normal operation, the problem is more severe > >> than the original problem fixed by commit 67ac844, so revert this > >> before a better fix is found for the original problem. > > > > See also this thread: > > > > https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335428.html > > > > Will be sending out the patch later today. > > > > //Eelco* > > > Thanks Eelco! Too bad I didn't notice the thread yesterday, otherwise > I would not spend hours chasing the problem :( > Since the problem is critical (e.g. installing release v2.7.1 on any > existing hypervisors would break connection), do you think it is > better to revert the commit until we are confident with the new patch? Assuming we can get the patch in before the next release, there should be no need to revert back. But if its common practice for OVS to do it right away I have no issue with it.
On Mon, Jul 17, 2017 at 1:10 PM, Eelco Chaudron <echaudro@redhat.com> wrote: > On 14/07/17 18:59, Han Zhou wrote: > >> >> >> On Fri, Jul 14, 2017 at 1:35 AM, Eelco Chaudron <echaudro@redhat.com >> <mailto:echaudro@redhat.com>> wrote: >> > >> > On 14/07/17 08:44, Han Zhou wrote: >> >> >> >> This reverts commit 67ac844b55d4c5f6bbfa01773c82b3d6d8b62131. >> >> >> >> The commit introduced a problem that "File exists" will be reported >> >> when trying to open br0. >> >> >> >> The operation that adds eth0 to br0 while moving IP address from >> >> eth0 to bridge internal interface br0 reproduces this issue. >> >> >> >> $ ip a del <ip> dev eth0; ip a add <ip> dev br0; ovs-vsctl add-port >> br0 eth0 >> >> $ ovs-dpctl show >> >> ... >> >> port 1: br0 (internal: open failed (File exists)) >> >> ... >> >> >> >> At this point restarting OVS will result in connection lost for the >> >> node. >> >> >> >> Reverting the change fixes the problem. Since adding physical interface >> >> to OVS bridge is quite normal operation, the problem is more severe >> >> than the original problem fixed by commit 67ac844, so revert this >> >> before a better fix is found for the original problem. >> > >> > See also this thread: >> > >> > https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335428.html >> > >> > Will be sending out the patch later today. >> > >> > //Eelco* >> > >> Thanks Eelco! Too bad I didn't notice the thread yesterday, otherwise I >> would not spend hours chasing the problem :( >> Since the problem is critical (e.g. installing release v2.7.1 on any >> existing hypervisors would break connection), do you think it is better to >> revert the commit until we are confident with the new patch? >> > Assuming we can get the patch in before the next release, there should be > no need to revert back. But if its common practice for OVS to do it right > away I have no issue with it. > > For tripleo, we need OVS 2.7.1 for OVN DB HA support, but we can't take v2.7.1 for this issue. I think we should atleast revert the patch for 2.7 branch and have a new version 2.7.2. Thanks Numan > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >
On 17/07/17 11:10, Numan Siddique wrote: > > > On Mon, Jul 17, 2017 at 1:10 PM, Eelco Chaudron <echaudro@redhat.com > <mailto:echaudro@redhat.com>> wrote: > > On 14/07/17 18:59, Han Zhou wrote: > > > > On Fri, Jul 14, 2017 at 1:35 AM, Eelco Chaudron > <echaudro@redhat.com <mailto:echaudro@redhat.com> > <mailto:echaudro@redhat.com <mailto:echaudro@redhat.com>>> wrote: > > > > On 14/07/17 08:44, Han Zhou wrote: > >> > >> This reverts commit 67ac844b55d4c5f6bbfa01773c82b3d6d8b62131. > >> > >> The commit introduced a problem that "File exists" will be > reported > >> when trying to open br0. > >> > >> The operation that adds eth0 to br0 while moving IP address > from > >> eth0 to bridge internal interface br0 reproduces this issue. > >> > >> $ ip a del <ip> dev eth0; ip a add <ip> dev br0; ovs-vsctl > add-port br0 eth0 > >> $ ovs-dpctl show > >> ... > >> port 1: br0 (internal: open failed (File exists)) > >> ... > >> > >> At this point restarting OVS will result in connection lost > for the > >> node. > >> > >> Reverting the change fixes the problem. Since adding > physical interface > >> to OVS bridge is quite normal operation, the problem is > more severe > >> than the original problem fixed by commit 67ac844, so > revert this > >> before a better fix is found for the original problem. > > > > See also this thread: > > > > > https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335428.html > <https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335428.html> > > > > Will be sending out the patch later today. > > > > //Eelco* > > > Thanks Eelco! Too bad I didn't notice the thread yesterday, > otherwise I would not spend hours chasing the problem :( > Since the problem is critical (e.g. installing release v2.7.1 > on any existing hypervisors would break connection), do you > think it is better to revert the commit until we are confident > with the new patch? > > Assuming we can get the patch in before the next release, there > should be no need to revert back. But if its common practice for > OVS to do it right away I have no issue with it. > > > For tripleo, we need OVS 2.7.1 for OVN DB HA support, but we can't > take v2.7.1 for this issue. I think we should atleast revert the patch > for 2.7 branch and have a new version 2.7.2. > If we are having a 2.7.2 release specially for this it makes sense to revert it. In addition can you test/apply the updated patch, https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335560.html as I was hitting the issue with the persistant ports setup, it would be good to see your case is also fixed. Thanks, Eelco
On Fri, Jul 14, 2017 at 12:34:44PM +0530, Numan Siddique wrote: > Hi Han, > > I think I am seeing a similar issue here [1]. Can you please check and see > if it's the same issue hit here which this patch fixes. > > [1] uses OVS 2.7.1 for tripleo and the upgrage job updates OVS 2.6.1 to > OVS 2.7.1. The restart of openvswitch service fails and there is a crash in > ovs-vswitchd. Please see the end of the log file (it's not too big). > > [1] - > http://logs.openstack.org/91/473191/12/check/gate-tripleo-ci-centos-7-multinode-upgrades-nv/414af3c/logs/subnode-2/openvswitch/ovs-vswitchd.txt.gz > https://review.openstack.org/#/c/473191/ > > 2017-07-14T04:14:04.229Z|00038|in_band|ERR|br-ex: failed to initialize > in-band control: cannot open datapath local port br-ex (File exists) > 2017-07-14T04:14:04.285Z|00002|daemon_unix|ERR|fork child died before > signaling startup (killed (Segmentation fault), core dumped) > 2017-07-14T04:14:04.285Z|00003|daemon_unix|EMER|could not detach from > foreground session > > This segfault is consistent. I did 2 rechecks and I see the same issue. The ultimate cause of the segfault is the new netdev_open() behavior, but the proximate cause of this segfault is a separate bug that we can fix easily: https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335782.html Once the proximate cause is fixed, we probably need to reassess what you're reporting.
> On Jul 17, 2017, at 7:49 AM, Eelco Chaudron <echaudro@redhat.com> wrote: > > On 17/07/17 11:10, Numan Siddique wrote: >> >> >> On Mon, Jul 17, 2017 at 1:10 PM, Eelco Chaudron <echaudro@redhat.com <mailto:echaudro@redhat.com>> wrote: >> >> Assuming we can get the patch in before the next release, there >> should be no need to revert back. But if its common practice for >> OVS to do it right away I have no issue with it. >> >> >> For tripleo, we need OVS 2.7.1 for OVN DB HA support, but we can't take v2.7.1 for this issue. I think we should atleast revert the patch for 2.7 branch and have a new version 2.7.2. >> > If we are having a 2.7.2 release specially for this it makes sense to revert it. > > In addition can you test/apply the updated patch, https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335560.html as I was hitting the issue with the persistant ports setup, it would be good to see your case is also fixed. Han and Numan, it sounds like Eelco's original patch fixed an issue, but it's introducing problems for more common workloads. Can you try it on the tip of "branch-2.7" and see if it resolves the issues you're seeing? I'd suggest if it doesn't fix the problems, we revert Eelco's orignal patch and release 2.7.2. If it does, and people think it's reasonably safe, we can release 2.7.2 with the patches. Let me know if anyone has a different view on how we should proceed. Thanks, --Justin
On Mon, Jul 17, 2017 at 12:51 PM, Justin Pettit <jpettit@ovn.org> wrote: > > > > On Jul 17, 2017, at 7:49 AM, Eelco Chaudron <echaudro@redhat.com> wrote: > > > > On 17/07/17 11:10, Numan Siddique wrote: > >> > >> > >> On Mon, Jul 17, 2017 at 1:10 PM, Eelco Chaudron <echaudro@redhat.com <mailto:echaudro@redhat.com>> wrote: > >> > >> Assuming we can get the patch in before the next release, there > >> should be no need to revert back. But if its common practice for > >> OVS to do it right away I have no issue with it. > >> > >> > >> For tripleo, we need OVS 2.7.1 for OVN DB HA support, but we can't take v2.7.1 for this issue. I think we should atleast revert the patch for 2.7 branch and have a new version 2.7.2. > >> > > If we are having a 2.7.2 release specially for this it makes sense to revert it. > > > > In addition can you test/apply the updated patch, https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335560.html as I was hitting the issue with the persistant ports setup, it would be good to see your case is also fixed. > > Han and Numan, it sounds like Eelco's original patch fixed an issue, but it's introducing problems for more common workloads. Can you try it on the tip of "branch-2.7" and see if it resolves the issues you're seeing? I'd suggest if it doesn't fix the problems, we revert Eelco's orignal patch and release 2.7.2. If it does, and people think it's reasonably safe, we can release 2.7.2 with the patches. > > Let me know if anyone has a different view on how we should proceed. > > Thanks, > > --Justin > > I tested on branch-2.7 with patch https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335560.html. I didn't see the "File exists" problem any more :)
On 17 July 2017 at 12:51, Justin Pettit <jpettit@ovn.org> wrote: > > > On Jul 17, 2017, at 7:49 AM, Eelco Chaudron <echaudro@redhat.com> wrote: > > > > On 17/07/17 11:10, Numan Siddique wrote: > >> > >> > >> On Mon, Jul 17, 2017 at 1:10 PM, Eelco Chaudron <echaudro@redhat.com > <mailto:echaudro@redhat.com>> wrote: > >> > >> Assuming we can get the patch in before the next release, there > >> should be no need to revert back. But if its common practice for > >> OVS to do it right away I have no issue with it. > >> > >> > >> For tripleo, we need OVS 2.7.1 for OVN DB HA support, but we can't take > v2.7.1 for this issue. I think we should atleast revert the patch for 2.7 > branch and have a new version 2.7.2. > >> > > If we are having a 2.7.2 release specially for this it makes sense to > revert it. > > > > In addition can you test/apply the updated patch, > https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335560.html as I > was hitting the issue with the persistant ports setup, it would be good to > see your case is also fixed. > > Han and Numan, it sounds like Eelco's original patch fixed an issue, but > it's introducing problems for more common workloads. Can you try it on the > tip of "branch-2.7" and see if it resolves the issues you're seeing? I'd > suggest if it doesn't fix the problems, we revert Eelco's orignal patch and > release 2.7.2. If it does, and people think it's reasonably safe, we can > release 2.7.2 with the patches. > > Let me know if anyone has a different view on how we should proceed. > IMO, we should just revert this patch for 2.7 branch till we have a proper fix. If we apply another patch in haste to fix this issue without letting it bake for more testing and immediately release 2.7.2, we may just end up with a different bug. I tested the revert on tip of branch-2.7 and it atleast fixes the issue where geneve tunnels across 2 hosts start to work again and I also do not see the "open failed (File exists))" error too. > > Thanks, > > --Justin > > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >
> On Jul 17, 2017, at 8:51 PM, Guru Shetty <guru@ovn.org> wrote: > > >> >> On 17 July 2017 at 12:51, Justin Pettit <jpettit@ovn.org> wrote: >> >> > On Jul 17, 2017, at 7:49 AM, Eelco Chaudron <echaudro@redhat.com> wrote: >> > >> > On 17/07/17 11:10, Numan Siddique wrote: >> >> >> >> >> >> On Mon, Jul 17, 2017 at 1:10 PM, Eelco Chaudron <echaudro@redhat.com <mailto:echaudro@redhat.com>> wrote: >> >> >> >> Assuming we can get the patch in before the next release, there >> >> should be no need to revert back. But if its common practice for >> >> OVS to do it right away I have no issue with it. >> >> >> >> >> >> For tripleo, we need OVS 2.7.1 for OVN DB HA support, but we can't take v2.7.1 for this issue. I think we should atleast revert the patch for 2.7 branch and have a new version 2.7.2. >> >> >> > If we are having a 2.7.2 release specially for this it makes sense to revert it. >> > >> > In addition can you test/apply the updated patch, https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335560.html as I was hitting the issue with the persistant ports setup, it would be good to see your case is also fixed. >> >> Han and Numan, it sounds like Eelco's original patch fixed an issue, but it's introducing problems for more common workloads. Can you try it on the tip of "branch-2.7" and see if it resolves the issues you're seeing? I'd suggest if it doesn't fix the problems, we revert Eelco's orignal patch and release 2.7.2. If it does, and people think it's reasonably safe, we can release 2.7.2 with the patches. > > Let me know if anyone has a different view on how we should proceed. > > IMO, we should just revert this patch for 2.7 branch till we have a proper fix. If we apply another patch in haste to fix this issue without letting it bake for more testing and immediately release 2.7.2, we may just end up with a different bug. > > I tested the revert on tip of branch-2.7 and it atleast fixes the issue where geneve tunnels across 2 hosts start to work again and I also do not see the "open failed (File exists))" error too. Sounds reasonable to me. Does anyone else have any opinions? If I don't hear anything by late tomorrow morning (PDT), I'll revert the patch and release 2.7.2. Thanks, --Justin
On Tue, Jul 18, 2017 at 9:42 AM, Justin Pettit <jpettit@ovn.org> wrote: > > > On Jul 17, 2017, at 8:51 PM, Guru Shetty <guru@ovn.org> wrote: > > > > > >> > >> On 17 July 2017 at 12:51, Justin Pettit <jpettit@ovn.org> wrote: > >> > >> > On Jul 17, 2017, at 7:49 AM, Eelco Chaudron <echaudro@redhat.com> > wrote: > >> > > >> > On 17/07/17 11:10, Numan Siddique wrote: > >> >> > >> >> > >> >> On Mon, Jul 17, 2017 at 1:10 PM, Eelco Chaudron <echaudro@redhat.com > <mailto:echaudro@redhat.com>> wrote: > >> >> > >> >> Assuming we can get the patch in before the next release, there > >> >> should be no need to revert back. But if its common practice for > >> >> OVS to do it right away I have no issue with it. > >> >> > >> >> > >> >> For tripleo, we need OVS 2.7.1 for OVN DB HA support, but we can't > take v2.7.1 for this issue. I think we should atleast revert the patch for > 2.7 branch and have a new version 2.7.2. > >> >> > >> > If we are having a 2.7.2 release specially for this it makes sense to > revert it. > >> > > >> > In addition can you test/apply the updated patch, > https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335560.html as I > was hitting the issue with the persistant ports setup, it would be good to > see your case is also fixed. > >> > >> Han and Numan, it sounds like Eelco's original patch fixed an issue, > but it's introducing problems for more common workloads. Can you try it on > the tip of "branch-2.7" and see if it resolves the issues you're seeing? > I'd suggest if it doesn't fix the problems, we revert Eelco's orignal patch > and release 2.7.2. If it does, and people think it's reasonably safe, we > can release 2.7.2 with the patches. > > > > Let me know if anyone has a different view on how we should proceed. > > > > IMO, we should just revert this patch for 2.7 branch till we have a > proper fix. If we apply another patch in haste to fix this issue without > letting it bake for more testing and immediately release 2.7.2, we may just > end up with a different bug. > > > > I tested the revert on tip of branch-2.7 and it atleast fixes the issue > where geneve tunnels across 2 hosts start to work again and I also do not > see the "open failed (File exists))" error too. > > Sounds reasonable to me. Does anyone else have any opinions? If I don't > hear anything by late tomorrow morning (PDT), I'll revert the patch and > release 2.7.2. > > Reverting the patch and a release 2.7.2 sounds fine to me. Thanks Numan Thanks, > > --Justin > > > >
> On Jul 17, 2017, at 10:06 PM, Numan Siddique <nusiddiq@redhat.com> wrote: > > On Tue, Jul 18, 2017 at 9:42 AM, Justin Pettit <jpettit@ovn.org> wrote: > > > On Jul 17, 2017, at 8:51 PM, Guru Shetty <guru@ovn.org> wrote: > > > > > >> > >> On 17 July 2017 at 12:51, Justin Pettit <jpettit@ovn.org> wrote: > >> > >> > On Jul 17, 2017, at 7:49 AM, Eelco Chaudron <echaudro@redhat.com> wrote: > >> > > >> > On 17/07/17 11:10, Numan Siddique wrote: > >> >> > >> >> > >> >> For tripleo, we need OVS 2.7.1 for OVN DB HA support, but we can't take v2.7.1 for this issue. I think we should atleast revert the patch for 2.7 branch and have a new version 2.7.2. > >> >> > >> > If we are having a 2.7.2 release specially for this it makes sense to revert it. > >> > > >> > In addition can you test/apply the updated patch, https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335560.html as I was hitting the issue with the persistant ports setup, it would be good to see your case is also fixed. > >> > >> Han and Numan, it sounds like Eelco's original patch fixed an issue, but it's introducing problems for more common workloads. Can you try it on the tip of "branch-2.7" and see if it resolves the issues you're seeing? I'd suggest if it doesn't fix the problems, we revert Eelco's orignal patch and release 2.7.2. If it does, and people think it's reasonably safe, we can release 2.7.2 with the patches. > > > > Let me know if anyone has a different view on how we should proceed. > > > > IMO, we should just revert this patch for 2.7 branch till we have a proper fix. If we apply another patch in haste to fix this issue without letting it bake for more testing and immediately release 2.7.2, we may just end up with a different bug. > > > > I tested the revert on tip of branch-2.7 and it atleast fixes the issue where geneve tunnels across 2 hosts start to work again and I also do not see the "open failed (File exists))" error too. > > Sounds reasonable to me. Does anyone else have any opinions? If I don't hear anything by late tomorrow morning (PDT), I'll revert the patch and release 2.7.2. > > > Reverting the patch and a release 2.7.2 sounds fine to me. Are you okay with that, Eelco? Unless anyone objects, we'll leave the commit on master and look at applying your new patch. --Justin
On 18/07/17 08:28, Justin Pettit wrote: >> On Jul 17, 2017, at 10:06 PM, Numan Siddique <nusiddiq@redhat.com> wrote: >> >> On Tue, Jul 18, 2017 at 9:42 AM, Justin Pettit <jpettit@ovn.org> wrote: >> >>> On Jul 17, 2017, at 8:51 PM, Guru Shetty <guru@ovn.org> wrote: >>> >>> >>>> On 17 July 2017 at 12:51, Justin Pettit <jpettit@ovn.org> wrote: >>>> >>>>> On Jul 17, 2017, at 7:49 AM, Eelco Chaudron <echaudro@redhat.com> wrote: >>>>> >>>>> On 17/07/17 11:10, Numan Siddique wrote: >>>>>> >>>>>> For tripleo, we need OVS 2.7.1 for OVN DB HA support, but we can't take v2.7.1 for this issue. I think we should atleast revert the patch for 2.7 branch and have a new version 2.7.2. >>>>>> >>>>> If we are having a 2.7.2 release specially for this it makes sense to revert it. >>>>> >>>>> In addition can you test/apply the updated patch, https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335560.html as I was hitting the issue with the persistant ports setup, it would be good to see your case is also fixed. >>>> Han and Numan, it sounds like Eelco's original patch fixed an issue, but it's introducing problems for more common workloads. Can you try it on the tip of "branch-2.7" and see if it resolves the issues you're seeing? I'd suggest if it doesn't fix the problems, we revert Eelco's orignal patch and release 2.7.2. If it does, and people think it's reasonably safe, we can release 2.7.2 with the patches. >>> Let me know if anyone has a different view on how we should proceed. >>> >>> IMO, we should just revert this patch for 2.7 branch till we have a proper fix. If we apply another patch in haste to fix this issue without letting it bake for more testing and immediately release 2.7.2, we may just end up with a different bug. >>> >>> I tested the revert on tip of branch-2.7 and it atleast fixes the issue where geneve tunnels across 2 hosts start to work again and I also do not see the "open failed (File exists))" error too. >> Sounds reasonable to me. Does anyone else have any opinions? If I don't hear anything by late tomorrow morning (PDT), I'll revert the patch and release 2.7.2. >> >> >> Reverting the patch and a release 2.7.2 sounds fine to me. > Are you okay with that, Eelco? Unless anyone objects, we'll leave the commit on master and look at applying your new patch. > > --Justin > > Sound fine to me. //Eelco
> On Jul 18, 2017, at 4:15 AM, Eelco Chaudron <echaudro@redhat.com> wrote: > > On 18/07/17 08:28, Justin Pettit wrote: >>> On Jul 17, 2017, at 10:06 PM, Numan Siddique <nusiddiq@redhat.com> wrote: >>> >>> On Tue, Jul 18, 2017 at 9:42 AM, Justin Pettit <jpettit@ovn.org> wrote: >>> >>>> On Jul 17, 2017, at 8:51 PM, Guru Shetty <guru@ovn.org> wrote: >>>> >>>> >>>>> On 17 July 2017 at 12:51, Justin Pettit <jpettit@ovn.org> wrote: >>>>> >>> Sounds reasonable to me. Does anyone else have any opinions? If I don't hear anything by late tomorrow morning (PDT), I'll revert the patch and release 2.7.2. >>> >>> >>> Reverting the patch and a release 2.7.2 sounds fine to me. >> Are you okay with that, Eelco? Unless anyone objects, we'll leave the commit on master and look at applying your new patch. >> > Sound fine to me. Okay, I reverted the patch and pushed to branch-2.7. I've sent out a series to release 2.7.2: https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335886.html If someone can review that, I'll get the release out tonight. --Justin
> On Jul 19, 2017, at 7:32 AM, Justin Pettit <jpettit@ovn.org> wrote: > >> >> On Jul 18, 2017, at 4:15 AM, Eelco Chaudron <echaudro@redhat.com> wrote: >> >> On 18/07/17 08:28, Justin Pettit wrote: >>>> On Jul 17, 2017, at 10:06 PM, Numan Siddique <nusiddiq@redhat.com> wrote: >>>> >>>> On Tue, Jul 18, 2017 at 9:42 AM, Justin Pettit <jpettit@ovn.org> wrote: >>>> >>>>> On Jul 17, 2017, at 8:51 PM, Guru Shetty <guru@ovn.org> wrote: >>>>> >>>>> >>>>>> On 17 July 2017 at 12:51, Justin Pettit <jpettit@ovn.org> wrote: >>>>>> >>>> Sounds reasonable to me. Does anyone else have any opinions? If I don't hear anything by late tomorrow morning (PDT), I'll revert the patch and release 2.7.2. >>>> >>>> >>>> Reverting the patch and a release 2.7.2 sounds fine to me. >>> Are you okay with that, Eelco? Unless anyone objects, we'll leave the commit on master and look at applying your new patch. >>> >> Sound fine to me. > > Okay, I reverted the patch and pushed to branch-2.7. I've sent out a series to release 2.7.2: > > https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335886.html > > If someone can review that, I'll get the release out tonight. > > --Justin Hi all, I sent patches, I hope it can help us. Thanks. http://patchwork.ozlabs.org/patch/790729/ http://patchwork.ozlabs.org/patch/790730/
On Thu, Jul 13, 2017 at 11:44:51PM -0700, Han Zhou wrote: > This reverts commit 67ac844b55d4c5f6bbfa01773c82b3d6d8b62131. > > The commit introduced a problem that "File exists" will be reported > when trying to open br0. Earlier today I applied Eelco's additional fix (appended below). I hope that it resolves all the issues. If it doesn't, I think that we will have to revert both this new commit and the one that you propose to revert as well. It seems like this area should not be so tricky. I am a bit frustrated. commit 8c2c225e481d6560faf73bb49c33f026bc58a664 Author: Eelco Chaudron <echaudro@redhat.com> Date: Fri Jul 14 14:33:27 2017 +0200 netdev: Fix netdev_open() to track and recreate classless interfaces Due to commit 67ac844 an existing issue with OVS persisten ports surfaced. If we revert the commit we no longer get the error, and basic traffic will flow. However the wrong netdev class is used, hence the wrong callbacks get called. The main issue is with netdev_open() being called with type = NULL before the interface is actually configured in the system. This patch tracks these "auto" generated interfaces, and once netdev_open() gets called with a valid type, re-configures (re-create) it. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
diff --git a/lib/netdev.c b/lib/netdev.c index 0d5fad5..e8843a8 100644 --- a/lib/netdev.c +++ b/lib/netdev.c @@ -416,8 +416,6 @@ netdev_open(const char *name, const char *type, struct netdev **netdevp) name, type); error = EAFNOSUPPORT; } - } else if (type && type[0] && strcmp(type, netdev->netdev_class->type)) { - error = EEXIST; } else { error = 0; }
This reverts commit 67ac844b55d4c5f6bbfa01773c82b3d6d8b62131. The commit introduced a problem that "File exists" will be reported when trying to open br0. The operation that adds eth0 to br0 while moving IP address from eth0 to bridge internal interface br0 reproduces this issue. $ ip a del <ip> dev eth0; ip a add <ip> dev br0; ovs-vsctl add-port br0 eth0 $ ovs-dpctl show ... port 1: br0 (internal: open failed (File exists)) ... At this point restarting OVS will result in connection lost for the node. Reverting the change fixes the problem. Since adding physical interface to OVS bridge is quite normal operation, the problem is more severe than the original problem fixed by commit 67ac844, so revert this before a better fix is found for the original problem. Signed-off-by: Han Zhou <zhouhan@gmail.com> --- lib/netdev.c | 2 -- 1 file changed, 2 deletions(-)