diff mbox

[ovs-dev,v3] tnl-ports: Remove netdevs in netdev_hash when deleted

Message ID 1497491937-9312-1-git-send-email-fukaige@huawei.com
State Superseded
Headers show

Commit Message

fukaige June 15, 2017, 1:58 a.m. UTC
tart a virtual machine with its backend tap device attached to a brought up linux bridge.
If we delete the linux bridge when vm is still running, we'll get the following error when
trying to create a ovs bridge with the same name.

The reason is that ovs-router subsystem add the linux bridge into netdev_shash, but does
not remove it when the bridge is deleted in the situation. When the bridge is deleted, ovs
will receive a RTM_DELLINK msg, take this chance to remove the bridge in netdev_shash.

ovs-vsctl: Error detected while setting up 'br-eth'.  See ovs-vswitchd log for details.

ovs-vswitchd log:
2017-05-11T03:45:25.293Z|00026|ofproto_dpif|INFO|system@ovs-system: Datapath supports recirculation
2017-05-11T03:45:25.293Z|00027|ofproto_dpif|INFO|system@ovs-system: MPLS label stack length probed as 1
2017-05-11T03:45:25.293Z|00028|ofproto_dpif|INFO|system@ovs-system: Datapath supports unique flow ids
2017-05-11T03:45:25.293Z|00029|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_state
2017-05-11T03:45:25.293Z|00030|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_zone
2017-05-11T03:45:25.293Z|00031|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_mark
2017-05-11T03:45:25.293Z|00032|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_label
2017-05-11T03:45:25.364Z|00001|ofproto_dpif_upcall(handler226)|INFO|received packet on unassociated datapath port 0
2017-05-11T03:45:25.368Z|00033|netdev_linux|WARN|ethtool command ETHTOOL_GFLAGS on network device br-eth failed: No such device
2017-05-11T03:45:25.368Z|00034|dpif|WARN|system@ovs-system: failed to add br-eth as port: No such device
2017-05-11T03:45:25.368Z|00035|bridge|INFO|bridge br-eth: using datapath ID 00002a51cf9f2841
2017-05-11T03:45:25.368Z|00036|connmgr|INFO|br-eth: added service controller "punix:/var/run/openvswitch/br-eth.mgmt"

Signed-off-by: fukaige <fukaige@huawei.com>
---
 lib/tnl-ports.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

Ben Pfaff July 7, 2017, 9:54 p.m. UTC | #1
On Thu, Jun 15, 2017 at 09:58:57AM +0800, fukaige wrote:
> tart a virtual machine with its backend tap device attached to a brought up linux bridge.
> If we delete the linux bridge when vm is still running, we'll get the following error when
> trying to create a ovs bridge with the same name.
> 
> The reason is that ovs-router subsystem add the linux bridge into netdev_shash, but does
> not remove it when the bridge is deleted in the situation. When the bridge is deleted, ovs
> will receive a RTM_DELLINK msg, take this chance to remove the bridge in netdev_shash.
> 
> ovs-vsctl: Error detected while setting up 'br-eth'.  See ovs-vswitchd log for details.
> 
> ovs-vswitchd log:
> 2017-05-11T03:45:25.293Z|00026|ofproto_dpif|INFO|system@ovs-system: Datapath supports recirculation
> 2017-05-11T03:45:25.293Z|00027|ofproto_dpif|INFO|system@ovs-system: MPLS label stack length probed as 1
> 2017-05-11T03:45:25.293Z|00028|ofproto_dpif|INFO|system@ovs-system: Datapath supports unique flow ids
> 2017-05-11T03:45:25.293Z|00029|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_state
> 2017-05-11T03:45:25.293Z|00030|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_zone
> 2017-05-11T03:45:25.293Z|00031|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_mark
> 2017-05-11T03:45:25.293Z|00032|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_label
> 2017-05-11T03:45:25.364Z|00001|ofproto_dpif_upcall(handler226)|INFO|received packet on unassociated datapath port 0
> 2017-05-11T03:45:25.368Z|00033|netdev_linux|WARN|ethtool command ETHTOOL_GFLAGS on network device br-eth failed: No such device
> 2017-05-11T03:45:25.368Z|00034|dpif|WARN|system@ovs-system: failed to add br-eth as port: No such device
> 2017-05-11T03:45:25.368Z|00035|bridge|INFO|bridge br-eth: using datapath ID 00002a51cf9f2841
> 2017-05-11T03:45:25.368Z|00036|connmgr|INFO|br-eth: added service controller "punix:/var/run/openvswitch/br-eth.mgmt"
> 
> Signed-off-by: fukaige <fukaige@huawei.com>

Thank you for the updated patch.

This patch raises some difficulties.  First, it uses rtnetlink, which is
Linux specific.  We do not want tnl-ports to be Linux-specific.  The
more generic alternative is ifnotifier, but it provides no information
about the change that took place, so it is harder to deal with.  Second,
it will dereference a null pointer if the 'change' passed in is null,
which can happen if the system is busy or devices are changing quickly,
as described in rtnetlink.h:

    /* Function called to report that a netdev has changed.  'change' describes the
     * specific change.  It may be null if the buffer of change information
     * overflowed, in which case the function must assume that every device may
     * have changed.  'aux' is as specified in the call to
     * rtnetlink_notifier_register().  */
    typedef
    void rtnetlink_notify_func(const struct rtnetlink_change *change,
                               void *aux);

I am not sure the best way to proceed.  One way would be to revamp
ifnotifier so that it provides information on the device that changed
and the kind of change, and then adapt each of the implementations to
pass that along as well.  Other approaches along these lines are also
possible.

Another approach might be to notify the ovs-router subsystem when a
bridge is deleted, so that it can drop all of the references it has for
that bridge.  I haven't looked into how much work this would be.

What do you think?

Thanks,

Ben.
fukaige July 11, 2017, 7:34 a.m. UTC | #2
> -----Original Message-----
> From: Ben Pfaff [mailto:blp@ovn.org]
> Sent: Saturday, July 08, 2017 5:54 AM
> To: fukaige
> Cc: dev@openvswitch.org; Zhaoshenglong
> Subject: Re: [PATCH v3] tnl-ports: Remove netdevs in netdev_hash when
> deleted
> 
> On Thu, Jun 15, 2017 at 09:58:57AM +0800, fukaige wrote:
> > tart a virtual machine with its backend tap device attached to a brought up
> linux bridge.
> > If we delete the linux bridge when vm is still running, we'll get the
> > following error when trying to create a ovs bridge with the same name.
> >
> > The reason is that ovs-router subsystem add the linux bridge into
> > netdev_shash, but does not remove it when the bridge is deleted in the
> > situation. When the bridge is deleted, ovs will receive a RTM_DELLINK msg,
> take this chance to remove the bridge in netdev_shash.
> >
> > ovs-vsctl: Error detected while setting up 'br-eth'.  See ovs-vswitchd log for
> details.
> >
> > ovs-vswitchd log:
> > 2017-05-11T03:45:25.293Z|00026|ofproto_dpif|INFO|system@ovs-system:
> > Datapath supports recirculation
> > 2017-05-11T03:45:25.293Z|00027|ofproto_dpif|INFO|system@ovs-system:
> > MPLS label stack length probed as 1
> > 2017-05-11T03:45:25.293Z|00028|ofproto_dpif|INFO|system@ovs-system:
> > Datapath supports unique flow ids
> > 2017-05-11T03:45:25.293Z|00029|ofproto_dpif|INFO|system@ovs-system:
> > Datapath supports ct_state
> > 2017-05-11T03:45:25.293Z|00030|ofproto_dpif|INFO|system@ovs-system:
> > Datapath supports ct_zone
> > 2017-05-11T03:45:25.293Z|00031|ofproto_dpif|INFO|system@ovs-system:
> > Datapath supports ct_mark
> > 2017-05-11T03:45:25.293Z|00032|ofproto_dpif|INFO|system@ovs-system:
> > Datapath supports ct_label
> > 2017-05-11T03:45:25.364Z|00001|ofproto_dpif_upcall(handler226)|INFO|re
> > ceived packet on unassociated datapath port 0
> > 2017-05-11T03:45:25.368Z|00033|netdev_linux|WARN|ethtool command
> > ETHTOOL_GFLAGS on network device br-eth failed: No such device
> > 2017-05-11T03:45:25.368Z|00034|dpif|WARN|system@ovs-system: failed
> to
> > add br-eth as port: No such device
> > 2017-05-11T03:45:25.368Z|00035|bridge|INFO|bridge br-eth: using
> > datapath ID 00002a51cf9f2841
> > 2017-05-11T03:45:25.368Z|00036|connmgr|INFO|br-eth: added service
> controller "punix:/var/run/openvswitch/br-eth.mgmt"
> >
> > Signed-off-by: fukaige <fukaige@huawei.com>
> 
> Thank you for the updated patch.
> 
> This patch raises some difficulties.  First, it uses rtnetlink, which is Linux
> specific.  We do not want tnl-ports to be Linux-specific.  The more generic
> alternative is ifnotifier, but it provides no information about the change that
> took place, so it is harder to deal with.  Second, it will dereference a null
> pointer if the 'change' passed in is null, which can happen if the system is busy
> or devices are changing quickly, as described in rtnetlink.h:
> 
>     /* Function called to report that a netdev has changed.  'change'
> describes the
>      * specific change.  It may be null if the buffer of change information
>      * overflowed, in which case the function must assume that every device
> may
>      * have changed.  'aux' is as specified in the call to
>      * rtnetlink_notifier_register().  */
>     typedef
>     void rtnetlink_notify_func(const struct rtnetlink_change *change,
>                                void *aux);
> 
> I am not sure the best way to proceed.  One way would be to revamp ifnotifier
> so that it provides information on the device that changed and the kind of
> change, and then adapt each of the implementations to pass that along as well.
> Other approaches along these lines are also possible.
> 
> Another approach might be to notify the ovs-router subsystem when a bridge is
> deleted, so that it can drop all of the references it has for that bridge.  I
> haven't looked into how much work this would be.
> 
> What do you think?
> 
> Thanks,
> 
> Ben.

Thanks for your review.

I prefer the second approach you mentioned. May be we can register a callback in ovs-router level, like route_table_init.
When a bridge is deleted, ovs-router subsystem will get a msg, like RTM_DELLINK on linux. So it can drop all of the references
it has for that bridge. 

I don't know if there is same bug in other OSes, like bsd, and which type of msg will be sent in the situation.
fukaige July 13, 2017, 2:04 a.m. UTC | #3
> -----Original Message-----
> From: fukaige
> Sent: Tuesday, July 11, 2017 3:34 PM
> To: 'Ben Pfaff'
> Cc: dev@openvswitch.org; Zhaoshenglong
> Subject: RE: [PATCH v3] tnl-ports: Remove netdevs in netdev_hash when
> deleted
> 
> 
> > -----Original Message-----
> > From: Ben Pfaff [mailto:blp@ovn.org]
> > Sent: Saturday, July 08, 2017 5:54 AM
> > To: fukaige
> > Cc: dev@openvswitch.org; Zhaoshenglong
> > Subject: Re: [PATCH v3] tnl-ports: Remove netdevs in netdev_hash when
> > deleted
> >
> > On Thu, Jun 15, 2017 at 09:58:57AM +0800, fukaige wrote:
> > > tart a virtual machine with its backend tap device attached to a
> > > brought up
> > linux bridge.
> > > If we delete the linux bridge when vm is still running, we'll get
> > > the following error when trying to create a ovs bridge with the same name.
> > >
> > > The reason is that ovs-router subsystem add the linux bridge into
> > > netdev_shash, but does not remove it when the bridge is deleted in
> > > the situation. When the bridge is deleted, ovs will receive a
> > > RTM_DELLINK msg,
> > take this chance to remove the bridge in netdev_shash.
> > >
> > > ovs-vsctl: Error detected while setting up 'br-eth'.  See
> > > ovs-vswitchd log for
> > details.
> > >
> > > ovs-vswitchd log:
> > >
> 2017-05-11T03:45:25.293Z|00026|ofproto_dpif|INFO|system@ovs-system:
> > > Datapath supports recirculation
> > >
> 2017-05-11T03:45:25.293Z|00027|ofproto_dpif|INFO|system@ovs-system:
> > > MPLS label stack length probed as 1
> > >
> 2017-05-11T03:45:25.293Z|00028|ofproto_dpif|INFO|system@ovs-system:
> > > Datapath supports unique flow ids
> > >
> 2017-05-11T03:45:25.293Z|00029|ofproto_dpif|INFO|system@ovs-system:
> > > Datapath supports ct_state
> > >
> 2017-05-11T03:45:25.293Z|00030|ofproto_dpif|INFO|system@ovs-system:
> > > Datapath supports ct_zone
> > >
> 2017-05-11T03:45:25.293Z|00031|ofproto_dpif|INFO|system@ovs-system:
> > > Datapath supports ct_mark
> > >
> 2017-05-11T03:45:25.293Z|00032|ofproto_dpif|INFO|system@ovs-system:
> > > Datapath supports ct_label
> > > 2017-05-11T03:45:25.364Z|00001|ofproto_dpif_upcall(handler226)|INFO|
> > > re ceived packet on unassociated datapath port 0
> > > 2017-05-11T03:45:25.368Z|00033|netdev_linux|WARN|ethtool command
> > > ETHTOOL_GFLAGS on network device br-eth failed: No such device
> > > 2017-05-11T03:45:25.368Z|00034|dpif|WARN|system@ovs-system: failed
> > to
> > > add br-eth as port: No such device
> > > 2017-05-11T03:45:25.368Z|00035|bridge|INFO|bridge br-eth: using
> > > datapath ID 00002a51cf9f2841
> > > 2017-05-11T03:45:25.368Z|00036|connmgr|INFO|br-eth: added service
> > controller "punix:/var/run/openvswitch/br-eth.mgmt"
> > >
> > > Signed-off-by: fukaige <fukaige@huawei.com>
> >
> > Thank you for the updated patch.
> >
> > This patch raises some difficulties.  First, it uses rtnetlink, which
> > is Linux specific.  We do not want tnl-ports to be Linux-specific.
> > The more generic alternative is ifnotifier, but it provides no
> > information about the change that took place, so it is harder to deal
> > with.  Second, it will dereference a null pointer if the 'change'
> > passed in is null, which can happen if the system is busy or devices are
> changing quickly, as described in rtnetlink.h:
> >
> >     /* Function called to report that a netdev has changed.  'change'
> > describes the
> >      * specific change.  It may be null if the buffer of change information
> >      * overflowed, in which case the function must assume that every
> > device may
> >      * have changed.  'aux' is as specified in the call to
> >      * rtnetlink_notifier_register().  */
> >     typedef
> >     void rtnetlink_notify_func(const struct rtnetlink_change *change,
> >                                void *aux);
> >
> > I am not sure the best way to proceed.  One way would be to revamp
> > ifnotifier so that it provides information on the device that changed
> > and the kind of change, and then adapt each of the implementations to pass
> that along as well.
> > Other approaches along these lines are also possible.
> >
> > Another approach might be to notify the ovs-router subsystem when a
> > bridge is deleted, so that it can drop all of the references it has
> > for that bridge.  I haven't looked into how much work this would be.
> >
> > What do you think?
> >
> > Thanks,
> >
> > Ben.
> 
> Thanks for your review.
> 
> I prefer the second approach you mentioned. May be we can register a callback
> in ovs-router level, like route_table_init.
> When a bridge is deleted, ovs-router subsystem will get a msg, like
> RTM_DELLINK on linux. So it can drop all of the references it has for that
> bridge.
> 
> I don't know if there is same bug in other OSes, like bsd, and which type of msg
> will be sent in the situation.

I think we can fix the bug through the following way:
	1. register a callback using rtnetlink_notifier_create in route_table_init(route-table.c);
	2.when receive the msg RTM_DELLINK, drop all of the references it has for that bridge.

The reason I think the above method can work is that ovs-router subsystem in bsd(route-table-bsd.c) does not hold the references of that bridge.
So, there is no this bug in bsd. I just hit it in linux.

What do you think?
Ben Pfaff Aug. 8, 2017, 8:06 p.m. UTC | #4
On Thu, Jul 13, 2017 at 02:04:53AM +0000, fukaige wrote:
> 
> 
> > -----Original Message-----
> > From: fukaige
> > Sent: Tuesday, July 11, 2017 3:34 PM
> > To: 'Ben Pfaff'
> > Cc: dev@openvswitch.org; Zhaoshenglong
> > Subject: RE: [PATCH v3] tnl-ports: Remove netdevs in netdev_hash when
> > deleted
> > 
> > 
> > > -----Original Message-----
> > > From: Ben Pfaff [mailto:blp@ovn.org]
> > > Sent: Saturday, July 08, 2017 5:54 AM
> > > To: fukaige
> > > Cc: dev@openvswitch.org; Zhaoshenglong
> > > Subject: Re: [PATCH v3] tnl-ports: Remove netdevs in netdev_hash when
> > > deleted
> > >
> > > On Thu, Jun 15, 2017 at 09:58:57AM +0800, fukaige wrote:
> > > > tart a virtual machine with its backend tap device attached to a
> > > > brought up
> > > linux bridge.
> > > > If we delete the linux bridge when vm is still running, we'll get
> > > > the following error when trying to create a ovs bridge with the same name.
> > > >
> > > > The reason is that ovs-router subsystem add the linux bridge into
> > > > netdev_shash, but does not remove it when the bridge is deleted in
> > > > the situation. When the bridge is deleted, ovs will receive a
> > > > RTM_DELLINK msg,
> > > take this chance to remove the bridge in netdev_shash.
> > > >
> > > > ovs-vsctl: Error detected while setting up 'br-eth'.  See
> > > > ovs-vswitchd log for
> > > details.
> > > >
> > > > ovs-vswitchd log:
> > > >
> > 2017-05-11T03:45:25.293Z|00026|ofproto_dpif|INFO|system@ovs-system:
> > > > Datapath supports recirculation
> > > >
> > 2017-05-11T03:45:25.293Z|00027|ofproto_dpif|INFO|system@ovs-system:
> > > > MPLS label stack length probed as 1
> > > >
> > 2017-05-11T03:45:25.293Z|00028|ofproto_dpif|INFO|system@ovs-system:
> > > > Datapath supports unique flow ids
> > > >
> > 2017-05-11T03:45:25.293Z|00029|ofproto_dpif|INFO|system@ovs-system:
> > > > Datapath supports ct_state
> > > >
> > 2017-05-11T03:45:25.293Z|00030|ofproto_dpif|INFO|system@ovs-system:
> > > > Datapath supports ct_zone
> > > >
> > 2017-05-11T03:45:25.293Z|00031|ofproto_dpif|INFO|system@ovs-system:
> > > > Datapath supports ct_mark
> > > >
> > 2017-05-11T03:45:25.293Z|00032|ofproto_dpif|INFO|system@ovs-system:
> > > > Datapath supports ct_label
> > > > 2017-05-11T03:45:25.364Z|00001|ofproto_dpif_upcall(handler226)|INFO|
> > > > re ceived packet on unassociated datapath port 0
> > > > 2017-05-11T03:45:25.368Z|00033|netdev_linux|WARN|ethtool command
> > > > ETHTOOL_GFLAGS on network device br-eth failed: No such device
> > > > 2017-05-11T03:45:25.368Z|00034|dpif|WARN|system@ovs-system: failed
> > > to
> > > > add br-eth as port: No such device
> > > > 2017-05-11T03:45:25.368Z|00035|bridge|INFO|bridge br-eth: using
> > > > datapath ID 00002a51cf9f2841
> > > > 2017-05-11T03:45:25.368Z|00036|connmgr|INFO|br-eth: added service
> > > controller "punix:/var/run/openvswitch/br-eth.mgmt"
> > > >
> > > > Signed-off-by: fukaige <fukaige@huawei.com>
> > >
> > > Thank you for the updated patch.
> > >
> > > This patch raises some difficulties.  First, it uses rtnetlink, which
> > > is Linux specific.  We do not want tnl-ports to be Linux-specific.
> > > The more generic alternative is ifnotifier, but it provides no
> > > information about the change that took place, so it is harder to deal
> > > with.  Second, it will dereference a null pointer if the 'change'
> > > passed in is null, which can happen if the system is busy or devices are
> > changing quickly, as described in rtnetlink.h:
> > >
> > >     /* Function called to report that a netdev has changed.  'change'
> > > describes the
> > >      * specific change.  It may be null if the buffer of change information
> > >      * overflowed, in which case the function must assume that every
> > > device may
> > >      * have changed.  'aux' is as specified in the call to
> > >      * rtnetlink_notifier_register().  */
> > >     typedef
> > >     void rtnetlink_notify_func(const struct rtnetlink_change *change,
> > >                                void *aux);
> > >
> > > I am not sure the best way to proceed.  One way would be to revamp
> > > ifnotifier so that it provides information on the device that changed
> > > and the kind of change, and then adapt each of the implementations to pass
> > that along as well.
> > > Other approaches along these lines are also possible.
> > >
> > > Another approach might be to notify the ovs-router subsystem when a
> > > bridge is deleted, so that it can drop all of the references it has
> > > for that bridge.  I haven't looked into how much work this would be.
> > >
> > > What do you think?
> > >
> > > Thanks,
> > >
> > > Ben.
> > 
> > Thanks for your review.
> > 
> > I prefer the second approach you mentioned. May be we can register a callback
> > in ovs-router level, like route_table_init.
> > When a bridge is deleted, ovs-router subsystem will get a msg, like
> > RTM_DELLINK on linux. So it can drop all of the references it has for that
> > bridge.
> > 
> > I don't know if there is same bug in other OSes, like bsd, and which type of msg
> > will be sent in the situation.
> 
> I think we can fix the bug through the following way:
> 	1. register a callback using rtnetlink_notifier_create in route_table_init(route-table.c);
> 	2.when receive the msg RTM_DELLINK, drop all of the references it has for that bridge.
> 
> The reason I think the above method can work is that ovs-router subsystem in bsd(route-table-bsd.c) does not hold the references of that bridge.
> So, there is no this bug in bsd. I just hit it in linux.
> 
> What do you think?

It's promising enough that I'd like to see a patch that implements it.
Would you mind writing one?

Thanks,

Ben.
fukaige Aug. 9, 2017, 12:47 a.m. UTC | #5
> -----Original Message-----
> From: Ben Pfaff [mailto:blp@ovn.org]
> Sent: Wednesday, August 09, 2017 4:06 AM
> To: fukaige
> Cc: dev@openvswitch.org; Zhaoshenglong
> Subject: Re: [PATCH v3] tnl-ports: Remove netdevs in netdev_hash when
> deleted
> 
> On Thu, Jul 13, 2017 at 02:04:53AM +0000, fukaige wrote:
> >
> >
> > > -----Original Message-----
> > > From: fukaige
> > > Sent: Tuesday, July 11, 2017 3:34 PM
> > > To: 'Ben Pfaff'
> > > Cc: dev@openvswitch.org; Zhaoshenglong
> > > Subject: RE: [PATCH v3] tnl-ports: Remove netdevs in netdev_hash
> > > when deleted
> > >
> > >
> > > > -----Original Message-----
> > > > From: Ben Pfaff [mailto:blp@ovn.org]
> > > > Sent: Saturday, July 08, 2017 5:54 AM
> > > > To: fukaige
> > > > Cc: dev@openvswitch.org; Zhaoshenglong
> > > > Subject: Re: [PATCH v3] tnl-ports: Remove netdevs in netdev_hash
> > > > when deleted
> > > >
> > > > On Thu, Jun 15, 2017 at 09:58:57AM +0800, fukaige wrote:
> > > > > tart a virtual machine with its backend tap device attached to a
> > > > > brought up
> > > > linux bridge.
> > > > > If we delete the linux bridge when vm is still running, we'll
> > > > > get the following error when trying to create a ovs bridge with the same
> name.
> > > > >
> > > > > The reason is that ovs-router subsystem add the linux bridge
> > > > > into netdev_shash, but does not remove it when the bridge is
> > > > > deleted in the situation. When the bridge is deleted, ovs will
> > > > > receive a RTM_DELLINK msg,
> > > > take this chance to remove the bridge in netdev_shash.
> > > > >
> > > > > ovs-vsctl: Error detected while setting up 'br-eth'.  See
> > > > > ovs-vswitchd log for
> > > > details.
> > > > >
> > > > > ovs-vswitchd log:
> > > > >
> > >
> 2017-05-11T03:45:25.293Z|00026|ofproto_dpif|INFO|system@ovs-system:
> > > > > Datapath supports recirculation
> > > > >
> > >
> 2017-05-11T03:45:25.293Z|00027|ofproto_dpif|INFO|system@ovs-system:
> > > > > MPLS label stack length probed as 1
> > > > >
> > >
> 2017-05-11T03:45:25.293Z|00028|ofproto_dpif|INFO|system@ovs-system:
> > > > > Datapath supports unique flow ids
> > > > >
> > >
> 2017-05-11T03:45:25.293Z|00029|ofproto_dpif|INFO|system@ovs-system:
> > > > > Datapath supports ct_state
> > > > >
> > >
> 2017-05-11T03:45:25.293Z|00030|ofproto_dpif|INFO|system@ovs-system:
> > > > > Datapath supports ct_zone
> > > > >
> > >
> 2017-05-11T03:45:25.293Z|00031|ofproto_dpif|INFO|system@ovs-system:
> > > > > Datapath supports ct_mark
> > > > >
> > >
> 2017-05-11T03:45:25.293Z|00032|ofproto_dpif|INFO|system@ovs-system:
> > > > > Datapath supports ct_label
> > > > > 2017-05-11T03:45:25.364Z|00001|ofproto_dpif_upcall(handler226)|I
> > > > > NFO| re ceived packet on unassociated datapath port 0
> > > > > 2017-05-11T03:45:25.368Z|00033|netdev_linux|WARN|ethtool
> command
> > > > > ETHTOOL_GFLAGS on network device br-eth failed: No such device
> > > > > 2017-05-11T03:45:25.368Z|00034|dpif|WARN|system@ovs-system:
> > > > > failed
> > > > to
> > > > > add br-eth as port: No such device
> > > > > 2017-05-11T03:45:25.368Z|00035|bridge|INFO|bridge br-eth: using
> > > > > datapath ID 00002a51cf9f2841
> > > > > 2017-05-11T03:45:25.368Z|00036|connmgr|INFO|br-eth: added
> > > > > service
> > > > controller "punix:/var/run/openvswitch/br-eth.mgmt"
> > > > >
> > > > > Signed-off-by: fukaige <fukaige@huawei.com>
> > > >
> > > > Thank you for the updated patch.
> > > >
> > > > This patch raises some difficulties.  First, it uses rtnetlink,
> > > > which is Linux specific.  We do not want tnl-ports to be Linux-specific.
> > > > The more generic alternative is ifnotifier, but it provides no
> > > > information about the change that took place, so it is harder to
> > > > deal with.  Second, it will dereference a null pointer if the 'change'
> > > > passed in is null, which can happen if the system is busy or
> > > > devices are
> > > changing quickly, as described in rtnetlink.h:
> > > >
> > > >     /* Function called to report that a netdev has changed.  'change'
> > > > describes the
> > > >      * specific change.  It may be null if the buffer of change
> information
> > > >      * overflowed, in which case the function must assume that
> > > > every device may
> > > >      * have changed.  'aux' is as specified in the call to
> > > >      * rtnetlink_notifier_register().  */
> > > >     typedef
> > > >     void rtnetlink_notify_func(const struct rtnetlink_change *change,
> > > >                                void *aux);
> > > >
> > > > I am not sure the best way to proceed.  One way would be to revamp
> > > > ifnotifier so that it provides information on the device that
> > > > changed and the kind of change, and then adapt each of the
> > > > implementations to pass
> > > that along as well.
> > > > Other approaches along these lines are also possible.
> > > >
> > > > Another approach might be to notify the ovs-router subsystem when
> > > > a bridge is deleted, so that it can drop all of the references it
> > > > has for that bridge.  I haven't looked into how much work this would be.
> > > >
> > > > What do you think?
> > > >
> > > > Thanks,
> > > >
> > > > Ben.
> > >
> > > Thanks for your review.
> > >
> > > I prefer the second approach you mentioned. May be we can register a
> > > callback in ovs-router level, like route_table_init.
> > > When a bridge is deleted, ovs-router subsystem will get a msg, like
> > > RTM_DELLINK on linux. So it can drop all of the references it has
> > > for that bridge.
> > >
> > > I don't know if there is same bug in other OSes, like bsd, and which
> > > type of msg will be sent in the situation.
> >
> > I think we can fix the bug through the following way:
> > 	1. register a callback using rtnetlink_notifier_create in
> route_table_init(route-table.c);
> > 	2.when receive the msg RTM_DELLINK, drop all of the references it has
> for that bridge.
> >
> > The reason I think the above method can work is that ovs-router subsystem in
> bsd(route-table-bsd.c) does not hold the references of that bridge.
> > So, there is no this bug in bsd. I just hit it in linux.
> >
> > What do you think?
> 
> It's promising enough that I'd like to see a patch that implements it.
> Would you mind writing one?
> 
Thank you for your reply.
I'll wirting a patch as soon as possible.

> Thanks,
> 
> Ben.
diff mbox

Patch

diff --git a/lib/tnl-ports.c b/lib/tnl-ports.c
index 4a07dcb..5f5791c 100644
--- a/lib/tnl-ports.c
+++ b/lib/tnl-ports.c
@@ -21,6 +21,7 @@ 
 #include <stddef.h>
 #include <stdint.h>
 #include <string.h>
+#include <linux/rtnetlink.h>
 
 #include "classifier.h"
 #include "openvswitch/dynamic-string.h"
@@ -33,6 +34,7 @@ 
 #include "ovs-thread.h"
 #include "unixctl.h"
 #include "util.h"
+#include "rtnetlink.h"
 
 static struct ovs_mutex mutex = OVS_MUTEX_INITIALIZER;
 static struct classifier cls;   /* Tunnel ports. */
@@ -499,11 +501,19 @@  tnl_port_map_run(void)
     ovs_mutex_unlock(&mutex);
 }
 
+static void
+rtnetlink_del_cb(const struct rtnetlink_change *change, void *aux OVS_UNUSED)
+{
+    if(change->nlmsg_type == RTM_DELLINK)
+        tnl_port_map_delete_ipdev(change->ifname);
+}
+
 void
 tnl_port_map_init(void)
 {
     classifier_init(&cls, flow_segment_u64s);
     ovs_list_init(&addr_list);
     ovs_list_init(&port_list);
+    rtnetlink_notifier_create(rtnetlink_del_cb, NULL);
     unixctl_command_register("tnl/ports/show", "-v", 0, 1, tnl_port_show, NULL);
 }