diff mbox

[net-next] bridge: Let bridge not age 'externally' learnt FDB entries, they are removed when 'external' entity notifies the aging

Message ID 1422897714-5956-1-git-send-email-siva.mannem.lnx@gmail.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Siva Mannem Feb. 2, 2015, 5:21 p.m. UTC
When 'learned_sync' flag is turned on, the offloaded switch
 port syncs learned MAC addresses to bridge's FDB via switchdev notifier
 (NETDEV_SWITCH_FDB_ADD). Currently, FDB entries learnt via this mechanism are
 wrongly being deleted by bridge aging logic. This patch ensures that FDB
 entries synced from offloaded switch ports are not deleted by bridging logic.
 Such entries can only be deleted via switchdev notifier
 (NETDEV_SWITCH_FDB_DEL).

Signed-off-by: Siva Mannem <siva.mannem.lnx@gmail.com>
---
 net/bridge/br_fdb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Roopa Prabhu Feb. 3, 2015, 3:11 p.m. UTC | #1
On 2/2/15, 9:21 AM, Siva Mannem wrote:
>   When 'learned_sync' flag is turned on, the offloaded switch
>   port syncs learned MAC addresses to bridge's FDB via switchdev notifier
>   (NETDEV_SWITCH_FDB_ADD). Currently, FDB entries learnt via this mechanism are
>   wrongly being deleted by bridge aging logic. This patch ensures that FDB
>   entries synced from offloaded switch ports are not deleted by bridging logic.
>   Such entries can only be deleted via switchdev notifier
>   (NETDEV_SWITCH_FDB_DEL).

Your patch seems right and maintains symmetry for fdb add/del of 
externally learnt entries.
However, this could be made configurable. I think some drivers may rely 
on bridge driver aging these entries (The default setting needs more 
thought).
I am not sure what rocker does (CC'ed rocker maintainers). But, our 
driver does rely on the bridge driver aging these entries by default.
>
> Signed-off-by: Siva Mannem <siva.mannem.lnx@gmail.com>
> ---
>   net/bridge/br_fdb.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
> index 08bf04b..6eb94b5 100644
> --- a/net/bridge/br_fdb.c
> +++ b/net/bridge/br_fdb.c
> @@ -280,7 +280,7 @@ void br_fdb_cleanup(unsigned long _data)
>   
>   		hlist_for_each_entry_safe(f, n, &br->hash[i], hlist) {
>   			unsigned long this_timer;
> -			if (f->is_static)
> +			if (f->is_static || f->added_by_external_learn)
>   				continue;
>   			this_timer = f->updated + delay;
>   			if (time_before_eq(this_timer, jiffies))

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Siva Mannem Feb. 4, 2015, 8:02 a.m. UTC | #2
On Tue, Feb 3, 2015 at 8:41 PM, roopa <roopa@cumulusnetworks.com> wrote:
> On 2/2/15, 9:21 AM, Siva Mannem wrote:
>>
>>   When 'learned_sync' flag is turned on, the offloaded switch
>>   port syncs learned MAC addresses to bridge's FDB via switchdev notifier
>>   (NETDEV_SWITCH_FDB_ADD). Currently, FDB entries learnt via this
>> mechanism are
>>   wrongly being deleted by bridge aging logic. This patch ensures that FDB
>>   entries synced from offloaded switch ports are not deleted by bridging
>> logic.
>>   Such entries can only be deleted via switchdev notifier
>>   (NETDEV_SWITCH_FDB_DEL).
>
>
> Your patch seems right and maintains symmetry for fdb add/del of externally
> learnt entries.
> However, this could be made configurable. I think some drivers may rely on
> bridge driver aging these entries (The default setting needs more thought).
> I am not sure what rocker does (CC'ed rocker maintainers). But, our driver
> does rely on the bridge driver aging these entries by default.

added_by_external_learn flag is only set for entries learned via
switchdev notifier
(NETDEV_SWITCH_FDB_ADD) and rocker is the only driver using these notifiers.
 I see that rocker is deleting the entries via switchdev notifier
(NETDEV_SWITCH_FDB_DEL).
This mechanism is only used by drivers when learned_sync is turned on by user.

$ sudo bridge link set dev swp1 learning_sync on self

Am I missing something here?

>
>>
>> Signed-off-by: Siva Mannem <siva.mannem.lnx@gmail.com>
>> ---
>>   net/bridge/br_fdb.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
>> index 08bf04b..6eb94b5 100644
>> --- a/net/bridge/br_fdb.c
>> +++ b/net/bridge/br_fdb.c
>> @@ -280,7 +280,7 @@ void br_fdb_cleanup(unsigned long _data)
>>                 hlist_for_each_entry_safe(f, n, &br->hash[i], hlist) {
>>                         unsigned long this_timer;
>> -                       if (f->is_static)
>> +                       if (f->is_static || f->added_by_external_learn)
>>                                 continue;
>>                         this_timer = f->updated + delay;
>>                         if (time_before_eq(this_timer, jiffies))
>
>
Roopa Prabhu Feb. 4, 2015, 4:19 p.m. UTC | #3
On 2/4/15, 12:02 AM, Siva Mannem wrote:
> On Tue, Feb 3, 2015 at 8:41 PM, roopa <roopa@cumulusnetworks.com> wrote:
>> On 2/2/15, 9:21 AM, Siva Mannem wrote:
>>>    When 'learned_sync' flag is turned on, the offloaded switch
>>>    port syncs learned MAC addresses to bridge's FDB via switchdev notifier
>>>    (NETDEV_SWITCH_FDB_ADD). Currently, FDB entries learnt via this
>>> mechanism are
>>>    wrongly being deleted by bridge aging logic. This patch ensures that FDB
>>>    entries synced from offloaded switch ports are not deleted by bridging
>>> logic.
>>>    Such entries can only be deleted via switchdev notifier
>>>    (NETDEV_SWITCH_FDB_DEL).
>>
>> Your patch seems right and maintains symmetry for fdb add/del of externally
>> learnt entries.
>> However, this could be made configurable. I think some drivers may rely on
>> bridge driver aging these entries (The default setting needs more thought).
>> I am not sure what rocker does (CC'ed rocker maintainers). But, our driver
>> does rely on the bridge driver aging these entries by default.
> added_by_external_learn flag is only set for entries learned via
> switchdev notifier
> (NETDEV_SWITCH_FDB_ADD) and rocker is the only driver using these notifiers.
>   I see that rocker is deleting the entries via switchdev notifier
> (NETDEV_SWITCH_FDB_DEL).
> This mechanism is only used by drivers when learned_sync is turned on by user.
>
> $ sudo bridge link set dev swp1 learning_sync on self
>
> Am I missing something here?
I know that its enabled by an external flag. I wasn't sure rocker was 
doing a del
or was relying on the bridge driver to age those entries (hence the CC 
to rocker maintainers).
And, my only point was its valid in some cases for the switch driver to 
rely on bridge driver ageing those entries.
For symmetry, your patch seems right.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Feb. 4, 2015, 9:51 p.m. UTC | #4
From: Siva Mannem <siva.mannem.lnx@gmail.com>
Date: Mon,  2 Feb 2015 22:51:54 +0530

>  When 'learned_sync' flag is turned on, the offloaded switch
>  port syncs learned MAC addresses to bridge's FDB via switchdev notifier
>  (NETDEV_SWITCH_FDB_ADD). Currently, FDB entries learnt via this mechanism are
>  wrongly being deleted by bridge aging logic. This patch ensures that FDB
>  entries synced from offloaded switch ports are not deleted by bridging logic.
>  Such entries can only be deleted via switchdev notifier
>  (NETDEV_SWITCH_FDB_DEL).
> 
> Signed-off-by: Siva Mannem <siva.mannem.lnx@gmail.com>

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Scott Feldman Feb. 5, 2015, 7:13 a.m. UTC | #5
On Wed, Feb 4, 2015 at 8:19 AM, roopa <roopa@cumulusnetworks.com> wrote:
> On 2/4/15, 12:02 AM, Siva Mannem wrote:
>>
>> On Tue, Feb 3, 2015 at 8:41 PM, roopa <roopa@cumulusnetworks.com> wrote:
>>>
>>> On 2/2/15, 9:21 AM, Siva Mannem wrote:
>>>>
>>>>    When 'learned_sync' flag is turned on, the offloaded switch
>>>>    port syncs learned MAC addresses to bridge's FDB via switchdev
>>>> notifier
>>>>    (NETDEV_SWITCH_FDB_ADD). Currently, FDB entries learnt via this
>>>> mechanism are
>>>>    wrongly being deleted by bridge aging logic. This patch ensures that
>>>> FDB
>>>>    entries synced from offloaded switch ports are not deleted by
>>>> bridging
>>>> logic.
>>>>    Such entries can only be deleted via switchdev notifier
>>>>    (NETDEV_SWITCH_FDB_DEL).
>>>
>>>
>>> Your patch seems right and maintains symmetry for fdb add/del of
>>> externally
>>> learnt entries.
>>> However, this could be made configurable. I think some drivers may rely
>>> on
>>> bridge driver aging these entries (The default setting needs more
>>> thought).
>>> I am not sure what rocker does (CC'ed rocker maintainers). But, our
>>> driver
>>> does rely on the bridge driver aging these entries by default.
>>
>> added_by_external_learn flag is only set for entries learned via
>> switchdev notifier
>> (NETDEV_SWITCH_FDB_ADD) and rocker is the only driver using these
>> notifiers.
>>   I see that rocker is deleting the entries via switchdev notifier
>> (NETDEV_SWITCH_FDB_DEL).
>> This mechanism is only used by drivers when learned_sync is turned on by
>> user.
>>
>> $ sudo bridge link set dev swp1 learning_sync on self
>>
>> Am I missing something here?
>
> I know that its enabled by an external flag. I wasn't sure rocker was doing
> a del
> or was relying on the bridge driver to age those entries (hence the CC to
> rocker maintainers).
> And, my only point was its valid in some cases for the switch driver to rely
> on bridge driver ageing those entries.
> For symmetry, your patch seems right.

No, not right.  Drats, email was sent to the wrong address for me.
Thanks Roopa for trying to keep me in the loop.

We want the bridge's aging logic to age out these externally learned
entries, just like it would age out internally learned entries.

I'd like to see this patch reverted so we can have a more
comprehensive discussion/solution.  With this patch applied, the only
user (rocker) of NETDEV_SWITCH_FDB_ADD is broken.  So please undo this
patch so rocker isn't broken and let's work on a knob to suit both
modes: 1) let bridge manage aging, 2) let device manage aging.

-scott
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Feb. 5, 2015, 7:53 a.m. UTC | #6
From: Scott Feldman <sfeldma@gmail.com>
Date: Wed, 4 Feb 2015 23:13:07 -0800

> I'd like to see this patch reverted so we can have a more
> comprehensive discussion/solution.  With this patch applied, the only
> user (rocker) of NETDEV_SWITCH_FDB_ADD is broken.  So please undo this
> patch so rocker isn't broken and let's work on a knob to suit both
> modes: 1) let bridge manage aging, 2) let device manage aging.

Patch reverted.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Siva Mannem Feb. 5, 2015, 10:26 a.m. UTC | #7
On Thu, Feb 5, 2015 at 1:23 PM, David Miller <davem@davemloft.net> wrote:
> From: Scott Feldman <sfeldma@gmail.com>
> Date: Wed, 4 Feb 2015 23:13:07 -0800
>
>> I'd like to see this patch reverted so we can have a more
>> comprehensive discussion/solution.  With this patch applied, the only
>> user (rocker) of NETDEV_SWITCH_FDB_ADD is broken.  So please undo this
>> patch so rocker isn't broken and let's work on a knob to suit both
>> modes: 1) let bridge manage aging, 2) let device manage aging.
>
> Patch reverted.

I now see why rocker is broken. Sorry for the churn.
As Scott/Roopa suggested, a new flag IFLA_BRPORT_AGING_SYNC (similar
to IFLA_BRPORT_LEARNING_SYNC) can be defined and stored in
net_bridge_port->flags when user configures it. Some thing like below.

$ sudo bridge link set dev swp1 ageing_sync on self

And bridge ageing logic does not age externally learnt entry *only* if
IFLA_BRPORT_AGING_SYNC flag on fdb entry's net_bridge_port is set.
This ensures that existing behavior continues to be default behavior
and is of no harm to rocker.

Please let me know your comments on above approach.
B Viswanath Feb. 5, 2015, 11:05 a.m. UTC | #8
On 5 February 2015 at 15:56, Siva Mannem <siva.mannem.lnx@gmail.com> wrote:
> On Thu, Feb 5, 2015 at 1:23 PM, David Miller <davem@davemloft.net> wrote:
>> From: Scott Feldman <sfeldma@gmail.com>
>> Date: Wed, 4 Feb 2015 23:13:07 -0800
>>
>>> I'd like to see this patch reverted so we can have a more
>>> comprehensive discussion/solution.  With this patch applied, the only
>>> user (rocker) of NETDEV_SWITCH_FDB_ADD is broken.  So please undo this
>>> patch so rocker isn't broken and let's work on a knob to suit both
>>> modes: 1) let bridge manage aging, 2) let device manage aging.
>>
>> Patch reverted.
>
> I now see why rocker is broken. Sorry for the churn.
> As Scott/Roopa suggested, a new flag IFLA_BRPORT_AGING_SYNC (similar
> to IFLA_BRPORT_LEARNING_SYNC) can be defined and stored in
> net_bridge_port->flags when user configures it. Some thing like below.
>
> $ sudo bridge link set dev swp1 ageing_sync on self
>
> And bridge ageing logic does not age externally learnt entry *only* if
> IFLA_BRPORT_AGING_SYNC flag on fdb entry's net_bridge_port is set.
> This ensures that existing behavior continues to be default behavior
> and is of no harm to rocker.
>
> Please let me know your comments on above approach.

I am not sure you really want to pass on this burden of deciding who
should age to the end user.  I think user should  not be forced to be
aware of all the device and driver properties.

May be a different way would be that -

1. driver specifies whether it can supporting ageing by itself (the
chip) during registration.
2. driver also allows that kernel turn off the driver/device ageing.
3. when a bridge is created with all ports from same device which
supports ageing (and which is currently enabled), then bridge defers
ageing to driver. Otherwise bridge disables ageing on all
participating devices and takes care of ageing by itself.
4. When ageing is disabled on a device, all other bridges that use any
ports from that device start ageing themselves.

I guess this sounds complicated, but it can ensure that user gets the
best default behaviour based on the device. Any alternate suggestions
?

Thanks
vissu

> --
> Regards,
> Siva Mannem.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Scott Feldman Feb. 5, 2015, 5:10 p.m. UTC | #9
On Thu, Feb 5, 2015 at 3:05 AM, B Viswanath <marichika4@gmail.com> wrote:
> On 5 February 2015 at 15:56, Siva Mannem <siva.mannem.lnx@gmail.com> wrote:
>> On Thu, Feb 5, 2015 at 1:23 PM, David Miller <davem@davemloft.net> wrote:
>>> From: Scott Feldman <sfeldma@gmail.com>
>>> Date: Wed, 4 Feb 2015 23:13:07 -0800
>>>
>>>> I'd like to see this patch reverted so we can have a more
>>>> comprehensive discussion/solution.  With this patch applied, the only
>>>> user (rocker) of NETDEV_SWITCH_FDB_ADD is broken.  So please undo this
>>>> patch so rocker isn't broken and let's work on a knob to suit both
>>>> modes: 1) let bridge manage aging, 2) let device manage aging.
>>>
>>> Patch reverted.
>>
>> I now see why rocker is broken. Sorry for the churn.
>> As Scott/Roopa suggested, a new flag IFLA_BRPORT_AGING_SYNC (similar
>> to IFLA_BRPORT_LEARNING_SYNC) can be defined and stored in
>> net_bridge_port->flags when user configures it. Some thing like below.
>>
>> $ sudo bridge link set dev swp1 ageing_sync on self
>>
>> And bridge ageing logic does not age externally learnt entry *only* if
>> IFLA_BRPORT_AGING_SYNC flag on fdb entry's net_bridge_port is set.
>> This ensures that existing behavior continues to be default behavior
>> and is of no harm to rocker.
>>
>> Please let me know your comments on above approach.
>
> I am not sure you really want to pass on this burden of deciding who
> should age to the end user.  I think user should  not be forced to be
> aware of all the device and driver properties.
>
> May be a different way would be that -
>
> 1. driver specifies whether it can supporting ageing by itself (the
> chip) during registration.
> 2. driver also allows that kernel turn off the driver/device ageing.
> 3. when a bridge is created with all ports from same device which
> supports ageing (and which is currently enabled), then bridge defers
> ageing to driver. Otherwise bridge disables ageing on all
> participating devices and takes care of ageing by itself.
> 4. When ageing is disabled on a device, all other bridges that use any
> ports from that device start ageing themselves.
>
> I guess this sounds complicated, but it can ensure that user gets the
> best default behaviour based on the device. Any alternate suggestions
> ?
>
> Thanks
> vissu
>
>> --
>> Regards,
>> Siva Mannem.
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Scott Feldman Feb. 5, 2015, 5:55 p.m. UTC | #10
On Thu, Feb 5, 2015 at 9:10 AM, Scott Feldman <sfeldma@gmail.com> wrote:
> On Thu, Feb 5, 2015 at 3:05 AM, B Viswanath <marichika4@gmail.com> wrote:
>> On 5 February 2015 at 15:56, Siva Mannem <siva.mannem.lnx@gmail.com> wrote:
>>> On Thu, Feb 5, 2015 at 1:23 PM, David Miller <davem@davemloft.net> wrote:
>>>> From: Scott Feldman <sfeldma@gmail.com>
>>>> Date: Wed, 4 Feb 2015 23:13:07 -0800
>>>>
>>>>> I'd like to see this patch reverted so we can have a more
>>>>> comprehensive discussion/solution.  With this patch applied, the only
>>>>> user (rocker) of NETDEV_SWITCH_FDB_ADD is broken.  So please undo this
>>>>> patch so rocker isn't broken and let's work on a knob to suit both
>>>>> modes: 1) let bridge manage aging, 2) let device manage aging.
>>>>
>>>> Patch reverted.

Thank you.

>>> I now see why rocker is broken. Sorry for the churn.
>>> As Scott/Roopa suggested, a new flag IFLA_BRPORT_AGING_SYNC (similar
>>> to IFLA_BRPORT_LEARNING_SYNC) can be defined and stored in
>>> net_bridge_port->flags when user configures it. Some thing like below.
>>>
>>> $ sudo bridge link set dev swp1 ageing_sync on self
>>>
>>> And bridge ageing logic does not age externally learnt entry *only* if
>>> IFLA_BRPORT_AGING_SYNC flag on fdb entry's net_bridge_port is set.
>>> This ensures that existing behavior continues to be default behavior
>>> and is of no harm to rocker.
>>>
>>> Please let me know your comments on above approach.
>>
>> I am not sure you really want to pass on this burden of deciding who
>> should age to the end user.  I think user should  not be forced to be
>> aware of all the device and driver properties.
>>
>> May be a different way would be that -
>>
>> 1. driver specifies whether it can supporting ageing by itself (the
>> chip) during registration.
>> 2. driver also allows that kernel turn off the driver/device ageing.
>> 3. when a bridge is created with all ports from same device which
>> supports ageing (and which is currently enabled), then bridge defers
>> ageing to driver. Otherwise bridge disables ageing on all
>> participating devices and takes care of ageing by itself.
>> 4. When ageing is disabled on a device, all other bridges that use any
>> ports from that device start ageing themselves.
>>
>> I guess this sounds complicated, but it can ensure that user gets the
>> best default behaviour based on the device. Any alternate suggestions
>> ?

So let me see if I can summarize the PROs and CONs of using the
default behavior of the bridge driver for ageing externally learned
FDB entries:

PROs

1) No new user knobs; just use existing bridge ageing settings for
both internal and
externally learned FDB entries.
2) Because we're using SW bridge, all offloaded switches behave the
same w.r.t. ageing.
3) Because we're using SW bridge, bug fixes and enhancements apply to
all offloaded switches.
4) Switch device model w.r.t. FDB ageing is simple to understand.
5) User's view of the ageing process is the same for internal and
external learned FDB entries.
The last-used stats make sense.
6) Leveraging what's already there...bridge ageing function is
tried-and-true....let's reuse it if we can.

CONs

1) Scale-ability.  The bridge driver can't keep up with ageing many
(10^5, for example) entries.
2) Scale-ability.  Keeping the entries periodically refreshed requires
a refresh sync from
the device to the bridge driver, and the bridge driver/kernel can't keep up.
3) With the bridge driver, the ageing settings are per-bridge, not per-port.

Did I miss any points?

I think it boils down to use-ability vs. scale-ability.  My personal
opinion is I'm skeptical there are real scale-ability issues because
the on-board switch CPU should easily be able to handle the ageing
tasks.  After all, the CPU will for the most part be idle, only
worrying about the occasional ctrl pkt (STP, LLDP, OSPF, etc).  I'm
most interested in keeping the user's experience simple, so if we can
extend the existing bridge ageing model to offloaded devices without
changing the users' experience compared to the non-offloaded case.

So let's draw out the CONs for using the bridge driver's ageing function as-is.

-scott
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
B Viswanath Feb. 5, 2015, 6:31 p.m. UTC | #11
On 5 February 2015 at 23:25, Scott Feldman <sfeldma@gmail.com> wrote:
<snip>
>>>> I now see why rocker is broken. Sorry for the churn.
>>>> As Scott/Roopa suggested, a new flag IFLA_BRPORT_AGING_SYNC (similar
>>>> to IFLA_BRPORT_LEARNING_SYNC) can be defined and stored in
>>>> net_bridge_port->flags when user configures it. Some thing like below.
>>>>
>>>> $ sudo bridge link set dev swp1 ageing_sync on self
>>>>
>>>> And bridge ageing logic does not age externally learnt entry *only* if
>>>> IFLA_BRPORT_AGING_SYNC flag on fdb entry's net_bridge_port is set.
>>>> This ensures that existing behavior continues to be default behavior
>>>> and is of no harm to rocker.
>>>>
>>>> Please let me know your comments on above approach.
>>>
>>> I am not sure you really want to pass on this burden of deciding who
>>> should age to the end user.  I think user should  not be forced to be
>>> aware of all the device and driver properties.
>>>
>>> May be a different way would be that -
>>>
>>> 1. driver specifies whether it can supporting ageing by itself (the
>>> chip) during registration.
>>> 2. driver also allows that kernel turn off the driver/device ageing.
>>> 3. when a bridge is created with all ports from same device which
>>> supports ageing (and which is currently enabled), then bridge defers
>>> ageing to driver. Otherwise bridge disables ageing on all
>>> participating devices and takes care of ageing by itself.
>>> 4. When ageing is disabled on a device, all other bridges that use any
>>> ports from that device start ageing themselves.
>>>
>>> I guess this sounds complicated, but it can ensure that user gets the
>>> best default behaviour based on the device. Any alternate suggestions
>>> ?
>
> So let me see if I can summarize the PROs and CONs of using the
> default behavior of the bridge driver for ageing externally learned
> FDB entries:
>
> PROs
>
> 1) No new user knobs; just use existing bridge ageing settings for
> both internal and
> externally learned FDB entries.
> 2) Because we're using SW bridge, all offloaded switches behave the
> same w.r.t. ageing.
> 3) Because we're using SW bridge, bug fixes and enhancements apply to
> all offloaded switches.
> 4) Switch device model w.r.t. FDB ageing is simple to understand.
> 5) User's view of the ageing process is the same for internal and
> external learned FDB entries.
> The last-used stats make sense.
> 6) Leveraging what's already there...bridge ageing function is
> tried-and-true....let's reuse it if we can.
>
> CONs
>
> 1) Scale-ability.  The bridge driver can't keep up with ageing many
> (10^5, for example) entries.
> 2) Scale-ability.  Keeping the entries periodically refreshed requires
> a refresh sync from
> the device to the bridge driver, and the bridge driver/kernel can't keep up.
> 3) With the bridge driver, the ageing settings are per-bridge, not per-port.
>
> Did I miss any points?

There is one more

4. On links with wirespeed traffic, software ageing FDB entries
independently without having the knowledge of the traffic means that
CPU will ageout entries it should not. I don't know if this breaks
some RFC, but it can certainly cause big packet loss.

Such a packet loss is considered bad, and is a primary reason why
hardware supports ageing. Otherwise there is no reason hw needs to
support it. Based on experience, I can say that it is a necessity, now
even more with 10G and 40G links.

>
> I think it boils down to use-ability vs. scale-ability.  My personal
> opinion is I'm skeptical there are real scale-ability issues because
> the on-board switch CPU should easily be able to handle the ageing
> tasks.  After all, the CPU will for the most part be idle, only
> worrying about the occasional ctrl pkt (STP, LLDP, OSPF, etc).  I'm
> most interested in keeping the user's experience simple, so if we can
> extend the existing bridge ageing model to offloaded devices without
> changing the users' experience compared to the non-offloaded case.

I agree with no changes in user experience. I think my design change
suggestion can make sure that no user experience changes for all
regular use cases, and certainly no changes for non-switch-devices.

Thanks
Vissu

>
> So let's draw out the CONs for using the bridge driver's ageing function as-is.
>
> -scott
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Scott Feldman Feb. 6, 2015, 1:45 a.m. UTC | #12
On Thu, Feb 5, 2015 at 10:31 AM, B Viswanath <marichika4@gmail.com> wrote:
> On 5 February 2015 at 23:25, Scott Feldman <sfeldma@gmail.com> wrote:
> <snip>
>>
>> CONs
>>
>> 1) Scale-ability.  The bridge driver can't keep up with ageing many
>> (10^5, for example) entries.
>> 2) Scale-ability.  Keeping the entries periodically refreshed requires
>> a refresh sync from
>> the device to the bridge driver, and the bridge driver/kernel can't keep up.
>> 3) With the bridge driver, the ageing settings are per-bridge, not per-port.
>>
>> Did I miss any points?
>
> There is one more
>
> 4. On links with wirespeed traffic, software ageing FDB entries
> independently without having the knowledge of the traffic means that
> CPU will ageout entries it should not. I don't know if this breaks
> some RFC, but it can certainly cause big packet loss.

SW would only age out expired entries.  An entry not refreshed by HW
will expire.  But HW will refresh (to SW) active entries based on
traffic seen.  So I'm not seeing how SW will age out entries
prematurely.  I'm missing something.

Is there a race perhaps?  Maybe that's your point.  I think I see the
race.  SW has decided entry is expired and sends signal to remove
entry, while in the meantime HW got a hit on entry and wants to
refresh it.  So this would cause a service disruption if the entry was
removed and then re-added.  But this is a boundary condition on the
hairy edge of the ageing timeout window.  Does it matter in practice?

I really want to understand your case 4, but I'm not getting
something.  Is there another way to explain it, by example perhaps?

> Such a packet loss is considered bad, and is a primary reason why
> hardware supports ageing. Otherwise there is no reason hw needs to
> support it. Based on experience, I can say that it is a necessity, now
> even more with 10G and 40G links.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Siva Mannem Feb. 6, 2015, 3:27 a.m. UTC | #13
On Fri, Feb 6, 2015 at 7:15 AM, Scott Feldman <sfeldma@gmail.com> wrote:
> On Thu, Feb 5, 2015 at 10:31 AM, B Viswanath <marichika4@gmail.com> wrote:
>> On 5 February 2015 at 23:25, Scott Feldman <sfeldma@gmail.com> wrote:
>> <snip>
>>>
>>> CONs
>>>
>>> 1) Scale-ability.  The bridge driver can't keep up with ageing many
>>> (10^5, for example) entries.
>>> 2) Scale-ability.  Keeping the entries periodically refreshed requires
>>> a refresh sync from
>>> the device to the bridge driver, and the bridge driver/kernel can't keep up.
>>> 3) With the bridge driver, the ageing settings are per-bridge, not per-port.
>>>
>>> Did I miss any points?
>>
>> There is one more
>>
>> 4. On links with wirespeed traffic, software ageing FDB entries
>> independently without having the knowledge of the traffic means that
>> CPU will ageout entries it should not. I don't know if this breaks
>> some RFC, but it can certainly cause big packet loss.
>
> SW would only age out expired entries.  An entry not refreshed by HW
> will expire.  But HW will refresh (to SW) active entries based on
> traffic seen.  So I'm not seeing how SW will age out entries
> prematurely.  I'm missing something.

Hardware does not refresh(to SW) active entries based on the traffic
seen, at least on the switches that I worked on. Even if it were to
refresh, if the traffic is flowing at line rate(say x packets per
sec), the refresh rate to SW is equal to x, which is very huge for cpu
to handle.

There are only two notifications to SW.

1)learn notification(when packet arrives that has smac which is not in
hardware's FDB)
2)When the hardware ages the entry, it sends a age notification.

>
> Is there a race perhaps?  Maybe that's your point.  I think I see the
> race.  SW has decided entry is expired and sends signal to remove
> entry, while in the meantime HW got a hit on entry and wants to
> refresh it.  So this would cause a service disruption if the entry was
> removed and then re-added.  But this is a boundary condition on the
> hairy edge of the ageing timeout window.  Does it matter in practice?
>
> I really want to understand your case 4, but I'm not getting
> something.  Is there another way to explain it, by example perhaps?
>
>> Such a packet loss is considered bad, and is a primary reason why
>> hardware supports ageing. Otherwise there is no reason hw needs to
>> support it. Based on experience, I can say that it is a necessity, now
>> even more with 10G and 40G links.
Scott Feldman Feb. 7, 2015, 5:53 a.m. UTC | #14
On Thu, Feb 5, 2015 at 7:27 PM, Siva Mannem <siva.mannem.lnx@gmail.com> wrote:
> On Fri, Feb 6, 2015 at 7:15 AM, Scott Feldman <sfeldma@gmail.com> wrote:
>> On Thu, Feb 5, 2015 at 10:31 AM, B Viswanath <marichika4@gmail.com> wrote:
>>> On 5 February 2015 at 23:25, Scott Feldman <sfeldma@gmail.com> wrote:
>>> <snip>
>>>>
>>>> CONs
>>>>
>>>> 1) Scale-ability.  The bridge driver can't keep up with ageing many
>>>> (10^5, for example) entries.
>>>> 2) Scale-ability.  Keeping the entries periodically refreshed requires
>>>> a refresh sync from
>>>> the device to the bridge driver, and the bridge driver/kernel can't keep up.
>>>> 3) With the bridge driver, the ageing settings are per-bridge, not per-port.
>>>>
>>>> Did I miss any points?
>>>
>>> There is one more
>>>
>>> 4. On links with wirespeed traffic, software ageing FDB entries
>>> independently without having the knowledge of the traffic means that
>>> CPU will ageout entries it should not. I don't know if this breaks
>>> some RFC, but it can certainly cause big packet loss.
>>
>> SW would only age out expired entries.  An entry not refreshed by HW
>> will expire.  But HW will refresh (to SW) active entries based on
>> traffic seen.  So I'm not seeing how SW will age out entries
>> prematurely.  I'm missing something.
>
> Hardware does not refresh(to SW) active entries based on the traffic
> seen, at least on the switches that I worked on. Even if it were to
> refresh, if the traffic is flowing at line rate(say x packets per
> sec), the refresh rate to SW is equal to x, which is very huge for cpu
> to handle.

I meant refresh as in re-sending learn notification for entry on a
periodic basis, say once a second.  Not refreshing by presenting pkt
to SW...that wouldn't scale at all, as you say.

> There are only two notifications to SW.
>
> 1)learn notification(when packet arrives that has smac which is not in
> hardware's FDB)
> 2)When the hardware ages the entry, it sends a age notification.

I think I'm coming around to your thinking of letting HW manage ageing
on HW-learned FDB entries.  I was hoping to keep ageing in SW for the
PROs I mentioned earlier, but it's not aligning very well with real
HW.  If we modify rocker to work like real HW (that was the point of
rocker in the first place), then we can re-introduce your patch to let
bridge ignore ageing on entries marked with added_by_external_learn.
Let me work on the rocker mods and followup.

-scott
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index 08bf04b..6eb94b5 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -280,7 +280,7 @@  void br_fdb_cleanup(unsigned long _data)
 
 		hlist_for_each_entry_safe(f, n, &br->hash[i], hlist) {
 			unsigned long this_timer;
-			if (f->is_static)
+			if (f->is_static || f->added_by_external_learn)
 				continue;
 			this_timer = f->updated + delay;
 			if (time_before_eq(this_timer, jiffies))