mbox series

[net-next,0/3] VLANs, DSA switches and multiple bridges

Message ID 20200218114515.GL18808@shell.armlinux.org.uk
Headers show
Series VLANs, DSA switches and multiple bridges | expand

Message

Russell King (Oracle) Feb. 18, 2020, 11:45 a.m. UTC
Hi,

This is a repost of the previously posted RFC back in December, which
did not get fully reviewed.  I've dropped the RFC tag this time as no
one really found anything too problematical in the RFC posting.

I've been trying to configure DSA for VLANs and not having much success.
The setup is quite simple:

- The main network is untagged
- The wifi network is a vlan tagged with id $VN running over the main
  network.

I have an Armada 388 Clearfog with a PCIe wifi card which I'm trying to
setup to provide wifi access to the vlan $VN network, while the switch
is also part of the main network.

However, I'm encountering problems:

1) vlan support in DSA has a different behaviour from the Linux
   software bridge implementation.

    # bridge vlan
    port    vlan ids
    lan1     1 PVID Egress Untagged
    ...

   shows the default setup - the bridge ports are all configured for
   vlan 1, untagged egress, and vlan 1 as the port vid.  Issuing:

    # ip li set dev br0 type bridge vlan_filtering 1

   with no other vlan configuration commands on a Linux software bridge
   continues to allow untagged traffic to flow across the bridge.
   
   This difference in behaviour is because the MV88E6xxx VTU is
   completely empty - because net/dsa ignores all vlan settings for
   a port if br_vlan_enabled(dp->bridge_dev) is false - this reflects
   the vlan filtering state of the bridge, not whether the bridge is
   vlan aware.
   
   What this means is that attempting to configure the bridge port
   vlans before enabling vlan filtering works for Linux software
   bridges, but fails for DSA bridges.

2) Assuming the above is sorted, we move on to the next issue, which
   is altogether more weird.  Let's take a setup where we have a
   DSA bridge with lan1..6 in a bridge device, br0, with vlan
   filtering enabled.  lan1 is the upstream port, lan2 is a downstream
   port that also wants to see traffic on vlan id $VN.

   Both lan1 and lan2 are configured for that:

     # bridge vlan add vid $VN dev lan1
     # bridge vlan add vid $VN dev lan2
     # ip li set br0 type bridge vlan_filtering 1

   Untagged traffic can now pass between all the six lan ports, and
   vlan $VN between lan1 and lan2 only.  The MV88E6xxx 8021q_mode
   debugfs file shows all lan ports are in mode "secure" - this is
   important!  /sys/class/net/br0/bridge/vlan_filtering contains 1.

   tcpdumping from another machine on lan4 shows that no $VN traffic
   reaches it.  Everything seems to be working correctly...
   
   In order to further bridge vlan $VN traffic to hostapd's wifi
   interface, things get a little more complex - we can't add hostapd's
   wifi interface to br0 directly, because hostapd will bring up the
   wifi interface and leak the main, untagged traffic onto the wifi.
   (hostapd does have vlan support, but only as a dynamic per-client
   thing, and there's no hooks I can see to allow script-based config
   of the network setup before hostapd up's the wifi interface.)

   So, what I tried was:

     # ip li add link br0 name br0.$VN type vlan id $VN
     # bridge vlan add vid $VN dev br0 self
     # ip li set dev br0.$VN up

   So far so good, we get a vlan interface on top of the bridge, and
   tcpdumping it shows we get traffic.  The 8021q_mode file has not
   changed state.  Everything still seems to be correct.

     # bridge addbr br1

   Still nothing has changed.

     # bridge addif br1 br0.$VN

   And now the 8021q_mode debugfs file shows that all ports are now in
   "disabled" mode, but /sys/class/net/br0/bridge/vlan_filtering still
   contains '1'.  In other words, br0 still thinks vlan filtering is
   enabled, but the hardware has had vlan filtering disabled.

   Adding some stack traces to an appropriate point indicates that this
   is because __switchdev_handle_port_attr_set() recurses down through
   the tree of interfaces, skipping over the vlan interface, applying
   br1's configuration to br0's ports.

   This surely can not be right - surely
   __switchdev_handle_port_attr_set() and similar should stop recursing
   down through another master bridge device?  There are probably other
   network device classes that switchdev shouldn't recurse down too.

   I've considered whether switchdev is the right level to do it, and
   I think it is - as we want the check/set callbacks to be called for
   the top level device even if it is a master bridge device, but we
   don't want to recurse through a lower master bridge device.

Comments

Florian Fainelli Feb. 18, 2020, 7:26 p.m. UTC | #1
On 2/18/20 3:45 AM, Russell King - ARM Linux admin wrote:
> Hi,
> 
> This is a repost of the previously posted RFC back in December, which
> did not get fully reviewed.  I've dropped the RFC tag this time as no
> one really found anything too problematical in the RFC posting.
> 
> I've been trying to configure DSA for VLANs and not having much success.
> The setup is quite simple:
> 
> - The main network is untagged
> - The wifi network is a vlan tagged with id $VN running over the main
>   network.
> 
> I have an Armada 388 Clearfog with a PCIe wifi card which I'm trying to
> setup to provide wifi access to the vlan $VN network, while the switch
> is also part of the main network.
> 
> However, I'm encountering problems:
> 
> 1) vlan support in DSA has a different behaviour from the Linux
>    software bridge implementation.
> 
>     # bridge vlan
>     port    vlan ids
>     lan1     1 PVID Egress Untagged
>     ...
> 
>    shows the default setup - the bridge ports are all configured for
>    vlan 1, untagged egress, and vlan 1 as the port vid.  Issuing:
> 
>     # ip li set dev br0 type bridge vlan_filtering 1
> 
>    with no other vlan configuration commands on a Linux software bridge
>    continues to allow untagged traffic to flow across the bridge.
>    
>    This difference in behaviour is because the MV88E6xxx VTU is
>    completely empty - because net/dsa ignores all vlan settings for
>    a port if br_vlan_enabled(dp->bridge_dev) is false - this reflects
>    the vlan filtering state of the bridge, not whether the bridge is
>    vlan aware.
>    
>    What this means is that attempting to configure the bridge port
>    vlans before enabling vlan filtering works for Linux software
>    bridges, but fails for DSA bridges.

FWIW, because VLAN filtering is global with b53 switches, we do actually
program every port with a valid VLAN entry into VID 0 which ensures that
standalone (non-bridged) ports, with, or without VLAN interfaces on top
(e.g.: sw0p0.N) continue to work as soon as another port gets enslaved
into a bridge that does request vlan_filtering.

> 
> 2) Assuming the above is sorted, we move on to the next issue, which
>    is altogether more weird.  Let's take a setup where we have a
>    DSA bridge with lan1..6 in a bridge device, br0, with vlan
>    filtering enabled.  lan1 is the upstream port, lan2 is a downstream
>    port that also wants to see traffic on vlan id $VN.
> 
>    Both lan1 and lan2 are configured for that:
> 
>      # bridge vlan add vid $VN dev lan1
>      # bridge vlan add vid $VN dev lan2
>      # ip li set br0 type bridge vlan_filtering 1
> 
>    Untagged traffic can now pass between all the six lan ports, and
>    vlan $VN between lan1 and lan2 only.  The MV88E6xxx 8021q_mode
>    debugfs file shows all lan ports are in mode "secure" - this is
>    important!  /sys/class/net/br0/bridge/vlan_filtering contains 1.
> 
>    tcpdumping from another machine on lan4 shows that no $VN traffic
>    reaches it.  Everything seems to be working correctly...
>    
>    In order to further bridge vlan $VN traffic to hostapd's wifi
>    interface, things get a little more complex - we can't add hostapd's
>    wifi interface to br0 directly, because hostapd will bring up the
>    wifi interface and leak the main, untagged traffic onto the wifi.
>    (hostapd does have vlan support, but only as a dynamic per-client
>    thing, and there's no hooks I can see to allow script-based config
>    of the network setup before hostapd up's the wifi interface.)
> 
>    So, what I tried was:
> 
>      # ip li add link br0 name br0.$VN type vlan id $VN
>      # bridge vlan add vid $VN dev br0 self
>      # ip li set dev br0.$VN up
> 
>    So far so good, we get a vlan interface on top of the bridge, and
>    tcpdumping it shows we get traffic.  The 8021q_mode file has not
>    changed state.  Everything still seems to be correct.
> 
>      # bridge addbr br1
> 
>    Still nothing has changed.
> 
>      # bridge addif br1 br0.$VN
> 
>    And now the 8021q_mode debugfs file shows that all ports are now in
>    "disabled" mode, but /sys/class/net/br0/bridge/vlan_filtering still
>    contains '1'.  In other words, br0 still thinks vlan filtering is
>    enabled, but the hardware has had vlan filtering disabled.
> 
>    Adding some stack traces to an appropriate point indicates that this
>    is because __switchdev_handle_port_attr_set() recurses down through
>    the tree of interfaces, skipping over the vlan interface, applying
>    br1's configuration to br0's ports.
> 
>    This surely can not be right - surely
>    __switchdev_handle_port_attr_set() and similar should stop recursing
>    down through another master bridge device?  There are probably other
>    network device classes that switchdev shouldn't recurse down too.
> 
>    I've considered whether switchdev is the right level to do it, and
>    I think it is - as we want the check/set callbacks to be called for
>    the top level device even if it is a master bridge device, but we
>    don't want to recurse through a lower master bridge device.
>
Florian Fainelli Feb. 19, 2020, midnight UTC | #2
On 2/18/20 3:45 AM, Russell King - ARM Linux admin wrote:
> Hi,
> 
> This is a repost of the previously posted RFC back in December, which
> did not get fully reviewed.  I've dropped the RFC tag this time as no
> one really found anything too problematical in the RFC posting.
> 
> I've been trying to configure DSA for VLANs and not having much success.
> The setup is quite simple:
> 
> - The main network is untagged
> - The wifi network is a vlan tagged with id $VN running over the main
>   network.
> 
> I have an Armada 388 Clearfog with a PCIe wifi card which I'm trying to
> setup to provide wifi access to the vlan $VN network, while the switch
> is also part of the main network.

Why not just revert 2ea7a679ca2abd251c1ec03f20508619707e1749 ("net: dsa:
Don't add vlans when vlan filtering is disabled")? If a driver wants to
veto the programming of VLANs while it has ports enslaved to a bridge
that does not have VLAN filtering, it should have enough information to
not do that operation.
Russell King (Oracle) Feb. 19, 2020, 12:17 a.m. UTC | #3
On Tue, Feb 18, 2020 at 04:00:08PM -0800, Florian Fainelli wrote:
> On 2/18/20 3:45 AM, Russell King - ARM Linux admin wrote:
> > Hi,
> > 
> > This is a repost of the previously posted RFC back in December, which
> > did not get fully reviewed.  I've dropped the RFC tag this time as no
> > one really found anything too problematical in the RFC posting.
> > 
> > I've been trying to configure DSA for VLANs and not having much success.
> > The setup is quite simple:
> > 
> > - The main network is untagged
> > - The wifi network is a vlan tagged with id $VN running over the main
> >   network.
> > 
> > I have an Armada 388 Clearfog with a PCIe wifi card which I'm trying to
> > setup to provide wifi access to the vlan $VN network, while the switch
> > is also part of the main network.
> 
> Why not just revert 2ea7a679ca2abd251c1ec03f20508619707e1749 ("net: dsa:
> Don't add vlans when vlan filtering is disabled")? If a driver wants to
> veto the programming of VLANs while it has ports enslaved to a bridge
> that does not have VLAN filtering, it should have enough information to
> not do that operation.

I do not have the knowledge to know whether reverting that commit
would be appropriate; I do not know how the non-Marvell switches will
behave with such a revert - what was the reason for the commit in
the first place?

The commit says:

    This fixes at least one corner case. There are still issues in other
    corners, such as when vlan_filtering is later enabled.

but it doesn't say what that corner case was.  So, presumably reverting
it will cause a regression of whatever that corner case was...
Florian Fainelli Feb. 19, 2020, 12:33 a.m. UTC | #4
On 2/18/20 4:17 PM, Russell King - ARM Linux admin wrote:
> On Tue, Feb 18, 2020 at 04:00:08PM -0800, Florian Fainelli wrote:
>> On 2/18/20 3:45 AM, Russell King - ARM Linux admin wrote:
>>> Hi,
>>>
>>> This is a repost of the previously posted RFC back in December, which
>>> did not get fully reviewed.  I've dropped the RFC tag this time as no
>>> one really found anything too problematical in the RFC posting.
>>>
>>> I've been trying to configure DSA for VLANs and not having much success.
>>> The setup is quite simple:
>>>
>>> - The main network is untagged
>>> - The wifi network is a vlan tagged with id $VN running over the main
>>>   network.
>>>
>>> I have an Armada 388 Clearfog with a PCIe wifi card which I'm trying to
>>> setup to provide wifi access to the vlan $VN network, while the switch
>>> is also part of the main network.
>>
>> Why not just revert 2ea7a679ca2abd251c1ec03f20508619707e1749 ("net: dsa:
>> Don't add vlans when vlan filtering is disabled")? If a driver wants to
>> veto the programming of VLANs while it has ports enslaved to a bridge
>> that does not have VLAN filtering, it should have enough information to
>> not do that operation.
> 
> I do not have the knowledge to know whether reverting that commit
> would be appropriate; I do not know how the non-Marvell switches will
> behave with such a revert - what was the reason for the commit in
> the first place?
> 
> The commit says:
> 
>     This fixes at least one corner case. There are still issues in other
>     corners, such as when vlan_filtering is later enabled.
> 
> but it doesn't say what that corner case was.  So, presumably reverting
> it will cause a regression of whatever that corner case was...

Andrew, can you provide more details on what prompted you to do this in
the first place?
Vivien Didelot Feb. 19, 2020, 12:58 a.m. UTC | #5
On Wed, 19 Feb 2020 00:17:37 +0000, Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote:
> On Tue, Feb 18, 2020 at 04:00:08PM -0800, Florian Fainelli wrote:
> > On 2/18/20 3:45 AM, Russell King - ARM Linux admin wrote:
> > > Hi,
> > > 
> > > This is a repost of the previously posted RFC back in December, which
> > > did not get fully reviewed.  I've dropped the RFC tag this time as no
> > > one really found anything too problematical in the RFC posting.
> > > 
> > > I've been trying to configure DSA for VLANs and not having much success.
> > > The setup is quite simple:
> > > 
> > > - The main network is untagged
> > > - The wifi network is a vlan tagged with id $VN running over the main
> > >   network.
> > > 
> > > I have an Armada 388 Clearfog with a PCIe wifi card which I'm trying to
> > > setup to provide wifi access to the vlan $VN network, while the switch
> > > is also part of the main network.
> > 
> > Why not just revert 2ea7a679ca2abd251c1ec03f20508619707e1749 ("net: dsa:
> > Don't add vlans when vlan filtering is disabled")? If a driver wants to
> > veto the programming of VLANs while it has ports enslaved to a bridge
> > that does not have VLAN filtering, it should have enough information to
> > not do that operation.
> 
> I do not have the knowledge to know whether reverting that commit
> would be appropriate; I do not know how the non-Marvell switches will
> behave with such a revert - what was the reason for the commit in
> the first place?
> 
> The commit says:
> 
>     This fixes at least one corner case. There are still issues in other
>     corners, such as when vlan_filtering is later enabled.
> 
> but it doesn't say what that corner case was.  So, presumably reverting
> it will cause a regression of whatever that corner case was...

It is hard to care about regression when we have no idea what these "corner
cases" are. Also things like ds->vlan_filtering* were added after if I'm
not mistaken, so the drivers might have enough information now to adapt to
or reject some unsupported bridge offload.

Getting rid of this limitation would definitely be another approach worth
trying.


Thanks,

	Vivien
Andrew Lunn Feb. 19, 2020, 3:47 a.m. UTC | #6
On Wed, Feb 19, 2020 at 12:17:37AM +0000, Russell King - ARM Linux admin wrote:
> On Tue, Feb 18, 2020 at 04:00:08PM -0800, Florian Fainelli wrote:
> > On 2/18/20 3:45 AM, Russell King - ARM Linux admin wrote:
> > > Hi,
> > > 
> > > This is a repost of the previously posted RFC back in December, which
> > > did not get fully reviewed.  I've dropped the RFC tag this time as no
> > > one really found anything too problematical in the RFC posting.
> > > 
> > > I've been trying to configure DSA for VLANs and not having much success.
> > > The setup is quite simple:
> > > 
> > > - The main network is untagged
> > > - The wifi network is a vlan tagged with id $VN running over the main
> > >   network.
> > > 
> > > I have an Armada 388 Clearfog with a PCIe wifi card which I'm trying to
> > > setup to provide wifi access to the vlan $VN network, while the switch
> > > is also part of the main network.
> > 
> > Why not just revert 2ea7a679ca2abd251c1ec03f20508619707e1749 ("net: dsa:
> > Don't add vlans when vlan filtering is disabled")? If a driver wants to
> > veto the programming of VLANs while it has ports enslaved to a bridge
> > that does not have VLAN filtering, it should have enough information to
> > not do that operation.
> 
> I do not have the knowledge to know whether reverting that commit
> would be appropriate; I do not know how the non-Marvell switches will
> behave with such a revert - what was the reason for the commit in
> the first place?
> 
> The commit says:
> 
>     This fixes at least one corner case. There are still issues in other
>     corners, such as when vlan_filtering is later enabled.
> 
> but it doesn't say what that corner case was.  So, presumably reverting
> it will cause a regression of whatever that corner case was...

Yes, sorry, bad commit message. I'm not too sure, but it could of been
that the switch was adding the VLANs to its tables, even though it
should not because filtering is disabled. And i also think the default
VLAN was not defined at that point, it only gets defined when
vlan_filtering is enabled?

       Andrew
Russell King (Oracle) Feb. 19, 2020, 9:19 a.m. UTC | #7
On Wed, Feb 19, 2020 at 04:47:30AM +0100, Andrew Lunn wrote:
> On Wed, Feb 19, 2020 at 12:17:37AM +0000, Russell King - ARM Linux admin wrote:
> > On Tue, Feb 18, 2020 at 04:00:08PM -0800, Florian Fainelli wrote:
> > > On 2/18/20 3:45 AM, Russell King - ARM Linux admin wrote:
> > > > Hi,
> > > > 
> > > > This is a repost of the previously posted RFC back in December, which
> > > > did not get fully reviewed.  I've dropped the RFC tag this time as no
> > > > one really found anything too problematical in the RFC posting.
> > > > 
> > > > I've been trying to configure DSA for VLANs and not having much success.
> > > > The setup is quite simple:
> > > > 
> > > > - The main network is untagged
> > > > - The wifi network is a vlan tagged with id $VN running over the main
> > > >   network.
> > > > 
> > > > I have an Armada 388 Clearfog with a PCIe wifi card which I'm trying to
> > > > setup to provide wifi access to the vlan $VN network, while the switch
> > > > is also part of the main network.
> > > 
> > > Why not just revert 2ea7a679ca2abd251c1ec03f20508619707e1749 ("net: dsa:
> > > Don't add vlans when vlan filtering is disabled")? If a driver wants to
> > > veto the programming of VLANs while it has ports enslaved to a bridge
> > > that does not have VLAN filtering, it should have enough information to
> > > not do that operation.
> > 
> > I do not have the knowledge to know whether reverting that commit
> > would be appropriate; I do not know how the non-Marvell switches will
> > behave with such a revert - what was the reason for the commit in
> > the first place?
> > 
> > The commit says:
> > 
> >     This fixes at least one corner case. There are still issues in other
> >     corners, such as when vlan_filtering is later enabled.
> > 
> > but it doesn't say what that corner case was.  So, presumably reverting
> > it will cause a regression of whatever that corner case was...
> 
> Yes, sorry, bad commit message. I'm not too sure, but it could of been
> that the switch was adding the VLANs to its tables, even though it
> should not because filtering is disabled. And i also think the default
> VLAN was not defined at that point, it only gets defined when
> vlan_filtering is enabled?

It's been too long since I researched all these details, but I seem
to remember that in the Linux software bridge, vlan 1 is always
present even when vlan filtering is not enabled.

Looking at br_vlan_init():

        br->default_pvid = 1;

and nbp_vlan_init() propagates that irrespective of the bridge vlan
enable state to switchdev.  nbp_vlan_init() is called whenever any
interface is added to a bridge (in br_add_if()).

As I believe I mentioned somewhere in the commit messages or covering
message, for at least some of the Marvell DSA switches, it is safe to
add VTU entries - they do not even look at the VTU when the port has
802.1Q disabled.  Whether that is true for all Marvell's DSA switches
I don't know without trawling every functional spec, and I was hoping
that you guys would know.  I guess I need to trawl the specs.
Vivien Didelot Feb. 19, 2020, 6:07 p.m. UTC | #8
Hi Russell,

On Wed, 19 Feb 2020 09:19:00 +0000, Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote:
> On Wed, Feb 19, 2020 at 04:47:30AM +0100, Andrew Lunn wrote:
> > On Wed, Feb 19, 2020 at 12:17:37AM +0000, Russell King - ARM Linux admin wrote:
> > > On Tue, Feb 18, 2020 at 04:00:08PM -0800, Florian Fainelli wrote:
> > > > On 2/18/20 3:45 AM, Russell King - ARM Linux admin wrote:
> > > > > Hi,
> > > > > 
> > > > > This is a repost of the previously posted RFC back in December, which
> > > > > did not get fully reviewed.  I've dropped the RFC tag this time as no
> > > > > one really found anything too problematical in the RFC posting.
> > > > > 
> > > > > I've been trying to configure DSA for VLANs and not having much success.
> > > > > The setup is quite simple:
> > > > > 
> > > > > - The main network is untagged
> > > > > - The wifi network is a vlan tagged with id $VN running over the main
> > > > >   network.
> > > > > 
> > > > > I have an Armada 388 Clearfog with a PCIe wifi card which I'm trying to
> > > > > setup to provide wifi access to the vlan $VN network, while the switch
> > > > > is also part of the main network.
> > > > 
> > > > Why not just revert 2ea7a679ca2abd251c1ec03f20508619707e1749 ("net: dsa:
> > > > Don't add vlans when vlan filtering is disabled")? If a driver wants to
> > > > veto the programming of VLANs while it has ports enslaved to a bridge
> > > > that does not have VLAN filtering, it should have enough information to
> > > > not do that operation.
> > > 
> > > I do not have the knowledge to know whether reverting that commit
> > > would be appropriate; I do not know how the non-Marvell switches will
> > > behave with such a revert - what was the reason for the commit in
> > > the first place?
> > > 
> > > The commit says:
> > > 
> > >     This fixes at least one corner case. There are still issues in other
> > >     corners, such as when vlan_filtering is later enabled.
> > > 
> > > but it doesn't say what that corner case was.  So, presumably reverting
> > > it will cause a regression of whatever that corner case was...
> > 
> > Yes, sorry, bad commit message. I'm not too sure, but it could of been
> > that the switch was adding the VLANs to its tables, even though it
> > should not because filtering is disabled. And i also think the default
> > VLAN was not defined at that point, it only gets defined when
> > vlan_filtering is enabled?
> 
> It's been too long since I researched all these details, but I seem
> to remember that in the Linux software bridge, vlan 1 is always
> present even when vlan filtering is not enabled.
> 
> Looking at br_vlan_init():
> 
>         br->default_pvid = 1;
> 
> and nbp_vlan_init() propagates that irrespective of the bridge vlan
> enable state to switchdev.  nbp_vlan_init() is called whenever any
> interface is added to a bridge (in br_add_if()).
> 
> As I believe I mentioned somewhere in the commit messages or covering
> message, for at least some of the Marvell DSA switches, it is safe to
> add VTU entries - they do not even look at the VTU when the port has
> 802.1Q disabled.  Whether that is true for all Marvell's DSA switches
> I don't know without trawling every functional spec, and I was hoping
> that you guys would know.  I guess I need to trawl the specs.

Some switches like the Marvell 88E6060 don't have a VTU, so programming the
default PVID would return -EOPNOTSUPP. Switches supporting only global VLAN
filtering cannot have a VLAN filtering aware bridge as well as a non VLAN
filtering aware bridge spanning their ports at the same time. But all this
shouldn't be a problem because drivers inform the stack whether they support
ds->vlan_filtering per-port, globally or not. We should simply reject the
operation when vlan_filtering is being enabled on unsupported hardware.

Linux bridge is the reference for the implementation of an Ethernet bridge,
if it programs VLAN entries, supported DSA hardware must do so. I'm not a
fan of having our own bridge logic in DSA, so the limitation implemented by
2ea7a679ca2a ("net: dsa: Don't add vlans when vlan filtering is disabled")
needs to go in my opinion.


Thanks,

	Vivien
Florian Fainelli Feb. 19, 2020, 6:20 p.m. UTC | #9
On 2/19/20 10:07 AM, Vivien Didelot wrote:
> Hi Russell,
> 
> On Wed, 19 Feb 2020 09:19:00 +0000, Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote:
>> On Wed, Feb 19, 2020 at 04:47:30AM +0100, Andrew Lunn wrote:
>>> On Wed, Feb 19, 2020 at 12:17:37AM +0000, Russell King - ARM Linux admin wrote:
>>>> On Tue, Feb 18, 2020 at 04:00:08PM -0800, Florian Fainelli wrote:
>>>>> On 2/18/20 3:45 AM, Russell King - ARM Linux admin wrote:
>>>>>> Hi,
>>>>>>
>>>>>> This is a repost of the previously posted RFC back in December, which
>>>>>> did not get fully reviewed.  I've dropped the RFC tag this time as no
>>>>>> one really found anything too problematical in the RFC posting.
>>>>>>
>>>>>> I've been trying to configure DSA for VLANs and not having much success.
>>>>>> The setup is quite simple:
>>>>>>
>>>>>> - The main network is untagged
>>>>>> - The wifi network is a vlan tagged with id $VN running over the main
>>>>>>   network.
>>>>>>
>>>>>> I have an Armada 388 Clearfog with a PCIe wifi card which I'm trying to
>>>>>> setup to provide wifi access to the vlan $VN network, while the switch
>>>>>> is also part of the main network.
>>>>>
>>>>> Why not just revert 2ea7a679ca2abd251c1ec03f20508619707e1749 ("net: dsa:
>>>>> Don't add vlans when vlan filtering is disabled")? If a driver wants to
>>>>> veto the programming of VLANs while it has ports enslaved to a bridge
>>>>> that does not have VLAN filtering, it should have enough information to
>>>>> not do that operation.
>>>>
>>>> I do not have the knowledge to know whether reverting that commit
>>>> would be appropriate; I do not know how the non-Marvell switches will
>>>> behave with such a revert - what was the reason for the commit in
>>>> the first place?
>>>>
>>>> The commit says:
>>>>
>>>>     This fixes at least one corner case. There are still issues in other
>>>>     corners, such as when vlan_filtering is later enabled.
>>>>
>>>> but it doesn't say what that corner case was.  So, presumably reverting
>>>> it will cause a regression of whatever that corner case was...
>>>
>>> Yes, sorry, bad commit message. I'm not too sure, but it could of been
>>> that the switch was adding the VLANs to its tables, even though it
>>> should not because filtering is disabled. And i also think the default
>>> VLAN was not defined at that point, it only gets defined when
>>> vlan_filtering is enabled?
>>
>> It's been too long since I researched all these details, but I seem
>> to remember that in the Linux software bridge, vlan 1 is always
>> present even when vlan filtering is not enabled.
>>
>> Looking at br_vlan_init():
>>
>>         br->default_pvid = 1;
>>
>> and nbp_vlan_init() propagates that irrespective of the bridge vlan
>> enable state to switchdev.  nbp_vlan_init() is called whenever any
>> interface is added to a bridge (in br_add_if()).
>>
>> As I believe I mentioned somewhere in the commit messages or covering
>> message, for at least some of the Marvell DSA switches, it is safe to
>> add VTU entries - they do not even look at the VTU when the port has
>> 802.1Q disabled.  Whether that is true for all Marvell's DSA switches
>> I don't know without trawling every functional spec, and I was hoping
>> that you guys would know.  I guess I need to trawl the specs.
> 
> Some switches like the Marvell 88E6060 don't have a VTU, so programming the
> default PVID would return -EOPNOTSUPP. Switches supporting only global VLAN
> filtering cannot have a VLAN filtering aware bridge as well as a non VLAN
> filtering aware bridge spanning their ports at the same time. But all this
> shouldn't be a problem because drivers inform the stack whether they support
> ds->vlan_filtering per-port, globally or not. We should simply reject the
> operation when vlan_filtering is being enabled on unsupported hardware.
> 
> Linux bridge is the reference for the implementation of an Ethernet bridge,
> if it programs VLAN entries, supported DSA hardware must do so. I'm not a
> fan of having our own bridge logic in DSA, so the limitation implemented by
> 2ea7a679ca2a ("net: dsa: Don't add vlans when vlan filtering is disabled")
> needs to go in my opinion.

Agreed.

This also helps with switches who only support the creation of broadcast
domains via VLANs (not the case with b53 and mv88e6xxx AFAICT they have
specific egress vector controls). Because then you could put each
standalone port in say, VID (4094 - port number), and once enslaved in a
bridge, have them in VID 1 to maintain broadcast domains, whether the
bridge has VLAN filtering or not.
Vladimir Oltean Feb. 19, 2020, 6:52 p.m. UTC | #10
On Wed, 19 Feb 2020 at 02:02, Florian Fainelli <f.fainelli@gmail.com> wrote:
>
> Why not just revert 2ea7a679ca2abd251c1ec03f20508619707e1749 ("net: dsa:
> Don't add vlans when vlan filtering is disabled")? If a driver wants to
> veto the programming of VLANs while it has ports enslaved to a bridge
> that does not have VLAN filtering, it should have enough information to
> not do that operation.
> --
> Florian

It would be worth mentioning that for sja1105 and hypothetical other
users of DSA_TAG_PROTO_8021Q, DSA doing that automatically was
helpful. VLAN manipulations are still being done from tag_8021q.c for
the purpose of DSA tagging, but the fact that the VLAN EtherType is
not 0x8100 means that from the perspective of real VLAN traffic, the
switch is VLAN unaware. DSA was the easiest place to disseminate
between VLAN requests of its own and VLAN requests coming from
switchdev.
Without that logic in DSA, a vlan-unaware bridge would be able to
destroy the configuration done for source port decoding.
Just saying - with enough logic in .port_vlan_prepare, I should still
be able to accept only what's whitelisted to work for tagging, and
then it won't matter who issued that VLAN command.

Thanks,
-Vladimir
Florian Fainelli Feb. 19, 2020, 7:18 p.m. UTC | #11
On 2/19/20 10:52 AM, Vladimir Oltean wrote:
> On Wed, 19 Feb 2020 at 02:02, Florian Fainelli <f.fainelli@gmail.com> wrote:
>>
>> Why not just revert 2ea7a679ca2abd251c1ec03f20508619707e1749 ("net: dsa:
>> Don't add vlans when vlan filtering is disabled")? If a driver wants to
>> veto the programming of VLANs while it has ports enslaved to a bridge
>> that does not have VLAN filtering, it should have enough information to
>> not do that operation.
>> --
>> Florian
> 
> It would be worth mentioning that for sja1105 and hypothetical other
> users of DSA_TAG_PROTO_8021Q, DSA doing that automatically was
> helpful. VLAN manipulations are still being done from tag_8021q.c for
> the purpose of DSA tagging, but the fact that the VLAN EtherType is
> not 0x8100 means that from the perspective of real VLAN traffic, the
> switch is VLAN unaware. DSA was the easiest place to disseminate
> between VLAN requests of its own and VLAN requests coming from
> switchdev.
> Without that logic in DSA, a vlan-unaware bridge would be able to
> destroy the configuration done for source port decoding.
> Just saying - with enough logic in .port_vlan_prepare, I should still
> be able to accept only what's whitelisted to work for tagging, and
> then it won't matter who issued that VLAN command.

I suppose I am fine with Russell's approach, but maybe its meaning
should be reversed, that is, you get VLAN objects notifications by
default for a  VLAN unaware bridge and if you do set a specific
dsa_switch flag, then you do not get those.
Russell King (Oracle) Feb. 19, 2020, 11:15 p.m. UTC | #12
On Wed, Feb 19, 2020 at 11:18:17AM -0800, Florian Fainelli wrote:
> On 2/19/20 10:52 AM, Vladimir Oltean wrote:
> > On Wed, 19 Feb 2020 at 02:02, Florian Fainelli <f.fainelli@gmail.com> wrote:
> >>
> >> Why not just revert 2ea7a679ca2abd251c1ec03f20508619707e1749 ("net: dsa:
> >> Don't add vlans when vlan filtering is disabled")? If a driver wants to
> >> veto the programming of VLANs while it has ports enslaved to a bridge
> >> that does not have VLAN filtering, it should have enough information to
> >> not do that operation.
> >> --
> >> Florian
> > 
> > It would be worth mentioning that for sja1105 and hypothetical other
> > users of DSA_TAG_PROTO_8021Q, DSA doing that automatically was
> > helpful. VLAN manipulations are still being done from tag_8021q.c for
> > the purpose of DSA tagging, but the fact that the VLAN EtherType is
> > not 0x8100 means that from the perspective of real VLAN traffic, the
> > switch is VLAN unaware. DSA was the easiest place to disseminate
> > between VLAN requests of its own and VLAN requests coming from
> > switchdev.
> > Without that logic in DSA, a vlan-unaware bridge would be able to
> > destroy the configuration done for source port decoding.
> > Just saying - with enough logic in .port_vlan_prepare, I should still
> > be able to accept only what's whitelisted to work for tagging, and
> > then it won't matter who issued that VLAN command.
> 
> I suppose I am fine with Russell's approach, but maybe its meaning
> should be reversed, that is, you get VLAN objects notifications by
> default for a  VLAN unaware bridge and if you do set a specific
> dsa_switch flag, then you do not get those.

If we reverse it, I'll need someone to tell me which DSA switches
should not get the vlan object notifications.  Maybe also in that
case, we should deny the ability to toggle the state of
vlan_filtering as well?
Russell King (Oracle) Feb. 20, 2020, 11:35 a.m. UTC | #13
On Wed, Feb 19, 2020 at 01:07:07PM -0500, Vivien Didelot wrote:
> Hi Russell,
> 
> Some switches like the Marvell 88E6060 don't have a VTU, so programming the
> default PVID would return -EOPNOTSUPP.

The 88e6060 has its own driver separate from mv88e6xxx.

> Switches supporting only global VLAN
> filtering cannot have a VLAN filtering aware bridge as well as a non VLAN
> filtering aware bridge spanning their ports at the same time. But all this
> shouldn't be a problem because drivers inform the stack whether they support
> ds->vlan_filtering per-port, globally or not. We should simply reject the
> operation when vlan_filtering is being enabled on unsupported hardware.
> 
> Linux bridge is the reference for the implementation of an Ethernet bridge,
> if it programs VLAN entries, supported DSA hardware must do so. I'm not a
> fan of having our own bridge logic in DSA, so the limitation implemented by
> 2ea7a679ca2a ("net: dsa: Don't add vlans when vlan filtering is disabled")
> needs to go in my opinion.

... which is basically what patch 3 is doing, but in a per-driver
manner.

The checks introduced in 2ea7a679ca2a ("net: dsa: Don't add vlans when
vlan filtering is disabled") were raised up a level by c5335d737ff3
("net: dsa: check bridge VLAN in slave operations") to their present
positions, which are then touched by my patch 3 to make the checks
conditional.
Florian Fainelli Feb. 20, 2020, 6:56 p.m. UTC | #14
On 2/19/20 3:15 PM, Russell King - ARM Linux admin wrote:
> On Wed, Feb 19, 2020 at 11:18:17AM -0800, Florian Fainelli wrote:
>> On 2/19/20 10:52 AM, Vladimir Oltean wrote:
>>> On Wed, 19 Feb 2020 at 02:02, Florian Fainelli <f.fainelli@gmail.com> wrote:
>>>>
>>>> Why not just revert 2ea7a679ca2abd251c1ec03f20508619707e1749 ("net: dsa:
>>>> Don't add vlans when vlan filtering is disabled")? If a driver wants to
>>>> veto the programming of VLANs while it has ports enslaved to a bridge
>>>> that does not have VLAN filtering, it should have enough information to
>>>> not do that operation.
>>>> --
>>>> Florian
>>>
>>> It would be worth mentioning that for sja1105 and hypothetical other
>>> users of DSA_TAG_PROTO_8021Q, DSA doing that automatically was
>>> helpful. VLAN manipulations are still being done from tag_8021q.c for
>>> the purpose of DSA tagging, but the fact that the VLAN EtherType is
>>> not 0x8100 means that from the perspective of real VLAN traffic, the
>>> switch is VLAN unaware. DSA was the easiest place to disseminate
>>> between VLAN requests of its own and VLAN requests coming from
>>> switchdev.
>>> Without that logic in DSA, a vlan-unaware bridge would be able to
>>> destroy the configuration done for source port decoding.
>>> Just saying - with enough logic in .port_vlan_prepare, I should still
>>> be able to accept only what's whitelisted to work for tagging, and
>>> then it won't matter who issued that VLAN command.
>>
>> I suppose I am fine with Russell's approach, but maybe its meaning
>> should be reversed, that is, you get VLAN objects notifications by
>> default for a  VLAN unaware bridge and if you do set a specific
>> dsa_switch flag, then you do not get those.
> 
> If we reverse it, I'll need someone to tell me which DSA switches
> should not get the vlan object notifications.  Maybe also in that
> case, we should deny the ability to toggle the state of
> vlan_filtering as well?
> 

Let's get your patch series merged. If you re-spin while addressing
Vivien's comment not to use the term "vtu", I think I would be fine with
the current approach of having to go after each driver and enabling them
where necessary.

Thanks
Russell King (Oracle) Feb. 21, 2020, 12:21 a.m. UTC | #15
On Thu, Feb 20, 2020 at 10:56:17AM -0800, Florian Fainelli wrote:
> Let's get your patch series merged. If you re-spin while addressing
> Vivien's comment not to use the term "vtu", I think I would be fine with
> the current approach of having to go after each driver and enabling them
> where necessary.

The question then becomes what to call it.  "always_allow_vlans" or
"always_allow_vlan_config" maybe?
Russell King (Oracle) March 16, 2020, 11:15 a.m. UTC | #16
On Fri, Feb 21, 2020 at 12:21:10AM +0000, Russell King - ARM Linux admin wrote:
> On Thu, Feb 20, 2020 at 10:56:17AM -0800, Florian Fainelli wrote:
> > Let's get your patch series merged. If you re-spin while addressing
> > Vivien's comment not to use the term "vtu", I think I would be fine with
> > the current approach of having to go after each driver and enabling them
> > where necessary.
> 
> The question then becomes what to call it.  "always_allow_vlans" or
> "always_allow_vlan_config" maybe?

Please note that I still have this patch pending (i.o.w., the problem
with vlans remains unfixed) as I haven't received a reply to this,
although the first two patches have been merged.
Russell King (Oracle) March 17, 2020, noon UTC | #17
On Mon, Mar 16, 2020 at 11:15:24AM +0000, Russell King - ARM Linux admin wrote:
> On Fri, Feb 21, 2020 at 12:21:10AM +0000, Russell King - ARM Linux admin wrote:
> > On Thu, Feb 20, 2020 at 10:56:17AM -0800, Florian Fainelli wrote:
> > > Let's get your patch series merged. If you re-spin while addressing
> > > Vivien's comment not to use the term "vtu", I think I would be fine with
> > > the current approach of having to go after each driver and enabling them
> > > where necessary.
> > 
> > The question then becomes what to call it.  "always_allow_vlans" or
> > "always_allow_vlan_config" maybe?
> 
> Please note that I still have this patch pending (i.o.w., the problem
> with vlans remains unfixed) as I haven't received a reply to this,
> although the first two patches have been merged.

Okay, I think three and a half weeks is way beyond a reasonable time
period to expect any kind of reply.

Since no one seems to have any idea what to name this, but can only
offer "we don't like the vtu" term, it's difficult to see what would
actually be acceptable.  So, I propose that we go with the existing
naming.

If you only know what you don't want, but not what you want, and aren't
even willing to discuss it, it makes it very much impossible to
progress.
Vladimir Oltean March 17, 2020, 2:21 p.m. UTC | #18
Hi Russell,

On Tue, 17 Mar 2020 at 14:00, Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>
> On Mon, Mar 16, 2020 at 11:15:24AM +0000, Russell King - ARM Linux admin wrote:
> > On Fri, Feb 21, 2020 at 12:21:10AM +0000, Russell King - ARM Linux admin wrote:
> > > On Thu, Feb 20, 2020 at 10:56:17AM -0800, Florian Fainelli wrote:
> > > > Let's get your patch series merged. If you re-spin while addressing
> > > > Vivien's comment not to use the term "vtu", I think I would be fine with
> > > > the current approach of having to go after each driver and enabling them
> > > > where necessary.
> > >
> > > The question then becomes what to call it.  "always_allow_vlans" or
> > > "always_allow_vlan_config" maybe?
> >
> > Please note that I still have this patch pending (i.o.w., the problem
> > with vlans remains unfixed) as I haven't received a reply to this,
> > although the first two patches have been merged.
>
> Okay, I think three and a half weeks is way beyond a reasonable time
> period to expect any kind of reply.
>
> Since no one seems to have any idea what to name this, but can only
> offer "we don't like the vtu" term, it's difficult to see what would
> actually be acceptable.  So, I propose that we go with the existing
> naming.
>
> If you only know what you don't want, but not what you want, and aren't
> even willing to discuss it, it makes it very much impossible to
> progress.
>
> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up

As I said, I know why I need this blocking of bridge vlan
configuration for sja1105, but not more. For sja1105 in particular, I
fully admit that the hardware is quirky, but I can work around that
within the driver. The concern is for the other drivers where we don't
really "remember" why this workaround is in place. I think, while your
patch is definitely a punctual fix for one case that doesn't need the
workaround, it might be better for maintenance to just see exactly
what breaks, instead of introducing this opaque property.
While I don't want to speak on behalf of the maintainers, I think that
may be at least part of the reason why there is little progress being
made. Introducing some breakage which is going to be fixed better next
time might be the more appropriate thing to do.

Thanks,
-Vladimir
Russell King (Oracle) March 17, 2020, 3:12 p.m. UTC | #19
On Tue, Mar 17, 2020 at 04:21:00PM +0200, Vladimir Oltean wrote:
> Hi Russell,
> 
> On Tue, 17 Mar 2020 at 14:00, Russell King - ARM Linux admin
> <linux@armlinux.org.uk> wrote:
> >
> > On Mon, Mar 16, 2020 at 11:15:24AM +0000, Russell King - ARM Linux admin wrote:
> > > On Fri, Feb 21, 2020 at 12:21:10AM +0000, Russell King - ARM Linux admin wrote:
> > > > On Thu, Feb 20, 2020 at 10:56:17AM -0800, Florian Fainelli wrote:
> > > > > Let's get your patch series merged. If you re-spin while addressing
> > > > > Vivien's comment not to use the term "vtu", I think I would be fine with
> > > > > the current approach of having to go after each driver and enabling them
> > > > > where necessary.
> > > >
> > > > The question then becomes what to call it.  "always_allow_vlans" or
> > > > "always_allow_vlan_config" maybe?
> > >
> > > Please note that I still have this patch pending (i.o.w., the problem
> > > with vlans remains unfixed) as I haven't received a reply to this,
> > > although the first two patches have been merged.
> >
> > Okay, I think three and a half weeks is way beyond a reasonable time
> > period to expect any kind of reply.
> >
> > Since no one seems to have any idea what to name this, but can only
> > offer "we don't like the vtu" term, it's difficult to see what would
> > actually be acceptable.  So, I propose that we go with the existing
> > naming.
> >
> > If you only know what you don't want, but not what you want, and aren't
> > even willing to discuss it, it makes it very much impossible to
> > progress.
> >
> > --
> > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> > FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up
> 
> As I said, I know why I need this blocking of bridge vlan
> configuration for sja1105, but not more. For sja1105 in particular, I
> fully admit that the hardware is quirky, but I can work around that
> within the driver. The concern is for the other drivers where we don't
> really "remember" why this workaround is in place. I think, while your
> patch is definitely a punctual fix for one case that doesn't need the
> workaround, it might be better for maintenance to just see exactly
> what breaks, instead of introducing this opaque property.
> While I don't want to speak on behalf of the maintainers, I think that
> may be at least part of the reason why there is little progress being
> made. Introducing some breakage which is going to be fixed better next
> time might be the more appropriate thing to do.

The conclusion on 21st February was that all patches should be merged,
complete with the boolean control, but there was an open question about
the name of the boolean used to enable this behaviour.

That question has not been resolved, so I'm trying to re-open discussion
of that point.  I've re-posted the patch.
Vivien Didelot March 17, 2020, 6:49 p.m. UTC | #20
Hi Russell,

On Tue, 17 Mar 2020 15:12:38 +0000, Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote:
> On Tue, Mar 17, 2020 at 04:21:00PM +0200, Vladimir Oltean wrote:
> > Hi Russell,
> > 
> > On Tue, 17 Mar 2020 at 14:00, Russell King - ARM Linux admin
> > <linux@armlinux.org.uk> wrote:
> > >
> > > On Mon, Mar 16, 2020 at 11:15:24AM +0000, Russell King - ARM Linux admin wrote:
> > > > On Fri, Feb 21, 2020 at 12:21:10AM +0000, Russell King - ARM Linux admin wrote:
> > > > > On Thu, Feb 20, 2020 at 10:56:17AM -0800, Florian Fainelli wrote:
> > > > > > Let's get your patch series merged. If you re-spin while addressing
> > > > > > Vivien's comment not to use the term "vtu", I think I would be fine with
> > > > > > the current approach of having to go after each driver and enabling them
> > > > > > where necessary.
> > > > >
> > > > > The question then becomes what to call it.  "always_allow_vlans" or
> > > > > "always_allow_vlan_config" maybe?
> > > >
> > > > Please note that I still have this patch pending (i.o.w., the problem
> > > > with vlans remains unfixed) as I haven't received a reply to this,
> > > > although the first two patches have been merged.
> > >
> > > Okay, I think three and a half weeks is way beyond a reasonable time
> > > period to expect any kind of reply.
> > >
> > > Since no one seems to have any idea what to name this, but can only
> > > offer "we don't like the vtu" term, it's difficult to see what would
> > > actually be acceptable.  So, I propose that we go with the existing
> > > naming.
> > >
> > > If you only know what you don't want, but not what you want, and aren't
> > > even willing to discuss it, it makes it very much impossible to
> > > progress.
> > >
> > > --
> > > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> > > FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up
> > 
> > As I said, I know why I need this blocking of bridge vlan
> > configuration for sja1105, but not more. For sja1105 in particular, I
> > fully admit that the hardware is quirky, but I can work around that
> > within the driver. The concern is for the other drivers where we don't
> > really "remember" why this workaround is in place. I think, while your
> > patch is definitely a punctual fix for one case that doesn't need the
> > workaround, it might be better for maintenance to just see exactly
> > what breaks, instead of introducing this opaque property.
> > While I don't want to speak on behalf of the maintainers, I think that
> > may be at least part of the reason why there is little progress being
> > made. Introducing some breakage which is going to be fixed better next
> > time might be the more appropriate thing to do.
> 
> The conclusion on 21st February was that all patches should be merged,
> complete with the boolean control, but there was an open question about
> the name of the boolean used to enable this behaviour.
> 
> That question has not been resolved, so I'm trying to re-open discussion
> of that point.  I've re-posted the patch.
> 
> -- 
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up

In response to your 3/3 patch, I suggested commands to test setting up a
VLAN filtering aware bridge with your own default PVID before enslaving
DSA ports. Unfortunately you left this unanswered. I think this would be
much more interesting in order to tackle the issue you're having here with
mv88e6xxx and bridge, instead of pointing out the lack of response regarding
an alternative boolean name. That being said, "force_vlan_programming",
"always_allow_vlans", or whatever explicit non-"vtu" term is fine by me.


Thanks,

	Vivien
Russell King (Oracle) March 17, 2020, 9:24 p.m. UTC | #21
On Tue, Mar 17, 2020 at 02:49:06PM -0400, Vivien Didelot wrote:
> Hi Russell,
> 
> On Tue, 17 Mar 2020 15:12:38 +0000, Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote:
> > On Tue, Mar 17, 2020 at 04:21:00PM +0200, Vladimir Oltean wrote:
> > > Hi Russell,
> > > 
> > > On Tue, 17 Mar 2020 at 14:00, Russell King - ARM Linux admin
> > > <linux@armlinux.org.uk> wrote:
> > > >
> > > > On Mon, Mar 16, 2020 at 11:15:24AM +0000, Russell King - ARM Linux admin wrote:
> > > > > On Fri, Feb 21, 2020 at 12:21:10AM +0000, Russell King - ARM Linux admin wrote:
> > > > > > On Thu, Feb 20, 2020 at 10:56:17AM -0800, Florian Fainelli wrote:
> > > > > > > Let's get your patch series merged. If you re-spin while addressing
> > > > > > > Vivien's comment not to use the term "vtu", I think I would be fine with
> > > > > > > the current approach of having to go after each driver and enabling them
> > > > > > > where necessary.
> > > > > >
> > > > > > The question then becomes what to call it.  "always_allow_vlans" or
> > > > > > "always_allow_vlan_config" maybe?
> > > > >
> > > > > Please note that I still have this patch pending (i.o.w., the problem
> > > > > with vlans remains unfixed) as I haven't received a reply to this,
> > > > > although the first two patches have been merged.
> > > >
> > > > Okay, I think three and a half weeks is way beyond a reasonable time
> > > > period to expect any kind of reply.
> > > >
> > > > Since no one seems to have any idea what to name this, but can only
> > > > offer "we don't like the vtu" term, it's difficult to see what would
> > > > actually be acceptable.  So, I propose that we go with the existing
> > > > naming.
> > > >
> > > > If you only know what you don't want, but not what you want, and aren't
> > > > even willing to discuss it, it makes it very much impossible to
> > > > progress.
> > > >
> > > > --
> > > > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> > > > FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up
> > > 
> > > As I said, I know why I need this blocking of bridge vlan
> > > configuration for sja1105, but not more. For sja1105 in particular, I
> > > fully admit that the hardware is quirky, but I can work around that
> > > within the driver. The concern is for the other drivers where we don't
> > > really "remember" why this workaround is in place. I think, while your
> > > patch is definitely a punctual fix for one case that doesn't need the
> > > workaround, it might be better for maintenance to just see exactly
> > > what breaks, instead of introducing this opaque property.
> > > While I don't want to speak on behalf of the maintainers, I think that
> > > may be at least part of the reason why there is little progress being
> > > made. Introducing some breakage which is going to be fixed better next
> > > time might be the more appropriate thing to do.
> > 
> > The conclusion on 21st February was that all patches should be merged,
> > complete with the boolean control, but there was an open question about
> > the name of the boolean used to enable this behaviour.
> > 
> > That question has not been resolved, so I'm trying to re-open discussion
> > of that point.  I've re-posted the patch.
> > 
> > -- 
> > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> > FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up
> 
> In response to your 3/3 patch, I suggested commands to test setting up a
> VLAN filtering aware bridge with your own default PVID before enslaving
> DSA ports. Unfortunately you left this unanswered.

I don't believe I left it unanswered.  However, I'm not about to rip
apart my network to try an experiment with specific set of commands.
I did, however, experiment a lot to work out what was going on, so I
already have the knowledge.

I believe I explained in the series description that the problem only
happens when vlan filtering is enabled with a pre-existing vlan
configuration present, even the default configuration.

Enabling vlan filtering *immediately* blocks all traffic on the Marvell
switch, whether it has vlan tags or not.  Any *new* vlan modifications
then get entered into the VTU, and the Marvell switch then behaves
accordingly with those new entries as one would expect.

Any setup done *before* vlan filtering is enabled is not present in the
VTU, and so remains non-existent as far as the DSA switch is concerned.

As soon as vlan filtering is enabled, the ports are switched to "802.1Q
secure" state, which means:

- The switch will consult the VTU for the incoming packet; if no entry
  is found for the vlan number either tagged to the packet, or the
  default vlan if not, then the packet will be discarded by the switch.

So, the setup done *before* vlan filtering is enabled, which is not
programmed into the VTU, results in that traffic being lost.

> I think this would be
> much more interesting in order to tackle the issue you're having here with
> mv88e6xxx and bridge, instead of pointing out the lack of response regarding
> an alternative boolean name.

It is my understanding that Florian actively wants this merged.  No
one objected to his email.

It seems there's a disconnect *between* the DSA maintainers - I think
you need to be more effectively communicating with each other and
reading each other's emails, and pro-actively replying to stuff you
may have other views on.

> That being said, "force_vlan_programming",
> "always_allow_vlans", or whatever explicit non-"vtu" term is fine by me.

Thanks.
Vivien Didelot March 18, 2020, 2:26 a.m. UTC | #22
On Tue, 17 Mar 2020 21:24:53 +0000, Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote:
> > In response to your 3/3 patch, I suggested commands to test setting up a
> > VLAN filtering aware bridge with your own default PVID before enslaving
> > DSA ports. Unfortunately you left this unanswered.
> 
> I don't believe I left it unanswered.  However, I'm not about to rip
> apart my network to try an experiment with specific set of commands.

In mail 3/3 I suggested to run the following snippet to configure the bridge
at creation time so that we can see clearly if the problem still occurs:

    # ip link add name br0 type bridge vlan_filtering 1 vlan_default_pvid 42
    # ip link set master br0 dev lan2 up
    # cat /sys/kernel/debug/mv88e6xxx/sw0/vtu
    vid 42      fid 1   sid 0   dpv 0 unmodified 2 untagged 10 unmodified

You skipped this, last email without reply, this feels pretty unanswered to me.

But whatever, I don't want these two commands to rip apart your network.

> It is my understanding that Florian actively wants this merged.  No
> one objected to his email.
> 
> It seems there's a disconnect *between* the DSA maintainers - I think
> you need to be more effectively communicating with each other and
> reading each other's emails, and pro-actively replying to stuff you
> may have other views on.

I'm not sure to understand what you're assuming here. As Florian said, your
patch is good to go as long as you change the boolean name to something
generic not containing "vtu", which is Marvell specific. If you really
need us to choose, then go with "force_vlan_programming" or one of your
suggestions. What matters here is that a non-mv88e6xxx user can clearly
understand what this boolean does.


Thank you,

	Vivien