diff mbox series

[BUG,FIX] net: dsa: oops in br_vlan_enabled

Message ID trinity-782c45c1-ef94-4340-8def-f949a4a929a9-1550334156901@3c-app-gmx-bs35
State RFC
Delegated to: David Miller
Headers show
Series [BUG,FIX] net: dsa: oops in br_vlan_enabled | expand

Commit Message

Frank Wunderlich Feb. 16, 2019, 4:22 p.m. UTC
Hi,

i've found an oops in 4.19.23/10, seems to be fixed anyhow in 5.0 (also works in 4.14.101)

root@bpi-r2:~# ip link add link lan0 name lan0.5 type vlan id 5
root@bpi-r2:~# ip addr add 192.168.5.200/24 brd 192.168.5.255 dev lan0.5
root@bpi-r2:~# ip link set dev lan0 up
root@bpi-r2:~# ip link set dev lan0.5 up

12: lan0.5@lan0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state LOWERLAYERDOWN group default qlen 1000
    link/ether 02:02:02:02:02:02 brd ff:ff:ff:ff:ff:ff
    inet 192.168.5.200/24 brd 192.168.5.255 scope global lan0.5
       valid_lft forever preferred_lft forever

root@bpi-r2:~# brctl addbr bridge_name
root@bpi-r2:~# brctl addif bridge_name lan0.5
[  352.057128] bridge_name: port 1(lan0.5) entered blocking state
[  352.063065] bridge_name: port 1(lan0.5) entered disabled state
[  352.069181] device lan0.5 entered promiscuous mode
[  352.074018] device lan0 entered promiscuous mode
[  352.078906] Unable to handle kernel NULL pointer dereference at virtual address 00000558
...
[  352.493085] [<bf0fde88>] (br_vlan_enabled [bridge]) from [<bf12c234>] (dsa_port_vlan_add+0x60/0xbc [dsa_core])
[  352.503050] [<bf12c234>] (dsa_port_vlan_add [dsa_core]) from [<bf12cb64>] (dsa_slave_port_obj_add+0x4c/0x50 [dsa_core])
[  352.513776] [<bf12cb64>] (dsa_slave_port_obj_add [dsa_core]) from [<c0b4e2d4>] (__switchdev_port_obj_add+0x50/0xc4)
[  352.524138] [<c0b4e2d4>] (__switchdev_port_obj_add) from [<c0b4e324>] (__switchdev_port_obj_add+0xa0/0xc4)
[  352.533721] [<c0b4e324>] (__switchdev_port_obj_add) from [<c0b4e3a8>] (switchdev_port_obj_add_now+0x60/0x130)
[  352.543562] [<c0b4e3a8>] (switchdev_port_obj_add_now) from [<c0b4e7e4>] (switchdev_port_obj_add+0x44/0x190)
[  352.553284] [<c0b4e7e4>] (switchdev_port_obj_add) from [<bf1013d0>] (br_switchdev_port_vlan_add+0x60/0x7c [bridge])
[  352.563733] [<bf1013d0>] (br_switchdev_port_vlan_add [bridge]) from [<bf0ff250>] (__vlan_add+0xb0/0x620 [bridge])
[  352.574007] [<bf0ff250>] (__vlan_add [bridge]) from [<bf0ffd04>] (nbp_vlan_add+0xc4/0x150 [bridge])
[  352.583073] [<bf0ffd04>] (nbp_vlan_add [bridge]) from [<bf0ffec4>] (nbp_vlan_init+0x134/0x164 [bridge])
[  352.592482] [<bf0ffec4>] (nbp_vlan_init [bridge]) from [<bf0edd4c>] (br_add_if+0x40c/0x5fc [bridge])
[  352.601632] [<bf0edd4c>] (br_add_if [bridge]) from [<bf0eeb14>] (add_del_if+0x6c/0x80 [bridge])
[  352.610351] [<bf0eeb14>] (add_del_if [bridge]) from [<bf0ef5b0>] (br_dev_ioctl+0x7c/0x9c [bridge])
[  352.619290] [<bf0ef5b0>] (br_dev_ioctl [bridge]) from [<c09583d4>] (dev_ifsioc+0x184/0x324)
[  352.627582] [<c09583d4>] (dev_ifsioc) from [<c09589e8>] (dev_ioctl+0x32c/0x5cc)
[  352.634837] [<c09589e8>] (dev_ioctl) from [<c090913c>] (sock_ioctl+0x3bc/0x580)


since my 4.19.23 kernel is modified a bit i tried with 4.19.10 without my net modifications and it is still reproducable with steps above (create a vlan on dsa-user-port and then use it in a bridge)

i fixed it with these changes:


i've found in a Patch from florian/vivien: https://www.mail-archive.com/netdev@vger.kernel.org/msg281415.html

Strange that 5.0-rc1 does not crash,because these 2 code-sections are unchanged: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/dsa/port.c#n255 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/bridge/br_vlan.c#n788

maybe you know why only 4.19 is affected...

regards Frank

Comments

Florian Fainelli Feb. 17, 2019, 5:13 p.m. UTC | #1
Hi Frank,

On 2/16/2019 8:22 AM, Frank Wunderlich wrote:
> Hi,
> 
> i've found an oops in 4.19.23/10, seems to be fixed anyhow in 5.0 (also works in 4.14.101)
> 
> root@bpi-r2:~# ip link add link lan0 name lan0.5 type vlan id 5
> root@bpi-r2:~# ip addr add 192.168.5.200/24 brd 192.168.5.255 dev lan0.5
> root@bpi-r2:~# ip link set dev lan0 up
> root@bpi-r2:~# ip link set dev lan0.5 up

So that these steps don't involve a bridge, and because we don't (yet)
implment ndo_vlan_rx_{add,kill}_vid() netdevice_ops, there is no VLAN
programming, it's all software

> 
> 12: lan0.5@lan0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state LOWERLAYERDOWN group default qlen 1000
>     link/ether 02:02:02:02:02:02 brd ff:ff:ff:ff:ff:ff
>     inet 192.168.5.200/24 brd 192.168.5.255 scope global lan0.5
>        valid_lft forever preferred_lft forever
> 
> root@bpi-r2:~# brctl addbr bridge_name
> root@bpi-r2:~# brctl addif bridge_name lan0.5

Unless you changed the bridge to have VLAN filtering/awareness with:

echo 1 > /sys/class/net/bridge_name/bridge/vlan_filtering

There will be no VLAN configuration/objects pushed to DSA since commit
2ea7a679ca2abd251c1ec03f20508619707e1749 ("net: dsa: Don't add vlans
when vlan filtering is disabled") so I am not sure how you got into that
situation because prior to pushing a VLAN object, the port must be part
of a bridge, so the steps typically look like:

- port_bridge_join which assigns dp->bridge_dev
- port_vlan_add

Does your 4.19.23 kernel somehow change how VLAN objects are pushed down
the switch driver?

> [  352.057128] bridge_name: port 1(lan0.5) entered blocking state
> [  352.063065] bridge_name: port 1(lan0.5) entered disabled state
> [  352.069181] device lan0.5 entered promiscuous mode
> [  352.074018] device lan0 entered promiscuous mode
> [  352.078906] Unable to handle kernel NULL pointer dereference at virtual address 00000558
> ...
> [  352.493085] [<bf0fde88>] (br_vlan_enabled [bridge]) from [<bf12c234>] (dsa_port_vlan_add+0x60/0xbc [dsa_core])
> [  352.503050] [<bf12c234>] (dsa_port_vlan_add [dsa_core]) from [<bf12cb64>] (dsa_slave_port_obj_add+0x4c/0x50 [dsa_core])
> [  352.513776] [<bf12cb64>] (dsa_slave_port_obj_add [dsa_core]) from [<c0b4e2d4>] (__switchdev_port_obj_add+0x50/0xc4)
> [  352.524138] [<c0b4e2d4>] (__switchdev_port_obj_add) from [<c0b4e324>] (__switchdev_port_obj_add+0xa0/0xc4)
> [  352.533721] [<c0b4e324>] (__switchdev_port_obj_add) from [<c0b4e3a8>] (switchdev_port_obj_add_now+0x60/0x130)
> [  352.543562] [<c0b4e3a8>] (switchdev_port_obj_add_now) from [<c0b4e7e4>] (switchdev_port_obj_add+0x44/0x190)
> [  352.553284] [<c0b4e7e4>] (switchdev_port_obj_add) from [<bf1013d0>] (br_switchdev_port_vlan_add+0x60/0x7c [bridge])
> [  352.563733] [<bf1013d0>] (br_switchdev_port_vlan_add [bridge]) from [<bf0ff250>] (__vlan_add+0xb0/0x620 [bridge])
> [  352.574007] [<bf0ff250>] (__vlan_add [bridge]) from [<bf0ffd04>] (nbp_vlan_add+0xc4/0x150 [bridge])
> [  352.583073] [<bf0ffd04>] (nbp_vlan_add [bridge]) from [<bf0ffec4>] (nbp_vlan_init+0x134/0x164 [bridge])
> [  352.592482] [<bf0ffec4>] (nbp_vlan_init [bridge]) from [<bf0edd4c>] (br_add_if+0x40c/0x5fc [bridge])
> [  352.601632] [<bf0edd4c>] (br_add_if [bridge]) from [<bf0eeb14>] (add_del_if+0x6c/0x80 [bridge])
> [  352.610351] [<bf0eeb14>] (add_del_if [bridge]) from [<bf0ef5b0>] (br_dev_ioctl+0x7c/0x9c [bridge])
> [  352.619290] [<bf0ef5b0>] (br_dev_ioctl [bridge]) from [<c09583d4>] (dev_ifsioc+0x184/0x324)
> [  352.627582] [<c09583d4>] (dev_ifsioc) from [<c09589e8>] (dev_ioctl+0x32c/0x5cc)
> [  352.634837] [<c09589e8>] (dev_ioctl) from [<c090913c>] (sock_ioctl+0x3bc/0x580)
> 
> 
> since my 4.19.23 kernel is modified a bit i tried with 4.19.10 without my net modifications and it is still reproducable with steps above (create a vlan on dsa-user-port and then use it in a bridge)
> 
> i fixed it with these changes:
> 
> diff --git a/net/dsa/port.c b/net/dsa/port.c
> index ed0595459df1..962887752ae8 100644
> --- a/net/dsa/port.c
> +++ b/net/dsa/port.c
> @@ -255,8 +255,9 @@ int dsa_port_vlan_add(struct dsa_port *dp,
>         if (netif_is_bridge_master(vlan->obj.orig_dev))
>                 return -EOPNOTSUPP;
>  
> -       if (br_vlan_enabled(dp->bridge_dev))
> -               return dsa_port_notify(dp, DSA_NOTIFIER_VLAN_ADD, &info);
> +       printk(KERN_ALERT "DEBUG: Passed %s %d 0x%x \n",__FUNCTION__,__LINE__,(unsigned int)dp->bridge_dev);
> +       if (!dp->bridge_dev || br_vlan_enabled(dp->bridge_dev))
> +               return dsa_port_notify(dp, DSA_NOTIFIER_VLAN_DEL, &info);
>  
>         return 0;
>  }
> @@ -273,7 +274,7 @@ int dsa_port_vlan_del(struct dsa_port *dp,
>         if (netif_is_bridge_master(vlan->obj.orig_dev))
>                 return -EOPNOTSUPP;
>  
> -       if (br_vlan_enabled(dp->bridge_dev))
> +       if (!dp->bridge_dev || br_vlan_enabled(dp->bridge_dev))
>                 return dsa_port_notify(dp, DSA_NOTIFIER_VLAN_DEL, &info);
>  
>         return 0;
> 
> i've found in a Patch from florian/vivien: https://www.mail-archive.com/netdev@vger.kernel.org/msg281415.html
> 
> Strange that 5.0-rc1 does not crash,because these 2 code-sections are unchanged: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/dsa/port.c#n255 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/bridge/br_vlan.c#n788
> 
> maybe you know why only 4.19 is affected...
> 
> regards Frank
>
Florian Fainelli Feb. 17, 2019, 11:20 p.m. UTC | #2
On 2/17/2019 9:13 AM, Florian Fainelli wrote:
> Hi Frank,
> 
> On 2/16/2019 8:22 AM, Frank Wunderlich wrote:
>> Hi,
>>
>> i've found an oops in 4.19.23/10, seems to be fixed anyhow in 5.0 (also works in 4.14.101)
>>
>> root@bpi-r2:~# ip link add link lan0 name lan0.5 type vlan id 5
>> root@bpi-r2:~# ip addr add 192.168.5.200/24 brd 192.168.5.255 dev lan0.5
>> root@bpi-r2:~# ip link set dev lan0 up
>> root@bpi-r2:~# ip link set dev lan0.5 up
> 
> So that these steps don't involve a bridge, and because we don't (yet)
> implment ndo_vlan_rx_{add,kill}_vid() netdevice_ops, there is no VLAN
> programming, it's all software
> 
>>
>> 12: lan0.5@lan0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state LOWERLAYERDOWN group default qlen 1000
>>     link/ether 02:02:02:02:02:02 brd ff:ff:ff:ff:ff:ff
>>     inet 192.168.5.200/24 brd 192.168.5.255 scope global lan0.5
>>        valid_lft forever preferred_lft forever
>>
>> root@bpi-r2:~# brctl addbr bridge_name
>> root@bpi-r2:~# brctl addif bridge_name lan0.5
> 
> Unless you changed the bridge to have VLAN filtering/awareness with:
> 
> echo 1 > /sys/class/net/bridge_name/bridge/vlan_filtering
> 
> There will be no VLAN configuration/objects pushed to DSA since commit
> 2ea7a679ca2abd251c1ec03f20508619707e1749 ("net: dsa: Don't add vlans
> when vlan filtering is disabled") so I am not sure how you got into that
> situation because prior to pushing a VLAN object, the port must be part
> of a bridge, so the steps typically look like:
> 
> - port_bridge_join which assigns dp->bridge_dev
> - port_vlan_add
> 
> Does your 4.19.23 kernel somehow change how VLAN objects are pushed down
> the switch driver?
> 
>> [  352.057128] bridge_name: port 1(lan0.5) entered blocking state
>> [  352.063065] bridge_name: port 1(lan0.5) entered disabled state
>> [  352.069181] device lan0.5 entered promiscuous mode
>> [  352.074018] device lan0 entered promiscuous mode
>> [  352.078906] Unable to handle kernel NULL pointer dereference at virtual address 00000558
>> ...
>> [  352.493085] [<bf0fde88>] (br_vlan_enabled [bridge]) from [<bf12c234>] (dsa_port_vlan_add+0x60/0xbc [dsa_core])
>> [  352.503050] [<bf12c234>] (dsa_port_vlan_add [dsa_core]) from [<bf12cb64>] (dsa_slave_port_obj_add+0x4c/0x50 [dsa_core])
>> [  352.513776] [<bf12cb64>] (dsa_slave_port_obj_add [dsa_core]) from [<c0b4e2d4>] (__switchdev_port_obj_add+0x50/0xc4)
>> [  352.524138] [<c0b4e2d4>] (__switchdev_port_obj_add) from [<c0b4e324>] (__switchdev_port_obj_add+0xa0/0xc4)
>> [  352.533721] [<c0b4e324>] (__switchdev_port_obj_add) from [<c0b4e3a8>] (switchdev_port_obj_add_now+0x60/0x130)
>> [  352.543562] [<c0b4e3a8>] (switchdev_port_obj_add_now) from [<c0b4e7e4>] (switchdev_port_obj_add+0x44/0x190)
>> [  352.553284] [<c0b4e7e4>] (switchdev_port_obj_add) from [<bf1013d0>] (br_switchdev_port_vlan_add+0x60/0x7c [bridge])
>> [  352.563733] [<bf1013d0>] (br_switchdev_port_vlan_add [bridge]) from [<bf0ff250>] (__vlan_add+0xb0/0x620 [bridge])
>> [  352.574007] [<bf0ff250>] (__vlan_add [bridge]) from [<bf0ffd04>] (nbp_vlan_add+0xc4/0x150 [bridge])
>> [  352.583073] [<bf0ffd04>] (nbp_vlan_add [bridge]) from [<bf0ffec4>] (nbp_vlan_init+0x134/0x164 [bridge])
>> [  352.592482] [<bf0ffec4>] (nbp_vlan_init [bridge]) from [<bf0edd4c>] (br_add_if+0x40c/0x5fc [bridge])
>> [  352.601632] [<bf0edd4c>] (br_add_if [bridge]) from [<bf0eeb14>] (add_del_if+0x6c/0x80 [bridge])
>> [  352.610351] [<bf0eeb14>] (add_del_if [bridge]) from [<bf0ef5b0>] (br_dev_ioctl+0x7c/0x9c [bridge])
>> [  352.619290] [<bf0ef5b0>] (br_dev_ioctl [bridge]) from [<c09583d4>] (dev_ifsioc+0x184/0x324)
>> [  352.627582] [<c09583d4>] (dev_ifsioc) from [<c09589e8>] (dev_ioctl+0x32c/0x5cc)
>> [  352.634837] [<c09589e8>] (dev_ioctl) from [<c090913c>] (sock_ioctl+0x3bc/0x580)
>>
>>
>> since my 4.19.23 kernel is modified a bit i tried with 4.19.10 without my net modifications and it is still reproducable with steps above (create a vlan on dsa-user-port and then use it in a bridge)
>>
>> i fixed it with these changes:

Your fix is undoing the commit from Andrew I just referenced, so it is
definitively not the right fix because it will push VLAN objects down to
a non-VLAN aware bridge, not that this is really a problem, but it
should not be happening anyway.

The problem appears to be the following though: you are enslaving a VLAN
device, which does not have switchdev_ops, so we recurse into the lower
device, which is the DSA network device, which does have switchdev_ops
defined. Once we are there we check the orig_dev against being a bridge
master network device, but we are not checking that it's a VLAN device
so we end-up assuming it is a DSA network device, we de-reference
garbage by checking br_vlan_enabled(dp->bridge_dev) because there is
simply no such data structure associated with the VLAN device.

The attached patch should help.

This is no longer a problem in newer kernels because the switchdev
operations use a notifier which checks the target network device to be DSA.
Frank Wunderlich Feb. 18, 2019, 5:11 p.m. UTC | #3
Hi,

tried your Patch but crashes the same way:

[  107.416972] [<bf0fde88>] (br_vlan_enabled [bridge]) from [<bf14e234>] (dsa_po
rt_vlan_add+0x60/0xbc [dsa_core])                                               
[  107.426939] [<bf14e234>] (dsa_port_vlan_add [dsa_core]) from [<bf14eba4>] (ds
a_slave_port_obj_add+0x64/0x68 [dsa_core])                                      
[  107.437666] [<bf14eba4>] (dsa_slave_port_obj_add [dsa_core]) from [<c0b4e684>
] (__switchdev_port_obj_add+0x50/0xc4)                                          
[  107.448029] [<c0b4e684>] (__switchdev_port_obj_add) from [<c0b4e6d4>] (__swit
chdev_port_obj_add+0xa0/0xc4)                                                   
[  107.457612] [<c0b4e6d4>] (__switchdev_port_obj_add) from [<c0b4e758>] (switch
dev_port_obj_add_now+0x60/0x130)                                                
[  107.467453] [<c0b4e758>] (switchdev_port_obj_add_now) from [<c0b4eb94>] (swit
chdev_port_obj_add+0x44/0x190)                                                  
[  107.477181] [<c0b4eb94>] (switchdev_port_obj_add) from [<bf1013d0>] (br_switc
hdev_port_vlan_add+0x60/0x7c [bridge])                                          
[  107.487644] [<bf1013d0>] (br_switchdev_port_vlan_add [bridge]) from [<bf0ff25
0>] (__vlan_add+0xb0/0x620 [bridge])

regards Frank


> Gesendet: Montag, 18. Februar 2019 um 00:20 Uhr
> Von: "Florian Fainelli" <f.fainelli@gmail.com>
> An: "Frank Wunderlich" <frank-w@public-files.de>, netdev@vger.kernel.org, "Andrew Lunn" <andrew@lunn.ch>, "Vivien Didelot" <vivien.didelot@gmail.com>
> Betreff: Re: [BUG] [FIX] net: dsa: oops in br_vlan_enabled
>
> 
> 
> On 2/17/2019 9:13 AM, Florian Fainelli wrote:
> > Hi Frank,
> > 
> > On 2/16/2019 8:22 AM, Frank Wunderlich wrote:
> >> Hi,
> >>
> >> i've found an oops in 4.19.23/10, seems to be fixed anyhow in 5.0 (also works in 4.14.101)
> >>
> >> root@bpi-r2:~# ip link add link lan0 name lan0.5 type vlan id 5
> >> root@bpi-r2:~# ip addr add 192.168.5.200/24 brd 192.168.5.255 dev lan0.5
> >> root@bpi-r2:~# ip link set dev lan0 up
> >> root@bpi-r2:~# ip link set dev lan0.5 up
> > 
> > So that these steps don't involve a bridge, and because we don't (yet)
> > implment ndo_vlan_rx_{add,kill}_vid() netdevice_ops, there is no VLAN
> > programming, it's all software
> > 
> >>
> >> 12: lan0.5@lan0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state LOWERLAYERDOWN group default qlen 1000
> >>     link/ether 02:02:02:02:02:02 brd ff:ff:ff:ff:ff:ff
> >>     inet 192.168.5.200/24 brd 192.168.5.255 scope global lan0.5
> >>        valid_lft forever preferred_lft forever
> >>
> >> root@bpi-r2:~# brctl addbr bridge_name
> >> root@bpi-r2:~# brctl addif bridge_name lan0.5
> > 
> > Unless you changed the bridge to have VLAN filtering/awareness with:
> > 
> > echo 1 > /sys/class/net/bridge_name/bridge/vlan_filtering
> > 
> > There will be no VLAN configuration/objects pushed to DSA since commit
> > 2ea7a679ca2abd251c1ec03f20508619707e1749 ("net: dsa: Don't add vlans
> > when vlan filtering is disabled") so I am not sure how you got into that
> > situation because prior to pushing a VLAN object, the port must be part
> > of a bridge, so the steps typically look like:
> > 
> > - port_bridge_join which assigns dp->bridge_dev
> > - port_vlan_add
> > 
> > Does your 4.19.23 kernel somehow change how VLAN objects are pushed down
> > the switch driver?
> > 
> >> [  352.057128] bridge_name: port 1(lan0.5) entered blocking state
> >> [  352.063065] bridge_name: port 1(lan0.5) entered disabled state
> >> [  352.069181] device lan0.5 entered promiscuous mode
> >> [  352.074018] device lan0 entered promiscuous mode
> >> [  352.078906] Unable to handle kernel NULL pointer dereference at virtual address 00000558
> >> ...
> >> [  352.493085] [<bf0fde88>] (br_vlan_enabled [bridge]) from [<bf12c234>] (dsa_port_vlan_add+0x60/0xbc [dsa_core])
> >> [  352.503050] [<bf12c234>] (dsa_port_vlan_add [dsa_core]) from [<bf12cb64>] (dsa_slave_port_obj_add+0x4c/0x50 [dsa_core])
> >> [  352.513776] [<bf12cb64>] (dsa_slave_port_obj_add [dsa_core]) from [<c0b4e2d4>] (__switchdev_port_obj_add+0x50/0xc4)
> >> [  352.524138] [<c0b4e2d4>] (__switchdev_port_obj_add) from [<c0b4e324>] (__switchdev_port_obj_add+0xa0/0xc4)
> >> [  352.533721] [<c0b4e324>] (__switchdev_port_obj_add) from [<c0b4e3a8>] (switchdev_port_obj_add_now+0x60/0x130)
> >> [  352.543562] [<c0b4e3a8>] (switchdev_port_obj_add_now) from [<c0b4e7e4>] (switchdev_port_obj_add+0x44/0x190)
> >> [  352.553284] [<c0b4e7e4>] (switchdev_port_obj_add) from [<bf1013d0>] (br_switchdev_port_vlan_add+0x60/0x7c [bridge])
> >> [  352.563733] [<bf1013d0>] (br_switchdev_port_vlan_add [bridge]) from [<bf0ff250>] (__vlan_add+0xb0/0x620 [bridge])
> >> [  352.574007] [<bf0ff250>] (__vlan_add [bridge]) from [<bf0ffd04>] (nbp_vlan_add+0xc4/0x150 [bridge])
> >> [  352.583073] [<bf0ffd04>] (nbp_vlan_add [bridge]) from [<bf0ffec4>] (nbp_vlan_init+0x134/0x164 [bridge])
> >> [  352.592482] [<bf0ffec4>] (nbp_vlan_init [bridge]) from [<bf0edd4c>] (br_add_if+0x40c/0x5fc [bridge])
> >> [  352.601632] [<bf0edd4c>] (br_add_if [bridge]) from [<bf0eeb14>] (add_del_if+0x6c/0x80 [bridge])
> >> [  352.610351] [<bf0eeb14>] (add_del_if [bridge]) from [<bf0ef5b0>] (br_dev_ioctl+0x7c/0x9c [bridge])
> >> [  352.619290] [<bf0ef5b0>] (br_dev_ioctl [bridge]) from [<c09583d4>] (dev_ifsioc+0x184/0x324)
> >> [  352.627582] [<c09583d4>] (dev_ifsioc) from [<c09589e8>] (dev_ioctl+0x32c/0x5cc)
> >> [  352.634837] [<c09589e8>] (dev_ioctl) from [<c090913c>] (sock_ioctl+0x3bc/0x580)
> >>
> >>
> >> since my 4.19.23 kernel is modified a bit i tried with 4.19.10 without my net modifications and it is still reproducable with steps above (create a vlan on dsa-user-port and then use it in a bridge)
> >>
> >> i fixed it with these changes:
> 
> Your fix is undoing the commit from Andrew I just referenced, so it is
> definitively not the right fix because it will push VLAN objects down to
> a non-VLAN aware bridge, not that this is really a problem, but it
> should not be happening anyway.
> 
> The problem appears to be the following though: you are enslaving a VLAN
> device, which does not have switchdev_ops, so we recurse into the lower
> device, which is the DSA network device, which does have switchdev_ops
> defined. Once we are there we check the orig_dev against being a bridge
> master network device, but we are not checking that it's a VLAN device
> so we end-up assuming it is a DSA network device, we de-reference
> garbage by checking br_vlan_enabled(dp->bridge_dev) because there is
> simply no such data structure associated with the VLAN device.
> 
> The attached patch should help.
> 
> This is no longer a problem in newer kernels because the switchdev
> operations use a notifier which checks the target network device to be DSA.
> -- 
> Florian
>
Florian Fainelli Feb. 18, 2019, 7:09 p.m. UTC | #4
On February 18, 2019 9:11:27 AM PST, Frank Wunderlich <frank-w@public-files.de> wrote:
>Hi,
>
>tried your Patch but crashes the same way:

Yes the lower_dev targeted by switched does point to the DSA slave network device, but we really don't have dp->bridge_dev assigned since the physical DSA port was not enslaved in the bridge. Will follow up with a correct version late today.

>
>[  107.416972] [<bf0fde88>] (br_vlan_enabled [bridge]) from
>[<bf14e234>] (dsa_po
>rt_vlan_add+0x60/0xbc [dsa_core])                                      
>        
>[  107.426939] [<bf14e234>] (dsa_port_vlan_add [dsa_core]) from
>[<bf14eba4>] (ds
>a_slave_port_obj_add+0x64/0x68 [dsa_core])                             
>        
>[  107.437666] [<bf14eba4>] (dsa_slave_port_obj_add [dsa_core]) from
>[<c0b4e684>
>] (__switchdev_port_obj_add+0x50/0xc4)                                 
>        
>[  107.448029] [<c0b4e684>] (__switchdev_port_obj_add) from
>[<c0b4e6d4>] (__swit
>chdev_port_obj_add+0xa0/0xc4)                                          
>        
>[  107.457612] [<c0b4e6d4>] (__switchdev_port_obj_add) from
>[<c0b4e758>] (switch
>dev_port_obj_add_now+0x60/0x130)                                       
>        
>[  107.467453] [<c0b4e758>] (switchdev_port_obj_add_now) from
>[<c0b4eb94>] (swit
>chdev_port_obj_add+0x44/0x190)                                         
>        
>[  107.477181] [<c0b4eb94>] (switchdev_port_obj_add) from [<bf1013d0>]
>(br_switc
>hdev_port_vlan_add+0x60/0x7c [bridge])                                 
>        
>[  107.487644] [<bf1013d0>] (br_switchdev_port_vlan_add [bridge]) from
>[<bf0ff25
>0>] (__vlan_add+0xb0/0x620 [bridge])
>
>regards Frank
>
>
>> Gesendet: Montag, 18. Februar 2019 um 00:20 Uhr
>> Von: "Florian Fainelli" <f.fainelli@gmail.com>
>> An: "Frank Wunderlich" <frank-w@public-files.de>,
>netdev@vger.kernel.org, "Andrew Lunn" <andrew@lunn.ch>, "Vivien
>Didelot" <vivien.didelot@gmail.com>
>> Betreff: Re: [BUG] [FIX] net: dsa: oops in br_vlan_enabled
>>
>> 
>> 
>> On 2/17/2019 9:13 AM, Florian Fainelli wrote:
>> > Hi Frank,
>> > 
>> > On 2/16/2019 8:22 AM, Frank Wunderlich wrote:
>> >> Hi,
>> >>
>> >> i've found an oops in 4.19.23/10, seems to be fixed anyhow in 5.0
>(also works in 4.14.101)
>> >>
>> >> root@bpi-r2:~# ip link add link lan0 name lan0.5 type vlan id 5
>> >> root@bpi-r2:~# ip addr add 192.168.5.200/24 brd 192.168.5.255 dev
>lan0.5
>> >> root@bpi-r2:~# ip link set dev lan0 up
>> >> root@bpi-r2:~# ip link set dev lan0.5 up
>> > 
>> > So that these steps don't involve a bridge, and because we don't
>(yet)
>> > implment ndo_vlan_rx_{add,kill}_vid() netdevice_ops, there is no
>VLAN
>> > programming, it's all software
>> > 
>> >>
>> >> 12: lan0.5@lan0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500
>qdisc noqueue state LOWERLAYERDOWN group default qlen 1000
>> >>     link/ether 02:02:02:02:02:02 brd ff:ff:ff:ff:ff:ff
>> >>     inet 192.168.5.200/24 brd 192.168.5.255 scope global lan0.5
>> >>        valid_lft forever preferred_lft forever
>> >>
>> >> root@bpi-r2:~# brctl addbr bridge_name
>> >> root@bpi-r2:~# brctl addif bridge_name lan0.5
>> > 
>> > Unless you changed the bridge to have VLAN filtering/awareness
>with:
>> > 
>> > echo 1 > /sys/class/net/bridge_name/bridge/vlan_filtering
>> > 
>> > There will be no VLAN configuration/objects pushed to DSA since
>commit
>> > 2ea7a679ca2abd251c1ec03f20508619707e1749 ("net: dsa: Don't add
>vlans
>> > when vlan filtering is disabled") so I am not sure how you got into
>that
>> > situation because prior to pushing a VLAN object, the port must be
>part
>> > of a bridge, so the steps typically look like:
>> > 
>> > - port_bridge_join which assigns dp->bridge_dev
>> > - port_vlan_add
>> > 
>> > Does your 4.19.23 kernel somehow change how VLAN objects are pushed
>down
>> > the switch driver?
>> > 
>> >> [  352.057128] bridge_name: port 1(lan0.5) entered blocking state
>> >> [  352.063065] bridge_name: port 1(lan0.5) entered disabled state
>> >> [  352.069181] device lan0.5 entered promiscuous mode
>> >> [  352.074018] device lan0 entered promiscuous mode
>> >> [  352.078906] Unable to handle kernel NULL pointer dereference at
>virtual address 00000558
>> >> ...
>> >> [  352.493085] [<bf0fde88>] (br_vlan_enabled [bridge]) from
>[<bf12c234>] (dsa_port_vlan_add+0x60/0xbc [dsa_core])
>> >> [  352.503050] [<bf12c234>] (dsa_port_vlan_add [dsa_core]) from
>[<bf12cb64>] (dsa_slave_port_obj_add+0x4c/0x50 [dsa_core])
>> >> [  352.513776] [<bf12cb64>] (dsa_slave_port_obj_add [dsa_core])
>from [<c0b4e2d4>] (__switchdev_port_obj_add+0x50/0xc4)
>> >> [  352.524138] [<c0b4e2d4>] (__switchdev_port_obj_add) from
>[<c0b4e324>] (__switchdev_port_obj_add+0xa0/0xc4)
>> >> [  352.533721] [<c0b4e324>] (__switchdev_port_obj_add) from
>[<c0b4e3a8>] (switchdev_port_obj_add_now+0x60/0x130)
>> >> [  352.543562] [<c0b4e3a8>] (switchdev_port_obj_add_now) from
>[<c0b4e7e4>] (switchdev_port_obj_add+0x44/0x190)
>> >> [  352.553284] [<c0b4e7e4>] (switchdev_port_obj_add) from
>[<bf1013d0>] (br_switchdev_port_vlan_add+0x60/0x7c [bridge])
>> >> [  352.563733] [<bf1013d0>] (br_switchdev_port_vlan_add [bridge])
>from [<bf0ff250>] (__vlan_add+0xb0/0x620 [bridge])
>> >> [  352.574007] [<bf0ff250>] (__vlan_add [bridge]) from
>[<bf0ffd04>] (nbp_vlan_add+0xc4/0x150 [bridge])
>> >> [  352.583073] [<bf0ffd04>] (nbp_vlan_add [bridge]) from
>[<bf0ffec4>] (nbp_vlan_init+0x134/0x164 [bridge])
>> >> [  352.592482] [<bf0ffec4>] (nbp_vlan_init [bridge]) from
>[<bf0edd4c>] (br_add_if+0x40c/0x5fc [bridge])
>> >> [  352.601632] [<bf0edd4c>] (br_add_if [bridge]) from [<bf0eeb14>]
>(add_del_if+0x6c/0x80 [bridge])
>> >> [  352.610351] [<bf0eeb14>] (add_del_if [bridge]) from
>[<bf0ef5b0>] (br_dev_ioctl+0x7c/0x9c [bridge])
>> >> [  352.619290] [<bf0ef5b0>] (br_dev_ioctl [bridge]) from
>[<c09583d4>] (dev_ifsioc+0x184/0x324)
>> >> [  352.627582] [<c09583d4>] (dev_ifsioc) from [<c09589e8>]
>(dev_ioctl+0x32c/0x5cc)
>> >> [  352.634837] [<c09589e8>] (dev_ioctl) from [<c090913c>]
>(sock_ioctl+0x3bc/0x580)
>> >>
>> >>
>> >> since my 4.19.23 kernel is modified a bit i tried with 4.19.10
>without my net modifications and it is still reproducable with steps
>above (create a vlan on dsa-user-port and then use it in a bridge)
>> >>
>> >> i fixed it with these changes:
>> 
>> Your fix is undoing the commit from Andrew I just referenced, so it
>is
>> definitively not the right fix because it will push VLAN objects down
>to
>> a non-VLAN aware bridge, not that this is really a problem, but it
>> should not be happening anyway.
>> 
>> The problem appears to be the following though: you are enslaving a
>VLAN
>> device, which does not have switchdev_ops, so we recurse into the
>lower
>> device, which is the DSA network device, which does have
>switchdev_ops
>> defined. Once we are there we check the orig_dev against being a
>bridge
>> master network device, but we are not checking that it's a VLAN
>device
>> so we end-up assuming it is a DSA network device, we de-reference
>> garbage by checking br_vlan_enabled(dp->bridge_dev) because there is
>> simply no such data structure associated with the VLAN device.
>> 
>> The attached patch should help.
>> 
>> This is no longer a problem in newer kernels because the switchdev
>> operations use a notifier which checks the target network device to
>be DSA.
>> -- 
>> Florian
>>
diff mbox series

Patch

diff --git a/net/dsa/port.c b/net/dsa/port.c
index ed0595459df1..962887752ae8 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -255,8 +255,9 @@  int dsa_port_vlan_add(struct dsa_port *dp,
        if (netif_is_bridge_master(vlan->obj.orig_dev))
                return -EOPNOTSUPP;
 
-       if (br_vlan_enabled(dp->bridge_dev))
-               return dsa_port_notify(dp, DSA_NOTIFIER_VLAN_ADD, &info);
+       printk(KERN_ALERT "DEBUG: Passed %s %d 0x%x \n",__FUNCTION__,__LINE__,(unsigned int)dp->bridge_dev);
+       if (!dp->bridge_dev || br_vlan_enabled(dp->bridge_dev))
+               return dsa_port_notify(dp, DSA_NOTIFIER_VLAN_DEL, &info);
 
        return 0;
 }
@@ -273,7 +274,7 @@  int dsa_port_vlan_del(struct dsa_port *dp,
        if (netif_is_bridge_master(vlan->obj.orig_dev))
                return -EOPNOTSUPP;
 
-       if (br_vlan_enabled(dp->bridge_dev))
+       if (!dp->bridge_dev || br_vlan_enabled(dp->bridge_dev))
                return dsa_port_notify(dp, DSA_NOTIFIER_VLAN_DEL, &info);
 
        return 0;