diff mbox series

bonding: do not enslave CAN devices

Message ID 20200130133046.2047-1-socketcan@hartkopp.net
State Changes Requested
Delegated to: David Miller
Headers show
Series bonding: do not enslave CAN devices | expand

Commit Message

Oliver Hartkopp Jan. 30, 2020, 1:30 p.m. UTC
Since commit 8df9ffb888c ("can: make use of preallocated can_ml_priv for per
device struct can_dev_rcv_lists") the device specific CAN receive filter lists
are stored in netdev_priv() and dev->ml_priv points to these filters.

In the bug report Syzkaller enslaved a vxcan1 CAN device and accessed the
bonding device with a PF_CAN socket which lead to a crash due to an access of
an unhandled bond_dev->ml_priv pointer.

Deny to enslave CAN devices by the bonding driver as the resulting bond_dev
pretends to be a CAN device by copying dev->type without really being one.

Reported-by: syzbot+c3ea30e1e2485573f953@syzkaller.appspotmail.com
Fixes: 8df9ffb888c ("can: make use of preallocated can_ml_priv for per
device struct can_dev_rcv_lists")
Cc: linux-stable <stable@vger.kernel.org> # >= v5.4
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
---
 drivers/net/bonding/bond_main.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

Sabrina Dubroca Jan. 30, 2020, 1:41 p.m. UTC | #1
Hello,

2020-01-30, 14:30:46 +0100, Oliver Hartkopp wrote:
> Since commit 8df9ffb888c ("can: make use of preallocated can_ml_priv for per
> device struct can_dev_rcv_lists") the device specific CAN receive filter lists
> are stored in netdev_priv() and dev->ml_priv points to these filters.
> 
> In the bug report Syzkaller enslaved a vxcan1 CAN device and accessed the
> bonding device with a PF_CAN socket which lead to a crash due to an access of
> an unhandled bond_dev->ml_priv pointer.
> 
> Deny to enslave CAN devices by the bonding driver as the resulting bond_dev
> pretends to be a CAN device by copying dev->type without really being one.

Does the team driver have the same problem?
Oliver Hartkopp Jan. 30, 2020, 1:57 p.m. UTC | #2
On 30/01/2020 14.41, Sabrina Dubroca wrote:

> 2020-01-30, 14:30:46 +0100, Oliver Hartkopp wrote:
>> Since commit 8df9ffb888c ("can: make use of preallocated can_ml_priv for per
>> device struct can_dev_rcv_lists") the device specific CAN receive filter lists
>> are stored in netdev_priv() and dev->ml_priv points to these filters.
>>
>> In the bug report Syzkaller enslaved a vxcan1 CAN device and accessed the
>> bonding device with a PF_CAN socket which lead to a crash due to an access of
>> an unhandled bond_dev->ml_priv pointer.
>>
>> Deny to enslave CAN devices by the bonding driver as the resulting bond_dev
>> pretends to be a CAN device by copying dev->type without really being one.
> 
> Does the team driver have the same problem?

Good point!

 From a first look into team_setup_by_port() in team.c I would say YES :-)

Thanks for watching out! I would suggest to wait for some more feedback 
and upstream of this fix.

Best regards,
Oliver
Oliver Hartkopp Feb. 4, 2020, 5:11 p.m. UTC | #3
Any updates, reviews, acks on this?

As pointed out by Sabrina here 
https://marc.info/?l=linux-netdev&m=158039302905460&w=2
the issue is also relevant for the TEAM driver.

Best,
Oliver

On 30/01/2020 14.30, Oliver Hartkopp wrote:
> Since commit 8df9ffb888c ("can: make use of preallocated can_ml_priv for per
> device struct can_dev_rcv_lists") the device specific CAN receive filter lists
> are stored in netdev_priv() and dev->ml_priv points to these filters.
> 
> In the bug report Syzkaller enslaved a vxcan1 CAN device and accessed the
> bonding device with a PF_CAN socket which lead to a crash due to an access of
> an unhandled bond_dev->ml_priv pointer.
> 
> Deny to enslave CAN devices by the bonding driver as the resulting bond_dev
> pretends to be a CAN device by copying dev->type without really being one.
> 
> Reported-by: syzbot+c3ea30e1e2485573f953@syzkaller.appspotmail.com
> Fixes: 8df9ffb888c ("can: make use of preallocated can_ml_priv for per
> device struct can_dev_rcv_lists")
> Cc: linux-stable <stable@vger.kernel.org> # >= v5.4
> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
> ---
>   drivers/net/bonding/bond_main.c | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 48d5ec770b94..4b781a7dfd96 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -1475,6 +1475,18 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
>   		return -EPERM;
>   	}
>   
> +	/* CAN network devices hold device specific filter lists in
> +	 * netdev_priv() where dev->ml_priv sets a reference to.
> +	 * As bonding assumes to have some ethernet-like device it doesn't
> +	 * take care about these CAN specific filter lists today.
> +	 * So we deny the enslaving of CAN interfaces here.
> +	 */
> +	if (slave_dev->type == ARPHRD_CAN) {
> +		NL_SET_ERR_MSG(extack, "CAN devices can not be enslaved");
> +		slave_err(bond_dev, slave_dev, "no bonding on CAN devices\n");
> +		return -EINVAL;
> +	}
> +
>   	/* set bonding device ether type by slave - bonding netdevices are
>   	 * created with ether_setup, so when the slave type is not ARPHRD_ETHER
>   	 * there is a need to override some of the type dependent attribs/funcs.
>
Marc Kleine-Budde Feb. 25, 2020, 8:32 p.m. UTC | #4
On 1/30/20 2:30 PM, Oliver Hartkopp wrote:
> Since commit 8df9ffb888c ("can: make use of preallocated can_ml_priv for per
> device struct can_dev_rcv_lists") the device specific CAN receive filter lists
> are stored in netdev_priv() and dev->ml_priv points to these filters.
> 
> In the bug report Syzkaller enslaved a vxcan1 CAN device and accessed the
> bonding device with a PF_CAN socket which lead to a crash due to an access of
> an unhandled bond_dev->ml_priv pointer.
> 
> Deny to enslave CAN devices by the bonding driver as the resulting bond_dev
> pretends to be a CAN device by copying dev->type without really being one.
> 
> Reported-by: syzbot+c3ea30e1e2485573f953@syzkaller.appspotmail.com
> Fixes: 8df9ffb888c ("can: make use of preallocated can_ml_priv for per
> device struct can_dev_rcv_lists")
> Cc: linux-stable <stable@vger.kernel.org> # >= v5.4
> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>

What's the preferred to upstream this? I could take this via the
linux-can tree.

regards,
Marc

> ---
>  drivers/net/bonding/bond_main.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 48d5ec770b94..4b781a7dfd96 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -1475,6 +1475,18 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
>  		return -EPERM;
>  	}
>  
> +	/* CAN network devices hold device specific filter lists in
> +	 * netdev_priv() where dev->ml_priv sets a reference to.
> +	 * As bonding assumes to have some ethernet-like device it doesn't
> +	 * take care about these CAN specific filter lists today.
> +	 * So we deny the enslaving of CAN interfaces here.
> +	 */
> +	if (slave_dev->type == ARPHRD_CAN) {
> +		NL_SET_ERR_MSG(extack, "CAN devices can not be enslaved");
> +		slave_err(bond_dev, slave_dev, "no bonding on CAN devices\n");
> +		return -EINVAL;
> +	}
> +
>  	/* set bonding device ether type by slave - bonding netdevices are
>  	 * created with ether_setup, so when the slave type is not ARPHRD_ETHER
>  	 * there is a need to override some of the type dependent attribs/funcs.
>
David Miller Feb. 27, 2020, 4:23 a.m. UTC | #5
From: Marc Kleine-Budde <mkl@pengutronix.de>
Date: Tue, 25 Feb 2020 21:32:41 +0100

> On 1/30/20 2:30 PM, Oliver Hartkopp wrote:
>> Since commit 8df9ffb888c ("can: make use of preallocated can_ml_priv for per
>> device struct can_dev_rcv_lists") the device specific CAN receive filter lists
>> are stored in netdev_priv() and dev->ml_priv points to these filters.
>> 
>> In the bug report Syzkaller enslaved a vxcan1 CAN device and accessed the
>> bonding device with a PF_CAN socket which lead to a crash due to an access of
>> an unhandled bond_dev->ml_priv pointer.
>> 
>> Deny to enslave CAN devices by the bonding driver as the resulting bond_dev
>> pretends to be a CAN device by copying dev->type without really being one.
>> 
>> Reported-by: syzbot+c3ea30e1e2485573f953@syzkaller.appspotmail.com
>> Fixes: 8df9ffb888c ("can: make use of preallocated can_ml_priv for per
>> device struct can_dev_rcv_lists")
>> Cc: linux-stable <stable@vger.kernel.org> # >= v5.4
>> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
> 
> What's the preferred to upstream this? I could take this via the
> linux-can tree.

What I don't get is why the PF_CAN is blindly dereferencing a device
assuming what is behind bond_dev->ml_priv.

If it assumes a device it access is CAN then it should check the
device by comparing the netdev_ops or via some other means.

This restriction seems arbitrary.
Oliver Hartkopp March 2, 2020, 8:45 a.m. UTC | #6
On 27/02/2020 05.23, David Miller wrote:
> From: Marc Kleine-Budde <mkl@pengutronix.de>
> Date: Tue, 25 Feb 2020 21:32:41 +0100
> 
>> On 1/30/20 2:30 PM, Oliver Hartkopp wrote:
>>> Since commit 8df9ffb888c ("can: make use of preallocated can_ml_priv for per
>>> device struct can_dev_rcv_lists") the device specific CAN receive filter lists
>>> are stored in netdev_priv() and dev->ml_priv points to these filters.
>>>
>>> In the bug report Syzkaller enslaved a vxcan1 CAN device and accessed the
>>> bonding device with a PF_CAN socket which lead to a crash due to an access of
>>> an unhandled bond_dev->ml_priv pointer.
>>>
>>> Deny to enslave CAN devices by the bonding driver as the resulting bond_dev
>>> pretends to be a CAN device by copying dev->type without really being one.
>>>
>>> Reported-by: syzbot+c3ea30e1e2485573f953@syzkaller.appspotmail.com
>>> Fixes: 8df9ffb888c ("can: make use of preallocated can_ml_priv for per
>>> device struct can_dev_rcv_lists")
>>> Cc: linux-stable <stable@vger.kernel.org> # >= v5.4
>>> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
>> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
>>
>> What's the preferred to upstream this? I could take this via the
>> linux-can tree.
> 
> What I don't get is why the PF_CAN is blindly dereferencing a device
> assuming what is behind bond_dev->ml_priv.
> 
> If it assumes a device it access is CAN then it should check the
> device by comparing the netdev_ops or via some other means.

Yes we do.

> This restriction seems arbitrary.

Since commit 8df9ffb888c the data structures for the CAN filters have 
been moved from net/can/af_can.c into netdev->ml_priv.

PF_CAN only works with CAN interfaces and therefore always checks 
dev->type to be ARPHRD_CAN before accessing netdev->ml_priv.

Bonding and Team driver copy most of the device data structures to 
create bonding/team devices.
They copy dev->type but *not* dev->ml_priv.
That leads to the problematic ml_priv access after passing the dev->type 
check ...

I don't know yet whether it makes sense to have CAN bonding/team 
devices. But if so we would need some more investigation. For now 
disabling CAN interfaces for bonding/team devices seems to be reasonable.

Regards,
Oliver
David Miller March 2, 2020, 7:12 p.m. UTC | #7
From: Oliver Hartkopp <socketcan@hartkopp.net>
Date: Mon, 2 Mar 2020 09:45:41 +0100

> I don't know yet whether it makes sense to have CAN bonding/team
> devices. But if so we would need some more investigation. For now
> disabling CAN interfaces for bonding/team devices seems to be
> reasonable.

Every single interesting device that falls into a special use case
like CAN is going to be tempted to add a similar check.

I don't want to set this precedence.

Check that the devices you get passed are actually CAN devices, it's
easy, just compare the netdev_ops and make sure they equal the CAN
ones.
Marc Kleine-Budde March 6, 2020, 2:12 p.m. UTC | #8
On 3/2/20 8:12 PM, David Miller wrote:
> From: Oliver Hartkopp <socketcan@hartkopp.net>
> Date: Mon, 2 Mar 2020 09:45:41 +0100
> 
>> I don't know yet whether it makes sense to have CAN bonding/team
>> devices. But if so we would need some more investigation. For now
>> disabling CAN interfaces for bonding/team devices seems to be
>> reasonable.
> 
> Every single interesting device that falls into a special use case
> like CAN is going to be tempted to add a similar check.
> 
> I don't want to set this precedence.
> 
> Check that the devices you get passed are actually CAN devices, it's
> easy, just compare the netdev_ops and make sure they equal the CAN
> ones.

Sorry, I'm not really sure how to implement this check.

Should I maintain a list of all netdev_ops of all the CAN devices (=
whitelist) and the compare against that list? Having a global list of
pointers to network devices remind me of the old days of kernel-2.4.

regards,
Marc
Dmitry Vyukov March 6, 2020, 5:34 p.m. UTC | #9
On Fri, Mar 6, 2020 at 3:12 PM Marc Kleine-Budde <mkl@pengutronix.de> wrote:
>
> On 3/2/20 8:12 PM, David Miller wrote:
> > From: Oliver Hartkopp <socketcan@hartkopp.net>
> > Date: Mon, 2 Mar 2020 09:45:41 +0100
> >
> >> I don't know yet whether it makes sense to have CAN bonding/team
> >> devices. But if so we would need some more investigation. For now
> >> disabling CAN interfaces for bonding/team devices seems to be
> >> reasonable.
> >
> > Every single interesting device that falls into a special use case
> > like CAN is going to be tempted to add a similar check.
> >
> > I don't want to set this precedence.
> >
> > Check that the devices you get passed are actually CAN devices, it's
> > easy, just compare the netdev_ops and make sure they equal the CAN
> > ones.
>
> Sorry, I'm not really sure how to implement this check.
>
> Should I maintain a list of all netdev_ops of all the CAN devices (=
> whitelist) and the compare against that list? Having a global list of
> pointers to network devices remind me of the old days of kernel-2.4.

I think Dave means something like this:

$ grep "netdev_ops == " drivers/net/*/*.c net/*/*.c
drivers/net/hyperv/netvsc_drv.c: if (event_dev->netdev_ops == &device_ops)
drivers/net/ppp/ppp_generic.c: if (dev->netdev_ops == &ppp_netdev_ops)
net/dsa/slave.c: return dev->netdev_ops == &dsa_slave_netdev_ops;
net/openvswitch/vport-internal_dev.c: return netdev->netdev_ops ==
&internal_dev_netdev_ops;
David Miller March 7, 2020, 5:13 a.m. UTC | #10
From: Marc Kleine-Budde <mkl@pengutronix.de>
Date: Fri, 6 Mar 2020 15:12:48 +0100

> On 3/2/20 8:12 PM, David Miller wrote:
>> From: Oliver Hartkopp <socketcan@hartkopp.net>
>> Date: Mon, 2 Mar 2020 09:45:41 +0100
>> 
>>> I don't know yet whether it makes sense to have CAN bonding/team
>>> devices. But if so we would need some more investigation. For now
>>> disabling CAN interfaces for bonding/team devices seems to be
>>> reasonable.
>> 
>> Every single interesting device that falls into a special use case
>> like CAN is going to be tempted to add a similar check.
>> 
>> I don't want to set this precedence.
>> 
>> Check that the devices you get passed are actually CAN devices, it's
>> easy, just compare the netdev_ops and make sure they equal the CAN
>> ones.
> 
> Sorry, I'm not really sure how to implement this check.

Like this:

if (netdev->ops != &can_netdev_ops)
	return;

Done.
Marc Kleine-Budde March 9, 2020, 10:25 a.m. UTC | #11
On 3/7/20 6:13 AM, David Miller wrote:
> From: Marc Kleine-Budde <mkl@pengutronix.de>
> Date: Fri, 6 Mar 2020 15:12:48 +0100
> 
>> On 3/2/20 8:12 PM, David Miller wrote:
>>> From: Oliver Hartkopp <socketcan@hartkopp.net>
>>> Date: Mon, 2 Mar 2020 09:45:41 +0100
>>>
>>>> I don't know yet whether it makes sense to have CAN bonding/team
>>>> devices. But if so we would need some more investigation. For now
>>>> disabling CAN interfaces for bonding/team devices seems to be
>>>> reasonable.
>>>
>>> Every single interesting device that falls into a special use case
>>> like CAN is going to be tempted to add a similar check.
>>>
>>> I don't want to set this precedence.
>>>
>>> Check that the devices you get passed are actually CAN devices, it's
>>> easy, just compare the netdev_ops and make sure they equal the CAN
>>> ones.
>>
>> Sorry, I'm not really sure how to implement this check.
> 
> Like this:
> 
> if (netdev->ops != &can_netdev_ops)
> 	return;

There is no single can_netdev_ops. The netdev_ops are per CAN-network
driver. But the ml_priv is used in the generic CAN code.

regards,
Marc
Oleksij Rempel March 13, 2020, 9:56 a.m. UTC | #12
On Mon, Mar 09, 2020 at 11:25:50AM +0100, Marc Kleine-Budde wrote:
> On 3/7/20 6:13 AM, David Miller wrote:
> > From: Marc Kleine-Budde <mkl@pengutronix.de>
> > Date: Fri, 6 Mar 2020 15:12:48 +0100
> > 
> >> On 3/2/20 8:12 PM, David Miller wrote:
> >>> From: Oliver Hartkopp <socketcan@hartkopp.net>
> >>> Date: Mon, 2 Mar 2020 09:45:41 +0100
> >>>
> >>>> I don't know yet whether it makes sense to have CAN bonding/team
> >>>> devices. But if so we would need some more investigation. For now
> >>>> disabling CAN interfaces for bonding/team devices seems to be
> >>>> reasonable.
> >>>
> >>> Every single interesting device that falls into a special use case
> >>> like CAN is going to be tempted to add a similar check.
> >>>
> >>> I don't want to set this precedence.
> >>>
> >>> Check that the devices you get passed are actually CAN devices, it's
> >>> easy, just compare the netdev_ops and make sure they equal the CAN
> >>> ones.
> >>
> >> Sorry, I'm not really sure how to implement this check.
> > 
> > Like this:
> > 
> > if (netdev->ops != &can_netdev_ops)
> > 	return;
> 
> There is no single can_netdev_ops. The netdev_ops are per CAN-network
> driver. But the ml_priv is used in the generic CAN code.

ping,

are there any other ways or ideas how to solve this issue?

Regards,
Oleksij
Oliver Hartkopp March 21, 2020, 2 p.m. UTC | #13
+ Sabrina Dubroca (takes care of team driver which has the same issue)

On 13/03/2020 10.56, Oleksij Rempel wrote:
> On Mon, Mar 09, 2020 at 11:25:50AM +0100, Marc Kleine-Budde wrote:
>> On 3/7/20 6:13 AM, David Miller wrote:

>>> Like this:
>>>
>>> if (netdev->ops != &can_netdev_ops)
>>> 	return;
>>
>> There is no single can_netdev_ops. The netdev_ops are per CAN-network
>> driver. But the ml_priv is used in the generic CAN code.
> 
> ping,
> 
> are there any other ways or ideas how to solve this issue?

Well, IMO the patch from
https://marc.info/?l=linux-can&m=158039108004683
is still valid.

Although the attribution that commit 8df9ffb888c ("can: make use of 
preallocated can_ml_priv for per device struct can_dev_rcv_lists")
introduced the problem could be removed.

Even before this commit dev->ml_priv of CAN interfaces has been used to 
store the filter lists. And either the bonding and the team driver do 
not take care of ml_priv.

They pretend to be CAN interfaces by faking dev->type to be ARPHRD_CAN - 
but they are not. When we dereference dev->ml_priv in (badly) faked CAN 
devices we run into the reported issue.

So the approach is to tell bonding and team driver to keep the fingers 
away from CAN interfaces like we do with some ARPHRD_INFINIBAND setups 
in bond_enslave() too.

Regards,
Oliver
diff mbox series

Patch

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 48d5ec770b94..4b781a7dfd96 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1475,6 +1475,18 @@  int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
 		return -EPERM;
 	}
 
+	/* CAN network devices hold device specific filter lists in
+	 * netdev_priv() where dev->ml_priv sets a reference to.
+	 * As bonding assumes to have some ethernet-like device it doesn't
+	 * take care about these CAN specific filter lists today.
+	 * So we deny the enslaving of CAN interfaces here.
+	 */
+	if (slave_dev->type == ARPHRD_CAN) {
+		NL_SET_ERR_MSG(extack, "CAN devices can not be enslaved");
+		slave_err(bond_dev, slave_dev, "no bonding on CAN devices\n");
+		return -EINVAL;
+	}
+
 	/* set bonding device ether type by slave - bonding netdevices are
 	 * created with ether_setup, so when the slave type is not ARPHRD_ETHER
 	 * there is a need to override some of the type dependent attribs/funcs.