diff mbox

Understanding mutual exclusion between rtnl_lock and rcu_read_lock

Message ID 1486099283.21871.73.camel@edumazet-glaptop3.roam.corp.google.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Feb. 3, 2017, 5:21 a.m. UTC
On Thu, 2017-02-02 at 15:52 -0800, Alexander Duyck wrote:
> On Thu, Feb 2, 2017 at 3:47 PM, Joel Cunningham <joel.cunningham@me.com> wrote:
> > Hi,
> >
> > I’m studying the synchronization used on different parts of struct net_device and I’m struggling to understand how structure member modifications in dev_ioctl are synchronized.  Getters in dev_ifsioc_locked() are only holding rcu_read_lock() while setters in dev_ifsioc() are holding rtnl_lock, but not using RCU APIs.  I was specifically looking at SIOCGIFHWADDR/SIOCSIFHWADDR.  What’s to prevent one CPU from executing a getter and another CPU from executing a setter resulting in possibly a torn read/write?  I didn’t see anything in rtnl_lock() that would wait for any rcu_reader_lock() critical sections (on other CPUs) to finish before acquiring the mutex.
> >
> > Is there something about dev_ioctl that prevents parallel execution? or maybe something I still don’t understand about the RCU implementation?
> >
> > Thanks,
> >
> > Joel
> 
> My advice would be to spend more time familiarizing yourself with RCU.
> The advantage of RCU is that it allows for updates while other threads
> are accessing the data.  The rtnl_lock is just meant to prevent
> multiple writers from updating the data simultaneously.  So between
> writers the rtnl_lock is used to keep things synchronized, but between
> writers and readers the mechanism that is meant to protect the data
> and keep it sane is RCU.

Note that sometimes we do not properly handle the case one field can be
written by a writer holding RTNL (or socket lock or something else)

We often believe compiler wont do something stupid, but it can
sometimes.

We definitely should scrutinize things a bit more, or maybe add __rcu
like annotations to catch potential problems earlier.

We recently found an issue in drivers/net/macvtap.c and
drivers/net/tun.c using q->vnet_hdr_sz without proper annotation.

macvtap patch would be :

Comments

Joel Cunningham Feb. 3, 2017, 4:26 p.m. UTC | #1
> On Feb 2, 2017, at 11:21 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> On Thu, 2017-02-02 at 15:52 -0800, Alexander Duyck wrote:
>> On Thu, Feb 2, 2017 at 3:47 PM, Joel Cunningham <joel.cunningham@me.com> wrote:
>>> Hi,
>>> 
>>> I’m studying the synchronization used on different parts of struct net_device and I’m struggling to understand how structure member modifications in dev_ioctl are synchronized.  Getters in dev_ifsioc_locked() are only holding rcu_read_lock() while setters in dev_ifsioc() are holding rtnl_lock, but not using RCU APIs.  I was specifically looking at SIOCGIFHWADDR/SIOCSIFHWADDR.  What’s to prevent one CPU from executing a getter and another CPU from executing a setter resulting in possibly a torn read/write?  I didn’t see anything in rtnl_lock() that would wait for any rcu_reader_lock() critical sections (on other CPUs) to finish before acquiring the mutex.
>>> 
>>> Is there something about dev_ioctl that prevents parallel execution? or maybe something I still don’t understand about the RCU implementation?
>>> 
>>> Thanks,
>>> 
>>> Joel
>> 
>> My advice would be to spend more time familiarizing yourself with RCU.
>> The advantage of RCU is that it allows for updates while other threads
>> are accessing the data.  The rtnl_lock is just meant to prevent
>> multiple writers from updating the data simultaneously.  So between
>> writers the rtnl_lock is used to keep things synchronized, but between
>> writers and readers the mechanism that is meant to protect the data
>> and keep it sane is RCU.
> 
> Note that sometimes we do not properly handle the case one field can be
> written by a writer holding RTNL (or socket lock or something else)
> 
> We often believe compiler wont do something stupid, but it can
> sometimes.
> 
> We definitely should scrutinize things a bit more, or maybe add __rcu
> like annotations to catch potential problems earlier.

This is my hunch from looking at dev_ioctl().  For some of the other fields, there is additional support to detect a write during the read, but not any of the ioctls handled in dev_ifsioc_locked().  For example, SIOCGIFNAME, dev_ifname() calls netdev_get_name() to copy dev->name, which uses devnet_rename_seq seqcount to detect if another thread called dev_change_name() and updated the name.

I found more examples of accessing net_device fields in net-sysfs.c and these instances are all acquire dev_base_lock/rtnl_lock before reading fields.  Maybe dev_ioctl should be implemented this way

> 
> We recently found an issue in drivers/net/macvtap.c and
> drivers/net/tun.c using q->vnet_hdr_sz without proper annotation.
> 
> macvtap patch would be :
> 
> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> index 4026185658381df004a7d641e2be7bcb9a45b509..d11a807565acf371f9bbb4afbfaca1aacd000138 100644
> --- a/drivers/net/macvtap.c
> +++ b/drivers/net/macvtap.c
> @@ -681,7 +681,7 @@ static ssize_t macvtap_get_user(struct macvtap_queue *q, struct msghdr *m,
> 	size_t linear;
> 
> 	if (q->flags & IFF_VNET_HDR) {
> -		vnet_hdr_len = q->vnet_hdr_sz;
> +		vnet_hdr_len = READ_ONCE(q->vnet_hdr_sz);
> 
> 		err = -EINVAL;
> 		if (len < vnet_hdr_len)
> @@ -820,7 +820,7 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q,
> 
> 	if (q->flags & IFF_VNET_HDR) {
> 		struct virtio_net_hdr vnet_hdr;
> -		vnet_hdr_len = q->vnet_hdr_sz;
> +		vnet_hdr_len = READ_ONCE(q->vnet_hdr_sz);
> 		if (iov_iter_count(iter) < vnet_hdr_len)
> 			return -EINVAL;
> 
> @@ -1090,7 +1090,7 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
> 		if (s < (int)sizeof(struct virtio_net_hdr))
> 			return -EINVAL;
> 
> -		q->vnet_hdr_sz = s;
> +		WRITE_ONCE(q->vnet_hdr_sz, s);
> 		return 0;
> 
> 	case TUNGETVNETLE:

Joel
diff mbox

Patch

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 4026185658381df004a7d641e2be7bcb9a45b509..d11a807565acf371f9bbb4afbfaca1aacd000138 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -681,7 +681,7 @@  static ssize_t macvtap_get_user(struct macvtap_queue *q, struct msghdr *m,
 	size_t linear;
 
 	if (q->flags & IFF_VNET_HDR) {
-		vnet_hdr_len = q->vnet_hdr_sz;
+		vnet_hdr_len = READ_ONCE(q->vnet_hdr_sz);
 
 		err = -EINVAL;
 		if (len < vnet_hdr_len)
@@ -820,7 +820,7 @@  static ssize_t macvtap_put_user(struct macvtap_queue *q,
 
 	if (q->flags & IFF_VNET_HDR) {
 		struct virtio_net_hdr vnet_hdr;
-		vnet_hdr_len = q->vnet_hdr_sz;
+		vnet_hdr_len = READ_ONCE(q->vnet_hdr_sz);
 		if (iov_iter_count(iter) < vnet_hdr_len)
 			return -EINVAL;
 
@@ -1090,7 +1090,7 @@  static long macvtap_ioctl(struct file *file, unsigned int cmd,
 		if (s < (int)sizeof(struct virtio_net_hdr))
 			return -EINVAL;
 
-		q->vnet_hdr_sz = s;
+		WRITE_ONCE(q->vnet_hdr_sz, s);
 		return 0;
 
 	case TUNGETVNETLE: