Message ID | 1486099283.21871.73.camel@edumazet-glaptop3.roam.corp.google.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
> On Feb 2, 2017, at 11:21 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > On Thu, 2017-02-02 at 15:52 -0800, Alexander Duyck wrote: >> On Thu, Feb 2, 2017 at 3:47 PM, Joel Cunningham <joel.cunningham@me.com> wrote: >>> Hi, >>> >>> I’m studying the synchronization used on different parts of struct net_device and I’m struggling to understand how structure member modifications in dev_ioctl are synchronized. Getters in dev_ifsioc_locked() are only holding rcu_read_lock() while setters in dev_ifsioc() are holding rtnl_lock, but not using RCU APIs. I was specifically looking at SIOCGIFHWADDR/SIOCSIFHWADDR. What’s to prevent one CPU from executing a getter and another CPU from executing a setter resulting in possibly a torn read/write? I didn’t see anything in rtnl_lock() that would wait for any rcu_reader_lock() critical sections (on other CPUs) to finish before acquiring the mutex. >>> >>> Is there something about dev_ioctl that prevents parallel execution? or maybe something I still don’t understand about the RCU implementation? >>> >>> Thanks, >>> >>> Joel >> >> My advice would be to spend more time familiarizing yourself with RCU. >> The advantage of RCU is that it allows for updates while other threads >> are accessing the data. The rtnl_lock is just meant to prevent >> multiple writers from updating the data simultaneously. So between >> writers the rtnl_lock is used to keep things synchronized, but between >> writers and readers the mechanism that is meant to protect the data >> and keep it sane is RCU. > > Note that sometimes we do not properly handle the case one field can be > written by a writer holding RTNL (or socket lock or something else) > > We often believe compiler wont do something stupid, but it can > sometimes. > > We definitely should scrutinize things a bit more, or maybe add __rcu > like annotations to catch potential problems earlier. This is my hunch from looking at dev_ioctl(). For some of the other fields, there is additional support to detect a write during the read, but not any of the ioctls handled in dev_ifsioc_locked(). For example, SIOCGIFNAME, dev_ifname() calls netdev_get_name() to copy dev->name, which uses devnet_rename_seq seqcount to detect if another thread called dev_change_name() and updated the name. I found more examples of accessing net_device fields in net-sysfs.c and these instances are all acquire dev_base_lock/rtnl_lock before reading fields. Maybe dev_ioctl should be implemented this way > > We recently found an issue in drivers/net/macvtap.c and > drivers/net/tun.c using q->vnet_hdr_sz without proper annotation. > > macvtap patch would be : > > diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c > index 4026185658381df004a7d641e2be7bcb9a45b509..d11a807565acf371f9bbb4afbfaca1aacd000138 100644 > --- a/drivers/net/macvtap.c > +++ b/drivers/net/macvtap.c > @@ -681,7 +681,7 @@ static ssize_t macvtap_get_user(struct macvtap_queue *q, struct msghdr *m, > size_t linear; > > if (q->flags & IFF_VNET_HDR) { > - vnet_hdr_len = q->vnet_hdr_sz; > + vnet_hdr_len = READ_ONCE(q->vnet_hdr_sz); > > err = -EINVAL; > if (len < vnet_hdr_len) > @@ -820,7 +820,7 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q, > > if (q->flags & IFF_VNET_HDR) { > struct virtio_net_hdr vnet_hdr; > - vnet_hdr_len = q->vnet_hdr_sz; > + vnet_hdr_len = READ_ONCE(q->vnet_hdr_sz); > if (iov_iter_count(iter) < vnet_hdr_len) > return -EINVAL; > > @@ -1090,7 +1090,7 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd, > if (s < (int)sizeof(struct virtio_net_hdr)) > return -EINVAL; > > - q->vnet_hdr_sz = s; > + WRITE_ONCE(q->vnet_hdr_sz, s); > return 0; > > case TUNGETVNETLE: Joel
diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c index 4026185658381df004a7d641e2be7bcb9a45b509..d11a807565acf371f9bbb4afbfaca1aacd000138 100644 --- a/drivers/net/macvtap.c +++ b/drivers/net/macvtap.c @@ -681,7 +681,7 @@ static ssize_t macvtap_get_user(struct macvtap_queue *q, struct msghdr *m, size_t linear; if (q->flags & IFF_VNET_HDR) { - vnet_hdr_len = q->vnet_hdr_sz; + vnet_hdr_len = READ_ONCE(q->vnet_hdr_sz); err = -EINVAL; if (len < vnet_hdr_len) @@ -820,7 +820,7 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q, if (q->flags & IFF_VNET_HDR) { struct virtio_net_hdr vnet_hdr; - vnet_hdr_len = q->vnet_hdr_sz; + vnet_hdr_len = READ_ONCE(q->vnet_hdr_sz); if (iov_iter_count(iter) < vnet_hdr_len) return -EINVAL; @@ -1090,7 +1090,7 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd, if (s < (int)sizeof(struct virtio_net_hdr)) return -EINVAL; - q->vnet_hdr_sz = s; + WRITE_ONCE(q->vnet_hdr_sz, s); return 0; case TUNGETVNETLE: