Patchwork NBD: Move from a global spinlock to one lock per NBD device

login
register
mail settings
Submitter Nicholas Thomas
Date Feb. 5, 2013, 4:14 p.m.
Message ID <1360080867.18869.30.camel@eboracum.office.bytemark.co.uk>
Download mbox | patch
Permalink /patch/218287/
State Not Applicable
Delegated to: David Miller
Headers show

Comments

Nicholas Thomas - Feb. 5, 2013, 4:14 p.m.
This patch is entirely based on a submission by Jerry Chu, incorporating
suggestions from Eric Dumazet. Modern, faster NICs make the original comment
on why a single lock is preferable incorrect; moving to one lock per NBD
device removes a performance bottleneck.

Original patch: http://permalink.gmane.org/gmane.linux.network/207233
    
Signed-off-by: Nick Thomas <nick@bytemark.co.uk>


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benjamin LaHaise - Feb. 5, 2013, 4:22 p.m.
On Tue, Feb 05, 2013 at 04:14:27PM +0000, Nicholas Thomas wrote:
> This patch is entirely based on a submission by Jerry Chu, incorporating
> suggestions from Eric Dumazet. Modern, faster NICs make the original comment
> on why a single lock is preferable incorrect; moving to one lock per NBD
> device removes a performance bottleneck.
> 
> Original patch: http://permalink.gmane.org/gmane.linux.network/207233
...
> +static spinlock_t *nbd_locks __read_mostly;
...

This is about the worst way to split up a lock possible.  Most (all?) of the 
spinlocks across nbd devices are on the same cacheline, so performance will 
be limited by the rate of cacheline bounces for the lock.  It would be far 
better to embed the spinlock in the data structures that it will be 
protecting to avoid this expensive false-sharing.

		-ben
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul Clements - Feb. 5, 2013, 9:31 p.m.
On Tue, Feb 5, 2013 at 11:22 AM, Benjamin LaHaise <bcrl@kvack.org> wrote:
> On Tue, Feb 05, 2013 at 04:14:27PM +0000, Nicholas Thomas wrote:
>> This patch is entirely based on a submission by Jerry Chu, incorporating
>> suggestions from Eric Dumazet. Modern, faster NICs make the original comment
>> on why a single lock is preferable incorrect; moving to one lock per NBD
>> device removes a performance bottleneck.
>>
>> Original patch: http://permalink.gmane.org/gmane.linux.network/207233
> ...
>> +static spinlock_t *nbd_locks __read_mostly;
> ...
>
> This is about the worst way to split up a lock possible.  Most (all?) of the
> spinlocks across nbd devices are on the same cacheline, so performance will
> be limited by the rate of cacheline bounces for the lock.  It would be far
> better to embed the spinlock in the data structures that it will be
> protecting to avoid this expensive false-sharing.

Yes, embedding this in the nbd_device would be much better, no?

--
Paul
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 043ddcc..d285b64 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -57,20 +57,9 @@  static unsigned int debugflags;
 
 static unsigned int nbds_max = 16;
 static struct nbd_device *nbd_dev;
+static spinlock_t *nbd_locks __read_mostly;
 static int max_part;
 
-/*
- * Use just one lock (or at most 1 per NIC). Two arguments for this:
- * 1. Each NIC is essentially a synchronization point for all servers
- *    accessed through that NIC so there's no need to have more locks
- *    than NICs anyway.
- * 2. More locks lead to more "Dirty cache line bouncing" which will slow
- *    down each lock to the point where they're actually slower than just
- *    a single lock.
- * Thanks go to Jens Axboe and Al Viro for their LKML emails explaining this!
- */
-static DEFINE_SPINLOCK(nbd_lock);
-
 #ifndef NDEBUG
 static const char *ioctl_cmd_to_ascii(int cmd)
 {
@@ -781,6 +770,15 @@  static int __init nbd_init(void)
 	if (!nbd_dev)
 		return -ENOMEM;
 
+	nbd_locks = kcalloc(nbds_max, sizeof(*nbd_locks), GFP_KERNEL);
+	if (!nbd_locks) {
+		kfree(nbd_dev);
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < nbds_max; i++)
+		spin_lock_init(&nbd_locks[i]);
+
 	part_shift = 0;
 	if (max_part > 0) {
 		part_shift = fls(max_part);
@@ -812,7 +810,7 @@  static int __init nbd_init(void)
 		 * every gendisk to have its very own request_queue struct.
 		 * These structs are big so we dynamically allocate them.
 		 */
-		disk->queue = blk_init_queue(do_nbd_request, &nbd_lock);
+		disk->queue = blk_init_queue(do_nbd_request, &nbd_locks[i]);
 		if (!disk->queue) {
 			put_disk(disk);
 			goto out;
@@ -863,6 +861,7 @@  out:
 		put_disk(nbd_dev[i].disk);
 	}
 	kfree(nbd_dev);
+	kfree(nbd_locks);
 	return err;
 }
 
@@ -880,6 +879,7 @@  static void __exit nbd_cleanup(void)
 	}
 	unregister_blkdev(NBD_MAJOR, "nbd");
 	kfree(nbd_dev);
+	kfree(nbd_locks);
 	printk(KERN_INFO "nbd: unregistered device at major %d\n", NBD_MAJOR);
 }