From patchwork Mon Sep  9 15:13:17 2013
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Mike D. Day" <ncmike@ncultra.org>
X-Patchwork-Id: 273583
Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])
	(using TLSv1 with cipher AES256-SHA (256/256 bits))
	(Client did not present a certificate)
	by ozlabs.org (Postfix) with ESMTPS id D06722C0129
	for <incoming@patchwork.ozlabs.org>;
	Tue, 10 Sep 2013 01:14:00 +1000 (EST)
Received: from localhost ([::1]:52209 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71) (envelope-from
	<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)
	id 1VJ3A6-0001PC-2a
	for incoming@patchwork.ozlabs.org; Mon, 09 Sep 2013 11:13:58 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:59016)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <ncmike@ncultra.org>) id 1VJ39j-0001OD-Lr
	for qemu-devel@nongnu.org; Mon, 09 Sep 2013 11:13:40 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <ncmike@ncultra.org>) id 1VJ39c-0000xs-MM
	for qemu-devel@nongnu.org; Mon, 09 Sep 2013 11:13:35 -0400
Received: from mail-qe0-f41.google.com ([209.85.128.41]:64578)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <ncmike@ncultra.org>) id 1VJ39c-0000xm-GP
	for qemu-devel@nongnu.org; Mon, 09 Sep 2013 11:13:28 -0400
Received: by mail-qe0-f41.google.com with SMTP id ff1so3397652qeb.0
	for <qemu-devel@nongnu.org>; Mon, 09 Sep 2013 08:13:27 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20130820;
	h=x-gm-message-state:from:to:cc:subject:date:message-id;
	bh=/BCmCzK11az5msNvWN0ZlCfm0mIIL7KaHx7nkMDvUW0=;
	b=XWo6KVF83KV71o/rqJfbXXWcO+yKrAYXVDXQQc3IVXjObWqAli42cZ20MbKJTPdFMB
	7reMD3kweIbGtZBxy0ja+a/L1/ck9OEJ2sbdP9JAWLfGzGLoXBUiFVSjN8U3cut9p+Is
	hgotrU+heBW5F4YJPCqE8as+mYins16cBNQ7Qt0S/OBmADc/KmTgFMKqoadEsW7fj4Hq
	yUS5Yn9KkrunqnSAGYRSvOrXJphDO7oyp8Q40x09e/j0FSHz1gz61NLTqGAypp71gfHy
	CqafFxTQ5rFJ9ffZklPtSrApPgrDIFrQhHvD1ZbQ/IUEHWHFFl6AP+jkYk9T4k+gKSzy
	IXfQ==
X-Gm-Message-State: 
 ALoCoQn26+DY4hvGRl5TGPg6rLOt8ntm7aRUqFeWk6ThTQlzrjCPjtVB3Ssq3C21/xiA6AlvVlFc
X-Received: by 10.49.95.68 with SMTP id di4mr25094903qeb.50.1378739607531;
	Mon, 09 Sep 2013 08:13:27 -0700 (PDT)
Received: from ncultra.org (cpe-076-182-073-192.nc.res.rr.com.
	[76.182.73.192])
	by mx.google.com with ESMTPSA id y9sm2095712qaj.9.1969.12.31.16.00.00
	(version=TLSv1.2 cipher=RC4-SHA bits=128/128);
	Mon, 09 Sep 2013 08:13:26 -0700 (PDT)
From: Mike Day <ncmike@ncultra.org>
To: qemu-devel@nongnu.org
Date: Mon,  9 Sep 2013 11:13:17 -0400
Message-Id: <1378739597-24681-1-git-send-email-ncmike@ncultra.org>
X-Mailer: git-send-email 1.8.3.1
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
	[fuzzy]
X-Received-From: 209.85.128.41
Cc: Mike Day <ncmike@ncultra.org>, Paolo Bonzini <pbonzini@redhat.com>,
	Paul Mckenney <paulmck@linux.vnet.ibm.com>,
	Mathew Desnoyers <mathieu.desnoyers@efficios.com>,
	Anthony Liguori <anthony@codemonkey.ws>
Subject: [Qemu-devel] [PATCH] Convert ram_list to RCU DQ V4,2
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org
Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org

Changes from V4.1:

* Correct memory barriers for ram_list globals.

Changes from V4:

* rebased on https://github.com/bonzini/qemu/tree/rcu
  commit 965f3b2aac93bca6df50c86fb17a06b3c856fa30

Changes from V3:

* now passes virt-test -t qemu
* uses call_rcu instead of call_rcu1
* completely removed the ram_list mutex and its locking functions
* cleaned up some comments

Changes from V2:

* added rcu reclaim function to free ram blocks
* reduced the number of comments
* made rcu locking policy consistent for different caller scenarios
* rcu updates to ram_block now are protected only by the iothread mutex

Changes from V1:

* Omitted locks or rcu critical sections within Some functions that
  read or write the ram_list but are called in a protected context
  (the caller holds the iothread lock, the ram_list mutex, or an rcu
  critical section).

Allow "unlocked" reads of the ram_list by using an RCU-enabled
DQ. Most readers of the list no longer require holding the list mutex.

The ram_list now uses a QLIST instead of a QTAILQ. The difference is
minimal.

This patch has been built and make-checked for the x86_64, ppc64,
s390x, and arm targets. It has been virt-tested using KVM -t qemu and
passes 15/15 migration tests.

To apply this patch, you must base upon Paolo Bonzini's rcu tree and
also apply the RCU DQ patch (below).

https://github.com/bonzini/qemu/tree/rcu
http://article.gmane.org/gmane.comp.emulators.qemu/230159/

Signed-off-by: Mike Day <ncmike@ncultra.org>

Signed-off-by: Mike Day <ncmike@ncultra.org>
---
 arch_init.c            | 107 ++++++++++++++++++-------------
 exec.c                 | 167 +++++++++++++++++++++++++++----------------------
 include/exec/cpu-all.h |  13 ++--
 3 files changed, 161 insertions(+), 126 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 0471cd5..e24d29e 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -49,6 +49,7 @@
 #include "trace.h"
 #include "exec/cpu-all.h"
 #include "hw/acpi/acpi.h"
+#include "qemu/rcu_queue.h"
 
 #ifdef DEBUG_ARCH_INIT
 #define DPRINTF(fmt, ...) \
@@ -399,7 +400,8 @@ static void migration_bitmap_sync(void)
     trace_migration_bitmap_sync_start();
     address_space_sync_dirty_bitmap(&address_space_memory);
 
-    QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+    rcu_read_lock();
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
         for (addr = 0; addr < block->length; addr += TARGET_PAGE_SIZE) {
             if (memory_region_test_and_clear_dirty(block->mr,
                                                    addr, TARGET_PAGE_SIZE,
@@ -408,6 +410,8 @@ static void migration_bitmap_sync(void)
             }
         }
     }
+    rcu_read_unlock();
+
     trace_migration_bitmap_sync_end(migration_dirty_pages
                                     - num_dirty_pages_init);
     num_dirty_pages_period += migration_dirty_pages - num_dirty_pages_init;
@@ -447,6 +451,8 @@ static void migration_bitmap_sync(void)
  *
  * Returns:  The number of bytes written.
  *           0 means no dirty pages
+ *
+ * assumes that the caller has rcu-locked the ram_list
  */
 
 static int ram_save_block(QEMUFile *f, bool last_stage)
@@ -458,8 +464,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
     MemoryRegion *mr;
     ram_addr_t current_addr;
 
+
     if (!block)
-        block = QTAILQ_FIRST(&ram_list.blocks);
+        block = QLIST_FIRST_RCU(&ram_list.blocks);
 
     while (true) {
         mr = block->mr;
@@ -470,9 +477,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
         }
         if (offset >= block->length) {
             offset = 0;
-            block = QTAILQ_NEXT(block, next);
+            block = QLIST_NEXT_RCU(block, next);
             if (!block) {
-                block = QTAILQ_FIRST(&ram_list.blocks);
+                block = QLIST_FIRST_RCU(&ram_list.blocks);
                 complete_round = true;
                 ram_bulk_stage = false;
             }
@@ -527,9 +534,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
             }
         }
     }
+
     last_seen_block = block;
     last_offset = offset;
-
     return bytes_sent;
 }
 
@@ -566,10 +573,10 @@ uint64_t ram_bytes_total(void)
 {
     RAMBlock *block;
     uint64_t total = 0;
-
-    QTAILQ_FOREACH(block, &ram_list.blocks, next)
+    rcu_read_lock();
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next)
         total += block->length;
-
+    rcu_read_unlock();
     return total;
 }
 
@@ -601,12 +608,22 @@ static void reset_ram_globals(void)
     last_seen_block = NULL;
     last_sent_block = NULL;
     last_offset = 0;
-    last_version = ram_list.version;
     ram_bulk_stage = true;
+    smp_wmb();
+    last_version = ram_list.version;
 }
 
 #define MAX_WAIT 50 /* ms, half buffered_file limit */
 
+
+/* ram_save_* functions each  implement a long-running RCU critical
+ * section. When rcu-reclaims in the code start to become numerous
+ * it will be necessary to reduce the granularity of these critical
+ * sections.
+ *
+ * (ram_save_setup, ram_save_iterate, and ram_save_complete.)
+ */
+
 static int ram_save_setup(QEMUFile *f, void *opaque)
 {
     RAMBlock *block;
@@ -632,7 +649,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     }
 
     qemu_mutex_lock_iothread();
-    qemu_mutex_lock_ramlist();
+    rcu_read_lock();
     bytes_transferred = 0;
     reset_ram_globals();
 
@@ -642,13 +659,13 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 
     qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE);
 
-    QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
         qemu_put_byte(f, strlen(block->idstr));
         qemu_put_buffer(f, (uint8_t *)block->idstr, strlen(block->idstr));
         qemu_put_be64(f, block->length);
     }
 
-    qemu_mutex_unlock_ramlist();
+    rcu_read_unlock();
 
     ram_control_before_iterate(f, RAM_CONTROL_SETUP);
     ram_control_after_iterate(f, RAM_CONTROL_SETUP);
@@ -658,6 +675,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     return 0;
 }
 
+
 static int ram_save_iterate(QEMUFile *f, void *opaque)
 {
     int ret;
@@ -665,16 +683,14 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
     int64_t t0;
     int total_sent = 0;
 
-    qemu_mutex_lock_ramlist();
-
-    if (ram_list.version != last_version) {
+    if (atomic_rcu_read(&ram_list.version) != last_version) {
         reset_ram_globals();
     }
 
     ram_control_before_iterate(f, RAM_CONTROL_ROUND);
-
     t0 = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
     i = 0;
+    rcu_read_lock();
     while ((ret = qemu_file_rate_limit(f)) == 0) {
         int bytes_sent;
 
@@ -701,9 +717,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
         }
         i++;
     }
-
-    qemu_mutex_unlock_ramlist();
-
+    rcu_read_unlock();
     /*
      * Must occur before EOS (or any QEMUFile operation)
      * because of RDMA protocol.
@@ -718,19 +732,19 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
     qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
     total_sent += 8;
     bytes_transferred += total_sent;
-
     return total_sent;
 }
 
 static int ram_save_complete(QEMUFile *f, void *opaque)
 {
-    qemu_mutex_lock_ramlist();
+
     migration_bitmap_sync();
 
     ram_control_before_iterate(f, RAM_CONTROL_FINISH);
 
     /* try transferring iterative blocks of memory */
 
+    rcu_read_lock();
     /* flush all remaining blocks regardless of rate limiting */
     while (true) {
         int bytes_sent;
@@ -742,11 +756,10 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
         }
         bytes_transferred += bytes_sent;
     }
-
+    rcu_read_unlock();
     ram_control_after_iterate(f, RAM_CONTROL_FINISH);
     migration_end();
 
-    qemu_mutex_unlock_ramlist();
     qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
 
     return 0;
@@ -808,6 +821,9 @@ static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host)
     return rc;
 }
 
+/* Must be called from within a rcu critical section.
+ * Returns a pointer from within the RCU-protected ram_list.
+ */
 static inline void *host_from_stream_offset(QEMUFile *f,
                                             ram_addr_t offset,
                                             int flags)
@@ -829,9 +845,10 @@ static inline void *host_from_stream_offset(QEMUFile *f,
     qemu_get_buffer(f, (uint8_t *)id, len);
     id[len] = 0;
 
-    QTAILQ_FOREACH(block, &ram_list.blocks, next) {
-        if (!strncmp(id, block->idstr, sizeof(id)))
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+        if (!strncmp(id, block->idstr, sizeof(id))) {
             return memory_region_get_ram_ptr(block->mr) + offset;
+        }
     }
 
     fprintf(stderr, "Can't find block %s!\n", id);
@@ -868,7 +885,12 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
     if (version_id < 4 || version_id > 4) {
         return -EINVAL;
     }
-
+    /* this implements a long-running RCU critical section.
+     * When rcu reclaims in the code start to become numerous
+     * it will be necessary to reduce the granularity of this critical
+     * section.
+     */
+    rcu_read_lock();
     do {
         addr = qemu_get_be64(f);
 
@@ -890,21 +912,19 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
                     qemu_get_buffer(f, (uint8_t *)id, len);
                     id[len] = 0;
                     length = qemu_get_be64(f);
-
-                    QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+                    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
                         if (!strncmp(id, block->idstr, sizeof(id))) {
                             if (block->length != length) {
                                 fprintf(stderr,
                                         "Length mismatch: %s: " RAM_ADDR_FMT
                                         " in != " RAM_ADDR_FMT "\n", id, length,
                                         block->length);
-                                ret =  -EINVAL;
+                                ret = -EINVAL;
                                 goto done;
                             }
                             break;
                         }
                     }
-
                     if (!block) {
                         fprintf(stderr, "Unknown ramblock \"%s\", cannot "
                                 "accept migration\n", id);
@@ -917,30 +937,30 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
             }
         }
 
+        /* Call host_from_stream_offset while holding an rcu read lock.
+         * It returns a pointer from within the rcu-protected ram_list.
+         */
         if (flags & RAM_SAVE_FLAG_COMPRESS) {
-            void *host;
             uint8_t ch;
-
-            host = host_from_stream_offset(f, addr, flags);
+            void *host = host_from_stream_offset(f, addr, flags);
             if (!host) {
-                return -EINVAL;
+                ret = -EINVAL;
+                goto done;
             }
-
             ch = qemu_get_byte(f);
             ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
         } else if (flags & RAM_SAVE_FLAG_PAGE) {
-            void *host;
-
-            host = host_from_stream_offset(f, addr, flags);
+            void *host = host_from_stream_offset(f, addr, flags);
             if (!host) {
-                return -EINVAL;
+                ret = -EINVAL;
+                goto done;
             }
-
             qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
         } else if (flags & RAM_SAVE_FLAG_XBZRLE) {
             void *host = host_from_stream_offset(f, addr, flags);
             if (!host) {
-                return -EINVAL;
+                ret = -EINVAL;
+                goto done;
             }
 
             if (load_xbzrle(f, addr, host) < 0) {
@@ -958,6 +978,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
     } while (!(flags & RAM_SAVE_FLAG_EOS));
 
 done:
+    rcu_read_unlock();
     DPRINTF("Completed load of VM with exit code %d seq iteration "
             "%" PRIu64 "\n", ret, seq_iter);
     return ret;
@@ -1126,8 +1147,8 @@ void do_acpitable_option(const QemuOpts *opts)
 
     acpi_table_add(opts, &err);
     if (err) {
-        error_report("Wrong acpi table provided: %s",
-                     error_get_pretty(err));
+        fprintf(stderr, "Wrong acpi table provided: %s\n",
+                error_get_pretty(err));
         error_free(err);
         exit(1);
     }
diff --git a/exec.c b/exec.c
index c0780ca..fbf3b2b 100644
--- a/exec.c
+++ b/exec.c
@@ -46,7 +46,7 @@
 #endif
 #include "exec/cpu-all.h"
 #include "qemu/tls.h"
-
+#include "qemu/rcu_queue.h"
 #include "exec/cputlb.h"
 #include "translate-all.h"
 
@@ -57,7 +57,10 @@
 #if !defined(CONFIG_USER_ONLY)
 static int in_migration;
 
-RAMList ram_list = { .blocks = QTAILQ_HEAD_INITIALIZER(ram_list.blocks) };
+/* ram_list is read under rcu_read_lock()/rcu_read_unlock().  Writes
+ * are protected by a lock, currently the iothread lock.
+ */
+RAMList ram_list = { .blocks = QLIST_HEAD_INITIALIZER(ram_list.blocks) };
 
 static MemoryRegion *system_memory;
 static MemoryRegion *system_io;
@@ -329,7 +332,6 @@ void cpu_exec_init_all(void)
 {
     tls_alloc_current_cpu_var();
 #if !defined(CONFIG_USER_ONLY)
-    qemu_mutex_init(&ram_list.mutex);
     memory_map_init();
     io_mem_init();
 #endif
@@ -885,7 +887,7 @@ static void mem_add(MemoryListener *listener, MemoryRegionSection *section)
         now = remain;
         if (int128_lt(remain.size, page_size)) {
             register_subpage(d, &now);
-        } else if (remain.offset_within_address_space & ~TARGET_PAGE_MASK) {
+        } else if (remain.offset_within_region & ~TARGET_PAGE_MASK) {
             now.size = page_size;
             register_subpage(d, &now);
         } else {
@@ -901,16 +903,6 @@ void qemu_flush_coalesced_mmio_buffer(void)
         kvm_flush_coalesced_mmio_buffer();
 }
 
-void qemu_mutex_lock_ramlist(void)
-{
-    qemu_mutex_lock(&ram_list.mutex);
-}
-
-void qemu_mutex_unlock_ramlist(void)
-{
-    qemu_mutex_unlock(&ram_list.mutex);
-}
-
 #if defined(__linux__) && !defined(TARGET_S390X)
 
 #include <sys/vfs.h>
@@ -1021,17 +1013,24 @@ static ram_addr_t find_ram_offset(ram_addr_t size)
     RAMBlock *block, *next_block;
     ram_addr_t offset = RAM_ADDR_MAX, mingap = RAM_ADDR_MAX;
 
+    /* ram_list must be protected by a mutex (for writes), or
+     * an rcu critical section (for reads). Currently this code
+     * is called with the iothread lock held. If that changes,
+     * make sure to protect ram_list with an rcu critical section.
+    */
+
     assert(size != 0); /* it would hand out same offset multiple times */
 
-    if (QTAILQ_EMPTY(&ram_list.blocks))
+    if (QLIST_EMPTY_RCU(&ram_list.blocks)) {
         return 0;
+    }
 
-    QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
         ram_addr_t end, next = RAM_ADDR_MAX;
 
         end = block->offset + block->length;
 
-        QTAILQ_FOREACH(next_block, &ram_list.blocks, next) {
+        QLIST_FOREACH_RCU(next_block, &ram_list.blocks, next) {
             if (next_block->offset >= end) {
                 next = MIN(next, next_block->offset);
             }
@@ -1056,9 +1055,11 @@ ram_addr_t last_ram_offset(void)
     RAMBlock *block;
     ram_addr_t last = 0;
 
-    QTAILQ_FOREACH(block, &ram_list.blocks, next)
+    rcu_read_lock();
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
         last = MAX(last, block->offset + block->length);
-
+    }
+    rcu_read_unlock();
     return last;
 }
 
@@ -1083,7 +1084,12 @@ void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev)
     RAMBlock *new_block, *block;
 
     new_block = NULL;
-    QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+
+    /* Assumes that the iothread lock is taken ... if that changes,
+     * add an rcu_read_lock()/unlock pair when traversing the
+     * ram list
+     */
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
         if (block->offset == addr) {
             new_block = block;
             break;
@@ -1102,15 +1108,13 @@ void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev)
     pstrcat(new_block->idstr, sizeof(new_block->idstr), name);
 
     /* This assumes the iothread lock is taken here too.  */
-    qemu_mutex_lock_ramlist();
-    QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
         if (block != new_block && !strcmp(block->idstr, new_block->idstr)) {
             fprintf(stderr, "RAMBlock \"%s\" already registered, abort!\n",
                     new_block->idstr);
             abort();
         }
     }
-    qemu_mutex_unlock_ramlist();
 }
 
 static int memory_try_enable_merging(void *addr, size_t len)
@@ -1123,16 +1127,16 @@ static int memory_try_enable_merging(void *addr, size_t len)
     return qemu_madvise(addr, len, QEMU_MADV_MERGEABLE);
 }
 
+/* Called with the iothread lock being held */
+
 ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
                                    MemoryRegion *mr)
 {
-    RAMBlock *block, *new_block;
+    RAMBlock *block, *new_block, *last_block = 0;
 
     size = TARGET_PAGE_ALIGN(size);
     new_block = g_malloc0(sizeof(*new_block));
 
-    /* This assumes the iothread lock is taken here too.  */
-    qemu_mutex_lock_ramlist();
     new_block->mr = mr;
     new_block->offset = find_ram_offset(size);
     if (host) {
@@ -1165,21 +1169,24 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
     new_block->length = size;
 
     /* Keep the list sorted from biggest to smallest block.  */
-    QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+        last_block = block;
         if (block->length < new_block->length) {
             break;
         }
     }
     if (block) {
-        QTAILQ_INSERT_BEFORE(block, new_block, next);
+        QLIST_INSERT_BEFORE_RCU(block, new_block, next);
     } else {
-        QTAILQ_INSERT_TAIL(&ram_list.blocks, new_block, next);
+        if (last_block) {
+            QLIST_INSERT_AFTER_RCU(last_block, new_block, next);
+        } else { /* list is empty */
+            QLIST_INSERT_HEAD_RCU(&ram_list.blocks, new_block, next);
+        }
     }
     ram_list.mru_block = NULL;
-
+    smp_wmb();
     ram_list.version++;
-    qemu_mutex_unlock_ramlist();
-
     ram_list.phys_dirty = g_realloc(ram_list.phys_dirty,
                                        last_ram_offset() >> TARGET_PAGE_BITS);
     memset(ram_list.phys_dirty + (new_block->offset >> TARGET_PAGE_BITS),
@@ -1204,30 +1211,29 @@ void qemu_ram_free_from_ptr(ram_addr_t addr)
 {
     RAMBlock *block;
 
-    /* This assumes the iothread lock is taken here too.  */
-    qemu_mutex_lock_ramlist();
-    QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+    /* This assumes the iothread lock is taken here too. */
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
         if (addr == block->offset) {
-            QTAILQ_REMOVE(&ram_list.blocks, block, next);
+            QLIST_REMOVE_RCU(block, next);
             ram_list.mru_block = NULL;
+            smp_wmb();
             ram_list.version++;
-            g_free(block);
+            call_rcu(block, (void (*)(struct RAMBlock *))g_free, rcu);
             break;
         }
     }
-    qemu_mutex_unlock_ramlist();
 }
 
+/* called with the iothread lock held */
 void qemu_ram_free(ram_addr_t addr)
 {
     RAMBlock *block;
 
-    /* This assumes the iothread lock is taken here too.  */
-    qemu_mutex_lock_ramlist();
-    QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
         if (addr == block->offset) {
-            QTAILQ_REMOVE(&ram_list.blocks, block, next);
+            QLIST_REMOVE_RCU(block, next);
             ram_list.mru_block = NULL;
+            smp_wmb();
             ram_list.version++;
             if (block->flags & RAM_PREALLOC_MASK) {
                 ;
@@ -1249,12 +1255,10 @@ void qemu_ram_free(ram_addr_t addr)
                     qemu_anon_ram_free(block->host, block->length);
                 }
             }
-            g_free(block);
+            call_rcu(block, (void (*)(struct RAMBlock *))g_free, rcu);
             break;
         }
     }
-    qemu_mutex_unlock_ramlist();
-
 }
 
 #ifndef _WIN32
@@ -1265,7 +1269,7 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
     int flags;
     void *area, *vaddr;
 
-    QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
         offset = addr - block->offset;
         if (offset < block->length) {
             vaddr = block->host + offset;
@@ -1313,7 +1317,6 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
                 memory_try_enable_merging(vaddr, length);
                 qemu_ram_setup_dump(vaddr, length);
             }
-            return;
         }
     }
 }
@@ -1323,23 +1326,21 @@ static RAMBlock *qemu_get_ram_block(ram_addr_t addr)
 {
     RAMBlock *block;
 
-    /* The list is protected by the iothread lock here.  */
-    block = ram_list.mru_block;
+    /* This assumes the iothread lock is taken here too. */
+
+    block = atomic_rcu_read(&ram_list.mru_block);
     if (block && addr - block->offset < block->length) {
-        goto found;
+        return block;
     }
-    QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
         if (addr - block->offset < block->length) {
-            goto found;
+            atomic_rcu_set(&ram_list.mru_block, block);
+            return block;
         }
     }
 
     fprintf(stderr, "Bad ram offset %" PRIx64 "\n", (uint64_t)addr);
     abort();
-
-found:
-    ram_list.mru_block = block;
-    return block;
 }
 
 /* Return a host pointer to ram allocated with qemu_ram_alloc.
@@ -1378,8 +1379,8 @@ static void *qemu_safe_ram_ptr(ram_addr_t addr)
 {
     RAMBlock *block;
 
-    /* The list is protected by the iothread lock here.  */
-    QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+    /* This assumes the iothread lock is taken here too. */
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
         if (addr - block->offset < block->length) {
             if (xen_enabled()) {
                 /* We need to check if the requested address is in the RAM
@@ -1399,14 +1400,19 @@ static void *qemu_safe_ram_ptr(ram_addr_t addr)
 
     fprintf(stderr, "Bad ram offset %" PRIx64 "\n", (uint64_t)addr);
     abort();
-
     return NULL;
 }
 
 /* Return a host pointer to guest's ram. Similar to qemu_get_ram_ptr
- * but takes a size argument */
+ * but takes a size argument
+ *
+ * The returned pointer is not protected by RCU so the caller
+ * must have other means of protecting the pointer, such as a
+ * reference.
+ * */
 static void *qemu_ram_ptr_length(ram_addr_t addr, hwaddr *size)
 {
+    void *ptr = NULL;
     if (*size == 0) {
         return NULL;
     }
@@ -1414,12 +1420,14 @@ static void *qemu_ram_ptr_length(ram_addr_t addr, hwaddr *size)
         return xen_map_cache(addr, *size, 1);
     } else {
         RAMBlock *block;
-
-        QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+        rcu_read_lock();
+        QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
             if (addr - block->offset < block->length) {
                 if (addr - block->offset + *size > block->length)
                     *size = block->length - addr + block->offset;
-                return block->host + (addr - block->offset);
+                ptr = block->host + (addr - block->offset);
+                rcu_read_unlock();
+                return ptr;
             }
         }
 
@@ -1429,37 +1437,43 @@ static void *qemu_ram_ptr_length(ram_addr_t addr, hwaddr *size)
 }
 
 /* Some of the softmmu routines need to translate from a host pointer
-   (typically a TLB entry) back to a ram offset.  */
+ * (typically a TLB entry) back to a ram offset.
+ *
+ * Note that the returned MemoryRegion is not RCU-protected.
+ */
 MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
 {
     RAMBlock *block;
     uint8_t *host = ptr;
+    MemoryRegion *mr = NULL;
 
     if (xen_enabled()) {
         *ram_addr = xen_ram_addr_from_mapcache(ptr);
         return qemu_get_ram_block(*ram_addr)->mr;
     }
-
-    block = ram_list.mru_block;
+    rcu_read_lock();
+    block = atomic_rcu_read(&ram_list.mru_block);
     if (block && block->host && host - block->host < block->length) {
-        goto found;
+        *ram_addr = block->offset + (host - block->host);
+        mr = block->mr;
+        goto unlock_out;
     }
 
-    QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
         /* This case append when the block is not mapped. */
         if (block->host == NULL) {
             continue;
         }
         if (host - block->host < block->length) {
-            goto found;
+            *ram_addr = block->offset + (host - block->host);
+            mr = block->mr;
+            goto unlock_out;
         }
     }
 
-    return NULL;
-
-found:
-    *ram_addr = block->offset + (host - block->host);
-    return block->mr;
+unlock_out:
+    rcu_read_unlock();
+    return mr;
 }
 
 static void notdirty_mem_write(void *opaque, hwaddr ram_addr,
@@ -2709,9 +2723,10 @@ bool cpu_physical_memory_is_io(hwaddr phys_addr)
 void qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque)
 {
     RAMBlock *block;
-
-    QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+    rcu_read_lock();
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
         func(block->host, block->offset, block->length, opaque);
     }
+    rcu_read_unlock();
 }
 #endif
diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index e088089..e68b86b 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -24,6 +24,7 @@
 #include "qemu/thread.h"
 #include "qom/cpu.h"
 #include "qemu/tls.h"
+#include "qemu/rcu.h"
 
 /* some important defines:
  *
@@ -448,28 +449,26 @@ extern ram_addr_t ram_size;
 #define RAM_PREALLOC_MASK   (1 << 0)
 
 typedef struct RAMBlock {
+    struct rcu_head rcu;
     struct MemoryRegion *mr;
     uint8_t *host;
     ram_addr_t offset;
     ram_addr_t length;
     uint32_t flags;
     char idstr[256];
-    /* Reads can take either the iothread or the ramlist lock.
-     * Writes must take both locks.
-     */
-    QTAILQ_ENTRY(RAMBlock) next;
+    /* Writes must hold the iothread lock */
+    QLIST_ENTRY(RAMBlock) next;
 #if defined(__linux__) && !defined(TARGET_S390X)
     int fd;
 #endif
 } RAMBlock;
 
 typedef struct RAMList {
-    QemuMutex mutex;
     /* Protected by the iothread lock.  */
     uint8_t *phys_dirty;
     RAMBlock *mru_block;
-    /* Protected by the ramlist lock.  */
-    QTAILQ_HEAD(, RAMBlock) blocks;
+    /* RCU-enabled, writes protected by the iothread lock. */
+    QLIST_HEAD(, RAMBlock) blocks;
     uint32_t version;
 } RAMList;
 extern RAMList ram_list;