From patchwork Fri Sep 23 07:31:49 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 673964 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3sgRPr6rkCz9t0p for ; Fri, 23 Sep 2016 18:30:24 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b=AyN2Za2R; dkim-atps=neutral Received: from localhost ([::1]:42111 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bnLsE-0003pj-Ed for incoming@patchwork.ozlabs.org; Fri, 23 Sep 2016 04:30:22 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36267) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bnKz1-0005sA-Eo for qemu-devel@nongnu.org; Fri, 23 Sep 2016 03:33:22 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bnKyz-0005rt-6G for qemu-devel@nongnu.org; Fri, 23 Sep 2016 03:33:18 -0400 Received: from mail-wm0-f65.google.com ([74.125.82.65]:34333) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bnKyy-0005rk-TX for qemu-devel@nongnu.org; Fri, 23 Sep 2016 03:33:17 -0400 Received: by mail-wm0-f65.google.com with SMTP id l132so1282541wmf.1 for ; Fri, 23 Sep 2016 00:33:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=cnwKZs0spha+O8mGG4t6PZsib1k9r1HdOLSUTO4XVVA=; b=AyN2Za2ROty1LVzlNma+o7XLyk5KAiITUtqxx0t+mxBIDkZtrQgUQaZaDPvRWrN4u/ EO2SBxYfkPh5EZugF1bZMWKmfFQ0/IXoU3ReksbGH6SiWCp25g2Qip/NENrcBDU51uTk /l6Rt6U0lB5aQRGOHkBHmah+XKrsjXOSwBezh7C1oDEXBDS+YY42Xx5SOIACFjIHDHSm e5ZOiPOJSuF50OIjGqfV5YeCDGPCRs+2XyhcpJgQ+r9f54lbc8iRqUr650cNHDC0N/XI uGCgpxrIX3KM3RNxW0MF4GzU/pEtHX6fu9O5AtRzUSlr+esDjvCCTsp2L4tJYa/GGwfx DvTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=cnwKZs0spha+O8mGG4t6PZsib1k9r1HdOLSUTO4XVVA=; b=MsMJS4FolhpTNeIXChilrriaHdgUPAtopLlm9WDHM9ErATAw2k9yF34BYffjdSbKwP lDKnkJwWPOOxkQYe/w9PODTE/qO/5Iny8DRSOQekVouxnsFaKpwZTDDEtzP8aEIIFgP0 CWbe6fQyqknas+jZSQ8OghJc44Q8ilov5VeQ30GJyMnw3APlZCIevKjXtDuiH+Dro0DM B9rKz1J/uC4tIgJ0gZleanEC9Iu9yi4ew6bzuJYMeGy5b9xY4iAkR5dPMr9Un6GYaIRo UTT6hZtq0dIn4damk2lygtYeLMPtsMZ5jKxViQTDlLiWZpRQkQvKxS3DLSNwqmCLQ3nM FQjw== X-Gm-Message-State: AE9vXwNWtK14n26lWU1i2oIOqLTOW4Q08zq2kj1W5ekX+qZTiE7f4rT0ikzv0k7zR9hQdQ== X-Received: by 10.194.125.44 with SMTP id mn12mr5419526wjb.147.1474615936278; Fri, 23 Sep 2016 00:32:16 -0700 (PDT) Received: from donizetti.lan (94-39-176-182.adsl-ull.clienti.tiscali.it. [94.39.176.182]) by smtp.gmail.com with ESMTPSA id q10sm1691267wme.6.2016.09.23.00.32.14 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 23 Sep 2016 00:32:15 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Date: Fri, 23 Sep 2016 09:31:49 +0200 Message-Id: <1474615909-17069-17-git-send-email-pbonzini@redhat.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1474615909-17069-1-git-send-email-pbonzini@redhat.com> References: <1474615909-17069-1-git-send-email-pbonzini@redhat.com> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 74.125.82.65 Subject: [Qemu-devel] [PATCH 16/16] cpus-common: lock-free fast path for cpu_exec_start/end X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: serge.fdrv@gmail.com, cota@braap.org, alex.bennee@linaro.org, sergey.fedorov@linaro.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Set cpu->running without taking the cpu_list lock, only requiring it if there is a concurrent exclusive section. This requires adding a new field to CPUState, which records whether a running CPU is being counted in pending_cpus. When an exclusive section is started concurrently with cpu_exec_start, cpu_exec_start can use the new field to determine if it has to wait for the end of the exclusive section. Likewise, cpu_exec_end can use it to see if start_exclusive is waiting for that CPU. This a separate patch for easier bisection of issues. Signed-off-by: Paolo Bonzini --- cpus-common.c | 95 ++++++++++++++++++++++++++++++++++++++-------- docs/tcg-exclusive.promela | 53 ++++++++++++++++++++++++-- include/qom/cpu.h | 5 ++- 3 files changed, 133 insertions(+), 20 deletions(-) diff --git a/cpus-common.c b/cpus-common.c index 38b1d55..618eab8 100644 --- a/cpus-common.c +++ b/cpus-common.c @@ -28,6 +28,9 @@ static QemuCond exclusive_cond; static QemuCond exclusive_resume; static QemuCond qemu_work_cond; +/* >= 1 if a thread is inside start_exclusive/end_exclusive. Written + * under qemu_cpu_list_lock, read with atomic operations. + */ static int pending_cpus; void qemu_init_cpu_list(void) @@ -177,18 +180,26 @@ static inline void exclusive_idle(void) void start_exclusive(void) { CPUState *other_cpu; + int running_cpus; qemu_mutex_lock(&qemu_cpu_list_lock); exclusive_idle(); /* Make all other cpus stop executing. */ - pending_cpus = 1; + atomic_set(&pending_cpus, 1); + + /* Write pending_cpus before reading other_cpu->running. */ + smp_mb(); + running_cpus = 0; CPU_FOREACH(other_cpu) { - if (other_cpu->running) { - pending_cpus++; + if (atomic_read(&other_cpu->running)) { + other_cpu->has_waiter = true; + running_cpus++; qemu_cpu_kick(other_cpu); } } + + atomic_set(&pending_cpus, running_cpus + 1); while (pending_cpus > 1) { qemu_cond_wait(&exclusive_cond, &qemu_cpu_list_lock); } @@ -203,7 +214,7 @@ void start_exclusive(void) void end_exclusive(void) { qemu_mutex_lock(&qemu_cpu_list_lock); - pending_cpus = 0; + atomic_set(&pending_cpus, 0); qemu_cond_broadcast(&exclusive_resume); qemu_mutex_unlock(&qemu_cpu_list_lock); } @@ -211,24 +222,78 @@ void end_exclusive(void) /* Wait for exclusive ops to finish, and begin cpu execution. */ void cpu_exec_start(CPUState *cpu) { - qemu_mutex_lock(&qemu_cpu_list_lock); - exclusive_idle(); - cpu->running = true; - qemu_mutex_unlock(&qemu_cpu_list_lock); + atomic_set(&cpu->running, true); + + /* Write cpu->running before reading pending_cpus. */ + smp_mb(); + + /* 1. start_exclusive saw cpu->running == true and pending_cpus >= 1. + * After taking the lock we'll see cpu->has_waiter == true and run---not + * for long because start_exclusive kicked us. cpu_exec_end will + * decrement pending_cpus and signal the waiter. + * + * 2. start_exclusive saw cpu->running == false but pending_cpus >= 1. + * This includes the case when an exclusive item is running now. + * Then we'll see cpu->has_waiter == false and wait for the item to + * complete. + * + * 3. pending_cpus == 0. Then start_exclusive is definitely going to + * see cpu->running == true, and it will kick the CPU. + */ + if (unlikely(atomic_read(&pending_cpus))) { + qemu_mutex_lock(&qemu_cpu_list_lock); + if (!cpu->has_waiter) { + /* Not counted in pending_cpus, let the exclusive item + * run. Since we have the lock, just set cpu->running to true + * while holding it; no need to check pending_cpus again. + */ + cpu->running = false; + exclusive_idle(); + /* Now pending_cpus is zero. */ + cpu->running = true; + } else { + /* Counted in pending_cpus, go ahead and release the + * waiter at cpu_exec_end. + */ + } + qemu_mutex_unlock(&qemu_cpu_list_lock); + } } /* Mark cpu as not executing, and release pending exclusive ops. */ void cpu_exec_end(CPUState *cpu) { - qemu_mutex_lock(&qemu_cpu_list_lock); - cpu->running = false; - if (pending_cpus > 1) { - pending_cpus--; - if (pending_cpus == 1) { - qemu_cond_signal(&exclusive_cond); + atomic_set(&cpu->running, false); + + /* Write cpu->running before reading pending_cpus. */ + smp_mb(); + + /* 1. start_exclusive saw cpu->running == true. Then it will increment + * pending_cpus and wait for exclusive_cond. After taking the lock + * we'll see cpu->has_waiter == true. + * + * 2. start_exclusive saw cpu->running == false but here pending_cpus >= 1. + * This includes the case when an exclusive item started after setting + * cpu->running to false and before we read pending_cpus. Then we'll see + * cpu->has_waiter == false and not touch pending_cpus. The next call to + * cpu_exec_start will run exclusive_idle if still necessary, thus waiting + * for the item to complete. + * + * 3. pending_cpus == 0. Then start_exclusive is definitely going to + * see cpu->running == false, and it can ignore this CPU until the + * next cpu_exec_start. + */ + if (unlikely(atomic_read(&pending_cpus))) { + qemu_mutex_lock(&qemu_cpu_list_lock); + if (cpu->has_waiter) { + cpu->has_waiter = false; + atomic_set(&pending_cpus, pending_cpus - 1); + if (pending_cpus == 1) { + qemu_cond_signal(&exclusive_cond); + } } + qemu_mutex_unlock(&qemu_cpu_list_lock); } - qemu_mutex_unlock(&qemu_cpu_list_lock); } void async_safe_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data) diff --git a/docs/tcg-exclusive.promela b/docs/tcg-exclusive.promela index a8896e5..8361cc2 100644 --- a/docs/tcg-exclusive.promela +++ b/docs/tcg-exclusive.promela @@ -12,7 +12,8 @@ * spin -a docs/event.promela * ./a.out -a * - * Tunable processor macros: N_CPUS, N_EXCLUSIVE, N_CYCLES, TEST_EXPENSIVE. + * Tunable processor macros: N_CPUS, N_EXCLUSIVE, N_CYCLES, USE_MUTEX, + * TEST_EXPENSIVE. */ // Define the missing parameters for the model @@ -21,8 +22,10 @@ #warning defaulting to 2 CPU processes #endif -// the expensive test is not so expensive for <= 3 CPUs -#if N_CPUS <= 3 +// the expensive test is not so expensive for <= 2 CPUs +// If the mutex is used, it's also cheap (300 MB / 4 seconds) for 3 CPUs +// For 3 CPUs and the lock-free option it needs 1.5 GB of RAM +#if N_CPUS <= 2 || (N_CPUS <= 3 && defined USE_MUTEX) #define TEST_EXPENSIVE #endif @@ -106,6 +109,8 @@ byte has_waiter[N_CPUS]; COND_BROADCAST(exclusive_resume); \ MUTEX_UNLOCK(mutex); +#ifdef USE_MUTEX +// Simple version using mutexes #define cpu_exec_start(id) \ MUTEX_LOCK(mutex); \ exclusive_idle(); \ @@ -126,6 +131,48 @@ byte has_waiter[N_CPUS]; :: else -> skip; \ fi; \ MUTEX_UNLOCK(mutex); +#else +// Wait-free fast path, only needs mutex when concurrent with +// an exclusive section +#define cpu_exec_start(id) \ + running[id] = 1; \ + if \ + :: pending_cpus -> { \ + MUTEX_LOCK(mutex); \ + if \ + :: !has_waiter[id] -> { \ + running[id] = 0; \ + exclusive_idle(); \ + running[id] = 1; \ + } \ + :: else -> skip; \ + fi; \ + MUTEX_UNLOCK(mutex); \ + } \ + :: else -> skip; \ + fi; + +#define cpu_exec_end(id) \ + running[id] = 0; \ + if \ + :: pending_cpus -> { \ + MUTEX_LOCK(mutex); \ + if \ + :: has_waiter[id] -> { \ + has_waiter[id] = 0; \ + pending_cpus--; \ + if \ + :: pending_cpus == 1 -> COND_BROADCAST(exclusive_cond); \ + :: else -> skip; \ + fi; \ + } \ + :: else -> skip; \ + fi; \ + MUTEX_UNLOCK(mutex); \ + } \ + :: else -> skip; \ + fi +#endif // Promela processes diff --git a/include/qom/cpu.h b/include/qom/cpu.h index 5dfe74a..22b54d6 100644 --- a/include/qom/cpu.h +++ b/include/qom/cpu.h @@ -242,7 +242,8 @@ struct qemu_work_item; * @nr_threads: Number of threads within this CPU. * @numa_node: NUMA node this CPU is belonging to. * @host_tid: Host thread ID. - * @running: #true if CPU is currently running; + * @running: #true if CPU is currently running (lockless). + * @has_waiter: #true if a CPU is currently waiting for the cpu_exec_end; * valid under cpu_list_lock. * @created: Indicates whether the CPU thread has been successfully created. * @interrupt_request: Indicates a pending interrupt request. @@ -296,7 +297,7 @@ struct CPUState { #endif int thread_id; uint32_t host_tid; - bool running; + bool running, has_waiter; struct QemuCond *halt_cond; bool thread_kicked; bool created;