From patchwork Thu Nov 7 16:12:04 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Henriques X-Patchwork-Id: 289869 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) by ozlabs.org (Postfix) with ESMTP id F2B132C014B for ; Sat, 9 Nov 2013 02:53:04 +1100 (EST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.76) (envelope-from ) id 1VeoMh-0006qE-1G; Fri, 08 Nov 2013 15:52:55 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtp (Exim 4.76) (envelope-from ) id 1VeSBn-0006TC-Qs for kernel-team@lists.ubuntu.com; Thu, 07 Nov 2013 16:12:11 +0000 Received: from bl6-53-174.dsl.telepac.pt ([82.155.53.174] helo=localhost) by youngberry.canonical.com with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1VeSBi-0004nx-OT; Thu, 07 Nov 2013 16:12:07 +0000 From: Luis Henriques To: Peter Zijlstra Subject: [3.5.y.z extended stable] Patch "perf: Fix perf ring buffer memory ordering" has been added to staging queue Date: Thu, 7 Nov 2013 16:12:04 +0000 Message-Id: <1383840724-7764-1-git-send-email-luis.henriques@canonical.com> X-Mailer: git-send-email 1.8.3.2 X-Extended-Stable: 3.5 X-Mailman-Approved-At: Fri, 08 Nov 2013 15:52:53 +0000 Cc: Michael Neuling , Mathieu Desnoyers , Frederic Weisbecker , michael@ellerman.id.au, kernel-team@lists.ubuntu.com, anton@samba.org, benh@kernel.crashing.org, Victor Kaplansky , Paul McKenney , Ingo Molnar X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.14 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: kernel-team-bounces@lists.ubuntu.com This is a note to let you know that I have just added a patch titled perf: Fix perf ring buffer memory ordering to the linux-3.5.y-queue branch of the 3.5.y.z extended stable tree which can be found at: http://kernel.ubuntu.com/git?p=ubuntu/linux.git;a=shortlog;h=refs/heads/linux-3.5.y-queue If you, or anyone else, feels it should not be added to this tree, please reply to this email. For more information about the 3.5.y.z tree, see https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable Thanks. -Luis ------ From 35fa0b9669c5300ae42dfe5455fb8b1e3e7349c8 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Mon, 28 Oct 2013 13:55:29 +0100 Subject: perf: Fix perf ring buffer memory ordering commit bf378d341e4873ed928dc3c636252e6895a21f50 upstream. The PPC64 people noticed a missing memory barrier and crufty old comments in the perf ring buffer code. So update all the comments and add the missing barrier. When the architecture implements local_t using atomic_long_t there will be double barriers issued; but short of introducing more conditional barrier primitives this is the best we can do. Reported-by: Victor Kaplansky Tested-by: Victor Kaplansky Signed-off-by: Peter Zijlstra Cc: Mathieu Desnoyers Cc: michael@ellerman.id.au Cc: Paul McKenney Cc: Michael Neuling Cc: Frederic Weisbecker Cc: anton@samba.org Cc: benh@kernel.crashing.org Link: http://lkml.kernel.org/r/20131025173749.GG19466@laptop.lan Signed-off-by: Ingo Molnar [ luis: backported to 3.5: - file rename: include/uapi/linux/perf_event.h -> include/linux/perf_event.h ] Signed-off-by: Luis Henriques --- include/linux/perf_event.h | 12 +++++++----- kernel/events/ring_buffer.c | 31 +++++++++++++++++++++++++++---- 2 files changed, 34 insertions(+), 9 deletions(-) -- 1.8.3.2 diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 3faf0d4..7e72637 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -393,13 +393,15 @@ struct perf_event_mmap_page { /* * Control data for the mmap() data buffer. * - * User-space reading the @data_head value should issue an rmb(), on - * SMP capable platforms, after reading this value -- see - * perf_event_wakeup(). + * User-space reading the @data_head value should issue an smp_rmb(), + * after reading this value. * * When the mapping is PROT_WRITE the @data_tail value should be - * written by userspace to reflect the last read data. In this case - * the kernel will not over-write unread data. + * written by userspace to reflect the last read data, after issueing + * an smp_mb() to separate the data read from the ->data_tail store. + * In this case the kernel will not over-write unread data. + * + * See perf_output_put_handle() for the data ordering. */ __u64 data_head; /* head in the data section */ __u64 data_tail; /* user-space written tail */ diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c index 6ddaba4..4636ecc 100644 --- a/kernel/events/ring_buffer.c +++ b/kernel/events/ring_buffer.c @@ -75,10 +75,31 @@ again: goto out; /* - * Publish the known good head. Rely on the full barrier implied - * by atomic_dec_and_test() order the rb->head read and this - * write. + * Since the mmap() consumer (userspace) can run on a different CPU: + * + * kernel user + * + * READ ->data_tail READ ->data_head + * smp_mb() (A) smp_rmb() (C) + * WRITE $data READ $data + * smp_wmb() (B) smp_mb() (D) + * STORE ->data_head WRITE ->data_tail + * + * Where A pairs with D, and B pairs with C. + * + * I don't think A needs to be a full barrier because we won't in fact + * write data until we see the store from userspace. So we simply don't + * issue the data WRITE until we observe it. Be conservative for now. + * + * OTOH, D needs to be a full barrier since it separates the data READ + * from the tail WRITE. + * + * For B a WMB is sufficient since it separates two WRITEs, and for C + * an RMB is sufficient since it separates two READs. + * + * See perf_output_begin(). */ + smp_wmb(); rb->user_page->data_head = head; /* @@ -142,9 +163,11 @@ int perf_output_begin(struct perf_output_handle *handle, * Userspace could choose to issue a mb() before updating the * tail pointer. So that all reads will be completed before the * write is issued. + * + * See perf_output_put_handle(). */ tail = ACCESS_ONCE(rb->user_page->data_tail); - smp_rmb(); + smp_mb(); offset = head = local_read(&rb->head); head += size; if (unlikely(!perf_output_space(rb, tail, offset, head)))