From patchwork Tue Apr 15 13:48:11 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Wang, Rui Y" X-Patchwork-Id: 339286 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 39D96140083 for ; Wed, 16 Apr 2014 00:00:07 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751304AbaDON7h (ORCPT ); Tue, 15 Apr 2014 09:59:37 -0400 Received: from mga09.intel.com ([134.134.136.24]:37545 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754887AbaDON7c (ORCPT ); Tue, 15 Apr 2014 09:59:32 -0400 Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga102.jf.intel.com with ESMTP; 15 Apr 2014 06:54:43 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.97,864,1389772800"; d="scan'208";a="513523045" Received: from wr-optiplex-9010.bj.intel.com ([10.238.155.112]) by fmsmga001.fm.intel.com with ESMTP; 15 Apr 2014 06:59:25 -0700 From: Rui Wang To: bhelgaas@google.com, tony.luck@intel.com, rafael.j.wysocki@intel.com Cc: ruiv.wang@gmail.com, chaohong.guo@intel.com, dave.hansen@intel.com, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, Rui Wang Subject: [PATCH v3 5/5] IO Hook: Tracing hw access Date: Tue, 15 Apr 2014 21:48:11 +0800 Message-Id: <1397569691-24691-6-git-send-email-rui.y.wang@intel.com> X-Mailer: git-send-email 1.7.5.4 In-Reply-To: <1397569691-24691-1-git-send-email-rui.y.wang@intel.com> References: <1397569691-24691-1-git-send-email-rui.y.wang@intel.com> Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org A new attribute "tc" can be defined for any Register Override so that IO Hook can report each read from (or write to) the specified area in mem, io, msr, and pciconf spaces. A trace event named 'iohook_event' is defined to report the accesses. For example echo "0x383ffff00000-4000[0/0]tc" > /sys/kernel/debug/tracing/iohook/mem or trace_event=iohook_event iohook.mem="383ffff00000-4000[0/0]tc" as boot parameters in grub can be used to trace any access to physical address region starting from 0x383ffff00000 with a length of 0x4000. A typical trace event output looks like the following: bash $ sudo cat /sys/kernel/debug/tracing/trace ... swapper/0-1 [018] .... 11.594157: iohook_event: pci_read_config(0000:00:14.00, 0x6, 0x2) <== 0x290 swapper/0-1 [018] .... 11.599914: iohook_event: pci_read_config(0000:00:14.00, 0x34, 0x1) <== 0x70 modprobe-613 [013] d... 12.447408: iohook_event: pci_write_config(0000:00:14.00, 0x4, 0x2) ==> 0x6 modprobe-613 [013] .... 12.452565: iohook_event: pci_read_config(0000:00:14.00, 0xd, 0x1) <== 0x0 modprobe-613 [013] d... 12.458302: iohook_event: pci_write_config(0000:00:14.00, 0xd, 0x1) ==> 0x40 modprobe-613 [013] .... 12.477935: iohook_event: read(0x383ffff00000, 0x4) <== 0x1000080 modprobe-613 [013] .... 12.477936: iohook_event: read(0x383ffff00018, 0x4) <== 0x2000 modprobe-613 [013] d... 12.478001: iohook_event: write(0x383ffff02030, 0x4) ==> 0x37822000 modprobe-613 [013] d... 12.478001: iohook_event: write(0x383ffff02034, 0x4) ==> 0x0 ... Signed-off-by: Rui Wang --- Documentation/PCI/iohook.txt | 54 +++++++- drivers/misc/iohook/iohook.c | 319 +++++++++++++++++++++++++++++++---------- include/trace/events/iohook.h | 58 ++++++++ 3 files changed, 352 insertions(+), 79 deletions(-) create mode 100644 include/trace/events/iohook.h diff --git a/Documentation/PCI/iohook.txt b/Documentation/PCI/iohook.txt index 9b3f232..8e0d2aa 100644 --- a/Documentation/PCI/iohook.txt +++ b/Documentation/PCI/iohook.txt @@ -104,7 +104,10 @@ where attribute - used to specify the attribute of the overridden bits. It can be ro, rw, wc, rc to mean read-only, read-write, - write-clear, and read-clear respectively. + write-clear, and read-clear respectively. A special attr- + ibute tc is defined to trace hw access through the trace + event named iohook_event. When tc is used as the attribute + value/mask are ignored. domain - pci domain number bus - pci bus number @@ -299,3 +302,52 @@ bash # echo 1 > trigger dmesg shows: pcieport 0000:00:05.0: AER: Corrected error received: id=0500 +3.4 Tracing hardware access by the OS + +I/O Hook can be used to trace access to hw registers by the OS. When compared +to mmio trace, this IO Hook trace has both benefits and shortcomings: + a) I/O Hook is SMP safe while mmio trace isn't. + b) mmio trace can trace arbitrary assembly instruction accessing + arbitrary address, while IO Hook can only trace standard + functions like readl(), inb(), pci_read_config_byte(), etc. + C) I/O Hook can trace IO ports and PCI Config space, while mmio trace + cannot. + +A trace event named 'iohook_event' is defined to report the accesses. A +special attribute tc can be used to specify the register to be traced. +For example in order to trace access to physical address region starting +from 0x383ffff00000 with a length of 0x4000, the following cmd sequence is used: + +bash # echo "0x383ffff00000-4000[0/0]tc" > /sys/kernel/debug/tracing/iohook/mem +bash # echo 1 > /sys/kernel/debug/iohook/trigger + +To trace hw access during boot, the following boot parameters can be used in +grub: + iohook.mem, iohook.io, iohook.pciconf, iohook.msr +For example + trace_event=iohook_event iohook.mem="383ffff00000-4000[0/0]tc" as boot +parameters in grub can be used to trace the same physical address region +starting from 0x383ffff00000. A typical trace event output looks like the +following: + +bash $ sudo cat /sys/kernel/debug/tracing/trace +... +swapper/0-1 [018] .... 11.594157: iohook_event: pci_read_config(0000:00:14.00, 0x6, 0x2) <== 0x290 + +swapper/0-1 [018] .... 11.599914: iohook_event: pci_read_config(0000:00:14.00, 0x34, 0x1) <== 0x70 + +modprobe-613 [013] d... 12.447408: iohook_event: pci_write_config(0000:00:14.00, 0x4, 0x2) ==> 0x6 + +modprobe-613 [013] .... 12.452565: iohook_event: pci_read_config(0000:00:14.00, 0xd, 0x1) <== 0x0 + +modprobe-613 [013] d... 12.458302: iohook_event: pci_write_config(0000:00:14.00, 0xd, 0x1) ==> 0x40 + +modprobe-613 [013] .... 12.477935: iohook_event: read(0x383ffff00000, 0x4) <== 0x1000080 + +modprobe-613 [013] .... 12.477936: iohook_event: read(0x383ffff00018, 0x4) <== 0x2000 + +modprobe-613 [013] d... 12.478001: iohook_event: write(0x383ffff02030, 0x4) ==> 0x37822000 + +modprobe-613 [013] d... 12.478001: iohook_event: write(0x383ffff02034, 0x4) ==> 0x0 +... + diff --git a/drivers/misc/iohook/iohook.c b/drivers/misc/iohook/iohook.c index f81f553..af8b2cf 100644 --- a/drivers/misc/iohook/iohook.c +++ b/drivers/misc/iohook/iohook.c @@ -25,6 +25,9 @@ #include #include "iohook.h" +#define CREATE_TRACE_POINTS +#include + MODULE_LICENSE("GPL"); static struct dentry *hook_irq_dentry; @@ -42,6 +45,37 @@ struct static_key ovrdhw_enabled = STATIC_KEY_INIT_FALSE; EXPORT_SYMBOL(ovrdhw_enabled); int g_ovrd_on; + +char *hook_func_names[] = { + "read", + "write", + "in", + "out", + "pci_read_config", + "pci_write_config", + "rdmsr", + "wrmsr" +}; + +const char *iohook_trace_parse_addr(struct trace_seq *p, int type, u64 address) +{ + const char *ret = p->buffer + p->len; + + if (type == PCI_RD || type == PCI_WR) + trace_seq_printf(p, "%04x:%02x:%02x.%02x, 0x%x", + PCI_DECODE_DOMAIN(address), + PCI_DECODE_BUSN(address), + PCI_SLOT(PCI_DECODE_DEVFN(address)), + PCI_FUNC(PCI_DECODE_DEVFN(address)), + PCI_DECODE_POS(address)); + else + trace_seq_printf(p, "0x%llx", address); + trace_seq_putc(p, 0); + + return ret; + +} + /* query a Register Override given spaceid and index */ struct reg_ovrd *iohook_query_ovrd(int spaceid, int idx) { @@ -330,6 +364,85 @@ static u64 v2p(u64 vaddr) return slow_virt_to_phys((void *)vaddr); } +static int pci_rd(struct pci_bus *pcib, u64 address, int len, void *data) +{ + unsigned int devfn = 0, pos = 0; + unsigned long flags; + int res; + + devfn = PCI_DECODE_DEVFN(address); + pos = PCI_DECODE_POS(address); + + raw_spin_lock_irqsave(&pci_lock, flags); + res = pcib->ops->read(pcib, devfn, pos, len, + (u32 *)data); + raw_spin_unlock_irqrestore(&pci_lock, flags); + return res; +} + +static int pci_wr(struct pci_bus *pcib, u64 address, int len, u32 value) +{ + unsigned int devfn = 0, pos = 0; + unsigned long flags; + int res; + + devfn = PCI_DECODE_DEVFN(address); + pos = PCI_DECODE_POS(address); + + raw_spin_lock_irqsave(&pci_lock, flags); + res = pcib->ops->write(pcib, devfn, pos, len, value); + raw_spin_unlock_irqrestore(&pci_lock, flags); + return res; +} + +static int msr_rd(void *bus, u64 address, u64 *data) +{ + struct msr_regs_info *rv; + unsigned int msr; + int res; + + rv = (struct msr_regs_info *)bus; + msr = address & 0xffffffff; + if (rv && rv->regs) { /* rdmsr_safe_regs() */ + u32 low, high; + + rv->err = rdmsr_safe_regs(rv->regs); + low = rv->regs[0]; /* eax */ + high = rv->regs[2]; /* edx */ + *data = low | ((u64)high << 32); + res = rv->err; + } else if (rv) { /* rv->regs == NULL */ + *data = native_do_read_msr_safe(msr, + &rv->err); + res = rv->err; + } else { /* rv == NULL */ + *data = native_do_read_msr(msr); + res = 0; + } + + return res; +} + +static void msr_wr(void *bus, u64 address, u64 value) +{ + struct msr_regs_info *rv; + unsigned int msr, low, high; + + rv = (struct msr_regs_info *)bus; + msr = address & 0xffffffff; + low = value & 0xffffffff; + high = value >> 32; + + if (rv && rv->regs) { /* wrmsr_safe_regs() */ + rv->err = wrmsr_safe_regs(rv->regs); + } else if (rv) { /* rv->regs == NULL */ + rv->err = native_do_write_msr_safe(msr, + low, high); + } else { /* rv == NULL */ + native_do_write_msr(msr, low, high); + } +} + /* shift left if i>=0, otherwise shift right */ #define BYTE_SHIFT(value, i) \ ((i) >= 0 ? (value) << (i)*8 : (value) >> (-i)*8) @@ -338,11 +451,9 @@ int read_ovrd_common(int spaceid, u64 address, int len, void *value, void *bus) { struct list_head *ovrd_list; struct reg_ovrd *ovrd_reg; - struct pci_bus *pcib; - unsigned long lock_flags = 0, flags = 0; + unsigned long lock_flags = 0; u64 faddress, vaddr = 0; u64 data, bit_mask, attrib, val; - unsigned int devfn = 0, pos = 0; int i, flength, res, ret; ret = -EINVAL; @@ -358,8 +469,6 @@ int read_ovrd_common(int spaceid, u64 address, int len, void *value, void *bus) } else if (spaceid == OVRD_SPACE_IO) { ovrd_list = &ovrd_io_reg_map; } else if (spaceid == OVRD_SPACE_PCICONF) { - devfn = PCI_DECODE_DEVFN(address); - pos = PCI_DECODE_POS(address); ovrd_list = &ovrd_pci_conf_reg_map; } else if (spaceid == OVRD_SPACE_MSR) { unsigned int cpuid; @@ -398,6 +507,27 @@ int read_ovrd_common(int spaceid, u64 address, int len, void *value, void *bus) lock_flags); /* at least one byte falls into the overridden range */ + + if (attrib == OVRD_TC) { + /* trace hw access */ + if (spaceid == OVRD_SPACE_MEM) { + ret = mem_read(vaddr, len, &data); + trace_iohook_event(MM_RD, len, address, data); + } else if (spaceid == OVRD_SPACE_IO) { + ret = io_read(address, len, &data); + trace_iohook_event(IO_RD, len, address, data); + } else if (spaceid == OVRD_SPACE_PCICONF) { + ret = pci_rd(bus, address, len, &data); + trace_iohook_event(PCI_RD, len, address, data); + } else if (spaceid == OVRD_SPACE_MSR) { + ret = msr_rd(bus, address, &data); + trace_iohook_event(MSR_RD, len, address, data); + } else + trace_iohook_event(MSR_WR + 1, len, address, + data); + goto read_done; + } + data = 0; ret = 0; if (!(address >= faddress && address+len <= faddress+flength && @@ -409,33 +539,9 @@ int read_ovrd_common(int spaceid, u64 address, int len, void *value, void *bus) } else if (spaceid == OVRD_SPACE_IO) { res = io_read(address, len, &data); } else if (spaceid == OVRD_SPACE_PCICONF) { - raw_spin_lock_irqsave(&pci_lock, flags); - pcib = (struct pci_bus *)bus; - res = pcib->ops->read(pcib, devfn, pos, len, - (u32 *)&data); - raw_spin_unlock_irqrestore(&pci_lock, flags); + res = pci_rd(bus, address, len, &data); } else if (spaceid == OVRD_SPACE_MSR) { - struct msr_regs_info *rv; - unsigned int msr; - - rv = (struct msr_regs_info *)bus; - msr = address & 0xffffffff; - if (rv && rv->regs) { /* rdmsr_safe_regs() */ - u32 low, high; - - rv->err = rdmsr_safe_regs(rv->regs); - low = rv->regs[0]; /* eax */ - high = rv->regs[2]; /* edx */ - data = low | ((u64)high << 32); - res = rv->err; - } else if (rv) { /* rv->regs == NULL */ - data = native_do_read_msr_safe(msr, - &rv->err); - res = rv->err; - } else { /* rv == NULL */ - data = native_do_read_msr(msr); - res = 0; - } + res = msr_rd(bus, address, &data); } else goto out; @@ -475,6 +581,7 @@ int read_ovrd_common(int spaceid, u64 address, int len, void *value, void *bus) } } +read_done: switch (len) { case 1: *(u8 *)value = (u8)data; @@ -490,11 +597,10 @@ int read_ovrd_common(int spaceid, u64 address, int len, void *value, void *bus) break; default: ret = -EINVAL; - goto out; + break; } - raw_spin_lock_irqsave(&io_hook_lock, - lock_flags); + goto out; } raw_spin_unlock_irqrestore(&io_hook_lock, lock_flags); @@ -507,8 +613,7 @@ int write_ovrd_common(int spaceid, u64 address, int len, void *data, void *bus) { struct list_head *ovrd_list; struct reg_ovrd *ovrd_reg; - struct pci_bus *pcib; - unsigned long lock_flags = 0, flags = 0; + unsigned long lock_flags = 0; u64 faddress, vaddr = 0; u64 bit_mask, val, attrib; unsigned int devfn = 0, pos = 0; @@ -567,42 +672,28 @@ int write_ovrd_common(int spaceid, u64 address, int len, void *data, void *bus) ret = 0; - if (!(address >= faddress && address+len <= faddress+flength && - bit_mask == (u64)((1<ops->write(pcib, devfn, pos, len, - (u32)value); - raw_spin_unlock_irqrestore(&pci_lock, flags); + pci_wr(bus, address, len, (u32)value); raw_spin_lock_irqsave(&io_hook_lock, lock_flags); + trace_iohook_event(PCI_WR, len, address, value); } else if (spaceid == OVRD_SPACE_MSR) { - struct msr_regs_info *rv; - unsigned int msr, low, high; - - rv = (struct msr_regs_info *)bus; - msr = address & 0xffffffff; - low = value & 0xffffffff; - high = value >> 32; - - if (rv && rv->regs) { /* wrmsr_safe_regs() */ - rv->err = wrmsr_safe_regs(rv->regs); - } else if (rv) { /* rv->regs == NULL */ - rv->err = native_do_write_msr_safe(msr, - low, high); - } else { /* rv == NULL */ - native_do_write_msr(msr, low, high); - } + msr_wr(bus, address, value); + trace_iohook_event(MSR_WR, len, address, value); } else - break; + trace_iohook_event(MSR_WR + 2, len, address, + value); + break; } for (i = 0; i < len; i++) { @@ -634,10 +725,30 @@ int write_ovrd_common(int spaceid, u64 address, int len, void *data, void *bus) } } + /* finished using ovrd_reg->, safe to unlock */ + raw_spin_unlock_irqrestore(&io_hook_lock, lock_flags); + + if (!(address >= faddress && address+len <= faddress+flength && + bit_mask == (u64)((1< +#include + +#define MM_RD 0 +#define MM_WR 1 +#define IO_RD 2 +#define IO_WR 3 +#define PCI_RD 4 +#define PCI_WR 5 +#define MSR_RD 6 +#define MSR_WR 7 + +extern char *hook_func_names[]; + +const char *iohook_trace_parse_addr(struct trace_seq*, int, u64); +#define __parse_addr iohook_trace_parse_addr(p, __entry->type, \ + __entry->address) + +/** + * @type: type of hw access (MM_RD, MM_WR, IO_RD, PCI_RD ...) + */ +TRACE_EVENT(iohook_event, + TP_PROTO(const int type, + const int len, + const u64 address, + const u64 value), + + TP_ARGS(type, len, address, value), + + TP_STRUCT__entry( + __field( int, type) + __field( int, len) + __field( u64, address) + __field( u64, value) + ), + + TP_fast_assign( + __entry->type = type; + __entry->len = len; + __entry->address = address; + __entry->value = value; + ), + + TP_printk("%s(%s, 0x%x) %s 0x%llx\n", __entry->type > MSR_WR ? + "unknown" : hook_func_names[__entry->type], __parse_addr, + __entry->len, __entry->type % 2 ? "==>" : "<==", /* event/odd */ + __entry->value & ((1ull << (8 * __entry->len)) - 1)) +); +#endif /* _TRACE_IOHOOK_H */ + +/* This part must be outside protection */ +#include