Message ID | 20210619034043.199220-1-tientzu@chromium.org |
---|---|
Headers | show |
Series | Restricted DMA | expand |
On Sat, 19 Jun 2021, Claire Chang wrote: > Add a new function, swiotlb_init_io_tlb_mem, for the io_tlb_mem struct > initialization to make the code reusable. > > Signed-off-by: Claire Chang <tientzu@chromium.org> > Reviewed-by: Christoph Hellwig <hch@lst.de> > Tested-by: Stefano Stabellini <sstabellini@kernel.org> > Tested-by: Will Deacon <will@kernel.org> Acked-by: Stefano Stabellini <sstabellini@kernel.org> > --- > kernel/dma/swiotlb.c | 50 ++++++++++++++++++++++---------------------- > 1 file changed, 25 insertions(+), 25 deletions(-) > > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c > index 52e2ac526757..1f9b2b9e7490 100644 > --- a/kernel/dma/swiotlb.c > +++ b/kernel/dma/swiotlb.c > @@ -168,9 +168,28 @@ void __init swiotlb_update_mem_attributes(void) > memset(vaddr, 0, bytes); > } > > -int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose) > +static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, > + unsigned long nslabs, bool late_alloc) > { > + void *vaddr = phys_to_virt(start); > unsigned long bytes = nslabs << IO_TLB_SHIFT, i; > + > + mem->nslabs = nslabs; > + mem->start = start; > + mem->end = mem->start + bytes; > + mem->index = 0; > + mem->late_alloc = late_alloc; > + spin_lock_init(&mem->lock); > + for (i = 0; i < mem->nslabs; i++) { > + mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i); > + mem->slots[i].orig_addr = INVALID_PHYS_ADDR; > + mem->slots[i].alloc_size = 0; > + } > + memset(vaddr, 0, bytes); > +} > + > +int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose) > +{ > struct io_tlb_mem *mem; > size_t alloc_size; > > @@ -186,16 +205,8 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose) > if (!mem) > panic("%s: Failed to allocate %zu bytes align=0x%lx\n", > __func__, alloc_size, PAGE_SIZE); > - mem->nslabs = nslabs; > - mem->start = __pa(tlb); > - mem->end = mem->start + bytes; > - mem->index = 0; > - spin_lock_init(&mem->lock); > - for (i = 0; i < mem->nslabs; i++) { > - mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i); > - mem->slots[i].orig_addr = INVALID_PHYS_ADDR; > - mem->slots[i].alloc_size = 0; > - } > + > + swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false); > > io_tlb_default_mem = mem; > if (verbose) > @@ -282,8 +293,8 @@ swiotlb_late_init_with_default_size(size_t default_size) > int > swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs) > { > - unsigned long bytes = nslabs << IO_TLB_SHIFT, i; > struct io_tlb_mem *mem; > + unsigned long bytes = nslabs << IO_TLB_SHIFT; > > if (swiotlb_force == SWIOTLB_NO_FORCE) > return 0; > @@ -297,20 +308,9 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs) > if (!mem) > return -ENOMEM; > > - mem->nslabs = nslabs; > - mem->start = virt_to_phys(tlb); > - mem->end = mem->start + bytes; > - mem->index = 0; > - mem->late_alloc = 1; > - spin_lock_init(&mem->lock); > - for (i = 0; i < mem->nslabs; i++) { > - mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i); > - mem->slots[i].orig_addr = INVALID_PHYS_ADDR; > - mem->slots[i].alloc_size = 0; > - } > - > + memset(mem, 0, sizeof(*mem)); > set_memory_decrypted((unsigned long)tlb, bytes >> PAGE_SHIFT); > - memset(tlb, 0, bytes); > + swiotlb_init_io_tlb_mem(mem, virt_to_phys(tlb), nslabs, true); > > io_tlb_default_mem = mem; > swiotlb_print_info(); > -- > 2.32.0.288.g62a8d224e6-goog >
On Sat, Jun 19, 2021 at 11:40:31AM +0800, Claire Chang wrote: > This series implements mitigations for lack of DMA access control on > systems without an IOMMU, which could result in the DMA accessing the > system memory at unexpected times and/or unexpected addresses, possibly > leading to data leakage or corruption. > > For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus is > not behind an IOMMU. As PCI-e, by design, gives the device full access to > system memory, a vulnerability in the Wi-Fi firmware could easily escalate > to a full system exploit (remote wifi exploits: [1a], [1b] that shows a > full chain of exploits; [2], [3]). > > To mitigate the security concerns, we introduce restricted DMA. Restricted > DMA utilizes the existing swiotlb to bounce streaming DMA in and out of a > specially allocated region and does memory allocation from the same region. > The feature on its own provides a basic level of protection against the DMA > overwriting buffer contents at unexpected times. However, to protect > against general data leakage and system memory corruption, the system needs > to provide a way to restrict the DMA to a predefined memory region (this is > usually done at firmware level, e.g. MPU in ATF on some ARM platforms [4]). > > [1a] https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_4.html > [1b] https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_11.html > [2] https://blade.tencent.com/en/advisories/qualpwn/ > [3] https://www.bleepingcomputer.com/news/security/vulnerabilities-found-in-highly-popular-firmware-for-wifi-chips/ > [4] https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132 Heya Claire, I put all your patches on https://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb.git/log/?h=devel/for-linus-5.14 Please double-check that they all look ok. Thank you!
On Wed, Jun 23, 2021 at 4:38 PM Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > > On Sat, Jun 19, 2021 at 11:40:31AM +0800, Claire Chang wrote: > > This series implements mitigations for lack of DMA access control on > > systems without an IOMMU, which could result in the DMA accessing the > > system memory at unexpected times and/or unexpected addresses, possibly > > leading to data leakage or corruption. > > > > For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus is > > not behind an IOMMU. As PCI-e, by design, gives the device full access to > > system memory, a vulnerability in the Wi-Fi firmware could easily escalate > > to a full system exploit (remote wifi exploits: [1a], [1b] that shows a > > full chain of exploits; [2], [3]). > > > > To mitigate the security concerns, we introduce restricted DMA. Restricted > > DMA utilizes the existing swiotlb to bounce streaming DMA in and out of a > > specially allocated region and does memory allocation from the same region. > > The feature on its own provides a basic level of protection against the DMA > > overwriting buffer contents at unexpected times. However, to protect > > against general data leakage and system memory corruption, the system needs > > to provide a way to restrict the DMA to a predefined memory region (this is > > usually done at firmware level, e.g. MPU in ATF on some ARM platforms [4]). > > > > [1a] https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_4.html > > [1b] https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_11.html > > [2] https://blade.tencent.com/en/advisories/qualpwn/ > > [3] https://www.bleepingcomputer.com/news/security/vulnerabilities-found-in-highly-popular-firmware-for-wifi-chips/ > > [4] https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132 > > Heya Claire, > > I put all your patches on > https://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb.git/log/?h=devel/for-linus-5.14 > > Please double-check that they all look ok. > > Thank you! They look fine. Thank you!
On 6/18/2021 11:40 PM, Claire Chang wrote: > Propagate the swiotlb_force into io_tlb_default_mem->force_bounce and > use it to determine whether to bounce the data or not. This will be > useful later to allow for different pools. > > Signed-off-by: Claire Chang <tientzu@chromium.org> > Reviewed-by: Christoph Hellwig <hch@lst.de> > Tested-by: Stefano Stabellini <sstabellini@kernel.org> > Tested-by: Will Deacon <will@kernel.org> > Acked-by: Stefano Stabellini <sstabellini@kernel.org> Reverting the rest of the series up to this patch fixed a boot crash with NVMe on today's linux-next. [ 22.286574][ T7] Unable to handle kernel paging request at virtual address dfff80000000000e [ 22.295225][ T7] Mem abort info: [ 22.298743][ T7] ESR = 0x96000004 [ 22.302496][ T7] EC = 0x25: DABT (current EL), IL = 32 bits [ 22.308525][ T7] SET = 0, FnV = 0 [ 22.312274][ T7] EA = 0, S1PTW = 0 [ 22.316131][ T7] FSC = 0x04: level 0 translation fault [ 22.321704][ T7] Data abort info: [ 22.325278][ T7] ISV = 0, ISS = 0x00000004 [ 22.329840][ T7] CM = 0, WnR = 0 [ 22.333503][ T7] [dfff80000000000e] address between user and kernel address ranges [ 22.338543][ T256] igb 0006:01:00.0: Intel(R) Gigabit Ethernet Network Connection [ 22.341400][ T7] Internal error: Oops: 96000004 [#1] SMP [ 22.348915][ T256] igb 0006:01:00.0: eth0: (PCIe:2.5Gb/s:Width x1) 4c:38:d5:09:c8:83 [ 22.354458][ T7] Modules linked in: igb(+) i2c_algo_bit nvme mlx5_core(+) i2c_core nvme_core firmware_class [ 22.362512][ T256] igb 0006:01:00.0: eth0: PBA No: G69016-004 [ 22.372287][ T7] CPU: 13 PID: 7 Comm: kworker/u64:0 Not tainted 5.13.0-rc7-next-20210623+ #47 [ 22.372293][ T7] Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020 [ 22.372298][ T7] Workqueue: nvme-reset-wq nvme_reset_work [nvme] [ 22.378145][ T256] igb 0006:01:00.0: Using MSI-X interrupts. 4 rx queue(s), 4 tx queue(s) [ 22.386901][ T7] [ 22.386905][ T7] pstate: 10000005 (nzcV daif -PAN -UAO -TCO BTYPE=--) [ 22.386910][ T7] pc : dma_direct_map_sg+0x304/0x8f0 is_swiotlb_force_bounce at /usr/src/linux-next/./include/linux/swiotlb.h:119 (inlined by) dma_direct_map_page at /usr/src/linux-next/kernel/dma/direct.h:90 (inlined by) dma_direct_map_sg at /usr/src/linux-next/kernel/dma/direct.c:428 [ 22.386919][ T7] lr : dma_map_sg_attrs+0x6c/0x118 [ 22.386924][ T7] sp : ffff80001dc8eac0 [ 22.386926][ T7] x29: ffff80001dc8eac0 x28: ffff0000199e70b0 x27: 0000000000000000 [ 22.386935][ T7] x26: ffff000847ee7000 x25: ffff80001158e570 x24: 0000000000000002 [ 22.386943][ T7] x23: dfff800000000000 x22: 0000000000000100 x21: ffff0000199e7460 [ 22.386951][ T7] x20: ffff0000199e7488 x19: 0000000000000001 x18: ffff000010062670 [ 22.386955][ T253] Unable to handle kernel paging request at virtual address dfff80000000000e [ 22.386958][ T7] x17: ffff8000109f6a90 x16: ffff8000109e1b4c x15: ffff800009303420 [ 22.386965][ T253] Mem abort info: [ 22.386967][ T7] x14: 0000000000000001 x13: ffff80001158e000 [ 22.386970][ T253] ESR = 0x96000004 [ 22.386972][ T7] x12: 1fffe00108fdce01 [ 22.386975][ T253] EC = 0x25: DABT (current EL), IL = 32 bits [ 22.386976][ T7] x11: 1fffe00108fdce03 x10: ffff000847ee700c x9 : 0000000000000004 [ 22.386981][ T253] SET = 0, FnV = 0 [ 22.386983][ T7] [ 22.386985][ T7] x8 : ffff700003b91d72 [ 22.386986][ T253] EA = 0, S1PTW = 0 [ 22.386987][ T7] x7 : 0000000000000000 x6 : 000000000000000e [ 22.386990][ T253] FSC = 0x04: level 0 translation fault [ 22.386992][ T7] [ 22.386994][ T7] x5 : dfff800000000000 [ 22.386995][ T253] Data abort info: [ 22.386997][ T7] x4 : 00000008c7ede000 [ 22.386999][ T253] ISV = 0, ISS = 0x00000004 [ 22.386999][ T7] x3 : 00000008c7ede000 [ 22.387003][ T7] x2 : 0000000000001000 [ 22.387003][ T253] CM = 0, WnR = 0 [ 22.387006][ T7] x1 : 0000000000000000 x0 : 0000000000000071 [ 22.387008][ T253] [dfff80000000000e] address between user and kernel address ranges [ 22.387011][ T7] [ 22.387013][ T7] Call trace: [ 22.387016][ T7] dma_direct_map_sg+0x304/0x8f0 [ 22.387022][ T7] dma_map_sg_attrs+0x6c/0x118 [ 22.387026][ T7] nvme_map_data+0x2ec/0x21d8 [nvme] [ 22.387040][ T7] nvme_queue_rq+0x274/0x3f0 [nvme] [ 22.387052][ T7] blk_mq_dispatch_rq_list+0x2ec/0x18a0 [ 22.387060][ T7] __blk_mq_sched_dispatch_requests+0x2a0/0x3e8 [ 22.387065][ T7] blk_mq_sched_dispatch_requests+0xa4/0x100 [ 22.387070][ T7] __blk_mq_run_hw_queue+0x148/0x1d8 [ 22.387075][ T7] __blk_mq_delay_run_hw_queue+0x3f8/0x730 [ 22.414539][ T269] igb 0006:01:00.0 enP6p1s0: renamed from eth0 [ 22.418957][ T7] blk_mq_run_hw_queue+0x148/0x248 [ 22.418969][ T7] blk_mq_sched_insert_request+0x2a4/0x330 [ 22.418975][ T7] blk_execute_rq_nowait+0xc8/0x118 [ 22.418981][ T7] blk_execute_rq+0xd4/0x188 [ 22.453203][ T255] udevadm (255) used greatest stack depth: 57408 bytes left [ 22.456504][ T7] __nvme_submit_sync_cmd+0x4e0/0x730 [nvme_core] [ 22.673245][ T7] nvme_identify_ctrl.isra.0+0x124/0x1e0 [nvme_core] [ 22.679784][ T7] nvme_init_identify+0x90/0x1868 [nvme_core] [ 22.685713][ T7] nvme_init_ctrl_finish+0x1a8/0xb88 [nvme_core] [ 22.691903][ T7] nvme_reset_work+0xe5c/0x2aa4 [nvme] [ 22.697219][ T7] process_one_work+0x7e4/0x19a0 [ 22.702005][ T7] worker_thread+0x334/0xae0 [ 22.706442][ T7] kthread+0x3bc/0x470 [ 22.710359][ T7] ret_from_fork+0x10/0x18 [ 22.714627][ T7] Code: f941ef81 9101c420 1200080e d343fc06 (38f768c6) [ 22.721407][ T7] ---[ end trace 1f3c4181ae408676 ]--- [ 22.726712][ T7] Kernel panic - not syncing: Oops: Fatal exception [ 22.733169][ T7] SMP: stopping secondary CPUs [ 23.765164][ T7] SMP: failed to stop secondary CPUs 13,15 [ 23.770818][ T7] Kernel Offset: disabled [ 23.774991][ T7] CPU features: 0x00000251,20000846 [ 23.780034][ T7] Memory Limit: none [ 23.783794][ T7] ---[ end Kernel panic - not syncing: Oops: Fatal exception ]--- > --- > drivers/xen/swiotlb-xen.c | 2 +- > include/linux/swiotlb.h | 11 +++++++++++ > kernel/dma/direct.c | 2 +- > kernel/dma/direct.h | 2 +- > kernel/dma/swiotlb.c | 4 ++++ > 5 files changed, 18 insertions(+), 3 deletions(-) > > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c > index 0c6ed09f8513..4730a146fa35 100644 > --- a/drivers/xen/swiotlb-xen.c > +++ b/drivers/xen/swiotlb-xen.c > @@ -369,7 +369,7 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page, > if (dma_capable(dev, dev_addr, size, true) && > !range_straddles_page_boundary(phys, size) && > !xen_arch_need_swiotlb(dev, phys, dev_addr) && > - swiotlb_force != SWIOTLB_FORCE) > + !is_swiotlb_force_bounce(dev)) > goto done; > > /* > diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h > index dd1c30a83058..8d8855c77d9a 100644 > --- a/include/linux/swiotlb.h > +++ b/include/linux/swiotlb.h > @@ -84,6 +84,7 @@ extern enum swiotlb_force swiotlb_force; > * unmap calls. > * @debugfs: The dentry to debugfs. > * @late_alloc: %true if allocated using the page allocator > + * @force_bounce: %true if swiotlb bouncing is forced > */ > struct io_tlb_mem { > phys_addr_t start; > @@ -94,6 +95,7 @@ struct io_tlb_mem { > spinlock_t lock; > struct dentry *debugfs; > bool late_alloc; > + bool force_bounce; > struct io_tlb_slot { > phys_addr_t orig_addr; > size_t alloc_size; > @@ -109,6 +111,11 @@ static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr) > return mem && paddr >= mem->start && paddr < mem->end; > } > > +static inline bool is_swiotlb_force_bounce(struct device *dev) > +{ > + return dev->dma_io_tlb_mem->force_bounce; > +} > + > void __init swiotlb_exit(void); > unsigned int swiotlb_max_segment(void); > size_t swiotlb_max_mapping_size(struct device *dev); > @@ -120,6 +127,10 @@ static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr) > { > return false; > } > +static inline bool is_swiotlb_force_bounce(struct device *dev) > +{ > + return false; > +} > static inline void swiotlb_exit(void) > { > } > diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c > index 7a88c34d0867..a92465b4eb12 100644 > --- a/kernel/dma/direct.c > +++ b/kernel/dma/direct.c > @@ -496,7 +496,7 @@ size_t dma_direct_max_mapping_size(struct device *dev) > { > /* If SWIOTLB is active, use its maximum mapping size */ > if (is_swiotlb_active(dev) && > - (dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE)) > + (dma_addressing_limited(dev) || is_swiotlb_force_bounce(dev))) > return swiotlb_max_mapping_size(dev); > return SIZE_MAX; > } > diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h > index 13e9e7158d94..4632b0f4f72e 100644 > --- a/kernel/dma/direct.h > +++ b/kernel/dma/direct.h > @@ -87,7 +87,7 @@ static inline dma_addr_t dma_direct_map_page(struct device *dev, > phys_addr_t phys = page_to_phys(page) + offset; > dma_addr_t dma_addr = phys_to_dma(dev, phys); > > - if (unlikely(swiotlb_force == SWIOTLB_FORCE)) > + if (is_swiotlb_force_bounce(dev)) > return swiotlb_map(dev, phys, size, dir, attrs); > > if (unlikely(!dma_capable(dev, dma_addr, size, true))) { > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c > index 8a120f42340b..0d294bbf274c 100644 > --- a/kernel/dma/swiotlb.c > +++ b/kernel/dma/swiotlb.c > @@ -179,6 +179,10 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, > mem->end = mem->start + bytes; > mem->index = 0; > mem->late_alloc = late_alloc; > + > + if (swiotlb_force == SWIOTLB_FORCE) > + mem->force_bounce = true; > + > spin_lock_init(&mem->lock); > for (i = 0; i < mem->nslabs; i++) { > mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i); >
On Wed, Jun 23, 2021 at 12:39:29PM -0400, Qian Cai wrote: > > > On 6/18/2021 11:40 PM, Claire Chang wrote: > > Propagate the swiotlb_force into io_tlb_default_mem->force_bounce and > > use it to determine whether to bounce the data or not. This will be > > useful later to allow for different pools. > > > > Signed-off-by: Claire Chang <tientzu@chromium.org> > > Reviewed-by: Christoph Hellwig <hch@lst.de> > > Tested-by: Stefano Stabellini <sstabellini@kernel.org> > > Tested-by: Will Deacon <will@kernel.org> > > Acked-by: Stefano Stabellini <sstabellini@kernel.org> > > Reverting the rest of the series up to this patch fixed a boot crash with NVMe on today's linux-next. Hmm, so that makes patch 7 the suspicious one, right? Looking at that one more closely, it looks like swiotlb_find_slots() takes 'alloc_size + offset' as its 'alloc_size' parameter from swiotlb_tbl_map_single() and initialises 'mem->slots[i].alloc_size' based on 'alloc_size + offset', which looks like a change in behaviour from the old code, which didn't include the offset there. swiotlb_release_slots() then adds the offset back on afaict, so we end up accounting for it twice and possibly unmap more than we're supposed to? Will
On 6/23/2021 2:37 PM, Will Deacon wrote: > On Wed, Jun 23, 2021 at 12:39:29PM -0400, Qian Cai wrote: >> >> >> On 6/18/2021 11:40 PM, Claire Chang wrote: >>> Propagate the swiotlb_force into io_tlb_default_mem->force_bounce and >>> use it to determine whether to bounce the data or not. This will be >>> useful later to allow for different pools. >>> >>> Signed-off-by: Claire Chang <tientzu@chromium.org> >>> Reviewed-by: Christoph Hellwig <hch@lst.de> >>> Tested-by: Stefano Stabellini <sstabellini@kernel.org> >>> Tested-by: Will Deacon <will@kernel.org> >>> Acked-by: Stefano Stabellini <sstabellini@kernel.org> >> >> Reverting the rest of the series up to this patch fixed a boot crash with NVMe on today's linux-next. > > Hmm, so that makes patch 7 the suspicious one, right? Will, no. It is rather patch #6 (this patch). Only the patch from #6 to #12 were reverted to fix the issue. Also, looking at this offset of the crash, pc : dma_direct_map_sg+0x304/0x8f0 is_swiotlb_force_bounce at /usr/src/linux-next/./include/linux/swiotlb.h:119 is_swiotlb_force_bounce() was the new function introduced in this patch here. +static inline bool is_swiotlb_force_bounce(struct device *dev) +{ + return dev->dma_io_tlb_mem->force_bounce; +} > > Looking at that one more closely, it looks like swiotlb_find_slots() takes > 'alloc_size + offset' as its 'alloc_size' parameter from > swiotlb_tbl_map_single() and initialises 'mem->slots[i].alloc_size' based > on 'alloc_size + offset', which looks like a change in behaviour from the > old code, which didn't include the offset there. > > swiotlb_release_slots() then adds the offset back on afaict, so we end up > accounting for it twice and possibly unmap more than we're supposed to? > > Will >
On Wed, Jun 23, 2021 at 02:44:34PM -0400, Qian Cai wrote: > is_swiotlb_force_bounce at /usr/src/linux-next/./include/linux/swiotlb.h:119 > > is_swiotlb_force_bounce() was the new function introduced in this patch here. > > +static inline bool is_swiotlb_force_bounce(struct device *dev) > +{ > + return dev->dma_io_tlb_mem->force_bounce; > +} To me the crash looks like dev->dma_io_tlb_mem is NULL. Can you turn this into : return dev->dma_io_tlb_mem && dev->dma_io_tlb_mem->force_bounce; for a quick debug check?
On Thu, Jun 24, 2021 at 1:43 PM Christoph Hellwig <hch@lst.de> wrote: > > On Wed, Jun 23, 2021 at 02:44:34PM -0400, Qian Cai wrote: > > is_swiotlb_force_bounce at /usr/src/linux-next/./include/linux/swiotlb.h:119 > > > > is_swiotlb_force_bounce() was the new function introduced in this patch here. > > > > +static inline bool is_swiotlb_force_bounce(struct device *dev) > > +{ > > + return dev->dma_io_tlb_mem->force_bounce; > > +} > > To me the crash looks like dev->dma_io_tlb_mem is NULL. Can you > turn this into : > > return dev->dma_io_tlb_mem && dev->dma_io_tlb_mem->force_bounce; > > for a quick debug check? I just realized that dma_io_tlb_mem might be NULL like Christoph pointed out since swiotlb might not get initialized. However, `Unable to handle kernel paging request at virtual address dfff80000000000e` looks more like the address is garbage rather than NULL? I wonder if that's because dev->dma_io_tlb_mem is not assigned properly (which means device_initialize is not called?).
On 2021-06-24 07:05, Claire Chang wrote: > On Thu, Jun 24, 2021 at 1:43 PM Christoph Hellwig <hch@lst.de> wrote: >> >> On Wed, Jun 23, 2021 at 02:44:34PM -0400, Qian Cai wrote: >>> is_swiotlb_force_bounce at /usr/src/linux-next/./include/linux/swiotlb.h:119 >>> >>> is_swiotlb_force_bounce() was the new function introduced in this patch here. >>> >>> +static inline bool is_swiotlb_force_bounce(struct device *dev) >>> +{ >>> + return dev->dma_io_tlb_mem->force_bounce; >>> +} >> >> To me the crash looks like dev->dma_io_tlb_mem is NULL. Can you >> turn this into : >> >> return dev->dma_io_tlb_mem && dev->dma_io_tlb_mem->force_bounce; >> >> for a quick debug check? > > I just realized that dma_io_tlb_mem might be NULL like Christoph > pointed out since swiotlb might not get initialized. > However, `Unable to handle kernel paging request at virtual address > dfff80000000000e` looks more like the address is garbage rather than > NULL? > I wonder if that's because dev->dma_io_tlb_mem is not assigned > properly (which means device_initialize is not called?). What also looks odd is that the base "address" 0xdfff800000000000 is held in a couple of registers, but the offset 0xe looks too small to match up to any relevant structure member in that dereference chain :/ Robin.
On Thu, Jun 24, 2021 at 12:14:39PM +0100, Robin Murphy wrote: > On 2021-06-24 07:05, Claire Chang wrote: > > On Thu, Jun 24, 2021 at 1:43 PM Christoph Hellwig <hch@lst.de> wrote: > > > > > > On Wed, Jun 23, 2021 at 02:44:34PM -0400, Qian Cai wrote: > > > > is_swiotlb_force_bounce at /usr/src/linux-next/./include/linux/swiotlb.h:119 > > > > > > > > is_swiotlb_force_bounce() was the new function introduced in this patch here. > > > > > > > > +static inline bool is_swiotlb_force_bounce(struct device *dev) > > > > +{ > > > > + return dev->dma_io_tlb_mem->force_bounce; > > > > +} > > > > > > To me the crash looks like dev->dma_io_tlb_mem is NULL. Can you > > > turn this into : > > > > > > return dev->dma_io_tlb_mem && dev->dma_io_tlb_mem->force_bounce; > > > > > > for a quick debug check? > > > > I just realized that dma_io_tlb_mem might be NULL like Christoph > > pointed out since swiotlb might not get initialized. > > However, `Unable to handle kernel paging request at virtual address > > dfff80000000000e` looks more like the address is garbage rather than > > NULL? > > I wonder if that's because dev->dma_io_tlb_mem is not assigned > > properly (which means device_initialize is not called?). > > What also looks odd is that the base "address" 0xdfff800000000000 is held in > a couple of registers, but the offset 0xe looks too small to match up to any > relevant structure member in that dereference chain :/ FWIW, I've managed to trigger a NULL dereference locally when swiotlb hasn't been initialised but we dereference 'dev->dma_io_tlb_mem', so I think Christoph's suggestion is needed regardless. But I agree that it won't help with the issue reported by Qian Cai. Qian Cai: please can you share your .config and your command line? Thanks, Will
On 2021-06-24 12:18, Will Deacon wrote: > On Thu, Jun 24, 2021 at 12:14:39PM +0100, Robin Murphy wrote: >> On 2021-06-24 07:05, Claire Chang wrote: >>> On Thu, Jun 24, 2021 at 1:43 PM Christoph Hellwig <hch@lst.de> wrote: >>>> >>>> On Wed, Jun 23, 2021 at 02:44:34PM -0400, Qian Cai wrote: >>>>> is_swiotlb_force_bounce at /usr/src/linux-next/./include/linux/swiotlb.h:119 >>>>> >>>>> is_swiotlb_force_bounce() was the new function introduced in this patch here. >>>>> >>>>> +static inline bool is_swiotlb_force_bounce(struct device *dev) >>>>> +{ >>>>> + return dev->dma_io_tlb_mem->force_bounce; >>>>> +} >>>> >>>> To me the crash looks like dev->dma_io_tlb_mem is NULL. Can you >>>> turn this into : >>>> >>>> return dev->dma_io_tlb_mem && dev->dma_io_tlb_mem->force_bounce; >>>> >>>> for a quick debug check? >>> >>> I just realized that dma_io_tlb_mem might be NULL like Christoph >>> pointed out since swiotlb might not get initialized. >>> However, `Unable to handle kernel paging request at virtual address >>> dfff80000000000e` looks more like the address is garbage rather than >>> NULL? >>> I wonder if that's because dev->dma_io_tlb_mem is not assigned >>> properly (which means device_initialize is not called?). >> >> What also looks odd is that the base "address" 0xdfff800000000000 is held in >> a couple of registers, but the offset 0xe looks too small to match up to any >> relevant structure member in that dereference chain :/ > > FWIW, I've managed to trigger a NULL dereference locally when swiotlb hasn't > been initialised but we dereference 'dev->dma_io_tlb_mem', so I think > Christoph's suggestion is needed regardless. Ack to that - for SWIOTLB_NO_FORCE, io_tlb_default_mem will remain NULL. The massive jump in KernelCI baseline failures as of yesterday looks like every arm64 machine with less than 4GB of RAM blowing up... Robin. > But I agree that it won't help > with the issue reported by Qian Cai. > > Qian Cai: please can you share your .config and your command line? > > Thanks, > > Will >
On 6/24/2021 7:48 AM, Will Deacon wrote: > Ok, diff below which attempts to tackle the offset issue I mentioned as > well. Qian Cai -- please can you try with these changes? This works fine. > > Will > > --->8 > > diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h > index 175b6c113ed8..39284ff2a6cd 100644 > --- a/include/linux/swiotlb.h > +++ b/include/linux/swiotlb.h > @@ -116,7 +116,9 @@ static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr) > > static inline bool is_swiotlb_force_bounce(struct device *dev) > { > - return dev->dma_io_tlb_mem->force_bounce; > + struct io_tlb_mem *mem = dev->dma_io_tlb_mem; > + > + return mem && mem->force_bounce; > } > > void __init swiotlb_exit(void); > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c > index 44be8258e27b..0ffbaae9fba2 100644 > --- a/kernel/dma/swiotlb.c > +++ b/kernel/dma/swiotlb.c > @@ -449,6 +449,7 @@ static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr, > dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1); > unsigned int nslots = nr_slots(alloc_size), stride; > unsigned int index, wrap, count = 0, i; > + unsigned int offset = swiotlb_align_offset(dev, orig_addr); > unsigned long flags; > > BUG_ON(!nslots); > @@ -497,7 +498,7 @@ static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr, > for (i = index; i < index + nslots; i++) { > mem->slots[i].list = 0; > mem->slots[i].alloc_size = > - alloc_size - ((i - index) << IO_TLB_SHIFT); > + alloc_size - (offset + ((i - index) << IO_TLB_SHIFT)); > } > for (i = index - 1; > io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 && >
On Thu, Jun 24, 2021 at 10:10:51AM -0400, Qian Cai wrote: > > > On 6/24/2021 7:48 AM, Will Deacon wrote: > > Ok, diff below which attempts to tackle the offset issue I mentioned as > > well. Qian Cai -- please can you try with these changes? > > This works fine. Cool. Let me squash this patch in #6 and rebase the rest of them. Claire, could you check the devel/for-linus-5.14 say by end of today to double check that I didn't mess anything up please? Will, Thank you for generating the fix! I am going to run it on x86 and Xen to make sure all is good (granted last time I ran devel/for-linus-5.14 on that setup I didn't see any errors so I need to double check I didn't do something silly like run a wrong kernel). > > > > > Will > > > > --->8 > > > > diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h > > index 175b6c113ed8..39284ff2a6cd 100644 > > --- a/include/linux/swiotlb.h > > +++ b/include/linux/swiotlb.h > > @@ -116,7 +116,9 @@ static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr) > > > > static inline bool is_swiotlb_force_bounce(struct device *dev) > > { > > - return dev->dma_io_tlb_mem->force_bounce; > > + struct io_tlb_mem *mem = dev->dma_io_tlb_mem; > > + > > + return mem && mem->force_bounce; > > } > > > > void __init swiotlb_exit(void); > > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c > > index 44be8258e27b..0ffbaae9fba2 100644 > > --- a/kernel/dma/swiotlb.c > > +++ b/kernel/dma/swiotlb.c > > @@ -449,6 +449,7 @@ static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr, > > dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1); > > unsigned int nslots = nr_slots(alloc_size), stride; > > unsigned int index, wrap, count = 0, i; > > + unsigned int offset = swiotlb_align_offset(dev, orig_addr); > > unsigned long flags; > > > > BUG_ON(!nslots); > > @@ -497,7 +498,7 @@ static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr, > > for (i = index; i < index + nslots; i++) { > > mem->slots[i].list = 0; > > mem->slots[i].alloc_size = > > - alloc_size - ((i - index) << IO_TLB_SHIFT); > > + alloc_size - (offset + ((i - index) << IO_TLB_SHIFT)); > > } > > for (i = index - 1; > > io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 && > >
On Thu, Jun 24, 2021 at 11:56 PM Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > > On Thu, Jun 24, 2021 at 10:10:51AM -0400, Qian Cai wrote: > > > > > > On 6/24/2021 7:48 AM, Will Deacon wrote: > > > Ok, diff below which attempts to tackle the offset issue I mentioned as > > > well. Qian Cai -- please can you try with these changes? > > > > This works fine. > > Cool. Let me squash this patch in #6 and rebase the rest of them. > > Claire, could you check the devel/for-linus-5.14 say by end of today to > double check that I didn't mess anything up please? I just submitted v15 here (https://lore.kernel.org/patchwork/cover/1451322/) in case it's helpful. I'll double check of course. Thanks for the efforts! > > Will, > > Thank you for generating the fix! I am going to run it on x86 and Xen > to make sure all is good (granted last time I ran devel/for-linus-5.14 > on that setup I didn't see any errors so I need to double check > I didn't do something silly like run a wrong kernel). > > > > > > > > > > Will > > > > > > --->8 > > > > > > diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h > > > index 175b6c113ed8..39284ff2a6cd 100644 > > > --- a/include/linux/swiotlb.h > > > +++ b/include/linux/swiotlb.h > > > @@ -116,7 +116,9 @@ static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr) > > > > > > static inline bool is_swiotlb_force_bounce(struct device *dev) > > > { > > > - return dev->dma_io_tlb_mem->force_bounce; > > > + struct io_tlb_mem *mem = dev->dma_io_tlb_mem; > > > + > > > + return mem && mem->force_bounce; > > > } > > > > > > void __init swiotlb_exit(void); > > > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c > > > index 44be8258e27b..0ffbaae9fba2 100644 > > > --- a/kernel/dma/swiotlb.c > > > +++ b/kernel/dma/swiotlb.c > > > @@ -449,6 +449,7 @@ static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr, > > > dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1); > > > unsigned int nslots = nr_slots(alloc_size), stride; > > > unsigned int index, wrap, count = 0, i; > > > + unsigned int offset = swiotlb_align_offset(dev, orig_addr); > > > unsigned long flags; > > > > > > BUG_ON(!nslots); > > > @@ -497,7 +498,7 @@ static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr, > > > for (i = index; i < index + nslots; i++) { > > > mem->slots[i].list = 0; > > > mem->slots[i].alloc_size = > > > - alloc_size - ((i - index) << IO_TLB_SHIFT); > > > + alloc_size - (offset + ((i - index) << IO_TLB_SHIFT)); > > > } > > > for (i = index - 1; > > > io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 && > > >
On Thu, Jun 24, 2021 at 11:58:57PM +0800, Claire Chang wrote: > On Thu, Jun 24, 2021 at 11:56 PM Konrad Rzeszutek Wilk > <konrad.wilk@oracle.com> wrote: > > > > On Thu, Jun 24, 2021 at 10:10:51AM -0400, Qian Cai wrote: > > > > > > > > > On 6/24/2021 7:48 AM, Will Deacon wrote: > > > > Ok, diff below which attempts to tackle the offset issue I mentioned as > > > > well. Qian Cai -- please can you try with these changes? > > > > > > This works fine. > > > > Cool. Let me squash this patch in #6 and rebase the rest of them. > > > > Claire, could you check the devel/for-linus-5.14 say by end of today to > > double check that I didn't mess anything up please? > > I just submitted v15 here > (https://lore.kernel.org/patchwork/cover/1451322/) in case it's > helpful. Oh! Nice! > I'll double check of course. Thanks for the efforts! I ended up using your patch #6 and #7. Please double-check.