Patchwork [3/3] ide: move dcache flushing to generic ide code

login
register
mail settings
Submitter Sebastian Siewior
Date Feb. 28, 2010, 3:35 p.m.
Message ID <1267371341-16684-4-git-send-email-sebastian@breakpoint.cc>
Download mbox | patch
Permalink /patch/46505/
State Rejected
Delegated to: David Miller
Headers show

Comments

Sebastian Siewior - Feb. 28, 2010, 3:35 p.m.
the pio callbacks are called with different kind of buffers. It could be
a straight kernel addr, kernel stack or a kmaped highmem page.
Some of this break the virt_to_page() assumptions.
This patch moves the dcache flush from architecture code into generic
ide code. ide_pio_bytes() is the only place where user pages might be
written as far as I can see.
The dcache flush is avoided in two cases:
- data which is written to the device (i.e. they are comming from the
  userland)
- pages without a mapping. Those requests should be issued by
  vfs and not go to the userland.

Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
---
 arch/mips/include/asm/mach-generic/ide.h |   21 ---------------------
 arch/sparc/include/asm/ide.h             |   14 --------------
 drivers/ide/ide-taskfile.c               |    7 +++++++
 3 files changed, 7 insertions(+), 35 deletions(-)
David Miller - March 1, 2010, 2:34 a.m.
From: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Date: Sun, 28 Feb 2010 16:35:41 +0100

> the pio callbacks are called with different kind of buffers. It could be
> a straight kernel addr, kernel stack or a kmaped highmem page.
> Some of this break the virt_to_page() assumptions.
> This patch moves the dcache flush from architecture code into generic
> ide code. ide_pio_bytes() is the only place where user pages might be
> written as far as I can see.
> The dcache flush is avoided in two cases:
> - data which is written to the device (i.e. they are comming from the
>   userland)

This needs a flush on sparc, otherwise an alias now exists in the
kernel side copy of the buffer.  The D-cache flush is intentionally
unconditional for PIO mode.  I definitely don't want to take the same
risks you guys seem to be willing to take for this optimization which
is of questionable value.

I also, intrinsically, really don't like these changes.

For one thing, you're optimizing PIO mode.

Secondly, IDE is in deep maintainence mode, if you want to optimize
cache flushing do it in the ATA layer.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sebastian Siewior - March 1, 2010, 7:58 p.m.
* David Miller | 2010-02-28 18:34:17 [-0800]:

>From: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
>Date: Sun, 28 Feb 2010 16:35:41 +0100
>
>> the pio callbacks are called with different kind of buffers. It could be
>> a straight kernel addr, kernel stack or a kmaped highmem page.
>> Some of this break the virt_to_page() assumptions.
>> This patch moves the dcache flush from architecture code into generic
>> ide code. ide_pio_bytes() is the only place where user pages might be
>> written as far as I can see.
>> The dcache flush is avoided in two cases:
>> - data which is written to the device (i.e. they are comming from the
>>   userland)
>
>This needs a flush on sparc, otherwise an alias now exists in the
>kernel side copy of the buffer.  The D-cache flush is intentionally
>unconditional for PIO mode.  I definitely don't want to take the same
>risks you guys seem to be willing to take for this optimization which
>is of questionable value.
It is not us guys it is just me. And if it is a stupid thing to do then
I get probably punched by Ralf as well once he gets back.
The part I don't get is why you have to flush dcache after the copy
process. I would understand a flush before reading. The dcache alias in
kernel shouldn't hurt since it is not used anymore. Unless we read twice
from the same page. Is this the trick?

>I also, intrinsically, really don't like these changes.
>
>For one thing, you're optimizing PIO mode.
>
>Secondly, IDE is in deep maintainence mode, if you want to optimize
>cache flushing do it in the ATA layer.
This patch patch was mostly driven by the fact that the buffer can be a
normal kernel mapping or a virtual address. The latter doesn't work with
virt_to_page().
Anyway I should probably spent more time getting ATA layer wokring than
on the IDE layer since it is somehow working since patch#1.

>
>Thanks.

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - March 2, 2010, 12:39 a.m.
From: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Date: Mon, 1 Mar 2010 20:58:58 +0100

> The part I don't get is why you have to flush dcache after the copy
> process. I would understand a flush before reading. The dcache alias in
> kernel shouldn't hurt since it is not used anymore. Unless we read twice
> from the same page. Is this the trick?

Anything that puts the data into the cache on the kernel
side is a problem.  The page is still potentially in user
space, as a result there will be thus two active mappings
in the cache, one in the kernel side and one in the user
side.  The user can then do various operations which can
access either mapping.

Writing to it via write() system call, writing to it via
mmap(), making the kernel write to it by doing a read()
with the buffer pointing into the mmap() area.

All we need is a modification on either side for the other
one to potentially become stale.

>>Secondly, IDE is in deep maintainence mode, if you want to optimize
>>cache flushing do it in the ATA layer.
> This patch patch was mostly driven by the fact that the buffer can be a
> normal kernel mapping or a virtual address. The latter doesn't work with
> virt_to_page().
> Anyway I should probably spent more time getting ATA layer wokring than
> on the IDE layer since it is somehow working since patch#1.

All buffers passed to the architecture IDE ops should be PAGE_OFFSET
relative kernel virtual addresses for which __pa() works.

Are kmap()'d things ending up here?

It all works out on sparc64 because we don't have highmem and
kernel stacks are just in normal kernel pages.
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sebastian Siewior - March 2, 2010, 9:14 p.m.
* David Miller | 2010-03-01 16:39:59 [-0800]:

>All buffers passed to the architecture IDE ops should be PAGE_OFFSET
>relative kernel virtual addresses for which __pa() works.
>
>Are kmap()'d things ending up here?
Ah, bounce buffers are used. So no, there no highmem which get kmap()ed.

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ralf Baechle - March 25, 2010, 5:17 p.m.
On Mon, Mar 01, 2010 at 04:39:59PM -0800, David Miller wrote:

> From: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
> Date: Mon, 1 Mar 2010 20:58:58 +0100
> 
> > The part I don't get is why you have to flush dcache after the copy
> > process. I would understand a flush before reading. The dcache alias in
> > kernel shouldn't hurt since it is not used anymore. Unless we read twice
> > from the same page. Is this the trick?
> 
> Anything that puts the data into the cache on the kernel
> side is a problem.  The page is still potentially in user
> space, as a result there will be thus two active mappings
> in the cache, one in the kernel side and one in the user
> side.  The user can then do various operations which can
> access either mapping.
> 
> Writing to it via write() system call, writing to it via
> mmap(), making the kernel write to it by doing a read()
> with the buffer pointing into the mmap() area.
> 
> All we need is a modification on either side for the other
> one to potentially become stale.
> 
> >>Secondly, IDE is in deep maintainence mode, if you want to optimize
> >>cache flushing do it in the ATA layer.
> > This patch patch was mostly driven by the fact that the buffer can be a
> > normal kernel mapping or a virtual address. The latter doesn't work with
> > virt_to_page().
> > Anyway I should probably spent more time getting ATA layer wokring than
> > on the IDE layer since it is somehow working since patch#1.
> 
> All buffers passed to the architecture IDE ops should be PAGE_OFFSET
> relative kernel virtual addresses for which __pa() works.
> 
> Are kmap()'d things ending up here?
> 
> It all works out on sparc64 because we don't have highmem and
> kernel stacks are just in normal kernel pages.

Highmem on MIPS has always been a bastard child though that will probably
change now that 32-bit embedded systems are coming with more memory than
can be mapped as lowmem.

The system in question that Sebastian is talking about has the IDE cache
problems because it has no aliases.  Neither the IDE code nor the cache
flush layer does any cacheflushes because there are no cache aliases [1] to
deal with.

So eventually code from a page loaded from an IDE disk gets executed and
if the CPU's I-cache refilled from the L2 or memory, not from the D-cache
stale data may get executed.

In an earlier mail in this thread you called the MIPS ins/outs risky.
Maybe - but we have to.  At least at the time when this code was written
the IDE ins/outs variants was possibly being called with interrupts
disabled.  The normal MIPS cache flushing functions are based on
smp_call_function() on SMP systems [2] so can't be used to do cache
maintenance hence we have to avoid them.

  Ralf

[1] Configuring page size to 16k or 64k would get rid of aliases on all
    MIPS processors with VIPT D-caches, so introduce the same situation.
    Such configurations are getting common now.
[2] Because the CACHE instruction only affects the local CPU cache.
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/arch/mips/include/asm/mach-generic/ide.h b/arch/mips/include/asm/mach-generic/ide.h
index 9360586..7370845 100644
--- a/arch/mips/include/asm/mach-generic/ide.h
+++ b/arch/mips/include/asm/mach-generic/ide.h
@@ -20,35 +20,16 @@ 
 #include <asm/processor.h>
 
 /* MIPS port and memory-mapped I/O string operations.  */
-static inline void __ide_set_pages_dirty(const void *addr, unsigned long size)
-{
-	unsigned long end = (unsigned long)addr + size;
-
-	if (!(cpu_has_dc_aliases || !cpu_has_ic_fills_f_dc))
-		return;
-
-	while ((unsigned long)addr < end) {
-		struct page *p = virt_to_page(addr);
-		struct address_space *mapping = page_mapping(p);
-
-		if (mapping && mapping_mapped(mapping))
-			SetPageDcacheDirty(p);
-		addr += PAGE_SIZE;
-	}
-}
-
 static inline void __ide_insw(unsigned long port, void *addr,
 	unsigned int count)
 {
 	insw(port, addr, count);
-	__ide_set_pages_dirty(addr, count * 2);
 }
 
 static inline void __ide_insl(unsigned long port, void *addr,
 	unsigned int count)
 {
 	insl(port, addr, count);
-	__ide_set_pages_dirty(addr, count * 4);
 }
 
 static inline void __ide_outsw(unsigned long port, const void *addr,
@@ -66,13 +47,11 @@  static inline void __ide_outsl(unsigned long port, const void *addr,
 static inline void __ide_mm_insw(void __iomem *port, void *addr, u32 count)
 {
 	readsw(port, addr, count);
-	__ide_set_pages_dirty(addr, count * 2);
 }
 
 static inline void __ide_mm_insl(void __iomem *port, void *addr, u32 count)
 {
 	readsl(port, addr, count);
-	__ide_set_pages_dirty(addr, count * 4);
 }
 
 static inline void __ide_mm_outsw(void __iomem *port, void *addr, u32 count)
diff --git a/arch/sparc/include/asm/ide.h b/arch/sparc/include/asm/ide.h
index b7af3d6..c1037ca 100644
--- a/arch/sparc/include/asm/ide.h
+++ b/arch/sparc/include/asm/ide.h
@@ -34,9 +34,6 @@ 
 
 static inline void __ide_insw(void __iomem *port, void *dst, u32 count)
 {
-#if defined(CONFIG_SPARC64) && defined(DCACHE_ALIASING_POSSIBLE)
-	unsigned long end = (unsigned long)dst + (count << 1);
-#endif
 	u16 *ps = dst;
 	u32 *pi;
 
@@ -56,17 +53,10 @@  static inline void __ide_insw(void __iomem *port, void *dst, u32 count)
 	ps = (u16 *)pi;
 	if(count)
 		*ps++ = __raw_readw(port);
-
-#if defined(CONFIG_SPARC64) && defined(DCACHE_ALIASING_POSSIBLE)
-	__flush_dcache_range((unsigned long)dst, end);
-#endif
 }
 
 static inline void __ide_outsw(void __iomem *port, const void *src, u32 count)
 {
-#if defined(CONFIG_SPARC64) && defined(DCACHE_ALIASING_POSSIBLE)
-	unsigned long end = (unsigned long)src + (count << 1);
-#endif
 	const u16 *ps = src;
 	const u32 *pi;
 
@@ -86,10 +76,6 @@  static inline void __ide_outsw(void __iomem *port, const void *src, u32 count)
 	ps = (const u16 *)pi;
 	if(count)
 		__raw_writew(*ps, port);
-
-#if defined(CONFIG_SPARC64) && defined(DCACHE_ALIASING_POSSIBLE)
-	__flush_dcache_range((unsigned long)src, end);
-#endif
 }
 
 #endif /* __KERNEL__ */
diff --git a/drivers/ide/ide-taskfile.c b/drivers/ide/ide-taskfile.c
index cc8633c..95a9922 100644
--- a/drivers/ide/ide-taskfile.c
+++ b/drivers/ide/ide-taskfile.c
@@ -273,6 +273,13 @@  void ide_pio_bytes(ide_drive_t *drive, struct ide_cmd *cmd,
 		if (page_is_high)
 			local_irq_restore(flags);
 
+		if (!write) {
+			struct address_space *mapping = page_mapping(page);
+
+			if (mapping && mapping_mapped(mapping))
+				flush_dcache_page(page);
+		}
+
 		len -= nr_bytes;
 	}
 }