diff mbox

[for,2.6.33?,1/1] ata: call flush_dcache_page() around PIO data transfers in libata-aff.c

Message ID 201002022211.o12MB8cJ017441@imap1.linux-foundation.org
State Not Applicable
Delegated to: David Miller
Headers show

Commit Message

Andrew Morton Feb. 2, 2010, 10:11 p.m. UTC
From: Catalin Marinas <catalin.marinas@arm.com>

Depending on the direction of the transfer, flush_dcache_page() must be
called either before (ATA_TFLAG_WRITE) or after (!ATA_TFLAG_WRITE) the
data copying to avoid D-cache aliasing with user space or I-D cache
coherency issues (when reading data from an ATA device using PIO, the
kernel dirties the D-cache but there is no flush_dcache_page() required on
Harvard architectures).

This patch allows the ARM boards to use a rootfs on CompactFlash with the
PATA platform driver.

As Anfei Zhou mentioned in a recent patch ("flush dcache before writing
into page to avoid alias"), on some architectures there may be a
performance benefit in differentiating the flush_dcache_page() calls based
on whether the kernel or the user page needs flushing.

IMHO, we should differentiate based on the direction (kernel reading or
writing from/to such page).  In the ARM case with PIPT Harvard caches
(newer processors), the kernel reading from a page that may be mapped in
user space shouldn't need cache flushing.  The kernel writing to such page
would require D-cache flushing because of coherency with the I-cache. 
Currently on ARM, the latter happens in both cases.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/ata/libata-sff.c |    6 ++++++
 1 file changed, 6 insertions(+)

Comments

James Bottomley Feb. 2, 2010, 10:58 p.m. UTC | #1
On Tue, 2010-02-02 at 14:11 -0800, akpm@linux-foundation.org wrote:
> From: Catalin Marinas <catalin.marinas@arm.com>
> 
> Depending on the direction of the transfer, flush_dcache_page() must be
> called either before (ATA_TFLAG_WRITE) or after (!ATA_TFLAG_WRITE) the
> data copying to avoid D-cache aliasing with user space or I-D cache
> coherency issues (when reading data from an ATA device using PIO, the
> kernel dirties the D-cache but there is no flush_dcache_page() required on
> Harvard architectures).
> 
> This patch allows the ARM boards to use a rootfs on CompactFlash with the
> PATA platform driver.
> 
> As Anfei Zhou mentioned in a recent patch ("flush dcache before writing
> into page to avoid alias"), on some architectures there may be a
> performance benefit in differentiating the flush_dcache_page() calls based
> on whether the kernel or the user page needs flushing.
> 
> IMHO, we should differentiate based on the direction (kernel reading or
> writing from/to such page).  In the ARM case with PIPT Harvard caches
> (newer processors), the kernel reading from a page that may be mapped in
> user space shouldn't need cache flushing.  The kernel writing to such page
> would require D-cache flushing because of coherency with the I-cache. 
> Currently on ARM, the latter happens in both cases.
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Jeff Garzik <jgarzik@pobox.com>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: <stable@kernel.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
>  drivers/ata/libata-sff.c |    6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff -puN drivers/ata/libata-sff.c~ata-call-flush_dcache_page-around-pio-data-transfers-in-libata-affc drivers/ata/libata-sff.c
> --- a/drivers/ata/libata-sff.c~ata-call-flush_dcache_page-around-pio-data-transfers-in-libata-affc
> +++ a/drivers/ata/libata-sff.c
> @@ -874,6 +874,9 @@ static void ata_pio_sector(struct ata_qu
>  
>  	DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read");
>  
> +	if (do_write)
> +		flush_dcache_page(page);
> +

This looks wrong; the upper layers should already have made the page
aliases coherent from user to kernel by calling flush_dcache_page (in
__get_user_pages()), so the aliases should already be up to date and
this flush is spurious.

>  	if (PageHighMem(page)) {
>  		unsigned long flags;
>  
> @@ -893,6 +896,9 @@ static void ata_pio_sector(struct ata_qu
>  				       do_write);
>  	}
>  
> +	if (!do_write)
> +		flush_dcache_page(page);
> +

OK, so this too looks wrong for two reasons

     1. it's over flushing.  Even after the write to the page by the
        kernel PIO, the only alias that is dirty should be the kernel,
        so this needs a flush_kernel_dcache_page() to empty the kernel
        alias.  It is possible user space will have speculated over the
        user aliases, but there's stuff further up the block stack to
        bring this back into coherence.
     2. If the page really is in highmem, the flush has to happen along
        the kernel alias, which you just lost because this flush is
        happening after the kunmap_atomic(), so it has to occur
        somewhere between the PIO operation and the kunmap.

>  	qc->curbytes += qc->sect_size;
>  	qc->cursg_ofs += qc->sect_size;
>  

James


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Morton Feb. 2, 2010, 11:05 p.m. UTC | #2
On Tue, 02 Feb 2010 16:58:38 -0600
James Bottomley <James.Bottomley@suse.de> wrote:

> On Tue, 2010-02-02 at 14:11 -0800, akpm@linux-foundation.org wrote:
> > From: Catalin Marinas <catalin.marinas@arm.com>
> > 
> > Depending on the direction of the transfer, flush_dcache_page() must be
> > called either before (ATA_TFLAG_WRITE) or after (!ATA_TFLAG_WRITE) the
> > data copying to avoid D-cache aliasing with user space or I-D cache
> > coherency issues (when reading data from an ATA device using PIO, the
> > kernel dirties the D-cache but there is no flush_dcache_page() required on
> > Harvard architectures).
> > 
> > This patch allows the ARM boards to use a rootfs on CompactFlash with the
> > PATA platform driver.
> > 
> > As Anfei Zhou mentioned in a recent patch ("flush dcache before writing
> > into page to avoid alias"), on some architectures there may be a
> > performance benefit in differentiating the flush_dcache_page() calls based
> > on whether the kernel or the user page needs flushing.
> > 
> > IMHO, we should differentiate based on the direction (kernel reading or
> > writing from/to such page).  In the ARM case with PIPT Harvard caches
> > (newer processors), the kernel reading from a page that may be mapped in
> > user space shouldn't need cache flushing.  The kernel writing to such page
> > would require D-cache flushing because of coherency with the I-cache. 
> > Currently on ARM, the latter happens in both cases.
> > 
> > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Jeff Garzik <jgarzik@pobox.com>
> > Cc: Tejun Heo <tj@kernel.org>
> > Cc: <stable@kernel.org>
> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> > ---
> > 
> >  drivers/ata/libata-sff.c |    6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff -puN drivers/ata/libata-sff.c~ata-call-flush_dcache_page-around-pio-data-transfers-in-libata-affc drivers/ata/libata-sff.c
> > --- a/drivers/ata/libata-sff.c~ata-call-flush_dcache_page-around-pio-data-transfers-in-libata-affc
> > +++ a/drivers/ata/libata-sff.c
> > @@ -874,6 +874,9 @@ static void ata_pio_sector(struct ata_qu
> >  
> >  	DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read");
> >  
> > +	if (do_write)
> > +		flush_dcache_page(page);
> > +
> 
> This looks wrong; the upper layers should already have made the page
> aliases coherent from user to kernel by calling flush_dcache_page (in
> __get_user_pages()), so the aliases should already be up to date and
> this flush is spurious.

The upper layers don't know that the CPU touched the data!  If the
driver did a DMA transfer then such a flush is unneeded, so we don't do
it.

> >  	if (PageHighMem(page)) {
> >  		unsigned long flags;
> >  
> > @@ -893,6 +896,9 @@ static void ata_pio_sector(struct ata_qu
> >  				       do_write);
> >  	}
> >  
> > +	if (!do_write)
> > +		flush_dcache_page(page);
> > +
> 
> OK, so this too looks wrong for two reasons
> 
>      1. it's over flushing.  Even after the write to the page by the
>         kernel PIO, the only alias that is dirty should be the kernel,
>         so this needs a flush_kernel_dcache_page() to empty the kernel
>         alias.  It is possible user space will have speculated over the
>         user aliases, but there's stuff further up the block stack to
>         bring this back into coherence.
>      2. If the page really is in highmem, the flush has to happen along
>         the kernel alias, which you just lost because this flush is
>         happening after the kunmap_atomic(), so it has to occur
>         somewhere between the PIO operation and the kunmap.
> 
> >  	qc->curbytes += qc->sect_size;
> >  	qc->cursg_ofs += qc->sect_size;
> >  
> 
> James
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Garzik Feb. 2, 2010, 11:14 p.m. UTC | #3
On 02/02/2010 06:05 PM, Andrew Morton wrote:
> On Tue, 02 Feb 2010 16:58:38 -0600
> James Bottomley<James.Bottomley@suse.de>  wrote:
>
>> On Tue, 2010-02-02 at 14:11 -0800, akpm@linux-foundation.org wrote:
>>> From: Catalin Marinas<catalin.marinas@arm.com>
>>>
>>> Depending on the direction of the transfer, flush_dcache_page() must be
>>> called either before (ATA_TFLAG_WRITE) or after (!ATA_TFLAG_WRITE) the
>>> data copying to avoid D-cache aliasing with user space or I-D cache
>>> coherency issues (when reading data from an ATA device using PIO, the
>>> kernel dirties the D-cache but there is no flush_dcache_page() required on
>>> Harvard architectures).
>>>
>>> This patch allows the ARM boards to use a rootfs on CompactFlash with the
>>> PATA platform driver.
>>>
>>> As Anfei Zhou mentioned in a recent patch ("flush dcache before writing
>>> into page to avoid alias"), on some architectures there may be a
>>> performance benefit in differentiating the flush_dcache_page() calls based
>>> on whether the kernel or the user page needs flushing.
>>>
>>> IMHO, we should differentiate based on the direction (kernel reading or
>>> writing from/to such page).  In the ARM case with PIPT Harvard caches
>>> (newer processors), the kernel reading from a page that may be mapped in
>>> user space shouldn't need cache flushing.  The kernel writing to such page
>>> would require D-cache flushing because of coherency with the I-cache.
>>> Currently on ARM, the latter happens in both cases.
>>>
>>> Signed-off-by: Catalin Marinas<catalin.marinas@arm.com>
>>> Cc: Jeff Garzik<jgarzik@pobox.com>
>>> Cc: Tejun Heo<tj@kernel.org>
>>> Cc:<stable@kernel.org>
>>> Signed-off-by: Andrew Morton<akpm@linux-foundation.org>
>>> ---
>>>
>>>   drivers/ata/libata-sff.c |    6 ++++++
>>>   1 file changed, 6 insertions(+)
>>>
>>> diff -puN drivers/ata/libata-sff.c~ata-call-flush_dcache_page-around-pio-data-transfers-in-libata-affc drivers/ata/libata-sff.c
>>> --- a/drivers/ata/libata-sff.c~ata-call-flush_dcache_page-around-pio-data-transfers-in-libata-affc
>>> +++ a/drivers/ata/libata-sff.c
>>> @@ -874,6 +874,9 @@ static void ata_pio_sector(struct ata_qu
>>>
>>>   	DPRINTK("data %s\n", qc->tf.flags&  ATA_TFLAG_WRITE ? "write" : "read");
>>>
>>> +	if (do_write)
>>> +		flush_dcache_page(page);
>>> +
>>
>> This looks wrong; the upper layers should already have made the page
>> aliases coherent from user to kernel by calling flush_dcache_page (in
>> __get_user_pages()), so the aliases should already be up to date and
>> this flush is spurious.
>
> The upper layers don't know that the CPU touched the data!  If the
> driver did a DMA transfer then such a flush is unneeded, so we don't do
> it.

The patch in question only affects PIO transfers, not DMA.  Data is 
transferred from a kernel buffer to hardware via out[bwl] via

	page data -> CPU register -> out[bwl]

or, data is transferred from hardware to a kernel buffer via

	in[bwl] -> CPU register -> page data

So what are the flushing rules given those conditions?

	Jeff



--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley Feb. 2, 2010, 11:14 p.m. UTC | #4
On Tue, 2010-02-02 at 15:05 -0800, Andrew Morton wrote:
> On Tue, 02 Feb 2010 16:58:38 -0600
> James Bottomley <James.Bottomley@suse.de> wrote:
> 
> > On Tue, 2010-02-02 at 14:11 -0800, akpm@linux-foundation.org wrote:
> > > From: Catalin Marinas <catalin.marinas@arm.com>
> > > 
> > > Depending on the direction of the transfer, flush_dcache_page() must be
> > > called either before (ATA_TFLAG_WRITE) or after (!ATA_TFLAG_WRITE) the
> > > data copying to avoid D-cache aliasing with user space or I-D cache
> > > coherency issues (when reading data from an ATA device using PIO, the
> > > kernel dirties the D-cache but there is no flush_dcache_page() required on
> > > Harvard architectures).
> > > 
> > > This patch allows the ARM boards to use a rootfs on CompactFlash with the
> > > PATA platform driver.
> > > 
> > > As Anfei Zhou mentioned in a recent patch ("flush dcache before writing
> > > into page to avoid alias"), on some architectures there may be a
> > > performance benefit in differentiating the flush_dcache_page() calls based
> > > on whether the kernel or the user page needs flushing.
> > > 
> > > IMHO, we should differentiate based on the direction (kernel reading or
> > > writing from/to such page).  In the ARM case with PIPT Harvard caches
> > > (newer processors), the kernel reading from a page that may be mapped in
> > > user space shouldn't need cache flushing.  The kernel writing to such page
> > > would require D-cache flushing because of coherency with the I-cache. 
> > > Currently on ARM, the latter happens in both cases.
> > > 
> > > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> > > Cc: Jeff Garzik <jgarzik@pobox.com>
> > > Cc: Tejun Heo <tj@kernel.org>
> > > Cc: <stable@kernel.org>
> > > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> > > ---
> > > 
> > >  drivers/ata/libata-sff.c |    6 ++++++
> > >  1 file changed, 6 insertions(+)
> > > 
> > > diff -puN drivers/ata/libata-sff.c~ata-call-flush_dcache_page-around-pio-data-transfers-in-libata-affc drivers/ata/libata-sff.c
> > > --- a/drivers/ata/libata-sff.c~ata-call-flush_dcache_page-around-pio-data-transfers-in-libata-affc
> > > +++ a/drivers/ata/libata-sff.c
> > > @@ -874,6 +874,9 @@ static void ata_pio_sector(struct ata_qu
> > >  
> > >  	DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read");
> > >  
> > > +	if (do_write)
> > > +		flush_dcache_page(page);
> > > +
> > 
> > This looks wrong; the upper layers should already have made the page
> > aliases coherent from user to kernel by calling flush_dcache_page (in
> > __get_user_pages()), so the aliases should already be up to date and
> > this flush is spurious.
> 
> The upper layers don't know that the CPU touched the data!

The upper layers know to flush the user aliases.

>   If the
> driver did a DMA transfer then such a flush is unneeded, so we don't do
> it.

The point is that this is a write case (transfer from kernel to device)
so the underlying data should remain stable.  Once __get_user_pages()
has cohered the aliases, nothing should decohere them because nothing
should be altering the data in the page.

The danger case for VI caches is the read case because there you are
altering data via the kernel aliases.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley Feb. 2, 2010, 11:21 p.m. UTC | #5
On Tue, 2010-02-02 at 18:14 -0500, Jeff Garzik wrote:
> On 02/02/2010 06:05 PM, Andrew Morton wrote:
> > On Tue, 02 Feb 2010 16:58:38 -0600
> > James Bottomley<James.Bottomley@suse.de>  wrote:
> >
> >> On Tue, 2010-02-02 at 14:11 -0800, akpm@linux-foundation.org wrote:
> >>> From: Catalin Marinas<catalin.marinas@arm.com>
> >>>
> >>> Depending on the direction of the transfer, flush_dcache_page() must be
> >>> called either before (ATA_TFLAG_WRITE) or after (!ATA_TFLAG_WRITE) the
> >>> data copying to avoid D-cache aliasing with user space or I-D cache
> >>> coherency issues (when reading data from an ATA device using PIO, the
> >>> kernel dirties the D-cache but there is no flush_dcache_page() required on
> >>> Harvard architectures).
> >>>
> >>> This patch allows the ARM boards to use a rootfs on CompactFlash with the
> >>> PATA platform driver.
> >>>
> >>> As Anfei Zhou mentioned in a recent patch ("flush dcache before writing
> >>> into page to avoid alias"), on some architectures there may be a
> >>> performance benefit in differentiating the flush_dcache_page() calls based
> >>> on whether the kernel or the user page needs flushing.
> >>>
> >>> IMHO, we should differentiate based on the direction (kernel reading or
> >>> writing from/to such page).  In the ARM case with PIPT Harvard caches
> >>> (newer processors), the kernel reading from a page that may be mapped in
> >>> user space shouldn't need cache flushing.  The kernel writing to such page
> >>> would require D-cache flushing because of coherency with the I-cache.
> >>> Currently on ARM, the latter happens in both cases.
> >>>
> >>> Signed-off-by: Catalin Marinas<catalin.marinas@arm.com>
> >>> Cc: Jeff Garzik<jgarzik@pobox.com>
> >>> Cc: Tejun Heo<tj@kernel.org>
> >>> Cc:<stable@kernel.org>
> >>> Signed-off-by: Andrew Morton<akpm@linux-foundation.org>
> >>> ---
> >>>
> >>>   drivers/ata/libata-sff.c |    6 ++++++
> >>>   1 file changed, 6 insertions(+)
> >>>
> >>> diff -puN drivers/ata/libata-sff.c~ata-call-flush_dcache_page-around-pio-data-transfers-in-libata-affc drivers/ata/libata-sff.c
> >>> --- a/drivers/ata/libata-sff.c~ata-call-flush_dcache_page-around-pio-data-transfers-in-libata-affc
> >>> +++ a/drivers/ata/libata-sff.c
> >>> @@ -874,6 +874,9 @@ static void ata_pio_sector(struct ata_qu
> >>>
> >>>   	DPRINTK("data %s\n", qc->tf.flags&  ATA_TFLAG_WRITE ? "write" : "read");
> >>>
> >>> +	if (do_write)
> >>> +		flush_dcache_page(page);
> >>> +
> >>
> >> This looks wrong; the upper layers should already have made the page
> >> aliases coherent from user to kernel by calling flush_dcache_page (in
> >> __get_user_pages()), so the aliases should already be up to date and
> >> this flush is spurious.
> >
> > The upper layers don't know that the CPU touched the data!  If the
> > driver did a DMA transfer then such a flush is unneeded, so we don't do
> > it.
> 
> The patch in question only affects PIO transfers, not DMA.  Data is 
> transferred from a kernel buffer to hardware via out[bwl] via
> 
> 	page data -> CPU register -> out[bwl]

No flush required here ... all the aliases should be up to date after
__get_user_pages()

> or, data is transferred from hardware to a kernel buffer via
> 
> 	in[bwl] -> CPU register -> page data

This writes to the page via the kernel alias, so the kernel alias
becomes dirty.  In order to make the underlying page up to date, the
kernel alias must be flushed  this flush *must* occur while the page is
mapped along the kernel alias.  In order that the user aliases see the
correct data, any speculation they may have done before the kernel alias
was flushed must be invalidated (depending on architectures ... some
architectures won't speculate reads in this case).  This invalidation
*should* be occurring further up the stack .. but if the architecture
has implemented this invalidation in the dma ops, then
flush_dcache_page() may be needed.

> So what are the flushing rules given those conditions?

See above.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Feb. 2, 2010, 11:21 p.m. UTC | #6
From: Jeff Garzik <jgarzik@pobox.com>
Date: Tue, 02 Feb 2010 18:14:40 -0500

> The patch in question only affects PIO transfers, not DMA.  Data is
> transferred from a kernel buffer to hardware via out[bwl] via
> 
> 	page data -> CPU register -> out[bwl]
> 
> or, data is transferred from hardware to a kernel buffer via
> 
> 	in[bwl] -> CPU register -> page data
> 
> So what are the flushing rules given those conditions?

Any time you touch a page cache page with the cpu, you have
to get it purged from aliasing caches before it gets mapped
into userspace or similar.

That's the problem on these machines.

Actually, sparc64 is probably susceptible to the same problem.

And it's not an issue in the IDE layer, know why? :-)

In the IDE layer we have arch specific ide_*() interfaces which is
where all of this PIO dma flushing stuff deserves to live.

ATA should do something similar instead of randomly scattering
flush_dcache_page() calls all over the place.
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Cox Feb. 2, 2010, 11:30 p.m. UTC | #7
> ATA should do something similar instead of randomly scattering
> flush_dcache_page() calls all over the place.

ATA has an sff_data_xfer callback for standard style interfaces for
exactly the same reason that IDE did. That should be the only place that
needs touching for most controllers (either the callback point or in the
callback).

The non-standard ones that do bits of PIO will need fixing in place
because they are sufficiently weird they don't fit any standard format.
Fortunately they are almost entirely platform specific and/or DMA.

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley Feb. 2, 2010, 11:32 p.m. UTC | #8
On Tue, 2010-02-02 at 15:21 -0800, David Miller wrote:
> From: Jeff Garzik <jgarzik@pobox.com>
> Date: Tue, 02 Feb 2010 18:14:40 -0500
> 
> > The patch in question only affects PIO transfers, not DMA.  Data is
> > transferred from a kernel buffer to hardware via out[bwl] via
> > 
> > 	page data -> CPU register -> out[bwl]
> > 
> > or, data is transferred from hardware to a kernel buffer via
> > 
> > 	in[bwl] -> CPU register -> page data
> > 
> > So what are the flushing rules given those conditions?
> 
> Any time you touch a page cache page with the cpu, you have
> to get it purged from aliasing caches before it gets mapped
> into userspace or similar.
> 
> That's the problem on these machines.
> 
> Actually, sparc64 is probably susceptible to the same problem.
> 
> And it's not an issue in the IDE layer, know why? :-)
> 
> In the IDE layer we have arch specific ide_*() interfaces which is
> where all of this PIO dma flushing stuff deserves to live.
> 
> ATA should do something similar instead of randomly scattering
> flush_dcache_page() calls all over the place.

Agree with this, except it shouldn't be ide specific ... any driver that
does PIO (we have some examples in SCSI) is vulnerable to this issue and
should have the same fix.

The ide interfaces are very low level .. they're the ide_in/out[sw] APIs
which aren't necessarily the most useful (some of the SCSI "PIO" stuff
actually gets fed in via memory transfers to registers rather than port
in/out).  

What it looks like we need is kmap_for_pio() and kunmap_for_pio() which
take flags (or API extensions) indicating whether we're transferring to
or from the device, which return the correct kernel address to the page
(mapping it if it's in highmem) and perform the correct
flushes/invalidates.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Feb. 2, 2010, 11:39 p.m. UTC | #9
From: James Bottomley <James.Bottomley@suse.de>
Date: Tue, 02 Feb 2010 17:32:48 -0600

> What it looks like we need is kmap_for_pio() and kunmap_for_pio() which
> take flags (or API extensions) indicating whether we're transferring to
> or from the device, which return the correct kernel address to the page
> (mapping it if it's in highmem) and perform the correct
> flushes/invalidates.

That sounds like a good idea.
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Catalin Marinas Feb. 3, 2010, 10:07 a.m. UTC | #10
On Tue, 2010-02-02 at 22:58 +0000, James Bottomley wrote:
> On Tue, 2010-02-02 at 14:11 -0800, akpm@linux-foundation.org wrote:
> > From: Catalin Marinas <catalin.marinas@arm.com>
> >
> > Depending on the direction of the transfer, flush_dcache_page() must be
> > called either before (ATA_TFLAG_WRITE) or after (!ATA_TFLAG_WRITE) the
> > data copying to avoid D-cache aliasing with user space or I-D cache
> > coherency issues (when reading data from an ATA device using PIO, the
> > kernel dirties the D-cache but there is no flush_dcache_page() required on
> > Harvard architectures).
> >
> > This patch allows the ARM boards to use a rootfs on CompactFlash with the
> > PATA platform driver.
> >
> > As Anfei Zhou mentioned in a recent patch ("flush dcache before writing
> > into page to avoid alias"), on some architectures there may be a
> > performance benefit in differentiating the flush_dcache_page() calls based
> > on whether the kernel or the user page needs flushing.
> >
> > IMHO, we should differentiate based on the direction (kernel reading or
> > writing from/to such page).  In the ARM case with PIPT Harvard caches
> > (newer processors), the kernel reading from a page that may be mapped in
> > user space shouldn't need cache flushing.  The kernel writing to such page
> > would require D-cache flushing because of coherency with the I-cache.
> > Currently on ARM, the latter happens in both cases.
[...]
> > diff -puN drivers/ata/libata-sff.c~ata-call-flush_dcache_page-around-pio-data-transfers-in-libata-affc drivers/ata/libata-sff.c
> > --- a/drivers/ata/libata-sff.c~ata-call-flush_dcache_page-around-pio-data-transfers-in-libata-affc
> > +++ a/drivers/ata/libata-sff.c
> > @@ -874,6 +874,9 @@ static void ata_pio_sector(struct ata_qu
> > 
> >       DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read");
> > 
> > +     if (do_write)
> > +             flush_dcache_page(page);
> > +
> 
> This looks wrong; the upper layers should already have made the page
> aliases coherent from user to kernel by calling flush_dcache_page (in
> __get_user_pages()), so the aliases should already be up to date and
> this flush is spurious.

In my case it was working fine even without this. I just added this for
completeness following the cachetlb.txt document.

> >       if (PageHighMem(page)) {
> >               unsigned long flags;
> > 
> > @@ -893,6 +896,9 @@ static void ata_pio_sector(struct ata_qu
> >                                      do_write);
> >       }
> > 
> > +     if (!do_write)
> > +             flush_dcache_page(page);
> > +
> 
> OK, so this too looks wrong for two reasons
> 
>      1. it's over flushing.  Even after the write to the page by the
>         kernel PIO, the only alias that is dirty should be the kernel,
>         so this needs a flush_kernel_dcache_page() to empty the kernel
>         alias.  It is possible user space will have speculated over the
>         user aliases, but there's stuff further up the block stack to
>         bring this back into coherence.

In the ARM case, it's not necessarily the user D-cache aliasing (which
on my current hardware does not exist) but the I and D cache coherency.

With non-aliasing D-cache on ARM, the flush_kernel_dcache_page() is a
no-op. If we consider the I-cache as being an alias, this function would
need to be implemented. So far this issue has been handled by
flush_dcache_page() (lazily).

>      2. If the page really is in highmem, the flush has to happen along
>         the kernel alias, which you just lost because this flush is
>         happening after the kunmap_atomic(), so it has to occur
>         somewhere between the PIO operation and the kunmap.

The kunmap_atomic(), at least in the ARM case, seems to do the cache
flushing. My cache coherency issues show up with highmem disabled.
Catalin Marinas Feb. 3, 2010, 10:18 a.m. UTC | #11
On Tue, 2010-02-02 at 23:32 +0000, James Bottomley wrote:
> On Tue, 2010-02-02 at 15:21 -0800, David Miller wrote:
> > From: Jeff Garzik <jgarzik@pobox.com>
> > Date: Tue, 02 Feb 2010 18:14:40 -0500
> >
> > > The patch in question only affects PIO transfers, not DMA.  Data is
> > > transferred from a kernel buffer to hardware via out[bwl] via
> > >
> > >     page data -> CPU register -> out[bwl]
> > >
> > > or, data is transferred from hardware to a kernel buffer via
> > >
> > >     in[bwl] -> CPU register -> page data
> > >
> > > So what are the flushing rules given those conditions?
> >
> > Any time you touch a page cache page with the cpu, you have
> > to get it purged from aliasing caches before it gets mapped
> > into userspace or similar.
> >
> > That's the problem on these machines.
> >
> > Actually, sparc64 is probably susceptible to the same problem.
> >
> > And it's not an issue in the IDE layer, know why? :-)
> >
> > In the IDE layer we have arch specific ide_*() interfaces which is
> > where all of this PIO dma flushing stuff deserves to live.
> >
> > ATA should do something similar instead of randomly scattering
> > flush_dcache_page() calls all over the place.
> 
> Agree with this, except it shouldn't be ide specific ... any driver that
> does PIO (we have some examples in SCSI) is vulnerable to this issue and
> should have the same fix.
> 
> The ide interfaces are very low level .. they're the ide_in/out[sw] APIs
> which aren't necessarily the most useful (some of the SCSI "PIO" stuff
> actually gets fed in via memory transfers to registers rather than port
> in/out). 
> 
> What it looks like we need is kmap_for_pio() and kunmap_for_pio() which
> take flags (or API extensions) indicating whether we're transferring to
> or from the device, which return the correct kernel address to the page
> (mapping it if it's in highmem) and perform the correct
> flushes/invalidates.

In a similar patch for dealing with a PIO USB HCD driver (ISP1760) I
suggested something like pio_map_single/pio_unmap_single similar to the
dma_* API. The problem with the HCD drivers is that the page information
is lost when they are called. They only have access to a
urb->transfer_buffer pointer and you can never know whether it is a
highmem one or not. Hence I think an API that deals directly with
pointers rather than page structures would be better.

If you are OK with the idea, I'm happy to propose a patch on linux-arch.

Thanks for the feedback so far.
James Bottomley Feb. 3, 2010, 4:40 p.m. UTC | #12
On Wed, 2010-02-03 at 10:18 +0000, Catalin Marinas wrote:
> On Tue, 2010-02-02 at 23:32 +0000, James Bottomley wrote:
> > On Tue, 2010-02-02 at 15:21 -0800, David Miller wrote:
> > > From: Jeff Garzik <jgarzik@pobox.com>
> > > Date: Tue, 02 Feb 2010 18:14:40 -0500
> > >
> > > > The patch in question only affects PIO transfers, not DMA.  Data is
> > > > transferred from a kernel buffer to hardware via out[bwl] via
> > > >
> > > >     page data -> CPU register -> out[bwl]
> > > >
> > > > or, data is transferred from hardware to a kernel buffer via
> > > >
> > > >     in[bwl] -> CPU register -> page data
> > > >
> > > > So what are the flushing rules given those conditions?
> > >
> > > Any time you touch a page cache page with the cpu, you have
> > > to get it purged from aliasing caches before it gets mapped
> > > into userspace or similar.
> > >
> > > That's the problem on these machines.
> > >
> > > Actually, sparc64 is probably susceptible to the same problem.
> > >
> > > And it's not an issue in the IDE layer, know why? :-)
> > >
> > > In the IDE layer we have arch specific ide_*() interfaces which is
> > > where all of this PIO dma flushing stuff deserves to live.
> > >
> > > ATA should do something similar instead of randomly scattering
> > > flush_dcache_page() calls all over the place.
> > 
> > Agree with this, except it shouldn't be ide specific ... any driver that
> > does PIO (we have some examples in SCSI) is vulnerable to this issue and
> > should have the same fix.
> > 
> > The ide interfaces are very low level .. they're the ide_in/out[sw] APIs
> > which aren't necessarily the most useful (some of the SCSI "PIO" stuff
> > actually gets fed in via memory transfers to registers rather than port
> > in/out). 
> > 
> > What it looks like we need is kmap_for_pio() and kunmap_for_pio() which
> > take flags (or API extensions) indicating whether we're transferring to
> > or from the device, which return the correct kernel address to the page
> > (mapping it if it's in highmem) and perform the correct
> > flushes/invalidates.
> 
> In a similar patch for dealing with a PIO USB HCD driver (ISP1760) I
> suggested something like pio_map_single/pio_unmap_single similar to the
> dma_* API. The problem with the HCD drivers is that the page information
> is lost when they are called. They only have access to a
> urb->transfer_buffer pointer and you can never know whether it is a
> highmem one or not. Hence I think an API that deals directly with
> pointers rather than page structures would be better.

Actually, thinking about the semantics, the normal kmap has to have
these flushes in the map and unmap anyway ... otherwise the page gets
decohered if you do normal read/write on it.  The only information we
don't have in current kmap is whether we're mapping for read/write or
both.  So the API I'd propose is keep current kmap as is for the read
and write case, and introduce a kmap_for_read and kmap_for_write (with
corresponding umaps, and obviously atomics).

The fix to libata looks to be just that it should kmap all the time
rather than trying to fiddle with the page to see if its higmem.  For
kmap on a normal page, we should just return the offset map address and
do all the flushing.

James


> If you are OK with the idea, I'm happy to propose a patch on linux-arch.
> 
> Thanks for the feedback so far.
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Garzik Feb. 3, 2010, 5 p.m. UTC | #13
On 02/03/2010 11:40 AM, James Bottomley wrote:
> The fix to libata looks to be just that it should kmap all the time
> rather than trying to fiddle with the page to see if its higmem.  For
> kmap on a normal page, we should just return the offset map address and
> do all the flushing.

libata tests PageHighMem() because it was measurably faster to do things 
the current way (which includes local_irq_save/restore, only for 
highmem) on boxes where it actually matters.

It seems more efficient to add a flush where necessary, than 
unconditionally punish everyone...

	Jeff



--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Morton Feb. 3, 2010, 5:06 p.m. UTC | #14
On Wed, 03 Feb 2010 12:00:58 -0500 Jeff Garzik <jgarzik@pobox.com> wrote:

> On 02/03/2010 11:40 AM, James Bottomley wrote:
> > The fix to libata looks to be just that it should kmap all the time
> > rather than trying to fiddle with the page to see if its higmem.  For
> > kmap on a normal page, we should just return the offset map address and
> > do all the flushing.
> 
> libata tests PageHighMem() because it was measurably faster to do things 
> the current way (which includes local_irq_save/restore, only for 
> highmem) on boxes where it actually matters.
> 
> It seems more efficient to add a flush where necessary, than 
> unconditionally punish everyone...

kmap_atomic() tests PageHighMem() too - it's pretty lightweight for
lowmem pages.


Anyway, I'd draw your attention to this claim in the changelog: "This
patch allows the ARM boards to use a rootfs on CompactFlash with the
PATA platform driver." Immediate-term, we should be looking for a small
fix for this issue which is acceptable for 2.6.33 and 2.6.32 and earlier.

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Feb. 3, 2010, 5:09 p.m. UTC | #15
From: James Bottomley <James.Bottomley@suse.de>
Date: Wed, 03 Feb 2010 10:40:54 -0600

> Actually, thinking about the semantics, the normal kmap has to have
> these flushes in the map and unmap anyway ... otherwise the page gets
> decohered if you do normal read/write on it.  The only information we
> don't have in current kmap is whether we're mapping for read/write or
> both.  So the API I'd propose is keep current kmap as is for the read
> and write case, and introduce a kmap_for_read and kmap_for_write (with
> corresponding umaps, and obviously atomics).

This would basically force non-highmem platforms to implement the
kmap_*() interfaces.

I understand that MIPS does this already to deal with cache aliasing.

But that's currently a choice, and sparc64 for example has no need
to do things that way.  It would in fact be more expensive than
how sparc64 currently handles cache aliasing.

So I'm more in support of a new interface.  Platforms like sparc64
can implement it and then get rid of the ide_*() string op cache
flushes and instead we put the new API calls in the right spots.
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Garzik Feb. 3, 2010, 5:15 p.m. UTC | #16
On 02/03/2010 12:06 PM, Andrew Morton wrote:
> On Wed, 03 Feb 2010 12:00:58 -0500 Jeff Garzik<jgarzik@pobox.com>  wrote:
>
>> On 02/03/2010 11:40 AM, James Bottomley wrote:
>>> The fix to libata looks to be just that it should kmap all the time
>>> rather than trying to fiddle with the page to see if its higmem.  For
>>> kmap on a normal page, we should just return the offset map address and
>>> do all the flushing.
>>
>> libata tests PageHighMem() because it was measurably faster to do things
>> the current way (which includes local_irq_save/restore, only for
>> highmem) on boxes where it actually matters.
>>
>> It seems more efficient to add a flush where necessary, than
>> unconditionally punish everyone...
>
> kmap_atomic() tests PageHighMem() too - it's pretty lightweight for
> lowmem pages.

Note the lack of local_irq_save/restore in our code, though...  These 
PIO xfers are __slow__, from the perspective of a CPU manufactured in 
the past decade; you are definitely disabling local interrupts for a 
long time.  I suppose we could do

	if (high mem)
		local irq save
		kmap
		xfer
		kunmap
		local irq restore
	else
		kmap
		xfer
		kunmap

does that solve the problem for ARM, for 2.6.33?


> Anyway, I'd draw your attention to this claim in the changelog: "This
> patch allows the ARM boards to use a rootfs on CompactFlash with the
> PATA platform driver." Immediate-term, we should be looking for a small
> fix for this issue which is acceptable for 2.6.33 and 2.6.32 and earlier.

Sure...  see above.  hopefully one that does not punish -everybody- 
though.  It would be sad to unconditionally slow down millions of volume 
platform (read: x86) users for some embedded boards.

	Jeff


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Morton Feb. 3, 2010, 5:20 p.m. UTC | #17
On Wed, 03 Feb 2010 12:15:46 -0500 Jeff Garzik <jgarzik@pobox.com> wrote:

> On 02/03/2010 12:06 PM, Andrew Morton wrote:
> > On Wed, 03 Feb 2010 12:00:58 -0500 Jeff Garzik<jgarzik@pobox.com>  wrote:
> >
> >> On 02/03/2010 11:40 AM, James Bottomley wrote:
> >>> The fix to libata looks to be just that it should kmap all the time
> >>> rather than trying to fiddle with the page to see if its higmem.  For
> >>> kmap on a normal page, we should just return the offset map address and
> >>> do all the flushing.
> >>
> >> libata tests PageHighMem() because it was measurably faster to do things
> >> the current way (which includes local_irq_save/restore, only for
> >> highmem) on boxes where it actually matters.
> >>
> >> It seems more efficient to add a flush where necessary, than
> >> unconditionally punish everyone...
> >
> > kmap_atomic() tests PageHighMem() too - it's pretty lightweight for
> > lowmem pages.
> 
> Note the lack of local_irq_save/restore in our code, though...  These 
> PIO xfers are __slow__, from the perspective of a CPU manufactured in 
> the past decade; you are definitely disabling local interrupts for a 
> long time.  I suppose we could do
> 
> 	if (high mem)
> 		local irq save
> 		kmap
> 		xfer
> 		kunmap
> 		local irq restore
> 	else
> 		kmap
> 		xfer
> 		kunmap
> 
> does that solve the problem for ARM, for 2.6.33?
> 

It's unclear (to me) why that code is using KM_IRQ0 at all.  Can't it
use a non-irq kmap slot?

> > Anyway, I'd draw your attention to this claim in the changelog: "This
> > patch allows the ARM boards to use a rootfs on CompactFlash with the
> > PATA platform driver." Immediate-term, we should be looking for a small
> > fix for this issue which is acceptable for 2.6.33 and 2.6.32 and earlier.
> 
> Sure...  see above.  hopefully one that does not punish -everybody- 
> though.  It would be sad to unconditionally slow down millions of volume 
> platform (read: x86) users for some embedded boards.

Well. 
ata-call-flush_dcache_page-around-pio-data-transfers-in-libata-affc.patch
is a no-op on x86.  It only has an effect on architectures which
implement flush_dcache_page().  And I expect flush_dcache_page() is
pretty light on those architectures, when compared with a PIO-mode
transfer.



--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Garzik Feb. 3, 2010, 5:29 p.m. UTC | #18
On 02/03/2010 12:20 PM, Andrew Morton wrote:
> On Wed, 03 Feb 2010 12:15:46 -0500 Jeff Garzik<jgarzik@pobox.com>  wrote:
>
>> On 02/03/2010 12:06 PM, Andrew Morton wrote:
>>> On Wed, 03 Feb 2010 12:00:58 -0500 Jeff Garzik<jgarzik@pobox.com>   wrote:
>>>
>>>> On 02/03/2010 11:40 AM, James Bottomley wrote:
>>>>> The fix to libata looks to be just that it should kmap all the time
>>>>> rather than trying to fiddle with the page to see if its higmem.  For
>>>>> kmap on a normal page, we should just return the offset map address and
>>>>> do all the flushing.
>>>>
>>>> libata tests PageHighMem() because it was measurably faster to do things
>>>> the current way (which includes local_irq_save/restore, only for
>>>> highmem) on boxes where it actually matters.
>>>>
>>>> It seems more efficient to add a flush where necessary, than
>>>> unconditionally punish everyone...
>>>
>>> kmap_atomic() tests PageHighMem() too - it's pretty lightweight for
>>> lowmem pages.
>>
>> Note the lack of local_irq_save/restore in our code, though...  These
>> PIO xfers are __slow__, from the perspective of a CPU manufactured in
>> the past decade; you are definitely disabling local interrupts for a
>> long time.  I suppose we could do
>>
>> 	if (high mem)
>> 		local irq save
>> 		kmap
>> 		xfer
>> 		kunmap
>> 		local irq restore
>> 	else
>> 		kmap
>> 		xfer
>> 		kunmap
>>
>> does that solve the problem for ARM, for 2.6.33?
>>
>
> It's unclear (to me) why that code is using KM_IRQ0 at all.  Can't it
> use a non-irq kmap slot?

libata may have to transfer data in response to an interrupt.  That is 
normal interrupt-driven PIO -- although it should be noted that the code 
supports polled PIO as well.


>>> Anyway, I'd draw your attention to this claim in the changelog: "This
>>> patch allows the ARM boards to use a rootfs on CompactFlash with the
>>> PATA platform driver." Immediate-term, we should be looking for a small
>>> fix for this issue which is acceptable for 2.6.33 and 2.6.32 and earlier.
>>
>> Sure...  see above.  hopefully one that does not punish -everybody-
>> though.  It would be sad to unconditionally slow down millions of volume
>> platform (read: x86) users for some embedded boards.
>
> Well.
> ata-call-flush_dcache_page-around-pio-data-transfers-in-libata-affc.patch
> is a no-op on x86.  It only has an effect on architectures which
> implement flush_dcache_page().  And I expect flush_dcache_page() is
> pretty light on those architectures, when compared with a PIO-mode
> transfer.

I don't object to the patch...  as long as the arch people are happy. 
Arch people seem to be the ones complaining, though.

	Jeff



--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley Feb. 3, 2010, 5:39 p.m. UTC | #19
On Wed, 2010-02-03 at 12:29 -0500, Jeff Garzik wrote:
> On 02/03/2010 12:20 PM, Andrew Morton wrote:
> > On Wed, 03 Feb 2010 12:15:46 -0500 Jeff Garzik<jgarzik@pobox.com>  wrote:
> >
> >> On 02/03/2010 12:06 PM, Andrew Morton wrote:
> >>> On Wed, 03 Feb 2010 12:00:58 -0500 Jeff Garzik<jgarzik@pobox.com>   wrote:
> >>>
> >>>> On 02/03/2010 11:40 AM, James Bottomley wrote:
> >>>>> The fix to libata looks to be just that it should kmap all the time
> >>>>> rather than trying to fiddle with the page to see if its higmem.  For
> >>>>> kmap on a normal page, we should just return the offset map address and
> >>>>> do all the flushing.
> >>>>
> >>>> libata tests PageHighMem() because it was measurably faster to do things
> >>>> the current way (which includes local_irq_save/restore, only for
> >>>> highmem) on boxes where it actually matters.
> >>>>
> >>>> It seems more efficient to add a flush where necessary, than
> >>>> unconditionally punish everyone...
> >>>
> >>> kmap_atomic() tests PageHighMem() too - it's pretty lightweight for
> >>> lowmem pages.
> >>
> >> Note the lack of local_irq_save/restore in our code, though...  These
> >> PIO xfers are __slow__, from the perspective of a CPU manufactured in
> >> the past decade; you are definitely disabling local interrupts for a
> >> long time.  I suppose we could do
> >>
> >> 	if (high mem)
> >> 		local irq save
> >> 		kmap
> >> 		xfer
> >> 		kunmap
> >> 		local irq restore
> >> 	else
> >> 		kmap
> >> 		xfer
> >> 		kunmap
> >>
> >> does that solve the problem for ARM, for 2.6.33?
> >>
> >
> > It's unclear (to me) why that code is using KM_IRQ0 at all.  Can't it
> > use a non-irq kmap slot?
> 
> libata may have to transfer data in response to an interrupt.  That is 
> normal interrupt-driven PIO -- although it should be noted that the code 
> supports polled PIO as well.
> 
> 
> >>> Anyway, I'd draw your attention to this claim in the changelog: "This
> >>> patch allows the ARM boards to use a rootfs on CompactFlash with the
> >>> PATA platform driver." Immediate-term, we should be looking for a small
> >>> fix for this issue which is acceptable for 2.6.33 and 2.6.32 and earlier.
> >>
> >> Sure...  see above.  hopefully one that does not punish -everybody-
> >> though.  It would be sad to unconditionally slow down millions of volume
> >> platform (read: x86) users for some embedded boards.
> >
> > Well.
> > ata-call-flush_dcache_page-around-pio-data-transfers-in-libata-affc.patch
> > is a no-op on x86.  It only has an effect on architectures which
> > implement flush_dcache_page().  And I expect flush_dcache_page() is
> > pretty light on those architectures, when compared with a PIO-mode
> > transfer.
> 
> I don't object to the patch...  as long as the arch people are happy. 
> Arch people seem to be the ones complaining, though

Apart from the contents, which is looking like sprinkle mainline with
random flushes, I'm unhappy that something which could be better
implemented by thinking about the problem is now being rammed through as
a must have bug fix.

We got this piece of crap in the same way:

commit 2d4dc890b5c8fabd818a8586607e6843c4375e62
Author: Ilya Loginov <isloginov@gmail.com>
Date:   Thu Nov 26 09:16:19 2009 +0100

    block: add helpers to run flush_dcache_page() against a bio and a
request's pages

Which is another race to flush everywhere until my coherence problem
goes away.

This scattershot approach to coherency is really damaging in the long
term because all these random flushes are going to mask real problems we
may or may not have in the arch APIs ... and worse, they'll mask
mostly ... there'll likely be times when the masking is incomplete and
we're left with incredibly hard to debug data loss or binary crashes.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Cox Feb. 3, 2010, 5:40 p.m. UTC | #20
> 	if (high mem)
> 		local irq save
> 		kmap
> 		xfer
> 		kunmap
> 		local irq restore

At the very least you need to do

	irq save
	kmap
	copy to local buffer
	kunmap
	local irq restore
	xfer

and the mirror image for read.


Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Cox Feb. 3, 2010, 5:46 p.m. UTC | #21
And indeed there was a patch I proposed in 2008 for this bounce
buffer latency: See the archive

Date: Fri, 29 Feb 2008 13:51:06 +0000
From: Alan Cox <alan@lxorguk.ukuu.org.uk>
To: linux-ide@vger.kernel.org, jeff@garzik.org
Subject: [RFC PATCH] libata: PIO via bounce buffer

although it doesn't deal with the dcache coherency issue and seems to
need a little tweaking to apply due to other changes

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Catalin Marinas Feb. 3, 2010, 5:49 p.m. UTC | #22
On Wed, 2010-02-03 at 17:15 +0000, Jeff Garzik wrote:
> On 02/03/2010 12:06 PM, Andrew Morton wrote:
> > On Wed, 03 Feb 2010 12:00:58 -0500 Jeff Garzik<jgarzik@pobox.com>  wrote:
> >
> >> On 02/03/2010 11:40 AM, James Bottomley wrote:
> >>> The fix to libata looks to be just that it should kmap all the time
> >>> rather than trying to fiddle with the page to see if its higmem.  For
> >>> kmap on a normal page, we should just return the offset map address and
> >>> do all the flushing.
> >>
> >> libata tests PageHighMem() because it was measurably faster to do things
> >> the current way (which includes local_irq_save/restore, only for
> >> highmem) on boxes where it actually matters.
> >>
> >> It seems more efficient to add a flush where necessary, than
> >> unconditionally punish everyone...
> >
> > kmap_atomic() tests PageHighMem() too - it's pretty lightweight for
> > lowmem pages.
> 
> Note the lack of local_irq_save/restore in our code, though...  These
> PIO xfers are __slow__, from the perspective of a CPU manufactured in
> the past decade; you are definitely disabling local interrupts for a
> long time.  I suppose we could do
> 
>         if (high mem)
>                 local irq save
>                 kmap
>                 xfer
>                 kunmap
>                 local irq restore
>         else
>                 kmap
>                 xfer
>                 kunmap
> 
> does that solve the problem for ARM, for 2.6.33?

No, it doesn't. If the page isn't a highmem one, the kunmap doesn't do
anything to the caches (and I also have highmem disabled). My logic was
based on a statement in cachetlb.txt that flush_dcache_page() should be
called if the kernel writes to a page cache page, hence the patch (as it
was pointed out, the first call to flush_dcache_page() isn't needed).
Jeff Garzik Feb. 3, 2010, 5:52 p.m. UTC | #23
On 02/03/2010 12:46 PM, Alan Cox wrote:
> And indeed there was a patch I proposed in 2008 for this bounce
> buffer latency: See the archive
>
> Date: Fri, 29 Feb 2008 13:51:06 +0000
> From: Alan Cox<alan@lxorguk.ukuu.org.uk>
> To: linux-ide@vger.kernel.org, jeff@garzik.org
> Subject: [RFC PATCH] libata: PIO via bounce buffer
>
> although it doesn't deal with the dcache coherency issue and seems to
> need a little tweaking to apply due to other changes

Yeah, I definitely liked the idea.  I wonder if we could do the 
allocation during port_start rather than at the time of bounce?  One 
maximally-sized buffer per device should not be too punishing on the 
system, IMO.

In any case, it is a nice approach to pursue.

	Jeff



--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Garzik Feb. 3, 2010, 5:54 p.m. UTC | #24
On 02/03/2010 12:49 PM, Catalin Marinas wrote:
> No, it doesn't. If the page isn't a highmem one, the kunmap doesn't do
> anything to the caches (and I also have highmem disabled). My logic was
> based on a statement in cachetlb.txt that flush_dcache_page() should be
> called if the kernel writes to a page cache page, hence the patch (as it
> was pointed out, the first call to flush_dcache_page() isn't needed).

Well, if those are the API rules, libata certainly must follow them...

	Jeff


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Cox Feb. 3, 2010, 6 p.m. UTC | #25
On Wed, 03 Feb 2010 12:52:34 -0500
Jeff Garzik <jgarzik@pobox.com> wrote:

> On 02/03/2010 12:46 PM, Alan Cox wrote:
> > And indeed there was a patch I proposed in 2008 for this bounce
> > buffer latency: See the archive
> >
> > Date: Fri, 29 Feb 2008 13:51:06 +0000
> > From: Alan Cox<alan@lxorguk.ukuu.org.uk>
> > To: linux-ide@vger.kernel.org, jeff@garzik.org
> > Subject: [RFC PATCH] libata: PIO via bounce buffer
> >
> > although it doesn't deal with the dcache coherency issue and seems to
> > need a little tweaking to apply due to other changes
> 
> Yeah, I definitely liked the idea.  I wonder if we could do the 
> allocation during port_start rather than at the time of bounce? 

I actually got better numbers using kmalloc - no idea why - perhaps we
get a hot page ?
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Garzik Feb. 3, 2010, 6:12 p.m. UTC | #26
On 02/03/2010 01:00 PM, Alan Cox wrote:
> On Wed, 03 Feb 2010 12:52:34 -0500
> Jeff Garzik<jgarzik@pobox.com>  wrote:
>
>> On 02/03/2010 12:46 PM, Alan Cox wrote:
>>> And indeed there was a patch I proposed in 2008 for this bounce
>>> buffer latency: See the archive
>>>
>>> Date: Fri, 29 Feb 2008 13:51:06 +0000
>>> From: Alan Cox<alan@lxorguk.ukuu.org.uk>
>>> To: linux-ide@vger.kernel.org, jeff@garzik.org
>>> Subject: [RFC PATCH] libata: PIO via bounce buffer
>>>
>>> although it doesn't deal with the dcache coherency issue and seems to
>>> need a little tweaking to apply due to other changes
>>
>> Yeah, I definitely liked the idea.  I wonder if we could do the
>> allocation during port_start rather than at the time of bounce?
>
> I actually got better numbers using kmalloc - no idea why - perhaps we
> get a hot page ?

Seems logical.  It surprises me, though, that being lockless and 
completely avoiding the kmalloc infrastructure wasn't a bigger win.  Was 
this measured on UP?

If your approach has the lowest cost, let's get it in...  I like 
decisions informed by hard data.  :)

	Jeff



--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Catalin Marinas Feb. 4, 2010, 2:33 p.m. UTC | #27
On Wed, 2010-02-03 at 17:39 +0000, James Bottomley wrote:
> On Wed, 2010-02-03 at 12:29 -0500, Jeff Garzik wrote:
> > On 02/03/2010 12:20 PM, Andrew Morton wrote:
> > > On Wed, 03 Feb 2010 12:15:46 -0500 Jeff Garzik<jgarzik@pobox.com>  wrote:
> > >> On 02/03/2010 12:06 PM, Andrew Morton wrote:
> > >>> Anyway, I'd draw your attention to this claim in the changelog: "This
> > >>> patch allows the ARM boards to use a rootfs on CompactFlash with the
> > >>> PATA platform driver." Immediate-term, we should be looking for a small
> > >>> fix for this issue which is acceptable for 2.6.33 and 2.6.32 and earlier.
> > >>
> > >> Sure...  see above.  hopefully one that does not punish -everybody-
> > >> though.  It would be sad to unconditionally slow down millions of volume
> > >> platform (read: x86) users for some embedded boards.
> > >
> > > Well.
> > > ata-call-flush_dcache_page-around-pio-data-transfers-in-libata-affc.patch
> > > is a no-op on x86.  It only has an effect on architectures which
> > > implement flush_dcache_page().  And I expect flush_dcache_page() is
> > > pretty light on those architectures, when compared with a PIO-mode
> > > transfer.
> >
> > I don't object to the patch...  as long as the arch people are happy.
> > Arch people seem to be the ones complaining, though
> 
> Apart from the contents, which is looking like sprinkle mainline with
> random flushes, I'm unhappy that something which could be better
> implemented by thinking about the problem is now being rammed through as
> a must have bug fix.
> 
> We got this piece of crap in the same way:
> 
> commit 2d4dc890b5c8fabd818a8586607e6843c4375e62
> Author: Ilya Loginov <isloginov@gmail.com>
> Date:   Thu Nov 26 09:16:19 2009 +0100
> 
>     block: add helpers to run flush_dcache_page() against a bio and a
> request's pages
> 
> Which is another race to flush everywhere until my coherence problem
> goes away.
> 
> This scattershot approach to coherency is really damaging in the long
> term because all these random flushes are going to mask real problems we
> may or may not have in the arch APIs ... and worse, they'll mask
> mostly ... there'll likely be times when the masking is incomplete and
> we're left with incredibly hard to debug data loss or binary crashes.

I agree that this could be solved in a better way and I'm in favour of
API improvements. The current API and description in cachetlb.txt
suggest that flush_dcache_page() is the best approach when modifying
page cache pages.

AFAICT, there are two main use cases of flush_dcache_page() (could be
more but that's what affects ARM): (1) filesystems modifying page cache
pages (NFS, cramfs etc.) and (2) drivers doing PIO (block, HCD) that may
write directly to page cache pages. Point (2) is a newer addition (as
the one above) to fix coherency problems on Harvard architectures.

As long as you use filesystems like cramfs that call
flush_dcache_page(), the PIO block driver doesn't need to do any
flushing. That's one of the reasons why MTD didn't need any for a long
time. Once you start using filesystems like ext2, there is no
flush_dcache_page() called by the VFS layer, so we thought about moving
this into the driver (for a long time I had a dirty hack in
mpage_end_io_read() to call flush_dcache_page()).

If we state that flush_dcache_page() should not be used in driver code,
than we need additional API for this (like pio_map_page etc. to be in
line with the dma_map_* functions). This would allow architectures to
implement them however they like.

I can go on linux-arch with a proposed patch if this sounds acceptable.

An alternative may be to force this flushing in kunmap(). Is it
guaranteed that page cache pages are only accessed via kmap/kunmap?
There are a few cases in the kernel where the code checks whether the
page is a highmem one before using these functions.

Regards.
James Bottomley Feb. 4, 2010, 3:01 p.m. UTC | #28
On Thu, 2010-02-04 at 14:33 +0000, Catalin Marinas wrote:
> On Wed, 2010-02-03 at 17:39 +0000, James Bottomley wrote:
> > On Wed, 2010-02-03 at 12:29 -0500, Jeff Garzik wrote:
> > > On 02/03/2010 12:20 PM, Andrew Morton wrote:
> > > > On Wed, 03 Feb 2010 12:15:46 -0500 Jeff Garzik<jgarzik@pobox.com>  wrote:
> > > >> On 02/03/2010 12:06 PM, Andrew Morton wrote:
> > > >>> Anyway, I'd draw your attention to this claim in the changelog: "This
> > > >>> patch allows the ARM boards to use a rootfs on CompactFlash with the
> > > >>> PATA platform driver." Immediate-term, we should be looking for a small
> > > >>> fix for this issue which is acceptable for 2.6.33 and 2.6.32 and earlier.
> > > >>
> > > >> Sure...  see above.  hopefully one that does not punish -everybody-
> > > >> though.  It would be sad to unconditionally slow down millions of volume
> > > >> platform (read: x86) users for some embedded boards.
> > > >
> > > > Well.
> > > > ata-call-flush_dcache_page-around-pio-data-transfers-in-libata-affc.patch
> > > > is a no-op on x86.  It only has an effect on architectures which
> > > > implement flush_dcache_page().  And I expect flush_dcache_page() is
> > > > pretty light on those architectures, when compared with a PIO-mode
> > > > transfer.
> > >
> > > I don't object to the patch...  as long as the arch people are happy.
> > > Arch people seem to be the ones complaining, though
> > 
> > Apart from the contents, which is looking like sprinkle mainline with
> > random flushes, I'm unhappy that something which could be better
> > implemented by thinking about the problem is now being rammed through as
> > a must have bug fix.
> > 
> > We got this piece of crap in the same way:
> > 
> > commit 2d4dc890b5c8fabd818a8586607e6843c4375e62
> > Author: Ilya Loginov <isloginov@gmail.com>
> > Date:   Thu Nov 26 09:16:19 2009 +0100
> > 
> >     block: add helpers to run flush_dcache_page() against a bio and a
> > request's pages
> > 
> > Which is another race to flush everywhere until my coherence problem
> > goes away.
> > 
> > This scattershot approach to coherency is really damaging in the long
> > term because all these random flushes are going to mask real problems we
> > may or may not have in the arch APIs ... and worse, they'll mask
> > mostly ... there'll likely be times when the masking is incomplete and
> > we're left with incredibly hard to debug data loss or binary crashes.
> 
> I agree that this could be solved in a better way and I'm in favour of
> API improvements. The current API and description in cachetlb.txt
> suggest that flush_dcache_page() is the best approach when modifying
> page cache pages.
> 
> AFAICT, there are two main use cases of flush_dcache_page() (could be
> more but that's what affects ARM): (1) filesystems modifying page cache
> pages (NFS, cramfs etc.) and (2) drivers doing PIO (block, HCD) that may
> write directly to page cache pages. Point (2) is a newer addition (as
> the one above) to fix coherency problems on Harvard architectures.
> 
> As long as you use filesystems like cramfs that call
> flush_dcache_page(), the PIO block driver doesn't need to do any
> flushing. That's one of the reasons why MTD didn't need any for a long
> time. Once you start using filesystems like ext2, there is no
> flush_dcache_page() called by the VFS layer, so we thought about moving
> this into the driver (for a long time I had a dirty hack in
> mpage_end_io_read() to call flush_dcache_page()).
> 
> If we state that flush_dcache_page() should not be used in driver code,
> than we need additional API for this (like pio_map_page etc. to be in
> line with the dma_map_* functions). This would allow architectures to
> implement them however they like.
> 
> I can go on linux-arch with a proposed patch if this sounds acceptable.

I think I'd prefer some type of kmap_ variant.  The reason is that all
the semantics are going to be the same as kmap ... down to having the
slots for atomic.  All we need is the extra flag giving the direction of
transfer and, of course, what you get back is a virtual address pointer
(all the dma operations return physical address handles).

> An alternative may be to force this flushing in kunmap(). Is it
> guaranteed that page cache pages are only accessed via kmap/kunmap?
> There are a few cases in the kernel where the code checks whether the
> page is a highmem one before using these functions.

So code which checks for higmem really shouldn't.  However DaveM makes
the valid point that this would be problematic on platforms that have no
highmem.  But an additional variant like kmap_pio which all platforms
can intercept looks more acceptable.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Catalin Marinas Feb. 4, 2010, 3:39 p.m. UTC | #29
On Thu, 2010-02-04 at 15:01 +0000, James Bottomley wrote:
> On Thu, 2010-02-04 at 14:33 +0000, Catalin Marinas wrote:
> > On Wed, 2010-02-03 at 17:39 +0000, James Bottomley wrote:
> > > On Wed, 2010-02-03 at 12:29 -0500, Jeff Garzik wrote:
> > > > On 02/03/2010 12:20 PM, Andrew Morton wrote:
> > > > > On Wed, 03 Feb 2010 12:15:46 -0500 Jeff Garzik<jgarzik@pobox.com>  wrote:
> > > > >> On 02/03/2010 12:06 PM, Andrew Morton wrote:
> > > > >>> Anyway, I'd draw your attention to this claim in the changelog: "This
> > > > >>> patch allows the ARM boards to use a rootfs on CompactFlash with the
> > > > >>> PATA platform driver." Immediate-term, we should be looking for a small
> > > > >>> fix for this issue which is acceptable for 2.6.33 and 2.6.32 and earlier.
> > > > >>
> > > > >> Sure...  see above.  hopefully one that does not punish -everybody-
> > > > >> though.  It would be sad to unconditionally slow down millions of volume
> > > > >> platform (read: x86) users for some embedded boards.
> > > > >
> > > > > Well.
> > > > > ata-call-flush_dcache_page-around-pio-data-transfers-in-libata-affc.patch
> > > > > is a no-op on x86.  It only has an effect on architectures which
> > > > > implement flush_dcache_page().  And I expect flush_dcache_page() is
> > > > > pretty light on those architectures, when compared with a PIO-mode
> > > > > transfer.
> > > >
> > > > I don't object to the patch...  as long as the arch people are happy.
> > > > Arch people seem to be the ones complaining, though
> > >
> > > Apart from the contents, which is looking like sprinkle mainline with
> > > random flushes, I'm unhappy that something which could be better
> > > implemented by thinking about the problem is now being rammed through as
> > > a must have bug fix.
> > >
> > > We got this piece of crap in the same way:
> > >
> > > commit 2d4dc890b5c8fabd818a8586607e6843c4375e62
> > > Author: Ilya Loginov <isloginov@gmail.com>
> > > Date:   Thu Nov 26 09:16:19 2009 +0100
> > >
> > >     block: add helpers to run flush_dcache_page() against a bio and a
> > > request's pages
> > >
> > > Which is another race to flush everywhere until my coherence problem
> > > goes away.
> > >
> > > This scattershot approach to coherency is really damaging in the long
> > > term because all these random flushes are going to mask real problems we
> > > may or may not have in the arch APIs ... and worse, they'll mask
> > > mostly ... there'll likely be times when the masking is incomplete and
> > > we're left with incredibly hard to debug data loss or binary crashes.
> >
> > I agree that this could be solved in a better way and I'm in favour of
> > API improvements. The current API and description in cachetlb.txt
> > suggest that flush_dcache_page() is the best approach when modifying
> > page cache pages.
> >
> > AFAICT, there are two main use cases of flush_dcache_page() (could be
> > more but that's what affects ARM): (1) filesystems modifying page cache
> > pages (NFS, cramfs etc.) and (2) drivers doing PIO (block, HCD) that may
> > write directly to page cache pages. Point (2) is a newer addition (as
> > the one above) to fix coherency problems on Harvard architectures.
> >
> > As long as you use filesystems like cramfs that call
> > flush_dcache_page(), the PIO block driver doesn't need to do any
> > flushing. That's one of the reasons why MTD didn't need any for a long
> > time. Once you start using filesystems like ext2, there is no
> > flush_dcache_page() called by the VFS layer, so we thought about moving
> > this into the driver (for a long time I had a dirty hack in
> > mpage_end_io_read() to call flush_dcache_page()).
> >
> > If we state that flush_dcache_page() should not be used in driver code,
> > than we need additional API for this (like pio_map_page etc. to be in
> > line with the dma_map_* functions). This would allow architectures to
> > implement them however they like.
> >
> > I can go on linux-arch with a proposed patch if this sounds acceptable.
> 
> I think I'd prefer some type of kmap_ variant.  The reason is that all
> the semantics are going to be the same as kmap ... down to having the
> slots for atomic.  All we need is the extra flag giving the direction of
> transfer and, of course, what you get back is a virtual address pointer
> (all the dma operations return physical address handles).

Would a kmap_pio API always require a page structure as argument?

There is a similar situation for HCD drivers and USB mass storage which
I raised recently. The idea that the USB mass storage code should do the
cache flushing was rejected as this has to be pushed down to the HCD
driver. However, the HCD driver doesn't get page information, only a
urb->transfer_buffer pointer to which it may do either DMA or PIO (both
may actually happen in the same driver as an optimisation).

If we enforce the use of a kmap_pio API in the USB mass storage, it
wouldn't be optimal since this code is not aware of whether the HCD
driver would do PIO or DMA.

Looking at the HCD drivers, there are more similarities to the DMA API
like pio_map_page, pio_map_single etc. I personally don't care much
about how the API is named but rather where such API should be called
from.

Regards.
James Bottomley Feb. 4, 2010, 9:36 p.m. UTC | #30
On Thu, 2010-02-04 at 15:39 +0000, Catalin Marinas wrote:
> On Thu, 2010-02-04 at 15:01 +0000, James Bottomley wrote:
> > On Thu, 2010-02-04 at 14:33 +0000, Catalin Marinas wrote:
> > > On Wed, 2010-02-03 at 17:39 +0000, James Bottomley wrote:
> > > > On Wed, 2010-02-03 at 12:29 -0500, Jeff Garzik wrote:
> > > > > On 02/03/2010 12:20 PM, Andrew Morton wrote:
> > > > > > On Wed, 03 Feb 2010 12:15:46 -0500 Jeff Garzik<jgarzik@pobox.com>  wrote:
> > > > > >> On 02/03/2010 12:06 PM, Andrew Morton wrote:
> > > > > >>> Anyway, I'd draw your attention to this claim in the changelog: "This
> > > > > >>> patch allows the ARM boards to use a rootfs on CompactFlash with the
> > > > > >>> PATA platform driver." Immediate-term, we should be looking for a small
> > > > > >>> fix for this issue which is acceptable for 2.6.33 and 2.6.32 and earlier.
> > > > > >>
> > > > > >> Sure...  see above.  hopefully one that does not punish -everybody-
> > > > > >> though.  It would be sad to unconditionally slow down millions of volume
> > > > > >> platform (read: x86) users for some embedded boards.
> > > > > >
> > > > > > Well.
> > > > > > ata-call-flush_dcache_page-around-pio-data-transfers-in-libata-affc.patch
> > > > > > is a no-op on x86.  It only has an effect on architectures which
> > > > > > implement flush_dcache_page().  And I expect flush_dcache_page() is
> > > > > > pretty light on those architectures, when compared with a PIO-mode
> > > > > > transfer.
> > > > >
> > > > > I don't object to the patch...  as long as the arch people are happy.
> > > > > Arch people seem to be the ones complaining, though
> > > >
> > > > Apart from the contents, which is looking like sprinkle mainline with
> > > > random flushes, I'm unhappy that something which could be better
> > > > implemented by thinking about the problem is now being rammed through as
> > > > a must have bug fix.
> > > >
> > > > We got this piece of crap in the same way:
> > > >
> > > > commit 2d4dc890b5c8fabd818a8586607e6843c4375e62
> > > > Author: Ilya Loginov <isloginov@gmail.com>
> > > > Date:   Thu Nov 26 09:16:19 2009 +0100
> > > >
> > > >     block: add helpers to run flush_dcache_page() against a bio and a
> > > > request's pages
> > > >
> > > > Which is another race to flush everywhere until my coherence problem
> > > > goes away.
> > > >
> > > > This scattershot approach to coherency is really damaging in the long
> > > > term because all these random flushes are going to mask real problems we
> > > > may or may not have in the arch APIs ... and worse, they'll mask
> > > > mostly ... there'll likely be times when the masking is incomplete and
> > > > we're left with incredibly hard to debug data loss or binary crashes.
> > >
> > > I agree that this could be solved in a better way and I'm in favour of
> > > API improvements. The current API and description in cachetlb.txt
> > > suggest that flush_dcache_page() is the best approach when modifying
> > > page cache pages.
> > >
> > > AFAICT, there are two main use cases of flush_dcache_page() (could be
> > > more but that's what affects ARM): (1) filesystems modifying page cache
> > > pages (NFS, cramfs etc.) and (2) drivers doing PIO (block, HCD) that may
> > > write directly to page cache pages. Point (2) is a newer addition (as
> > > the one above) to fix coherency problems on Harvard architectures.
> > >
> > > As long as you use filesystems like cramfs that call
> > > flush_dcache_page(), the PIO block driver doesn't need to do any
> > > flushing. That's one of the reasons why MTD didn't need any for a long
> > > time. Once you start using filesystems like ext2, there is no
> > > flush_dcache_page() called by the VFS layer, so we thought about moving
> > > this into the driver (for a long time I had a dirty hack in
> > > mpage_end_io_read() to call flush_dcache_page()).
> > >
> > > If we state that flush_dcache_page() should not be used in driver code,
> > > than we need additional API for this (like pio_map_page etc. to be in
> > > line with the dma_map_* functions). This would allow architectures to
> > > implement them however they like.
> > >
> > > I can go on linux-arch with a proposed patch if this sounds acceptable.
> > 
> > I think I'd prefer some type of kmap_ variant.  The reason is that all
> > the semantics are going to be the same as kmap ... down to having the
> > slots for atomic.  All we need is the extra flag giving the direction of
> > transfer and, of course, what you get back is a virtual address pointer
> > (all the dma operations return physical address handles).
> 
> Would a kmap_pio API always require a page structure as argument?

This seems a reasonable requirement.

> There is a similar situation for HCD drivers and USB mass storage which
> I raised recently. The idea that the USB mass storage code should do the
> cache flushing was rejected as this has to be pushed down to the HCD
> driver. However, the HCD driver doesn't get page information, only a
> urb->transfer_buffer pointer to which it may do either DMA or PIO (both
> may actually happen in the same driver as an optimisation).

You'd want to do dma_map or kmap at a point after the DMA or PIO
decision is made.

> If we enforce the use of a kmap_pio API in the USB mass storage, it
> wouldn't be optimal since this code is not aware of whether the HCD
> driver would do PIO or DMA.

So, if the decision isn't made until the HCD driver, then surely this
belongs there.

> Looking at the HCD drivers, there are more similarities to the DMA API
> like pio_map_page, pio_map_single etc. I personally don't care much
> about how the API is named but rather where such API should be called
> from.

the API usage is similar to the dma one.  The point I was making is the
semantics are similar to kmap because of the slot and atomic/non atomic
requirements.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff -puN drivers/ata/libata-sff.c~ata-call-flush_dcache_page-around-pio-data-transfers-in-libata-affc drivers/ata/libata-sff.c
--- a/drivers/ata/libata-sff.c~ata-call-flush_dcache_page-around-pio-data-transfers-in-libata-affc
+++ a/drivers/ata/libata-sff.c
@@ -874,6 +874,9 @@  static void ata_pio_sector(struct ata_qu
 
 	DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read");
 
+	if (do_write)
+		flush_dcache_page(page);
+
 	if (PageHighMem(page)) {
 		unsigned long flags;
 
@@ -893,6 +896,9 @@  static void ata_pio_sector(struct ata_qu
 				       do_write);
 	}
 
+	if (!do_write)
+		flush_dcache_page(page);
+
 	qc->curbytes += qc->sect_size;
 	qc->cursg_ofs += qc->sect_size;