[4/4] drm/vgem: flush page during page fault

Message ID	20180117003559.67837-4-gurchetansingh@chromium.org
State	Rejected
Headers	show Return-Path: <linux-tegra-owner@vger.kernel.org> From: Gurchetan Singh <gurchetansingh@chromium.org> To: dri-devel@lists.freedesktop.org Cc: thierry.reding@gmail.com, heiko@sntech.de, daniel.vetter@intel.com, chris@chris-wilson.co.uk, linux-tegra@vger.kernel.org, Gurchetan Singh <gurchetansingh@chromium.org> Subject: [PATCH 4/4] drm/vgem: flush page during page fault Date: Tue, 16 Jan 2018 16:35:59 -0800 Message-Id: <20180117003559.67837-4-gurchetansingh@chromium.org> In-Reply-To: <20180117003559.67837-1-gurchetansingh@chromium.org> References: <20180117003559.67837-1-gurchetansingh@chromium.org> Sender: linux-tegra-owner@vger.kernel.org Precedence: bulk
Series	[1/4] drm: rename {drm_clflush_sg, drm_clflush_pages} \| expand [1/4] drm: rename {drm_clflush_sg, drm_clflush_pages} [2/4] drm: add additional parameter in drm_flush_pages() and drm_flush_sg() [3/4] drm: add ARM flush implementations [4/4] drm/vgem: flush page during page fault

Message ID

20180117003559.67837-4-gurchetansingh@chromium.org

State

Rejected

Headers

From: Gurchetan Singh <gurchetansingh@chromium.org>
To: dri-devel@lists.freedesktop.org
Cc: thierry.reding@gmail.com, heiko@sntech.de, daniel.vetter@intel.com,
	chris@chris-wilson.co.uk, linux-tegra@vger.kernel.org,
	Gurchetan Singh <gurchetansingh@chromium.org>
Subject: [PATCH 4/4] drm/vgem: flush page during page fault
Date: Tue, 16 Jan 2018 16:35:59 -0800
Message-Id: <20180117003559.67837-4-gurchetansingh@chromium.org>
In-Reply-To: <20180117003559.67837-1-gurchetansingh@chromium.org>
References: <20180117003559.67837-1-gurchetansingh@chromium.org>
Sender: linux-tegra-owner@vger.kernel.org
Precedence: bulk

Series

[1/4] drm: rename {drm_clflush_sg, drm_clflush_pages} | expand

Commit Message

Gurchetan Singh Jan. 17, 2018, 12:35 a.m. UTC

This is required to use buffers allocated by vgem on AMD and ARM devices.
We're experiencing a case where eviction of the cache races with userspace
writes. To fix this, flush the cache after retrieving a page.

Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
---
 drivers/gpu/drm/vgem/vgem_drv.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Daniel Vetter Jan. 17, 2018, 8:39 a.m. UTC | #1

On Tue, Jan 16, 2018 at 04:35:59PM -0800, Gurchetan Singh wrote:
> This is required to use buffers allocated by vgem on AMD and ARM devices.
> We're experiencing a case where eviction of the cache races with userspace
> writes. To fix this, flush the cache after retrieving a page.
> 
> Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
> ---
>  drivers/gpu/drm/vgem/vgem_drv.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/vgem/vgem_drv.c b/drivers/gpu/drm/vgem/vgem_drv.c
> index 35bfdfb746a7..fb263969f02d 100644
> --- a/drivers/gpu/drm/vgem/vgem_drv.c
> +++ b/drivers/gpu/drm/vgem/vgem_drv.c
> @@ -112,6 +112,7 @@ static int vgem_gem_fault(struct vm_fault *vmf)
>  				break;
>  		}
>  
> +		drm_flush_pages(obj->base.dev->dev, &page, 1);

Uh ... what exactly are you doing?

Asking because the entire "who's responsible for coherency" story is
entirely undefined still when doing buffer sharing :-/ What is clear is
that currently vgem entirely ignores this (there's not
begin/end_cpu_access callback), mostly because the shared dma-buf support
in drm_prime.c also entirely ignores this. And doing a one-time only
flushing in your fault handler is definitely not going to fix this (at
least not if you do anything else than one-shot uploads).
-Daniel

>  	}
>  	return ret;
>  }
> -- 
> 2.13.5
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

Daniel Vetter Jan. 18, 2018, 7:38 a.m. UTC | #2

On Wed, Jan 17, 2018 at 11:49 PM, Gurchetan Singh
<gurchetansingh@chromium.org> wrote:
>
> On Wed, Jan 17, 2018 at 12:39 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
>>
>> On Tue, Jan 16, 2018 at 04:35:59PM -0800, Gurchetan Singh wrote:
>> > This is required to use buffers allocated by vgem on AMD and ARM
>> > devices.
>> > We're experiencing a case where eviction of the cache races with
>> > userspace
>> > writes. To fix this, flush the cache after retrieving a page.
>> >
>> > Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
>> > ---
>> >  drivers/gpu/drm/vgem/vgem_drv.c | 1 +
>> >  1 file changed, 1 insertion(+)
>> >
>> > diff --git a/drivers/gpu/drm/vgem/vgem_drv.c
>> > b/drivers/gpu/drm/vgem/vgem_drv.c
>> > index 35bfdfb746a7..fb263969f02d 100644
>> > --- a/drivers/gpu/drm/vgem/vgem_drv.c
>> > +++ b/drivers/gpu/drm/vgem/vgem_drv.c
>> > @@ -112,6 +112,7 @@ static int vgem_gem_fault(struct vm_fault *vmf)
>> >                               break;
>> >               }
>> >
>> > +             drm_flush_pages(obj->base.dev->dev, &page, 1);
>>
>> Uh ... what exactly are you doing?
>>
>> Asking because the entire "who's responsible for coherency" story is
>> entirely undefined still when doing buffer sharing :-/ What is clear is
>> that currently vgem entirely ignores this (there's not
>> begin/end_cpu_access callback), mostly because the shared dma-buf support
>> in drm_prime.c also entirely ignores this.
>
>
>
> This patch isn't trying to address the case of a dma-buf imported into vgem.
> It's trying to address the case when a buffer is created by
> vgem_gem_dumb_create, mapped by vgem_gem_dumb_map and then accessed by user
> space.  Since the page retrieved by shmem_read_mapping_page during the page
> fault may still be in the cache, we're experiencing incorrect data in
> buffer.  Here's the test case we're running:
>
> https://chromium.googlesource.com/chromiumos/platform/drm-tests/+/master/vgem_test.c

404s over here (Internal url?).

> It fails on line 210 on AMD and ARM devices (not Intel though).

So you _do_ import it on the other device driver as a dma-buf (and
export it from vgem)? Because coherency isn't well-defined for dma-buf
no matter who the exporter/importer is.

>> And doing a one-time only
>> flushing in your fault handler is definitely not going to fix this (at
>> least not if you do anything else than one-shot uploads).
>
>
> There used to be a be vgem_gem_get_pages function, but that's been removed.
> I don't know where else to flush in this situation.

dma_buf begin/end cpu access. Even exposed as an ioctl when you're
using the dma-buf mmap stuff. Vgem doesn't have that, which means
as-is the dumb mmap support of vgem can't really support this if you
want to do explicit flushing.

What would work is uncached/writecombine/coherent dma memory. But then
we're in the middle of the entire
"who's responsible.
-Daniel

>
>>
>> -Daniel
>>
>> >       }
>> >       return ret;
>> >  }
>> > --
>> > 2.13.5
>> >
>> > _______________________________________________
>> > dri-devel mailing list
>> > dri-devel@lists.freedesktop.org
>> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>
>> --
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> http://blog.ffwll.ch
>
>

Daniel Vetter Jan. 30, 2018, 9:14 a.m. UTC | #3

On Thu, Jan 18, 2018 at 09:23:31AM -0800, Gurchetan Singh wrote:
> On Wed, Jan 17, 2018 at 11:38 PM, Daniel Vetter <daniel@ffwll.ch> wrote:
> 
> > On Wed, Jan 17, 2018 at 11:49 PM, Gurchetan Singh
> > <gurchetansingh@chromium.org> wrote:
> > >
> > > On Wed, Jan 17, 2018 at 12:39 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
> > >>
> > >> On Tue, Jan 16, 2018 at 04:35:59PM -0800, Gurchetan Singh wrote:
> > >> > This is required to use buffers allocated by vgem on AMD and ARM
> > >> > devices.
> > >> > We're experiencing a case where eviction of the cache races with
> > >> > userspace
> > >> > writes. To fix this, flush the cache after retrieving a page.
> > >> >
> > >> > Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
> > >> > ---
> > >> >  drivers/gpu/drm/vgem/vgem_drv.c | 1 +
> > >> >  1 file changed, 1 insertion(+)
> > >> >
> > >> > diff --git a/drivers/gpu/drm/vgem/vgem_drv.c
> > >> > b/drivers/gpu/drm/vgem/vgem_drv.c
> > >> > index 35bfdfb746a7..fb263969f02d 100644
> > >> > --- a/drivers/gpu/drm/vgem/vgem_drv.c
> > >> > +++ b/drivers/gpu/drm/vgem/vgem_drv.c
> > >> > @@ -112,6 +112,7 @@ static int vgem_gem_fault(struct vm_fault *vmf)
> > >> >                               break;
> > >> >               }
> > >> >
> > >> > +             drm_flush_pages(obj->base.dev->dev, &page, 1);
> > >>
> > >> Uh ... what exactly are you doing?
> > >>
> > >> Asking because the entire "who's responsible for coherency" story is
> > >> entirely undefined still when doing buffer sharing :-/ What is clear is
> > >> that currently vgem entirely ignores this (there's not
> > >> begin/end_cpu_access callback), mostly because the shared dma-buf
> > support
> > >> in drm_prime.c also entirely ignores this.
> > >
> > >
> > >
> > > This patch isn't trying to address the case of a dma-buf imported into
> > vgem.
> > > It's trying to address the case when a buffer is created by
> > > vgem_gem_dumb_create, mapped by vgem_gem_dumb_map and then accessed by
> > user
> > > space.  Since the page retrieved by shmem_read_mapping_page during the
> > page
> > > fault may still be in the cache, we're experiencing incorrect data in
> > > buffer.  Here's the test case we're running:
> > >
> > > https://chromium.googlesource.com/chromiumos/platform/drm-te
> > sts/+/master/vgem_test.c
> >
> > 404s over here (Internal url?).
> 
> 
> Hmm ... I was able to access that link without being logged in to any
> accounts.
> 
> > It fails on line 210 on AMD and ARM devices (not Intel though).
> >
> > So you _do_ import it on the other device driver as a dma-buf (and
> > export it from vgem)?
> 
> 
> Those portions of the test work fine (if the platform has a drm_clflush
> implementation).  vgem_prime_pin calls drm_clflush_pages for the exporting
> case.  Typically, ARM drivers flush the cache after drm_gem_get_pages() and
> only do WC mappings, so import works.  For Intel, there is some hardware
> level coherency involved.  The problem is vgem doesn't flush the cache on
> ARM/AMD when getting pages for the non-export/non-import case (when
> faulting after a vgem_gem_dumb_map, not during dma-buf mmap) -- i.e, during
> regular use of the buffer.

So if the idea is that the vgem buffers should be accessed using WC, then
we need to switch the dump_map stuff to do wc. Flushing once is not going
to fix things if you write again (afaik CrOS only does write-once and then
throws buffers away again, but not sure).

The other option is to throw dumb_map into the wind and only support
dma-buf mmaping, which has a special ioctl for range flushing (so that we
could flush before/after each access as needed). A gross hack would be to
keep using dumb_map but abuse the dma-buf flushing ioctl for the flushing.

The problem ofc is that there's no agreement between importers/exporters
on who should flush when and where. Fixing that is way too much work, so
personally I think the simplest clean fix is something along the lines of
using the dma-buf flush ioctl (DMA_BUF_IOCTL_SYNC).

Cheers, Daniel

> 
> >
> > >> And doing a one-time only
> > >> flushing in your fault handler is definitely not going to fix this (at
> > >> least not if you do anything else than one-shot uploads).
> > >
> > >
> > > There used to be a be vgem_gem_get_pages function, but that's been
> > removed.
> > > I don't know where else to flush in this situation.
> >
> > dma_buf begin/end cpu access. Even exposed as an ioctl when you're
> > using the dma-buf mmap stuff. Vgem doesn't have that, which means
> > as-is the dumb mmap support of vgem can't really support this if you
> > want to do explicit flushing.
> 
> 
> > What would work is uncached/writecombine/coherent dma memory. But then
> > we're in the middle of the entire
> > "who's responsible.
> > -Daniel
> >
> > >
> > >>
> > >> -Daniel
> > >>
> > >> >       }
> > >> >       return ret;
> > >> >  }
> > >> > --
> > >> > 2.13.5
> > >> >
> > >> > _______________________________________________
> > >> > dri-devel mailing list
> > >> > dri-devel@lists.freedesktop.org
> > >> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > >>
> > >> --
> > >> Daniel Vetter
> > >> Software Engineer, Intel Corporation
> > >> http://blog.ffwll.ch
> > >
> > >
> >
> >
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> >

diff --git a/drivers/gpu/drm/vgem/vgem_drv.c b/drivers/gpu/drm/vgem/vgem_drv.c
index 35bfdfb746a7..fb263969f02d 100644
--- a/drivers/gpu/drm/vgem/vgem_drv.c
+++ b/drivers/gpu/drm/vgem/vgem_drv.c
@@ -112,6 +112,7 @@  static int vgem_gem_fault(struct vm_fault *vmf)
 				break;
 		}
 
+		drm_flush_pages(obj->base.dev->dev, &page, 1);
 	}
 	return ret;
 }

[4/4] drm/vgem: flush page during page fault

Commit Message

Comments

Patch