diff mbox series

[4/5] drm/tegra: Restrict IOVA space to DMA mask

Message ID 20190123093951.24908-5-thierry.reding@gmail.com
State Superseded
Headers show
Series drm/tegra: Fix IOVA space on Tegra186 and later | expand

Commit Message

Thierry Reding Jan. 23, 2019, 9:39 a.m. UTC
From: Thierry Reding <treding@nvidia.com>

On Tegra186 and later, the ARM SMMU provides an input address space that
is 48 bits wide. However, memory clients can only address up to 40 bits.
If the geometry is used as-is, allocations of IOVA space can end up in a
region that cannot be addressed by the memory clients.

To fix this, restrict the IOVA space to the DMA mask of the host1x
device. Note that, technically, the IOVA space needs to be restricted to
the intersection of the DMA masks for all clients that are attached to
the IOMMU domain. In practice using the DMA mask of the host1x device is
sufficient because all host1x clients share the same DMA mask.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/gpu/drm/tegra/drm.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Dmitry Osipenko Jan. 23, 2019, 1:41 p.m. UTC | #1
23.01.2019 12:39, Thierry Reding пишет:
> From: Thierry Reding <treding@nvidia.com>
> 
> On Tegra186 and later, the ARM SMMU provides an input address space that
> is 48 bits wide. However, memory clients can only address up to 40 bits.
> If the geometry is used as-is, allocations of IOVA space can end up in a
> region that cannot be addressed by the memory clients.
> 
> To fix this, restrict the IOVA space to the DMA mask of the host1x
> device. Note that, technically, the IOVA space needs to be restricted to
> the intersection of the DMA masks for all clients that are attached to
> the IOMMU domain. In practice using the DMA mask of the host1x device is
> sufficient because all host1x clients share the same DMA mask.
> 
> Signed-off-by: Thierry Reding <treding@nvidia.com>
> ---
>  drivers/gpu/drm/tegra/drm.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
> index 271c7a5fc954..0c5f1e6a0446 100644
> --- a/drivers/gpu/drm/tegra/drm.c
> +++ b/drivers/gpu/drm/tegra/drm.c
> @@ -136,11 +136,12 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
>  
>  	if (tegra->domain) {
>  		u64 carveout_start, carveout_end, gem_start, gem_end;
> +		u64 dma_mask = dma_get_mask(&device->dev);
>  		dma_addr_t start, end;
>  		unsigned long order;
>  
> -		start = tegra->domain->geometry.aperture_start;
> -		end = tegra->domain->geometry.aperture_end;
> +		start = tegra->domain->geometry.aperture_start & dma_mask;
> +		end = tegra->domain->geometry.aperture_end & dma_mask;
>  
>  		gem_start = start;
>  		gem_end = end - CARVEOUT_SZ;
> 

Wow, so IOVA could address >32bits on later Tegra's. AFAIK, currently there is no support for a proper programming of the 64bit addresses in the drivers code, hence.. won't it make sense to force IOVA mask to 32bit for now and hope that the second halve of address registers happen to be 0x00000000 in HW?
Thierry Reding Jan. 23, 2019, 2:04 p.m. UTC | #2
On Wed, Jan 23, 2019 at 04:41:44PM +0300, Dmitry Osipenko wrote:
> 23.01.2019 12:39, Thierry Reding пишет:
> > From: Thierry Reding <treding@nvidia.com>
> > 
> > On Tegra186 and later, the ARM SMMU provides an input address space that
> > is 48 bits wide. However, memory clients can only address up to 40 bits.
> > If the geometry is used as-is, allocations of IOVA space can end up in a
> > region that cannot be addressed by the memory clients.
> > 
> > To fix this, restrict the IOVA space to the DMA mask of the host1x
> > device. Note that, technically, the IOVA space needs to be restricted to
> > the intersection of the DMA masks for all clients that are attached to
> > the IOMMU domain. In practice using the DMA mask of the host1x device is
> > sufficient because all host1x clients share the same DMA mask.
> > 
> > Signed-off-by: Thierry Reding <treding@nvidia.com>
> > ---
> >  drivers/gpu/drm/tegra/drm.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
> > index 271c7a5fc954..0c5f1e6a0446 100644
> > --- a/drivers/gpu/drm/tegra/drm.c
> > +++ b/drivers/gpu/drm/tegra/drm.c
> > @@ -136,11 +136,12 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
> >  
> >  	if (tegra->domain) {
> >  		u64 carveout_start, carveout_end, gem_start, gem_end;
> > +		u64 dma_mask = dma_get_mask(&device->dev);
> >  		dma_addr_t start, end;
> >  		unsigned long order;
> >  
> > -		start = tegra->domain->geometry.aperture_start;
> > -		end = tegra->domain->geometry.aperture_end;
> > +		start = tegra->domain->geometry.aperture_start & dma_mask;
> > +		end = tegra->domain->geometry.aperture_end & dma_mask;
> >  
> >  		gem_start = start;
> >  		gem_end = end - CARVEOUT_SZ;
> > 
> 
> Wow, so IOVA could address >32bits on later Tegra's. AFAIK, currently
> there is no support for a proper programming of the 64bit addresses in
> the drivers code, hence.. won't it make sense to force IOVA mask to
> 32bit for now and hope that the second halve of address registers
> happen to be 0x00000000 in HW?

I think this restriction only applies to display at this point. In
practice you'd be hard put to trigger that case because IOVA memory is
allocated from the bottom, so you'd actually need to use up to 4 GiB of
IOVA space before hitting that.

That said, I vaguely remember typing up the patch to support writing the
WINBUF_START_ADDR_HI register and friends, but it looks as if that was
never merged.

I'll try to dig out that patch (or rewrite it, shouldn't be very
difficult) and make it part of this series. I'd rather fix that issue
than arbitrarily restrict the IOVA space, because that's likely to come
back and bite us at some point.

Thierry
Dmitry Osipenko Jan. 23, 2019, 2:33 p.m. UTC | #3
23.01.2019 17:04, Thierry Reding пишет:
> On Wed, Jan 23, 2019 at 04:41:44PM +0300, Dmitry Osipenko wrote:
>> 23.01.2019 12:39, Thierry Reding пишет:
>>> From: Thierry Reding <treding@nvidia.com>
>>>
>>> On Tegra186 and later, the ARM SMMU provides an input address space that
>>> is 48 bits wide. However, memory clients can only address up to 40 bits.
>>> If the geometry is used as-is, allocations of IOVA space can end up in a
>>> region that cannot be addressed by the memory clients.
>>>
>>> To fix this, restrict the IOVA space to the DMA mask of the host1x
>>> device. Note that, technically, the IOVA space needs to be restricted to
>>> the intersection of the DMA masks for all clients that are attached to
>>> the IOMMU domain. In practice using the DMA mask of the host1x device is
>>> sufficient because all host1x clients share the same DMA mask.
>>>
>>> Signed-off-by: Thierry Reding <treding@nvidia.com>
>>> ---
>>>  drivers/gpu/drm/tegra/drm.c | 5 +++--
>>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
>>> index 271c7a5fc954..0c5f1e6a0446 100644
>>> --- a/drivers/gpu/drm/tegra/drm.c
>>> +++ b/drivers/gpu/drm/tegra/drm.c
>>> @@ -136,11 +136,12 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
>>>  
>>>  	if (tegra->domain) {
>>>  		u64 carveout_start, carveout_end, gem_start, gem_end;
>>> +		u64 dma_mask = dma_get_mask(&device->dev);
>>>  		dma_addr_t start, end;
>>>  		unsigned long order;
>>>  
>>> -		start = tegra->domain->geometry.aperture_start;
>>> -		end = tegra->domain->geometry.aperture_end;
>>> +		start = tegra->domain->geometry.aperture_start & dma_mask;
>>> +		end = tegra->domain->geometry.aperture_end & dma_mask;
>>>  
>>>  		gem_start = start;
>>>  		gem_end = end - CARVEOUT_SZ;
>>>
>>
>> Wow, so IOVA could address >32bits on later Tegra's. AFAIK, currently
>> there is no support for a proper programming of the 64bit addresses in
>> the drivers code, hence.. won't it make sense to force IOVA mask to
>> 32bit for now and hope that the second halve of address registers
>> happen to be 0x00000000 in HW?
> 
> I think this restriction only applies to display at this point. In
> practice you'd be hard put to trigger that case because IOVA memory is
> allocated from the bottom, so you'd actually need to use up to 4 GiB of
> IOVA space before hitting that.
> 
> That said, I vaguely remember typing up the patch to support writing the
> WINBUF_START_ADDR_HI register and friends, but it looks as if that was
> never merged.
> 
> I'll try to dig out that patch (or rewrite it, shouldn't be very
> difficult) and make it part of this series. I'd rather fix that issue
> than arbitrarily restrict the IOVA space, because that's likely to come
> back and bite us at some point.

You could also try to change the IOVA allocation logic and see what will fail :)
Dmitry Osipenko Jan. 23, 2019, 3:55 p.m. UTC | #4
23.01.2019 17:04, Thierry Reding пишет:
> On Wed, Jan 23, 2019 at 04:41:44PM +0300, Dmitry Osipenko wrote:
>> 23.01.2019 12:39, Thierry Reding пишет:
>>> From: Thierry Reding <treding@nvidia.com>
>>>
>>> On Tegra186 and later, the ARM SMMU provides an input address space that
>>> is 48 bits wide. However, memory clients can only address up to 40 bits.
>>> If the geometry is used as-is, allocations of IOVA space can end up in a
>>> region that cannot be addressed by the memory clients.
>>>
>>> To fix this, restrict the IOVA space to the DMA mask of the host1x
>>> device. Note that, technically, the IOVA space needs to be restricted to
>>> the intersection of the DMA masks for all clients that are attached to
>>> the IOMMU domain. In practice using the DMA mask of the host1x device is
>>> sufficient because all host1x clients share the same DMA mask.
>>>
>>> Signed-off-by: Thierry Reding <treding@nvidia.com>
>>> ---
>>>  drivers/gpu/drm/tegra/drm.c | 5 +++--
>>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
>>> index 271c7a5fc954..0c5f1e6a0446 100644
>>> --- a/drivers/gpu/drm/tegra/drm.c
>>> +++ b/drivers/gpu/drm/tegra/drm.c
>>> @@ -136,11 +136,12 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
>>>  
>>>  	if (tegra->domain) {
>>>  		u64 carveout_start, carveout_end, gem_start, gem_end;
>>> +		u64 dma_mask = dma_get_mask(&device->dev);
>>>  		dma_addr_t start, end;
>>>  		unsigned long order;
>>>  
>>> -		start = tegra->domain->geometry.aperture_start;
>>> -		end = tegra->domain->geometry.aperture_end;
>>> +		start = tegra->domain->geometry.aperture_start & dma_mask;
>>> +		end = tegra->domain->geometry.aperture_end & dma_mask;
>>>  
>>>  		gem_start = start;
>>>  		gem_end = end - CARVEOUT_SZ;
>>>
>>
>> Wow, so IOVA could address >32bits on later Tegra's. AFAIK, currently
>> there is no support for a proper programming of the 64bit addresses in
>> the drivers code, hence.. won't it make sense to force IOVA mask to
>> 32bit for now and hope that the second halve of address registers
>> happen to be 0x00000000 in HW?
> 
> I think this restriction only applies to display at this point. In
> practice you'd be hard put to trigger that case because IOVA memory is
> allocated from the bottom, so you'd actually need to use up to 4 GiB of
> IOVA space before hitting that.
> 
> That said, I vaguely remember typing up the patch to support writing the
> WINBUF_START_ADDR_HI register and friends, but it looks as if that was
> never merged.
> 
> I'll try to dig out that patch (or rewrite it, shouldn't be very
> difficult) and make it part of this series. I'd rather fix that issue
> than arbitrarily restrict the IOVA space, because that's likely to come
> back and bite us at some point.

Looking at falcon.c.. it writes 32bit address only. Something need to be done about it as well, seems there is FALCON_DMATRFMOFFS register to facilitate >32bits addressing.

Secondly, right now there is no support for Host1x commands-stream >32bit BO address patching. Current UAPI may not work well in that case, though that's not really a problem given the staging nature of the UAPI. Something to consider for the UAPI de-staging.
Dmitry Osipenko Jan. 23, 2019, 7:42 p.m. UTC | #5
23.01.2019 18:55, Dmitry Osipenko пишет:
> 23.01.2019 17:04, Thierry Reding пишет:
>> On Wed, Jan 23, 2019 at 04:41:44PM +0300, Dmitry Osipenko wrote:
>>> 23.01.2019 12:39, Thierry Reding пишет:
>>>> From: Thierry Reding <treding@nvidia.com>
>>>>
>>>> On Tegra186 and later, the ARM SMMU provides an input address space that
>>>> is 48 bits wide. However, memory clients can only address up to 40 bits.
>>>> If the geometry is used as-is, allocations of IOVA space can end up in a
>>>> region that cannot be addressed by the memory clients.
>>>>
>>>> To fix this, restrict the IOVA space to the DMA mask of the host1x
>>>> device. Note that, technically, the IOVA space needs to be restricted to
>>>> the intersection of the DMA masks for all clients that are attached to
>>>> the IOMMU domain. In practice using the DMA mask of the host1x device is
>>>> sufficient because all host1x clients share the same DMA mask.
>>>>
>>>> Signed-off-by: Thierry Reding <treding@nvidia.com>
>>>> ---
>>>>  drivers/gpu/drm/tegra/drm.c | 5 +++--
>>>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
>>>> index 271c7a5fc954..0c5f1e6a0446 100644
>>>> --- a/drivers/gpu/drm/tegra/drm.c
>>>> +++ b/drivers/gpu/drm/tegra/drm.c
>>>> @@ -136,11 +136,12 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
>>>>  
>>>>  	if (tegra->domain) {
>>>>  		u64 carveout_start, carveout_end, gem_start, gem_end;
>>>> +		u64 dma_mask = dma_get_mask(&device->dev);
>>>>  		dma_addr_t start, end;
>>>>  		unsigned long order;
>>>>  
>>>> -		start = tegra->domain->geometry.aperture_start;
>>>> -		end = tegra->domain->geometry.aperture_end;
>>>> +		start = tegra->domain->geometry.aperture_start & dma_mask;
>>>> +		end = tegra->domain->geometry.aperture_end & dma_mask;
>>>>  
>>>>  		gem_start = start;
>>>>  		gem_end = end - CARVEOUT_SZ;
>>>>
>>>
>>> Wow, so IOVA could address >32bits on later Tegra's. AFAIK, currently
>>> there is no support for a proper programming of the 64bit addresses in
>>> the drivers code, hence.. won't it make sense to force IOVA mask to
>>> 32bit for now and hope that the second halve of address registers
>>> happen to be 0x00000000 in HW?
>>
>> I think this restriction only applies to display at this point. In
>> practice you'd be hard put to trigger that case because IOVA memory is
>> allocated from the bottom, so you'd actually need to use up to 4 GiB of
>> IOVA space before hitting that.
>>
>> That said, I vaguely remember typing up the patch to support writing the
>> WINBUF_START_ADDR_HI register and friends, but it looks as if that was
>> never merged.
>>
>> I'll try to dig out that patch (or rewrite it, shouldn't be very
>> difficult) and make it part of this series. I'd rather fix that issue
>> than arbitrarily restrict the IOVA space, because that's likely to come
>> back and bite us at some point.
> 
> Looking at falcon.c.. it writes 32bit address only. Something need to be done about it as well, seems there is FALCON_DMATRFMOFFS register to facilitate >32bits addressing.

Although scratch about FALCON_DMATRFMOFFS, it should be about something else. Mikko, could you please clarify whether falcon could load firmware only from a 32bit AS?
Mikko Perttunen Jan. 24, 2019, 10:24 a.m. UTC | #6
On 23.1.2019 21.42, Dmitry Osipenko wrote:
> 23.01.2019 18:55, Dmitry Osipenko пишет:
>> 23.01.2019 17:04, Thierry Reding пишет:
>>> On Wed, Jan 23, 2019 at 04:41:44PM +0300, Dmitry Osipenko wrote:
>>>> 23.01.2019 12:39, Thierry Reding пишет:
>>>>> From: Thierry Reding <treding@nvidia.com>
>>>>>
>>>>> On Tegra186 and later, the ARM SMMU provides an input address space that
>>>>> is 48 bits wide. However, memory clients can only address up to 40 bits.
>>>>> If the geometry is used as-is, allocations of IOVA space can end up in a
>>>>> region that cannot be addressed by the memory clients.
>>>>>
>>>>> To fix this, restrict the IOVA space to the DMA mask of the host1x
>>>>> device. Note that, technically, the IOVA space needs to be restricted to
>>>>> the intersection of the DMA masks for all clients that are attached to
>>>>> the IOMMU domain. In practice using the DMA mask of the host1x device is
>>>>> sufficient because all host1x clients share the same DMA mask.
>>>>>
>>>>> Signed-off-by: Thierry Reding <treding@nvidia.com>
>>>>> ---
>>>>>   drivers/gpu/drm/tegra/drm.c | 5 +++--
>>>>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
>>>>> index 271c7a5fc954..0c5f1e6a0446 100644
>>>>> --- a/drivers/gpu/drm/tegra/drm.c
>>>>> +++ b/drivers/gpu/drm/tegra/drm.c
>>>>> @@ -136,11 +136,12 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
>>>>>   
>>>>>   	if (tegra->domain) {
>>>>>   		u64 carveout_start, carveout_end, gem_start, gem_end;
>>>>> +		u64 dma_mask = dma_get_mask(&device->dev);
>>>>>   		dma_addr_t start, end;
>>>>>   		unsigned long order;
>>>>>   
>>>>> -		start = tegra->domain->geometry.aperture_start;
>>>>> -		end = tegra->domain->geometry.aperture_end;
>>>>> +		start = tegra->domain->geometry.aperture_start & dma_mask;
>>>>> +		end = tegra->domain->geometry.aperture_end & dma_mask;
>>>>>   
>>>>>   		gem_start = start;
>>>>>   		gem_end = end - CARVEOUT_SZ;
>>>>>
>>>>
>>>> Wow, so IOVA could address >32bits on later Tegra's. AFAIK, currently
>>>> there is no support for a proper programming of the 64bit addresses in
>>>> the drivers code, hence.. won't it make sense to force IOVA mask to
>>>> 32bit for now and hope that the second halve of address registers
>>>> happen to be 0x00000000 in HW?
>>>
>>> I think this restriction only applies to display at this point. In
>>> practice you'd be hard put to trigger that case because IOVA memory is
>>> allocated from the bottom, so you'd actually need to use up to 4 GiB of
>>> IOVA space before hitting that.
>>>
>>> That said, I vaguely remember typing up the patch to support writing the
>>> WINBUF_START_ADDR_HI register and friends, but it looks as if that was
>>> never merged.
>>>
>>> I'll try to dig out that patch (or rewrite it, shouldn't be very
>>> difficult) and make it part of this series. I'd rather fix that issue
>>> than arbitrarily restrict the IOVA space, because that's likely to come
>>> back and bite us at some point.
>>
>> Looking at falcon.c.. it writes 32bit address only. Something need to be done about it as well, seems there is FALCON_DMATRFMOFFS register to facilitate >32bits addressing.
> 
> Although scratch about FALCON_DMATRFMOFFS, it should be about something else. Mikko, could you please clarify whether falcon could load firmware only from a 32bit AS?
> 

The DMA base address is set using DMATRFBASE, which requires 256B 
alignment, meaning 40 bits are available for the address. The 
DMATRFFBOFFS I believe is then used as a 32-bit offset to that value.

Mikko
Dmitry Osipenko Jan. 24, 2019, 1:15 p.m. UTC | #7
24.01.2019 13:24, Mikko Perttunen пишет:
> On 23.1.2019 21.42, Dmitry Osipenko wrote:
>> 23.01.2019 18:55, Dmitry Osipenko пишет:
>>> 23.01.2019 17:04, Thierry Reding пишет:
>>>> On Wed, Jan 23, 2019 at 04:41:44PM +0300, Dmitry Osipenko wrote:
>>>>> 23.01.2019 12:39, Thierry Reding пишет:
>>>>>> From: Thierry Reding <treding@nvidia.com>
>>>>>>
>>>>>> On Tegra186 and later, the ARM SMMU provides an input address space that
>>>>>> is 48 bits wide. However, memory clients can only address up to 40 bits.
>>>>>> If the geometry is used as-is, allocations of IOVA space can end up in a
>>>>>> region that cannot be addressed by the memory clients.
>>>>>>
>>>>>> To fix this, restrict the IOVA space to the DMA mask of the host1x
>>>>>> device. Note that, technically, the IOVA space needs to be restricted to
>>>>>> the intersection of the DMA masks for all clients that are attached to
>>>>>> the IOMMU domain. In practice using the DMA mask of the host1x device is
>>>>>> sufficient because all host1x clients share the same DMA mask.
>>>>>>
>>>>>> Signed-off-by: Thierry Reding <treding@nvidia.com>
>>>>>> ---
>>>>>>   drivers/gpu/drm/tegra/drm.c | 5 +++--
>>>>>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
>>>>>> index 271c7a5fc954..0c5f1e6a0446 100644
>>>>>> --- a/drivers/gpu/drm/tegra/drm.c
>>>>>> +++ b/drivers/gpu/drm/tegra/drm.c
>>>>>> @@ -136,11 +136,12 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
>>>>>>         if (tegra->domain) {
>>>>>>           u64 carveout_start, carveout_end, gem_start, gem_end;
>>>>>> +        u64 dma_mask = dma_get_mask(&device->dev);
>>>>>>           dma_addr_t start, end;
>>>>>>           unsigned long order;
>>>>>>   -        start = tegra->domain->geometry.aperture_start;
>>>>>> -        end = tegra->domain->geometry.aperture_end;
>>>>>> +        start = tegra->domain->geometry.aperture_start & dma_mask;
>>>>>> +        end = tegra->domain->geometry.aperture_end & dma_mask;
>>>>>>             gem_start = start;
>>>>>>           gem_end = end - CARVEOUT_SZ;
>>>>>>
>>>>>
>>>>> Wow, so IOVA could address >32bits on later Tegra's. AFAIK, currently
>>>>> there is no support for a proper programming of the 64bit addresses in
>>>>> the drivers code, hence.. won't it make sense to force IOVA mask to
>>>>> 32bit for now and hope that the second halve of address registers
>>>>> happen to be 0x00000000 in HW?
>>>>
>>>> I think this restriction only applies to display at this point. In
>>>> practice you'd be hard put to trigger that case because IOVA memory is
>>>> allocated from the bottom, so you'd actually need to use up to 4 GiB of
>>>> IOVA space before hitting that.
>>>>
>>>> That said, I vaguely remember typing up the patch to support writing the
>>>> WINBUF_START_ADDR_HI register and friends, but it looks as if that was
>>>> never merged.
>>>>
>>>> I'll try to dig out that patch (or rewrite it, shouldn't be very
>>>> difficult) and make it part of this series. I'd rather fix that issue
>>>> than arbitrarily restrict the IOVA space, because that's likely to come
>>>> back and bite us at some point.
>>>
>>> Looking at falcon.c.. it writes 32bit address only. Something need to be done about it as well, seems there is FALCON_DMATRFMOFFS register to facilitate >32bits addressing.
>>
>> Although scratch about FALCON_DMATRFMOFFS, it should be about something else. Mikko, could you please clarify whether falcon could load firmware only from a 32bit AS?
>>
> 
> The DMA base address is set using DMATRFBASE, which requires 256B alignment, meaning 40 bits are available for the address. The DMATRFFBOFFS I believe is then used as a 32-bit offset to that value.

TRM (up to T196) suggests that DMATRFMOFFS is a 16bit offset. Is it a kind of TRM bug or I'm missing something?

I suppose it should be fine to just reserve carveout from the bottom of IOVA space and done with it.
Mikko Perttunen Jan. 24, 2019, 1:27 p.m. UTC | #8
On 24.1.2019 15.15, Dmitry Osipenko wrote:
> 24.01.2019 13:24, Mikko Perttunen пишет:
>> On 23.1.2019 21.42, Dmitry Osipenko wrote:
>>> 23.01.2019 18:55, Dmitry Osipenko пишет:
>>>> 23.01.2019 17:04, Thierry Reding пишет:
>>>>> On Wed, Jan 23, 2019 at 04:41:44PM +0300, Dmitry Osipenko wrote:
>>>>>> 23.01.2019 12:39, Thierry Reding пишет:
>>>>>>> From: Thierry Reding <treding@nvidia.com>
>>>>>>>
>>>>>>> On Tegra186 and later, the ARM SMMU provides an input address space that
>>>>>>> is 48 bits wide. However, memory clients can only address up to 40 bits.
>>>>>>> If the geometry is used as-is, allocations of IOVA space can end up in a
>>>>>>> region that cannot be addressed by the memory clients.
>>>>>>>
>>>>>>> To fix this, restrict the IOVA space to the DMA mask of the host1x
>>>>>>> device. Note that, technically, the IOVA space needs to be restricted to
>>>>>>> the intersection of the DMA masks for all clients that are attached to
>>>>>>> the IOMMU domain. In practice using the DMA mask of the host1x device is
>>>>>>> sufficient because all host1x clients share the same DMA mask.
>>>>>>>
>>>>>>> Signed-off-by: Thierry Reding <treding@nvidia.com>
>>>>>>> ---
>>>>>>>    drivers/gpu/drm/tegra/drm.c | 5 +++--
>>>>>>>    1 file changed, 3 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
>>>>>>> index 271c7a5fc954..0c5f1e6a0446 100644
>>>>>>> --- a/drivers/gpu/drm/tegra/drm.c
>>>>>>> +++ b/drivers/gpu/drm/tegra/drm.c
>>>>>>> @@ -136,11 +136,12 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
>>>>>>>          if (tegra->domain) {
>>>>>>>            u64 carveout_start, carveout_end, gem_start, gem_end;
>>>>>>> +        u64 dma_mask = dma_get_mask(&device->dev);
>>>>>>>            dma_addr_t start, end;
>>>>>>>            unsigned long order;
>>>>>>>    -        start = tegra->domain->geometry.aperture_start;
>>>>>>> -        end = tegra->domain->geometry.aperture_end;
>>>>>>> +        start = tegra->domain->geometry.aperture_start & dma_mask;
>>>>>>> +        end = tegra->domain->geometry.aperture_end & dma_mask;
>>>>>>>              gem_start = start;
>>>>>>>            gem_end = end - CARVEOUT_SZ;
>>>>>>>
>>>>>>
>>>>>> Wow, so IOVA could address >32bits on later Tegra's. AFAIK, currently
>>>>>> there is no support for a proper programming of the 64bit addresses in
>>>>>> the drivers code, hence.. won't it make sense to force IOVA mask to
>>>>>> 32bit for now and hope that the second halve of address registers
>>>>>> happen to be 0x00000000 in HW?
>>>>>
>>>>> I think this restriction only applies to display at this point. In
>>>>> practice you'd be hard put to trigger that case because IOVA memory is
>>>>> allocated from the bottom, so you'd actually need to use up to 4 GiB of
>>>>> IOVA space before hitting that.
>>>>>
>>>>> That said, I vaguely remember typing up the patch to support writing the
>>>>> WINBUF_START_ADDR_HI register and friends, but it looks as if that was
>>>>> never merged.
>>>>>
>>>>> I'll try to dig out that patch (or rewrite it, shouldn't be very
>>>>> difficult) and make it part of this series. I'd rather fix that issue
>>>>> than arbitrarily restrict the IOVA space, because that's likely to come
>>>>> back and bite us at some point.
>>>>
>>>> Looking at falcon.c.. it writes 32bit address only. Something need to be done about it as well, seems there is FALCON_DMATRFMOFFS register to facilitate >32bits addressing.
>>>
>>> Although scratch about FALCON_DMATRFMOFFS, it should be about something else. Mikko, could you please clarify whether falcon could load firmware only from a 32bit AS?
>>>
>>
>> The DMA base address is set using DMATRFBASE, which requires 256B alignment, meaning 40 bits are available for the address. The DMATRFFBOFFS I believe is then used as a 32-bit offset to that value.
> 
> TRM (up to T196) suggests that DMATRFMOFFS is a 16bit offset. Is it a kind of TRM bug or I'm missing something?
> 
> I suppose it should be fine to just reserve carveout from the bottom of IOVA space and done with it.
> 

DMATRFMOFFS is an offset to the Falcon IMEM, so 16 bits is enough to 
cover that.

Mikko
Dmitry Osipenko Jan. 24, 2019, 1:51 p.m. UTC | #9
24.01.2019 16:27, Mikko Perttunen пишет:
> On 24.1.2019 15.15, Dmitry Osipenko wrote:
>> 24.01.2019 13:24, Mikko Perttunen пишет:
>>> On 23.1.2019 21.42, Dmitry Osipenko wrote:
>>>> 23.01.2019 18:55, Dmitry Osipenko пишет:
>>>>> 23.01.2019 17:04, Thierry Reding пишет:
>>>>>> On Wed, Jan 23, 2019 at 04:41:44PM +0300, Dmitry Osipenko wrote:
>>>>>>> 23.01.2019 12:39, Thierry Reding пишет:
>>>>>>>> From: Thierry Reding <treding@nvidia.com>
>>>>>>>>
>>>>>>>> On Tegra186 and later, the ARM SMMU provides an input address space that
>>>>>>>> is 48 bits wide. However, memory clients can only address up to 40 bits.
>>>>>>>> If the geometry is used as-is, allocations of IOVA space can end up in a
>>>>>>>> region that cannot be addressed by the memory clients.
>>>>>>>>
>>>>>>>> To fix this, restrict the IOVA space to the DMA mask of the host1x
>>>>>>>> device. Note that, technically, the IOVA space needs to be restricted to
>>>>>>>> the intersection of the DMA masks for all clients that are attached to
>>>>>>>> the IOMMU domain. In practice using the DMA mask of the host1x device is
>>>>>>>> sufficient because all host1x clients share the same DMA mask.
>>>>>>>>
>>>>>>>> Signed-off-by: Thierry Reding <treding@nvidia.com>
>>>>>>>> ---
>>>>>>>>    drivers/gpu/drm/tegra/drm.c | 5 +++--
>>>>>>>>    1 file changed, 3 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
>>>>>>>> index 271c7a5fc954..0c5f1e6a0446 100644
>>>>>>>> --- a/drivers/gpu/drm/tegra/drm.c
>>>>>>>> +++ b/drivers/gpu/drm/tegra/drm.c
>>>>>>>> @@ -136,11 +136,12 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
>>>>>>>>          if (tegra->domain) {
>>>>>>>>            u64 carveout_start, carveout_end, gem_start, gem_end;
>>>>>>>> +        u64 dma_mask = dma_get_mask(&device->dev);
>>>>>>>>            dma_addr_t start, end;
>>>>>>>>            unsigned long order;
>>>>>>>>    -        start = tegra->domain->geometry.aperture_start;
>>>>>>>> -        end = tegra->domain->geometry.aperture_end;
>>>>>>>> +        start = tegra->domain->geometry.aperture_start & dma_mask;
>>>>>>>> +        end = tegra->domain->geometry.aperture_end & dma_mask;
>>>>>>>>              gem_start = start;
>>>>>>>>            gem_end = end - CARVEOUT_SZ;
>>>>>>>>
>>>>>>>
>>>>>>> Wow, so IOVA could address >32bits on later Tegra's. AFAIK, currently
>>>>>>> there is no support for a proper programming of the 64bit addresses in
>>>>>>> the drivers code, hence.. won't it make sense to force IOVA mask to
>>>>>>> 32bit for now and hope that the second halve of address registers
>>>>>>> happen to be 0x00000000 in HW?
>>>>>>
>>>>>> I think this restriction only applies to display at this point. In
>>>>>> practice you'd be hard put to trigger that case because IOVA memory is
>>>>>> allocated from the bottom, so you'd actually need to use up to 4 GiB of
>>>>>> IOVA space before hitting that.
>>>>>>
>>>>>> That said, I vaguely remember typing up the patch to support writing the
>>>>>> WINBUF_START_ADDR_HI register and friends, but it looks as if that was
>>>>>> never merged.
>>>>>>
>>>>>> I'll try to dig out that patch (or rewrite it, shouldn't be very
>>>>>> difficult) and make it part of this series. I'd rather fix that issue
>>>>>> than arbitrarily restrict the IOVA space, because that's likely to come
>>>>>> back and bite us at some point.
>>>>>
>>>>> Looking at falcon.c.. it writes 32bit address only. Something need to be done about it as well, seems there is FALCON_DMATRFMOFFS register to facilitate >32bits addressing.
>>>>
>>>> Although scratch about FALCON_DMATRFMOFFS, it should be about something else. Mikko, could you please clarify whether falcon could load firmware only from a 32bit AS?
>>>>
>>>
>>> The DMA base address is set using DMATRFBASE, which requires 256B alignment, meaning 40 bits are available for the address. The DMATRFFBOFFS I believe is then used as a 32-bit offset to that value.
>>
>> TRM (up to T196) suggests that DMATRFMOFFS is a 16bit offset. Is it a kind of TRM bug or I'm missing something?
>>
>> I suppose it should be fine to just reserve carveout from the bottom of IOVA space and done with it.
>>
> 
> DMATRFMOFFS is an offset to the Falcon IMEM, so 16 bits is enough to cover that.

Ah, there are source DMATRFMOFFS (32bit DRAM offset) and destination DMATRFMOFFS (16bit IMEM offset). So one variant is to setup DMA address correctly, the other variant is to change the carveout layout and make IOMMU mandatory for the driver.
Thierry Reding Jan. 24, 2019, 6:08 p.m. UTC | #10
On Wed, Jan 23, 2019 at 04:41:44PM +0300, Dmitry Osipenko wrote:
> 23.01.2019 12:39, Thierry Reding пишет:
> > From: Thierry Reding <treding@nvidia.com>
> > 
> > On Tegra186 and later, the ARM SMMU provides an input address space that
> > is 48 bits wide. However, memory clients can only address up to 40 bits.
> > If the geometry is used as-is, allocations of IOVA space can end up in a
> > region that cannot be addressed by the memory clients.
> > 
> > To fix this, restrict the IOVA space to the DMA mask of the host1x
> > device. Note that, technically, the IOVA space needs to be restricted to
> > the intersection of the DMA masks for all clients that are attached to
> > the IOMMU domain. In practice using the DMA mask of the host1x device is
> > sufficient because all host1x clients share the same DMA mask.
> > 
> > Signed-off-by: Thierry Reding <treding@nvidia.com>
> > ---
> >  drivers/gpu/drm/tegra/drm.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
> > index 271c7a5fc954..0c5f1e6a0446 100644
> > --- a/drivers/gpu/drm/tegra/drm.c
> > +++ b/drivers/gpu/drm/tegra/drm.c
> > @@ -136,11 +136,12 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
> >  
> >  	if (tegra->domain) {
> >  		u64 carveout_start, carveout_end, gem_start, gem_end;
> > +		u64 dma_mask = dma_get_mask(&device->dev);
> >  		dma_addr_t start, end;
> >  		unsigned long order;
> >  
> > -		start = tegra->domain->geometry.aperture_start;
> > -		end = tegra->domain->geometry.aperture_end;
> > +		start = tegra->domain->geometry.aperture_start & dma_mask;
> > +		end = tegra->domain->geometry.aperture_end & dma_mask;
> >  
> >  		gem_start = start;
> >  		gem_end = end - CARVEOUT_SZ;
> > 
> 
> Wow, so IOVA could address >32bits on later Tegra's. AFAIK, currently
> there is no support for a proper programming of the 64bit addresses in
> the drivers code, hence.. won't it make sense to force IOVA mask to
> 32bit for now and hope that the second halve of address registers
> happen to be 0x00000000 in HW?

Actually we do have proper programming for 40 bit addresses in the
display driver code for Tegra186 and later, see:

	drivers/gpu/drm/tegra/hub.c:
		tegra_shared_plane_atomic_update()

Code to support that on Tegra210 and earlier should be almost identical.
I'll try to add that as well. It's not actually necessary there because
with an SMMU enabled on those chips, the address space will be 32 bits.
I suspect that there could be an issue with running without an SMMU on
Tegra124 (with > 2 GiB of memory and LPAE enabled) or Tegra210. I'll see
if I can produce a case where this actually fails and then add the code
for proper programming there.

In the meantime, I've sent out a v2 of the series that takes care of
properly programming the various bits in host1x required for full 40-bit
addresses on Tegra186 and later.

Thierry
diff mbox series

Patch

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 271c7a5fc954..0c5f1e6a0446 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -136,11 +136,12 @@  static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
 
 	if (tegra->domain) {
 		u64 carveout_start, carveout_end, gem_start, gem_end;
+		u64 dma_mask = dma_get_mask(&device->dev);
 		dma_addr_t start, end;
 		unsigned long order;
 
-		start = tegra->domain->geometry.aperture_start;
-		end = tegra->domain->geometry.aperture_end;
+		start = tegra->domain->geometry.aperture_start & dma_mask;
+		end = tegra->domain->geometry.aperture_end & dma_mask;
 
 		gem_start = start;
 		gem_end = end - CARVEOUT_SZ;