diff mbox series

[v5,15/21] drm/tegra: Add new UAPI to header

Message ID 20210111130019.3515669-16-mperttunen@nvidia.com
State Rejected
Headers show
Series [v5,01/21] gpu: host1x: Use different lock classes for each client | expand

Commit Message

Mikko Perttunen Jan. 11, 2021, 1 p.m. UTC
Update the tegra_drm.h UAPI header, adding the new proposed UAPI.
The old staging UAPI is left in for now, with minor modification
to avoid name collisions.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
v4:
* Remove features that are not strictly necessary
* Remove padding/reserved fields in IOCTL structs where
  DRM's zero extension feature allows future expansion
v3:
* Remove timeout field
* Inline the syncpt_incrs array to the submit structure
* Remove WRITE_RELOC (it is now implicit)
---
 include/uapi/drm/tegra_drm.h | 338 ++++++++++++++++++++++++++++++++---
 1 file changed, 311 insertions(+), 27 deletions(-)

Comments

Dmitry Osipenko Jan. 13, 2021, 6:14 p.m. UTC | #1
11.01.2021 16:00, Mikko Perttunen пишет:
> +struct drm_tegra_submit_buf {
> +	/**
> +	 * @mapping_id: [in]
> +	 *
> +	 * Identifier of the mapping to use in the submission.
> +	 */
> +	__u32 mapping_id;

I'm now in process of trying out the UAPI using grate drivers and this
becomes the first obstacle.

Looks like this is not going to work well for older Tegra SoCs, in
particular for T20, which has a small GART.

Given that the usefulness of the partial mapping feature is very
questionable until it will be proven with a real userspace, we should
start with a dynamic mappings that are done at a time of job submission.

DRM already should have everything necessary for creating and managing
caches of mappings, grate kernel driver has been using drm_mm_scan for a
long time now for that.

It should be fine to support the static mapping feature, but it should
be done separately with the drm_mm integration, IMO.

What do think?
Mikko Perttunen Jan. 13, 2021, 6:56 p.m. UTC | #2
On 1/13/21 8:14 PM, Dmitry Osipenko wrote:
> 11.01.2021 16:00, Mikko Perttunen пишет:
>> +struct drm_tegra_submit_buf {
>> +	/**
>> +	 * @mapping_id: [in]
>> +	 *
>> +	 * Identifier of the mapping to use in the submission.
>> +	 */
>> +	__u32 mapping_id;
> 
> I'm now in process of trying out the UAPI using grate drivers and this
> becomes the first obstacle.
> 
> Looks like this is not going to work well for older Tegra SoCs, in
> particular for T20, which has a small GART.
> 
> Given that the usefulness of the partial mapping feature is very
> questionable until it will be proven with a real userspace, we should
> start with a dynamic mappings that are done at a time of job submission.
> 
> DRM already should have everything necessary for creating and managing
> caches of mappings, grate kernel driver has been using drm_mm_scan for a
> long time now for that.
> 
> It should be fine to support the static mapping feature, but it should
> be done separately with the drm_mm integration, IMO.
> 
> What do think?
> 

Can you elaborate on the requirements to be able to use GART? Are there 
any other reasons this would not work on older chips?

I think we should keep CHANNEL_MAP and mapping_ids, but if e.g. for GART 
we cannot do mapping immediately at CHANNEL_MAP time, we can just treat 
it as a "registration" call for the GEM object - potentially no-op like 
direct physical addressing is. We can then do whatever is needed at 
submit time. This way we can have the best of both worlds.

Note that partial mappings are already not present in this version of 
the UAPI.

Mikko
Dmitry Osipenko Jan. 14, 2021, 8:36 a.m. UTC | #3
13.01.2021 21:56, Mikko Perttunen пишет:
> On 1/13/21 8:14 PM, Dmitry Osipenko wrote:
>> 11.01.2021 16:00, Mikko Perttunen пишет:
>>> +struct drm_tegra_submit_buf {
>>> +    /**
>>> +     * @mapping_id: [in]
>>> +     *
>>> +     * Identifier of the mapping to use in the submission.
>>> +     */
>>> +    __u32 mapping_id;
>>
>> I'm now in process of trying out the UAPI using grate drivers and this
>> becomes the first obstacle.
>>
>> Looks like this is not going to work well for older Tegra SoCs, in
>> particular for T20, which has a small GART.
>>
>> Given that the usefulness of the partial mapping feature is very
>> questionable until it will be proven with a real userspace, we should
>> start with a dynamic mappings that are done at a time of job submission.
>>
>> DRM already should have everything necessary for creating and managing
>> caches of mappings, grate kernel driver has been using drm_mm_scan for a
>> long time now for that.
>>
>> It should be fine to support the static mapping feature, but it should
>> be done separately with the drm_mm integration, IMO.
>>
>> What do think?
>>
> 
> Can you elaborate on the requirements to be able to use GART? Are there
> any other reasons this would not work on older chips?

We have all DRM devices in a single address space on T30+, hence having
duplicated mappings for each device should be a bit wasteful.

> I think we should keep CHANNEL_MAP and mapping_ids, but if e.g. for GART
> we cannot do mapping immediately at CHANNEL_MAP time, we can just treat
> it as a "registration" call for the GEM object - potentially no-op like
> direct physical addressing is. We can then do whatever is needed at
> submit time. This way we can have the best of both worlds.

I have some thoughts now, but nothing concrete yet. Maybe we will need
to create a per-SoC ops for MM.

I'll finish with trying what we currently have to see what else is
missing and then we will decide what to do about it.

> Note that partial mappings are already not present in this version of
> the UAPI.

Oh, right :) I haven't got closely to this part of reviewing yet.
Mikko Perttunen Jan. 14, 2021, 10:34 a.m. UTC | #4
On 1/14/21 10:36 AM, Dmitry Osipenko wrote:
> 13.01.2021 21:56, Mikko Perttunen пишет:
>> On 1/13/21 8:14 PM, Dmitry Osipenko wrote:
>>> 11.01.2021 16:00, Mikko Perttunen пишет:
>>>> +struct drm_tegra_submit_buf {
>>>> +    /**
>>>> +     * @mapping_id: [in]
>>>> +     *
>>>> +     * Identifier of the mapping to use in the submission.
>>>> +     */
>>>> +    __u32 mapping_id;
>>>
>>> I'm now in process of trying out the UAPI using grate drivers and this
>>> becomes the first obstacle.
>>>
>>> Looks like this is not going to work well for older Tegra SoCs, in
>>> particular for T20, which has a small GART.
>>>
>>> Given that the usefulness of the partial mapping feature is very
>>> questionable until it will be proven with a real userspace, we should
>>> start with a dynamic mappings that are done at a time of job submission.
>>>
>>> DRM already should have everything necessary for creating and managing
>>> caches of mappings, grate kernel driver has been using drm_mm_scan for a
>>> long time now for that.
>>>
>>> It should be fine to support the static mapping feature, but it should
>>> be done separately with the drm_mm integration, IMO.
>>>
>>> What do think?
>>>
>>
>> Can you elaborate on the requirements to be able to use GART? Are there
>> any other reasons this would not work on older chips?
> 
> We have all DRM devices in a single address space on T30+, hence having
> duplicated mappings for each device should be a bit wasteful.

I guess this should be pretty easy to change to only keep one mapping 
per GEM object.

> 
>> I think we should keep CHANNEL_MAP and mapping_ids, but if e.g. for GART
>> we cannot do mapping immediately at CHANNEL_MAP time, we can just treat
>> it as a "registration" call for the GEM object - potentially no-op like
>> direct physical addressing is. We can then do whatever is needed at
>> submit time. This way we can have the best of both worlds.
> 
> I have some thoughts now, but nothing concrete yet. Maybe we will need
> to create a per-SoC ops for MM.

Yep, I think some specialized code will be needed, but hopefully it will 
be relatively minor.

> 
> I'll finish with trying what we currently have to see what else is
> missing and then we will decide what to do about it.
> 

Great :)

>> Note that partial mappings are already not present in this version of
>> the UAPI.
> 
> Oh, right :) I haven't got closely to this part of reviewing yet.
> 

Mikko
Thierry Reding March 23, 2021, 12:30 p.m. UTC | #5
On Thu, Jan 14, 2021 at 12:34:22PM +0200, Mikko Perttunen wrote:
> On 1/14/21 10:36 AM, Dmitry Osipenko wrote:
> > 13.01.2021 21:56, Mikko Perttunen пишет:
> > > On 1/13/21 8:14 PM, Dmitry Osipenko wrote:
> > > > 11.01.2021 16:00, Mikko Perttunen пишет:
> > > > > +struct drm_tegra_submit_buf {
> > > > > +    /**
> > > > > +     * @mapping_id: [in]
> > > > > +     *
> > > > > +     * Identifier of the mapping to use in the submission.
> > > > > +     */
> > > > > +    __u32 mapping_id;
> > > > 
> > > > I'm now in process of trying out the UAPI using grate drivers and this
> > > > becomes the first obstacle.
> > > > 
> > > > Looks like this is not going to work well for older Tegra SoCs, in
> > > > particular for T20, which has a small GART.
> > > > 
> > > > Given that the usefulness of the partial mapping feature is very
> > > > questionable until it will be proven with a real userspace, we should
> > > > start with a dynamic mappings that are done at a time of job submission.
> > > > 
> > > > DRM already should have everything necessary for creating and managing
> > > > caches of mappings, grate kernel driver has been using drm_mm_scan for a
> > > > long time now for that.
> > > > 
> > > > It should be fine to support the static mapping feature, but it should
> > > > be done separately with the drm_mm integration, IMO.
> > > > 
> > > > What do think?
> > > > 
> > > 
> > > Can you elaborate on the requirements to be able to use GART? Are there
> > > any other reasons this would not work on older chips?
> > 
> > We have all DRM devices in a single address space on T30+, hence having
> > duplicated mappings for each device should be a bit wasteful.
> 
> I guess this should be pretty easy to change to only keep one mapping per
> GEM object.

The important point here is the semantics: this IOCTL establishes a
mapping for a given GEM object on a given channel. If the underlying
implementation is such that the mapping doesn't fit into the GART, then
that's an implementation detail that the driver needs to take care of.
Similarly, if multiple devices share a single address space, that's
something the driver already knows and can take advantage of by simply
reusing an existing mapping if one already exists. In both cases the
semantics would be correctly implemented and that's really all that
matters.

Overall this interface seems sound from a high-level point of view and
allows these mappings to be properly created even for the cases we have
where each channel may have a separate address space. It may not be the
optimal interface for all use-cases or any one individual case, but the
very nature of these interfaces is to abstract away certain differences
in order to provide a unified interface to a common programming model.
So there will always be certain tradeoffs.

Thierry
Dmitry Osipenko March 23, 2021, 2 p.m. UTC | #6
23.03.2021 15:30, Thierry Reding пишет:
> On Thu, Jan 14, 2021 at 12:34:22PM +0200, Mikko Perttunen wrote:
>> On 1/14/21 10:36 AM, Dmitry Osipenko wrote:
>>> 13.01.2021 21:56, Mikko Perttunen пишет:
>>>> On 1/13/21 8:14 PM, Dmitry Osipenko wrote:
>>>>> 11.01.2021 16:00, Mikko Perttunen пишет:
>>>>>> +struct drm_tegra_submit_buf {
>>>>>> +    /**
>>>>>> +     * @mapping_id: [in]
>>>>>> +     *
>>>>>> +     * Identifier of the mapping to use in the submission.
>>>>>> +     */
>>>>>> +    __u32 mapping_id;
>>>>>
>>>>> I'm now in process of trying out the UAPI using grate drivers and this
>>>>> becomes the first obstacle.
>>>>>
>>>>> Looks like this is not going to work well for older Tegra SoCs, in
>>>>> particular for T20, which has a small GART.
>>>>>
>>>>> Given that the usefulness of the partial mapping feature is very
>>>>> questionable until it will be proven with a real userspace, we should
>>>>> start with a dynamic mappings that are done at a time of job submission.
>>>>>
>>>>> DRM already should have everything necessary for creating and managing
>>>>> caches of mappings, grate kernel driver has been using drm_mm_scan for a
>>>>> long time now for that.
>>>>>
>>>>> It should be fine to support the static mapping feature, but it should
>>>>> be done separately with the drm_mm integration, IMO.
>>>>>
>>>>> What do think?
>>>>>
>>>>
>>>> Can you elaborate on the requirements to be able to use GART? Are there
>>>> any other reasons this would not work on older chips?
>>>
>>> We have all DRM devices in a single address space on T30+, hence having
>>> duplicated mappings for each device should be a bit wasteful.
>>
>> I guess this should be pretty easy to change to only keep one mapping per
>> GEM object.
> 
> The important point here is the semantics: this IOCTL establishes a
> mapping for a given GEM object on a given channel. If the underlying
> implementation is such that the mapping doesn't fit into the GART, then
> that's an implementation detail that the driver needs to take care of.
> Similarly, if multiple devices share a single address space, that's
> something the driver already knows and can take advantage of by simply
> reusing an existing mapping if one already exists. In both cases the
> semantics would be correctly implemented and that's really all that
> matters.
> 
> Overall this interface seems sound from a high-level point of view and
> allows these mappings to be properly created even for the cases we have
> where each channel may have a separate address space. It may not be the
> optimal interface for all use-cases or any one individual case, but the
> very nature of these interfaces is to abstract away certain differences
> in order to provide a unified interface to a common programming model.
> So there will always be certain tradeoffs.

For now this IOCTL isn't useful from a userspace perspective of older
SoCs and I'll need to add a lot of code that won't do anything useful
just to conform to the specific needs of the newer SoCs. Trying to unify
everything into a single API doesn't sound like a good idea at this
point and I already suggested to Mikko to try out variant with a
separated per-SoC code paths in the next version, then the mappings
could be handled separately by the T186+ paths.
Thierry Reding March 23, 2021, 4:44 p.m. UTC | #7
On Tue, Mar 23, 2021 at 05:00:30PM +0300, Dmitry Osipenko wrote:
> 23.03.2021 15:30, Thierry Reding пишет:
> > On Thu, Jan 14, 2021 at 12:34:22PM +0200, Mikko Perttunen wrote:
> >> On 1/14/21 10:36 AM, Dmitry Osipenko wrote:
> >>> 13.01.2021 21:56, Mikko Perttunen пишет:
> >>>> On 1/13/21 8:14 PM, Dmitry Osipenko wrote:
> >>>>> 11.01.2021 16:00, Mikko Perttunen пишет:
> >>>>>> +struct drm_tegra_submit_buf {
> >>>>>> +    /**
> >>>>>> +     * @mapping_id: [in]
> >>>>>> +     *
> >>>>>> +     * Identifier of the mapping to use in the submission.
> >>>>>> +     */
> >>>>>> +    __u32 mapping_id;
> >>>>>
> >>>>> I'm now in process of trying out the UAPI using grate drivers and this
> >>>>> becomes the first obstacle.
> >>>>>
> >>>>> Looks like this is not going to work well for older Tegra SoCs, in
> >>>>> particular for T20, which has a small GART.
> >>>>>
> >>>>> Given that the usefulness of the partial mapping feature is very
> >>>>> questionable until it will be proven with a real userspace, we should
> >>>>> start with a dynamic mappings that are done at a time of job submission.
> >>>>>
> >>>>> DRM already should have everything necessary for creating and managing
> >>>>> caches of mappings, grate kernel driver has been using drm_mm_scan for a
> >>>>> long time now for that.
> >>>>>
> >>>>> It should be fine to support the static mapping feature, but it should
> >>>>> be done separately with the drm_mm integration, IMO.
> >>>>>
> >>>>> What do think?
> >>>>>
> >>>>
> >>>> Can you elaborate on the requirements to be able to use GART? Are there
> >>>> any other reasons this would not work on older chips?
> >>>
> >>> We have all DRM devices in a single address space on T30+, hence having
> >>> duplicated mappings for each device should be a bit wasteful.
> >>
> >> I guess this should be pretty easy to change to only keep one mapping per
> >> GEM object.
> > 
> > The important point here is the semantics: this IOCTL establishes a
> > mapping for a given GEM object on a given channel. If the underlying
> > implementation is such that the mapping doesn't fit into the GART, then
> > that's an implementation detail that the driver needs to take care of.
> > Similarly, if multiple devices share a single address space, that's
> > something the driver already knows and can take advantage of by simply
> > reusing an existing mapping if one already exists. In both cases the
> > semantics would be correctly implemented and that's really all that
> > matters.
> > 
> > Overall this interface seems sound from a high-level point of view and
> > allows these mappings to be properly created even for the cases we have
> > where each channel may have a separate address space. It may not be the
> > optimal interface for all use-cases or any one individual case, but the
> > very nature of these interfaces is to abstract away certain differences
> > in order to provide a unified interface to a common programming model.
> > So there will always be certain tradeoffs.
> 
> For now this IOCTL isn't useful from a userspace perspective of older
> SoCs and I'll need to add a lot of code that won't do anything useful
> just to conform to the specific needs of the newer SoCs. Trying to unify
> everything into a single API doesn't sound like a good idea at this
> point and I already suggested to Mikko to try out variant with a
> separated per-SoC code paths in the next version, then the mappings
> could be handled separately by the T186+ paths.

I'm not sure I understand what you're saying. Obviously the underlying
implementation of this might have to differ depending on SoC generation.
But it sounds like you're suggesting having different UAPIs depending on
SoC generation.

Thierry
Dmitry Osipenko March 23, 2021, 5:32 p.m. UTC | #8
23.03.2021 19:44, Thierry Reding пишет:
> On Tue, Mar 23, 2021 at 05:00:30PM +0300, Dmitry Osipenko wrote:
>> 23.03.2021 15:30, Thierry Reding пишет:
>>> On Thu, Jan 14, 2021 at 12:34:22PM +0200, Mikko Perttunen wrote:
>>>> On 1/14/21 10:36 AM, Dmitry Osipenko wrote:
>>>>> 13.01.2021 21:56, Mikko Perttunen пишет:
>>>>>> On 1/13/21 8:14 PM, Dmitry Osipenko wrote:
>>>>>>> 11.01.2021 16:00, Mikko Perttunen пишет:
>>>>>>>> +struct drm_tegra_submit_buf {
>>>>>>>> +    /**
>>>>>>>> +     * @mapping_id: [in]
>>>>>>>> +     *
>>>>>>>> +     * Identifier of the mapping to use in the submission.
>>>>>>>> +     */
>>>>>>>> +    __u32 mapping_id;
>>>>>>>
>>>>>>> I'm now in process of trying out the UAPI using grate drivers and this
>>>>>>> becomes the first obstacle.
>>>>>>>
>>>>>>> Looks like this is not going to work well for older Tegra SoCs, in
>>>>>>> particular for T20, which has a small GART.
>>>>>>>
>>>>>>> Given that the usefulness of the partial mapping feature is very
>>>>>>> questionable until it will be proven with a real userspace, we should
>>>>>>> start with a dynamic mappings that are done at a time of job submission.
>>>>>>>
>>>>>>> DRM already should have everything necessary for creating and managing
>>>>>>> caches of mappings, grate kernel driver has been using drm_mm_scan for a
>>>>>>> long time now for that.
>>>>>>>
>>>>>>> It should be fine to support the static mapping feature, but it should
>>>>>>> be done separately with the drm_mm integration, IMO.
>>>>>>>
>>>>>>> What do think?
>>>>>>>
>>>>>>
>>>>>> Can you elaborate on the requirements to be able to use GART? Are there
>>>>>> any other reasons this would not work on older chips?
>>>>>
>>>>> We have all DRM devices in a single address space on T30+, hence having
>>>>> duplicated mappings for each device should be a bit wasteful.
>>>>
>>>> I guess this should be pretty easy to change to only keep one mapping per
>>>> GEM object.
>>>
>>> The important point here is the semantics: this IOCTL establishes a
>>> mapping for a given GEM object on a given channel. If the underlying
>>> implementation is such that the mapping doesn't fit into the GART, then
>>> that's an implementation detail that the driver needs to take care of.
>>> Similarly, if multiple devices share a single address space, that's
>>> something the driver already knows and can take advantage of by simply
>>> reusing an existing mapping if one already exists. In both cases the
>>> semantics would be correctly implemented and that's really all that
>>> matters.
>>>
>>> Overall this interface seems sound from a high-level point of view and
>>> allows these mappings to be properly created even for the cases we have
>>> where each channel may have a separate address space. It may not be the
>>> optimal interface for all use-cases or any one individual case, but the
>>> very nature of these interfaces is to abstract away certain differences
>>> in order to provide a unified interface to a common programming model.
>>> So there will always be certain tradeoffs.
>>
>> For now this IOCTL isn't useful from a userspace perspective of older
>> SoCs and I'll need to add a lot of code that won't do anything useful
>> just to conform to the specific needs of the newer SoCs. Trying to unify
>> everything into a single API doesn't sound like a good idea at this
>> point and I already suggested to Mikko to try out variant with a
>> separated per-SoC code paths in the next version, then the mappings
>> could be handled separately by the T186+ paths.
> 
> I'm not sure I understand what you're saying. Obviously the underlying
> implementation of this might have to differ depending on SoC generation.
> But it sounds like you're suggesting having different UAPIs depending on
> SoC generation.

I suggested that this IOCTL shouldn't be mandatory for older SoCs, which
we should to have anyways for preserving the older UAPI. Yes, this makes
UAPI different and I want to see how it will look like in the next
version since the current variant was sub-optimal.
Thierry Reding March 23, 2021, 5:57 p.m. UTC | #9
On Tue, Mar 23, 2021 at 08:32:50PM +0300, Dmitry Osipenko wrote:
> 23.03.2021 19:44, Thierry Reding пишет:
> > On Tue, Mar 23, 2021 at 05:00:30PM +0300, Dmitry Osipenko wrote:
> >> 23.03.2021 15:30, Thierry Reding пишет:
> >>> On Thu, Jan 14, 2021 at 12:34:22PM +0200, Mikko Perttunen wrote:
> >>>> On 1/14/21 10:36 AM, Dmitry Osipenko wrote:
> >>>>> 13.01.2021 21:56, Mikko Perttunen пишет:
> >>>>>> On 1/13/21 8:14 PM, Dmitry Osipenko wrote:
> >>>>>>> 11.01.2021 16:00, Mikko Perttunen пишет:
> >>>>>>>> +struct drm_tegra_submit_buf {
> >>>>>>>> +    /**
> >>>>>>>> +     * @mapping_id: [in]
> >>>>>>>> +     *
> >>>>>>>> +     * Identifier of the mapping to use in the submission.
> >>>>>>>> +     */
> >>>>>>>> +    __u32 mapping_id;
> >>>>>>>
> >>>>>>> I'm now in process of trying out the UAPI using grate drivers and this
> >>>>>>> becomes the first obstacle.
> >>>>>>>
> >>>>>>> Looks like this is not going to work well for older Tegra SoCs, in
> >>>>>>> particular for T20, which has a small GART.
> >>>>>>>
> >>>>>>> Given that the usefulness of the partial mapping feature is very
> >>>>>>> questionable until it will be proven with a real userspace, we should
> >>>>>>> start with a dynamic mappings that are done at a time of job submission.
> >>>>>>>
> >>>>>>> DRM already should have everything necessary for creating and managing
> >>>>>>> caches of mappings, grate kernel driver has been using drm_mm_scan for a
> >>>>>>> long time now for that.
> >>>>>>>
> >>>>>>> It should be fine to support the static mapping feature, but it should
> >>>>>>> be done separately with the drm_mm integration, IMO.
> >>>>>>>
> >>>>>>> What do think?
> >>>>>>>
> >>>>>>
> >>>>>> Can you elaborate on the requirements to be able to use GART? Are there
> >>>>>> any other reasons this would not work on older chips?
> >>>>>
> >>>>> We have all DRM devices in a single address space on T30+, hence having
> >>>>> duplicated mappings for each device should be a bit wasteful.
> >>>>
> >>>> I guess this should be pretty easy to change to only keep one mapping per
> >>>> GEM object.
> >>>
> >>> The important point here is the semantics: this IOCTL establishes a
> >>> mapping for a given GEM object on a given channel. If the underlying
> >>> implementation is such that the mapping doesn't fit into the GART, then
> >>> that's an implementation detail that the driver needs to take care of.
> >>> Similarly, if multiple devices share a single address space, that's
> >>> something the driver already knows and can take advantage of by simply
> >>> reusing an existing mapping if one already exists. In both cases the
> >>> semantics would be correctly implemented and that's really all that
> >>> matters.
> >>>
> >>> Overall this interface seems sound from a high-level point of view and
> >>> allows these mappings to be properly created even for the cases we have
> >>> where each channel may have a separate address space. It may not be the
> >>> optimal interface for all use-cases or any one individual case, but the
> >>> very nature of these interfaces is to abstract away certain differences
> >>> in order to provide a unified interface to a common programming model.
> >>> So there will always be certain tradeoffs.
> >>
> >> For now this IOCTL isn't useful from a userspace perspective of older
> >> SoCs and I'll need to add a lot of code that won't do anything useful
> >> just to conform to the specific needs of the newer SoCs. Trying to unify
> >> everything into a single API doesn't sound like a good idea at this
> >> point and I already suggested to Mikko to try out variant with a
> >> separated per-SoC code paths in the next version, then the mappings
> >> could be handled separately by the T186+ paths.
> > 
> > I'm not sure I understand what you're saying. Obviously the underlying
> > implementation of this might have to differ depending on SoC generation.
> > But it sounds like you're suggesting having different UAPIs depending on
> > SoC generation.
> 
> I suggested that this IOCTL shouldn't be mandatory for older SoCs, which
> we should to have anyways for preserving the older UAPI. Yes, this makes
> UAPI different and I want to see how it will look like in the next
> version since the current variant was sub-optimal.

What exactly is sub-optimal about the current variant? And what would an
alternative look like? Like what we have in the old ABI where we pass in
GEM handles directly during submissions?

I can see how this new variant would be a bit more work than the
alternative, but even on older SoCs, wouldn't the explicit mapping be
much better for performance than having to constantly remap GEM objects
for every job?

In general I don't think it's useful to have separate UAPIs for what's
basically the same hardware. I mean from a high-level point of view what
we need to do for each job remains exactly the same whether the job is
executed on Tegra20 or Tegra194. We have to map a bunch of buffers so
that they can be accessed by hardware and then we have a command stream
that references the mappings and does something to the memory that they
represent. The only thing that's different between the SoC generations
is how these mappings are created.

The difference between the old UABI and this is that we create mappings
upfront, and I'm not sure I understand how that could be suboptimal. If
anything it should increase the efficiency of job submissions by
reducing per-submit overhead. It should be fairly easy to compare this
in terms of performance with implicit mappings by running tests against
the old UABI and the new one. If there's a significant impact we may
need to take a closer look.

Yes, this will require a bit of work in userspace to adapt to these
changes, but those are a one-time cost, so I don't think it's wise to
ignore potential performance improvements because we don't want to
update userspace.

In either case, I don't think we're quite done yet. There's still a bit
of work left to do on the userspace side to get a couple of use-cases up
with this new UABI and it's not entirely clear yet what the results will
be. However, we have to move forward somehow or this will end up being
yet another attempt that didn't go anywhere. We were in a similar place
a few years back and I vividly remember how frustrating that was for me
personally to spend all of that time working through this stuff and then
seeing it all go to waste.

So can we please keep going for a little longer while there's still
momentum?

Thierry
diff mbox series

Patch

diff --git a/include/uapi/drm/tegra_drm.h b/include/uapi/drm/tegra_drm.h
index c4df3c3668b3..014bc393c298 100644
--- a/include/uapi/drm/tegra_drm.h
+++ b/include/uapi/drm/tegra_drm.h
@@ -1,24 +1,5 @@ 
-/*
- * Copyright (c) 2012-2013, NVIDIA CORPORATION.  All rights reserved.
- *
- * Permission is hereby granted, free of charge, to any person obtaining a
- * copy of this software and associated documentation files (the "Software"),
- * to deal in the Software without restriction, including without limitation
- * the rights to use, copy, modify, merge, publish, distribute, sublicense,
- * and/or sell copies of the Software, and to permit persons to whom the
- * Software is furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
- * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
- * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
- * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
- * OTHER DEALINGS IN THE SOFTWARE.
- */
+/* SPDX-License-Identifier: MIT */
+/* Copyright (c) 2012-2020 NVIDIA Corporation */
 
 #ifndef _UAPI_TEGRA_DRM_H_
 #define _UAPI_TEGRA_DRM_H_
@@ -29,6 +10,8 @@ 
 extern "C" {
 #endif
 
+/* TegraDRM legacy UAPI. Only enabled with STAGING */
+
 #define DRM_TEGRA_GEM_CREATE_TILED     (1 << 0)
 #define DRM_TEGRA_GEM_CREATE_BOTTOM_UP (1 << 1)
 
@@ -644,13 +627,13 @@  struct drm_tegra_gem_get_flags {
 	__u32 flags;
 };
 
-#define DRM_TEGRA_GEM_CREATE		0x00
-#define DRM_TEGRA_GEM_MMAP		0x01
+#define DRM_TEGRA_GEM_CREATE_LEGACY	0x00
+#define DRM_TEGRA_GEM_MMAP_LEGACY	0x01
 #define DRM_TEGRA_SYNCPT_READ		0x02
 #define DRM_TEGRA_SYNCPT_INCR		0x03
 #define DRM_TEGRA_SYNCPT_WAIT		0x04
-#define DRM_TEGRA_OPEN_CHANNEL		0x05
-#define DRM_TEGRA_CLOSE_CHANNEL		0x06
+#define DRM_TEGRA_OPEN_CHANNEL	        0x05
+#define DRM_TEGRA_CLOSE_CHANNEL	        0x06
 #define DRM_TEGRA_GET_SYNCPT		0x07
 #define DRM_TEGRA_SUBMIT		0x08
 #define DRM_TEGRA_GET_SYNCPT_BASE	0x09
@@ -659,8 +642,8 @@  struct drm_tegra_gem_get_flags {
 #define DRM_TEGRA_GEM_SET_FLAGS		0x0c
 #define DRM_TEGRA_GEM_GET_FLAGS		0x0d
 
-#define DRM_IOCTL_TEGRA_GEM_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GEM_CREATE, struct drm_tegra_gem_create)
-#define DRM_IOCTL_TEGRA_GEM_MMAP DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GEM_MMAP, struct drm_tegra_gem_mmap)
+#define DRM_IOCTL_TEGRA_GEM_CREATE_LEGACY DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GEM_CREATE_LEGACY, struct drm_tegra_gem_create)
+#define DRM_IOCTL_TEGRA_GEM_MMAP_LEGACY DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GEM_MMAP_LEGACY, struct drm_tegra_gem_mmap)
 #define DRM_IOCTL_TEGRA_SYNCPT_READ DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_SYNCPT_READ, struct drm_tegra_syncpt_read)
 #define DRM_IOCTL_TEGRA_SYNCPT_INCR DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_SYNCPT_INCR, struct drm_tegra_syncpt_incr)
 #define DRM_IOCTL_TEGRA_SYNCPT_WAIT DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_SYNCPT_WAIT, struct drm_tegra_syncpt_wait)
@@ -674,6 +657,307 @@  struct drm_tegra_gem_get_flags {
 #define DRM_IOCTL_TEGRA_GEM_SET_FLAGS DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GEM_SET_FLAGS, struct drm_tegra_gem_set_flags)
 #define DRM_IOCTL_TEGRA_GEM_GET_FLAGS DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GEM_GET_FLAGS, struct drm_tegra_gem_get_flags)
 
+/* New TegraDRM UAPI */
+
+struct drm_tegra_channel_open {
+	/**
+	 * @host1x_class: [in]
+	 *
+	 * Host1x class of the engine that will be programmed using this
+	 * channel.
+	 */
+	__u32 host1x_class;
+
+	/**
+	 * @flags: [in]
+	 *
+	 * Flags.
+	 */
+	__u32 flags;
+
+	/**
+	 * @channel_ctx: [out]
+	 *
+	 * Opaque identifier corresponding to the opened channel.
+	 */
+	__u32 channel_ctx;
+
+	/**
+	 * @hardware_version: [out]
+	 *
+	 * Version of the engine hardware. This can be used by userspace
+	 * to determine how the engine needs to be programmed.
+	 */
+	__u32 hardware_version;
+};
+
+struct drm_tegra_channel_close {
+	/**
+	 * @channel_ctx: [in]
+	 *
+	 * Identifier of the channel to close.
+	 */
+	__u32 channel_ctx;
+};
+
+#define DRM_TEGRA_CHANNEL_MAP_READWRITE			(1<<0)
+
+struct drm_tegra_channel_map {
+	/**
+	 * @channel_ctx: [in]
+	 *
+	 * Identifier of the channel to which make memory available for.
+	 */
+	__u32 channel_ctx;
+
+	/**
+	 * @handle: [in]
+	 *
+	 * GEM handle of the memory to map.
+	 */
+	__u32 handle;
+
+	/**
+	 * @flags: [in]
+	 *
+	 * Flags.
+	 */
+	__u32 flags;
+
+	/**
+	 * @mapping_id: [out]
+	 *
+	 * Identifier corresponding to the mapping, to be used for
+	 * relocations or unmapping later.
+	 */
+	__u32 mapping_id;
+};
+
+struct drm_tegra_channel_unmap {
+	/**
+	 * @channel_ctx: [in]
+	 *
+	 * Channel identifier of the channel to unmap memory from.
+	 */
+	__u32 channel_ctx;
+
+	/**
+	 * @mapping_id: [in]
+	 *
+	 * Mapping identifier of the memory mapping to unmap.
+	 */
+	__u32 mapping_id;
+};
+
+/* Submission */
+
+/**
+ * Specify that bit 39 of the patched-in address should be set to
+ * trigger layout swizzling between Tegra and non-Tegra Blocklinear
+ * layout on systems that store surfaces in system memory in non-Tegra
+ * Blocklinear layout.
+ */
+#define DRM_TEGRA_SUBMIT_BUF_RELOC_BLOCKLINEAR		(1<<0)
+
+struct drm_tegra_submit_buf {
+	/**
+	 * @mapping_id: [in]
+	 *
+	 * Identifier of the mapping to use in the submission.
+	 */
+	__u32 mapping_id;
+
+	/**
+	 * @flags: [in]
+	 *
+	 * Flags.
+	 */
+	__u32 flags;
+
+	/**
+	 * Information for relocation patching. Relocation patching will
+	 * be done if the MAP IOCTL that created `mapping_id` did not
+	 * return an IOVA. If an IOVA was returned, the application is
+	 * responsible for patching the address into the gather.
+	 */
+	struct {
+		/**
+		 * @target_offset: [in]
+		 *
+		 * Offset from the start of the mapping of the data whose
+		 * address is to be patched into the gather.
+		 */
+		__u64 target_offset;
+
+		/**
+		 * @gather_offset_words: [in]
+		 *
+		 * Offset in words from the start of the gather data to
+		 * where the address should be patched into.
+		 */
+		__u32 gather_offset_words;
+
+		/**
+		 * @shift: [in]
+		 *
+		 * Number of bits the address should be shifted right before
+		 * patching in.
+		 */
+		__u32 shift;
+	} reloc;
+};
+
+struct drm_tegra_submit_syncpt_incr {
+	/**
+	 * @syncpt_fd: [in]
+	 *
+	 * Syncpoint file descriptor of the syncpoint that the job will
+	 * increment.
+	 */
+	__s32 syncpt_fd;
+
+	/**
+	 * @flags: [in]
+	 *
+	 * Flags.
+	 */
+	__u32 flags;
+
+	/**
+	 * @num_incrs: [in]
+	 *
+	 * Number of times the job will increment this syncpoint.
+	 */
+	__u32 num_incrs;
+
+	/**
+	 * @fence_value: [out]
+	 *
+	 * Value the syncpoint will have once the job has completed all
+	 * its specified syncpoint increments.
+	 *
+	 * Note that the kernel may increment the syncpoint before or after
+	 * the job. These increments are not reflected in this field.
+	 *
+	 * If the job hangs or times out, not all of the increments may
+	 * get executed.
+	 */
+	__u32 fence_value;
+};
+
+/**
+ * Execute `words` words of Host1x opcodes specified in the `gather_data_ptr`
+ * buffer. Each GATHER_UPTR command uses successive words from the buffer.
+ */
+#define DRM_TEGRA_SUBMIT_CMD_GATHER_UPTR		0
+/**
+ * Wait for a syncpoint to reach a value before continuing with further
+ * commands.
+ */
+#define DRM_TEGRA_SUBMIT_CMD_WAIT_SYNCPT		1
+
+struct drm_tegra_submit_cmd_gather_uptr {
+	__u32 words;
+	__u32 reserved[3];
+};
+
+struct drm_tegra_submit_cmd_wait_syncpt {
+	__u32 id;
+	__u32 threshold;
+	__u32 reserved[2];
+};
+
+struct drm_tegra_submit_cmd {
+	/**
+	 * @type: [in]
+	 *
+	 * Command type to execute. One of the DRM_TEGRA_SUBMIT_CMD*
+	 * defines.
+	 */
+	__u32 type;
+
+	/**
+	 * @flags: [in]
+	 *
+	 * Flags.
+	 */
+	__u32 flags;
+
+	union {
+		struct drm_tegra_submit_cmd_gather_uptr gather_uptr;
+		struct drm_tegra_submit_cmd_wait_syncpt wait_syncpt;
+		__u32 reserved[4];
+	};
+};
+
+struct drm_tegra_channel_submit {
+	/**
+	 * @channel_ctx: [in]
+	 *
+	 * Identifier of the channel to submit this job to.
+	 */
+	__u32 channel_ctx;
+
+	/**
+	 * @num_bufs: [in]
+	 *
+	 * Number of elements in the `bufs_ptr` array.
+	 */
+	__u32 num_bufs;
+
+	/**
+	 * @num_cmds: [in]
+	 *
+	 * Number of elements in the `cmds_ptr` array.
+	 */
+	__u32 num_cmds;
+
+	/**
+	 * @gather_data_words: [in]
+	 *
+	 * Number of 32-bit words in the `gather_data_ptr` array.
+	 */
+	__u32 gather_data_words;
+
+	/**
+	 * @bufs_ptr: [in]
+	 *
+	 * Pointer to an array of drm_tegra_submit_buf structures.
+	 */
+	__u64 bufs_ptr;
+
+	/**
+	 * @cmds_ptr: [in]
+	 *
+	 * Pointer to an array of drm_tegra_submit_cmd structures.
+	 */
+	__u64 cmds_ptr;
+
+	/**
+	 * @gather_data_ptr: [in]
+	 *
+	 * Pointer to an array of Host1x opcodes to be used by GATHER_UPTR
+	 * commands.
+	 */
+	__u64 gather_data_ptr;
+
+	/**
+	 * @syncpt_incr: [in,out]
+	 *
+	 * Information about the syncpoint the job will increment.
+	 */
+	struct drm_tegra_submit_syncpt_incr syncpt_incr;
+};
+
+#define DRM_IOCTL_TEGRA_CHANNEL_OPEN     DRM_IOWR(DRM_COMMAND_BASE + 0x10, struct drm_tegra_channel_open)
+#define DRM_IOCTL_TEGRA_CHANNEL_CLOSE    DRM_IOWR(DRM_COMMAND_BASE + 0x11, struct drm_tegra_channel_close)
+#define DRM_IOCTL_TEGRA_CHANNEL_MAP      DRM_IOWR(DRM_COMMAND_BASE + 0x12, struct drm_tegra_channel_map)
+#define DRM_IOCTL_TEGRA_CHANNEL_UNMAP    DRM_IOWR(DRM_COMMAND_BASE + 0x13, struct drm_tegra_channel_unmap)
+#define DRM_IOCTL_TEGRA_CHANNEL_SUBMIT   DRM_IOWR(DRM_COMMAND_BASE + 0x14, struct drm_tegra_channel_submit)
+
+#define DRM_IOCTL_TEGRA_GEM_CREATE       DRM_IOWR(DRM_COMMAND_BASE + 0x15, struct drm_tegra_gem_create)
+#define DRM_IOCTL_TEGRA_GEM_MMAP         DRM_IOWR(DRM_COMMAND_BASE + 0x16, struct drm_tegra_gem_mmap)
+
 #if defined(__cplusplus)
 }
 #endif