mbox series

[0/4] Update K3 DSP remoteproc driver for C71x DSPs

Message ID 20200325204701.16862-1-s-anna@ti.com
Headers show
Series Update K3 DSP remoteproc driver for C71x DSPs | expand

Message

Suman Anna March 25, 2020, 8:46 p.m. UTC
Hi All,

This series adds support for a new next generation 64-bit TI DSP based on
the TMS320C71x CorePac processor subsystem called the C71x. The support is
enabled through couple of enhancements to the remoteproc core (primarily to
support a 64-bit trace resource entry), and does depend on the K3 DSP
remoteproc driver posted earlier today [1]. 

The loading support leveraged the 64-bit ELF loader support code added by
Clement and already staged on the rproc-next branch. I am posting this
series separate from the C66x series because of the new 64-bit resource
type enhancement needs (patches 2 and 3). I have leveraged the existing
resource types as is by introducing a new version element, and am open to
ideas if it is desired to just define it as a separate resource type.

The C71x DSP boots using firmware segments loaded into the DDR with a 2 MB
aligned address requirement on the boot vectors. There is no support for
internal memory loading, and all internal memories shall be used as fast 
RAMs/scatchpads by the firmware executing on the DSPs. IPC is through the
virtio-rpmsg transport. There is no support for Error Recovery, Power
Management or loading into on-chip SRAMs at present.

Following is the patch summary:
 - Patch 1 updates the K3 DSP bindings for C71x cores
 - Patch 2 introduces a concept of version element into existing resource types
 - Patch 3 adds support for a new 64-bit trace resource entry
 - Patch 4 enhances the K3 DSP remoteproc driver for C71x

regards
Suman

[1] https://patchwork.kernel.org/cover/11458573/

Suman Anna (4):
  dt-bindings: remoteproc: k3-dsp: Update bindings for C71x DSPs
  remoteproc: introduce version element into resource type field
  remoteproc: add support for a new 64-bit trace version
  remoteproc/k3-dsp: Add support for C71x DSPs

 .../bindings/remoteproc/ti,k3-dsp-rproc.yaml  | 78 ++++++++++++++++---
 drivers/remoteproc/remoteproc_core.c          | 65 +++++++++++-----
 drivers/remoteproc/remoteproc_debugfs.c       | 50 ++++++++----
 drivers/remoteproc/ti_k3_dsp_remoteproc.c     | 17 ++++
 include/linux/remoteproc.h                    | 34 +++++++-
 5 files changed, 203 insertions(+), 41 deletions(-)

Comments

Suman Anna April 27, 2020, 7:54 p.m. UTC | #1
On 3/25/20 3:47 PM, Suman Anna wrote:
> The Texas Instrument's K3 J721E SoCs have a newer next-generation
> C71x DSP Subsystem in the MAIN voltage domain in addition to the
> previous generation C66x DSP subsystems. The C71x DSP subsystem is
> based on the TMS320C71x DSP CorePac module. The C71x CPU is a true
> 64-bit machine including 64-bit memory addressing and single-cycle
> 64-bit base arithmetic operations and supports vector signal processing
> providing a significant lift in DSP processing power over C66x DSPs.
> J721E SoCs use a C711 (a one-core 512-bit vector width CPU core) DSP
> that is cache coherent with the A72 Arm cores.
> 
> Each subsystem has one or more Fixed/Floating-Point DSP CPUs, with 32 KB
> of L1P Cache, 48 KB of L1D SRAM that can be configured and partitioned as
> either RAM and/or Cache, and 512 KB of L2 SRAM configurable as either RAM
> and/or Cache. The CorePac also includes a Matrix Multiplication Accelerator
> (MMA), a Stream Engine (SE) and a C71x Memory Management Unit (CMMU), an
> Interrupt Controller (INTC) and a Powerdown Management Unit (PMU) modules.
> 
> Update the existing K3 DSP remoteproc driver to add support for this C71x
> DSP subsystem. The firmware loading support is provided by using the newly
> added 64-bit ELF loader support, and is limited to images using only
> external DDR memory at the moment. The L1D and L2 SRAMs are used as scratch
> memory when using as RAMs, and cannot be used for loadable segments. The
> CMMU is also not supported to begin with, and the driver is designed to
> treat the MMU as if it is in bypass mode.
> 
> Signed-off-by: Suman Anna <s-anna@ti.com>
> ---
>   drivers/remoteproc/ti_k3_dsp_remoteproc.c | 17 +++++++++++++++++
>   1 file changed, 17 insertions(+)
> 
> diff --git a/drivers/remoteproc/ti_k3_dsp_remoteproc.c b/drivers/remoteproc/ti_k3_dsp_remoteproc.c
> index 7b712ef74611..48d26f7d5f48 100644
> --- a/drivers/remoteproc/ti_k3_dsp_remoteproc.c
> +++ b/drivers/remoteproc/ti_k3_dsp_remoteproc.c
> @@ -649,6 +649,9 @@ static int k3_dsp_rproc_probe(struct platform_device *pdev)
>   
>   	rproc->has_iommu = false;
>   	rproc->recovery_disabled = true;
> +	/* C71x is a 64-bit processor, so plug in generic .sanity_check ops */
> +	rproc->ops->sanity_check = rproc_elf_sanity_check;
> +

Will drop this on the next version, this is no longer needed after 
commit e29ff72b7794 ("remoteproc: remove rproc_elf32_sanity_check") 
currently on rproc-next.

regards
Suman


>   	kproc = rproc->priv;
>   	kproc->rproc = rproc;
>   	kproc->dev = dev;
> @@ -789,6 +792,12 @@ static const struct k3_dsp_mem_data c66_mems[] = {
>   	{ .name = "l1dram", .dev_addr = 0xf00000 },
>   };
>   
> +/* C71x cores only have a L1P Cache, there are no L1P SRAMs */
> +static const struct k3_dsp_mem_data c71_mems[] = {
> +	{ .name = "l2sram", .dev_addr = 0x800000 },
> +	{ .name = "l1dram", .dev_addr = 0xe00000 },
> +};
> +
>   static const struct k3_dsp_dev_data c66_data = {
>   	.mems = c66_mems,
>   	.num_mems = ARRAY_SIZE(c66_mems),
> @@ -796,8 +805,16 @@ static const struct k3_dsp_dev_data c66_data = {
>   	.uses_lreset = true,
>   };
>   
> +static const struct k3_dsp_dev_data c71_data = {
> +	.mems = c71_mems,
> +	.num_mems = ARRAY_SIZE(c71_mems),
> +	.boot_align_addr = SZ_2M,
> +	.uses_lreset = false,
> +};
> +
>   static const struct of_device_id k3_dsp_of_match[] = {
>   	{ .compatible = "ti,j721e-c66-dsp", .data = &c66_data, },
> +	{ .compatible = "ti,j721e-c71-dsp", .data = &c71_data, },
>   	{ /* sentinel */ },
>   };
>   MODULE_DEVICE_TABLE(of, k3_dsp_of_match);
>
Suman Anna May 21, 2020, 3:57 p.m. UTC | #2
On 3/25/20 3:46 PM, Suman Anna wrote:
> Hi All,
> 
> This series adds support for a new next generation 64-bit TI DSP based on
> the TMS320C71x CorePac processor subsystem called the C71x. The support is
> enabled through couple of enhancements to the remoteproc core (primarily to
> support a 64-bit trace resource entry), and does depend on the K3 DSP
> remoteproc driver posted earlier today [1].
> 
> The loading support leveraged the 64-bit ELF loader support code added by
> Clement and already staged on the rproc-next branch. I am posting this
> series separate from the C66x series because of the new 64-bit resource
> type enhancement needs (patches 2 and 3). I have leveraged the existing
> resource types as is by introducing a new version element, and am open to
> ideas if it is desired to just define it as a separate resource type.
> 
> The C71x DSP boots using firmware segments loaded into the DDR with a 2 MB
> aligned address requirement on the boot vectors. There is no support for
> internal memory loading, and all internal memories shall be used as fast
> RAMs/scatchpads by the firmware executing on the DSPs. IPC is through the
> virtio-rpmsg transport. There is no support for Error Recovery, Power
> Management or loading into on-chip SRAMs at present.
> 
> Following is the patch summary:
>   - Patch 1 updates the K3 DSP bindings for C71x cores
>   - Patch 2 introduces a concept of version element into existing resource types
>   - Patch 3 adds support for a new 64-bit trace resource entry
>   - Patch 4 enhances the K3 DSP remoteproc driver for C71x

I have separated out the C71 platform driver pieces (patches 1 & 4) and 
posted a v2 for those.

Appreciate any feedback on the core patches (patches 2 & 3) that add the 
minimal 64-bit trace support, as this also sets the direction for 
resource extensions. I can post the next version for those based on 
feedback.

regards
Suman

> 
> regards
> Suman
> 
> [1] https://patchwork.kernel.org/cover/11458573/
> 
> Suman Anna (4):
>    dt-bindings: remoteproc: k3-dsp: Update bindings for C71x DSPs
>    remoteproc: introduce version element into resource type field
>    remoteproc: add support for a new 64-bit trace version
>    remoteproc/k3-dsp: Add support for C71x DSPs
> 
>   .../bindings/remoteproc/ti,k3-dsp-rproc.yaml  | 78 ++++++++++++++++---
>   drivers/remoteproc/remoteproc_core.c          | 65 +++++++++++-----
>   drivers/remoteproc/remoteproc_debugfs.c       | 50 ++++++++----
>   drivers/remoteproc/ti_k3_dsp_remoteproc.c     | 17 ++++
>   include/linux/remoteproc.h                    | 34 +++++++-
>   5 files changed, 203 insertions(+), 41 deletions(-)
>
Bjorn Andersson May 21, 2020, 5:54 p.m. UTC | #3
On Wed 25 Mar 13:46 PDT 2020, Suman Anna wrote:

> The current remoteproc core has supported only 32-bit remote
> processors and as such some of the current resource structures
> may not scale well for 64-bit remote processors, and would
> require new versions of resource types. Each resource is currently
> identified by a 32-bit type field. Introduce the concept of version
> for these resource types by overloading this 32-bit type field
> into two 16-bit version and type fields with the existing resources
> behaving as version 0 thereby providing backward compatibility.
> 
> The version field is passed as an additional argument to each of
> the handler functions, and all the existing handlers are updated
> accordingly. Each specific handler will be updated on a need basis
> when a new version of the resource type is added.
> 

I really would prefer that we add additional types for the new
structures, neither side will be compatible with new versions without
enhancements to their respective implementations anyways.

> An alternate way would be to introduce the new types as completely
> new resource types which would require additional customization of
> the resource handlers based on the 32-bit or 64-bit mode of a remote
> processor, and introduction of an additional mode flag to the rproc
> structure.
> 

What would this "mode" indicate? If it's version 0 or 1?

> Signed-off-by: Suman Anna <s-anna@ti.com>
> ---
>  drivers/remoteproc/remoteproc_core.c    | 25 +++++++++++++++----------
>  drivers/remoteproc/remoteproc_debugfs.c | 17 ++++++++++-------
>  include/linux/remoteproc.h              |  8 +++++++-
>  3 files changed, 32 insertions(+), 18 deletions(-)
> 
[..]
> diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
> index 77788a4bb94e..526d3cb45e37 100644
> --- a/include/linux/remoteproc.h
> +++ b/include/linux/remoteproc.h
> @@ -86,7 +86,13 @@ struct resource_table {
>   * this header, and it should be parsed according to the resource type.
>   */
>  struct fw_rsc_hdr {
> -	u32 type;
> +	union {
> +		u32 type;
> +		struct {
> +			u16 t;
> +			u16 v;
> +		} st;

I see your "type" is little endian...

Regards,
Bjorn
Bjorn Andersson May 21, 2020, 6:04 p.m. UTC | #4
On Wed 25 Mar 13:47 PDT 2020, Suman Anna wrote:

> Introduce a new trace entry resource structure that accommodates
> a 64-bit device address to support 64-bit processors. This is to
> be used using an overloaded version value of 1 in the upper 32-bits
> of the previous resource type field. The new resource still uses
> 32-bits for the length field (followed by a 32-bit reserved field,
> so can be updated in the future), which is a sufficiently large
> trace buffer size. A 32-bit padding field also had to be added
> to align the device address on a 64-bit boundary, and match the
> usage on the firmware side.
> 
> The remoteproc debugfs logic also has been adjusted accordingly.
> 
> Signed-off-by: Suman Anna <s-anna@ti.com>
> ---
>  drivers/remoteproc/remoteproc_core.c    | 40 ++++++++++++++++++++-----
>  drivers/remoteproc/remoteproc_debugfs.c | 37 ++++++++++++++++++-----
>  include/linux/remoteproc.h              | 26 ++++++++++++++++
>  3 files changed, 87 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> index 53bc37c508c6..b9a097990862 100644
> --- a/drivers/remoteproc/remoteproc_core.c
> +++ b/drivers/remoteproc/remoteproc_core.c
> @@ -609,21 +609,45 @@ void rproc_vdev_release(struct kref *ref)
>   *
>   * Returns 0 on success, or an appropriate error code otherwise
>   */
> -static int rproc_handle_trace(struct rproc *rproc, struct fw_rsc_trace *rsc,
> +static int rproc_handle_trace(struct rproc *rproc, void *rsc,
>  			      int offset, int avail, u16 ver)
>  {
>  	struct rproc_debug_trace *trace;
>  	struct device *dev = &rproc->dev;
> +	struct fw_rsc_trace *rsc1;
> +	struct fw_rsc_trace2 *rsc2;
>  	char name[15];
> +	size_t rsc_size;
> +	u32 reserved;
> +	u64 da;
> +	u32 len;
> +
> +	if (!ver) {

This looks like a switch to me, but I also do think this looks rather
crude, if you spin off the tail of this function and call it from a
rproc_handle_trace() and rproc_handle_trace64() I believe this would be
cleaner.

> +		rsc1 = (struct fw_rsc_trace *)rsc;
> +		rsc_size = sizeof(*rsc1);
> +		reserved = rsc1->reserved;
> +		da = rsc1->da;
> +		len = rsc1->len;
> +	} else if (ver == 1) {
> +		rsc2 = (struct fw_rsc_trace2 *)rsc;
> +		rsc_size = sizeof(*rsc2);
> +		reserved = rsc2->reserved;
> +		da = rsc2->da;
> +		len = rsc2->len;
> +	} else {
> +		dev_err(dev, "unsupported trace rsc version %d\n", ver);

If we use "type" to describe your 64-bit-da-trace then this sanity check
would have been taken care of by the core.

> +		return -EINVAL;
> +	}
>  
> -	if (sizeof(*rsc) > avail) {
> +	if (rsc_size > avail) {
>  		dev_err(dev, "trace rsc is truncated\n");
>  		return -EINVAL;
>  	}
>  
>  	/* make sure reserved bytes are zeroes */
> -	if (rsc->reserved) {
> -		dev_err(dev, "trace rsc has non zero reserved bytes\n");
> +	if (reserved) {
> +		dev_err(dev, "trace rsc has non zero reserved bytes, value = 0x%x\n",
> +			reserved);
>  		return -EINVAL;
>  	}
>  
> @@ -632,8 +656,8 @@ static int rproc_handle_trace(struct rproc *rproc, struct fw_rsc_trace *rsc,
>  		return -ENOMEM;
>  
>  	/* set the trace buffer dma properties */
> -	trace->trace_mem.len = rsc->len;
> -	trace->trace_mem.da = rsc->da;
> +	trace->trace_mem.len = len;
> +	trace->trace_mem.da = da;
>  
>  	/* set pointer on rproc device */
>  	trace->rproc = rproc;
> @@ -652,8 +676,8 @@ static int rproc_handle_trace(struct rproc *rproc, struct fw_rsc_trace *rsc,
>  
>  	rproc->num_traces++;
>  
> -	dev_dbg(dev, "%s added: da 0x%x, len 0x%x\n",
> -		name, rsc->da, rsc->len);
> +	dev_dbg(dev, "%s added: da 0x%llx, len 0x%x\n",
> +		name, da, len);
>  
>  	return 0;
>  }
> diff --git a/drivers/remoteproc/remoteproc_debugfs.c b/drivers/remoteproc/remoteproc_debugfs.c
> index 3560eed7a360..ff43736db45a 100644
> --- a/drivers/remoteproc/remoteproc_debugfs.c
> +++ b/drivers/remoteproc/remoteproc_debugfs.c
> @@ -192,7 +192,8 @@ static int rproc_rsc_table_show(struct seq_file *seq, void *p)
>  	struct resource_table *table = rproc->table_ptr;
>  	struct fw_rsc_carveout *c;
>  	struct fw_rsc_devmem *d;
> -	struct fw_rsc_trace *t;
> +	struct fw_rsc_trace *t1;
> +	struct fw_rsc_trace2 *t2;
>  	struct fw_rsc_vdev *v;
>  	int i, j;
>  
> @@ -205,6 +206,7 @@ static int rproc_rsc_table_show(struct seq_file *seq, void *p)
>  		int offset = table->offset[i];
>  		struct fw_rsc_hdr *hdr = (void *)table + offset;
>  		void *rsc = (void *)hdr + sizeof(*hdr);
> +		u16 ver = hdr->st.v;
>  
>  		switch (hdr->st.t) {
>  		case RSC_CARVEOUT:
> @@ -230,13 +232,32 @@ static int rproc_rsc_table_show(struct seq_file *seq, void *p)
>  			seq_printf(seq, "  Name %s\n\n", d->name);
>  			break;
>  		case RSC_TRACE:
> -			t = rsc;
> -			seq_printf(seq, "Entry %d is of type %s\n",
> -				   i, types[hdr->st.t]);
> -			seq_printf(seq, "  Device Address 0x%x\n", t->da);
> -			seq_printf(seq, "  Length 0x%x Bytes\n", t->len);
> -			seq_printf(seq, "  Reserved (should be zero) [%d]\n", t->reserved);
> -			seq_printf(seq, "  Name %s\n\n", t->name);
> +			if (ver == 0) {

Again, this is a switch, here in a switch. Just defining a new
RSC_TRACE64 type would reduce the amount of code here...

> +				t1 = rsc;
> +				seq_printf(seq, "Entry %d is version %d of type %s\n",
> +					   i, ver, types[hdr->st.t]);
> +				seq_printf(seq, "  Device Address 0x%x\n",
> +					   t1->da);
> +				seq_printf(seq, "  Length 0x%x Bytes\n",
> +					   t1->len);
> +				seq_printf(seq, "  Reserved (should be zero) [%d]\n",
> +					   t1->reserved);
> +				seq_printf(seq, "  Name %s\n\n", t1->name);
> +			} else if (ver == 1) {
> +				t2 = rsc;
> +				seq_printf(seq, "Entry %d is version %d of type %s\n",
> +					   i, ver, types[hdr->st.t]);
> +				seq_printf(seq, "  Device Address 0x%llx\n",
> +					   t2->da);
> +				seq_printf(seq, "  Length 0x%x Bytes\n",
> +					   t2->len);
> +				seq_printf(seq, "  Reserved (should be zero) [%d]\n",
> +					   t2->reserved);
> +				seq_printf(seq, "  Name %s\n\n", t2->name);
> +			} else {
> +				seq_printf(seq, "Entry %d is an unsupported version %d of type %s\n",
> +					   i, ver, types[hdr->st.t]);
> +			}
>  			break;
>  		case RSC_VDEV:
>  			v = rsc;
> diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
> index 526d3cb45e37..3b3bea42f8b1 100644
> --- a/include/linux/remoteproc.h
> +++ b/include/linux/remoteproc.h
> @@ -243,6 +243,32 @@ struct fw_rsc_trace {
>  	u8 name[32];
>  } __packed;
>  
> +/**
> + * struct fw_rsc_trace2 - trace buffer declaration supporting 64-bits
> + * @padding: initial padding after type field for aligned 64-bit access
> + * @da: device address (64-bit)
> + * @len: length (in bytes)
> + * @reserved: reserved (must be zero)
> + * @name: human-readable name of the trace buffer
> + *
> + * This resource entry is an enhanced version of the fw_rsc_trace resourec entry
> + * and the provides equivalent functionality but designed for 64-bit remote
> + * processors.
> + *
> + * @da specifies the device address of the buffer, @len specifies
> + * its size, and @name may contain a human readable name of the trace buffer.
> + *
> + * After booting the remote processor, the trace buffers are exposed to the
> + * user via debugfs entries (called trace0, trace1, etc..).
> + */
> +struct fw_rsc_trace2 {

Sounds more like fw_rsc_trace64 to me - in particular since the version
of trace2 is 1...

> +	u32 padding;
> +	u64 da;
> +	u32 len;
> +	u32 reserved;

What's the purpose of this reserved field?

Regards,
Bjorn

> +	u8 name[32];
> +} __packed;
> +
>  /**
>   * struct fw_rsc_vdev_vring - vring descriptor entry
>   * @da: device address
> -- 
> 2.23.0
>
Suman Anna May 21, 2020, 7:06 p.m. UTC | #5
Hi Bjorn,

On 5/21/20 12:54 PM, Bjorn Andersson wrote:
> On Wed 25 Mar 13:46 PDT 2020, Suman Anna wrote:
> 
>> The current remoteproc core has supported only 32-bit remote
>> processors and as such some of the current resource structures
>> may not scale well for 64-bit remote processors, and would
>> require new versions of resource types. Each resource is currently
>> identified by a 32-bit type field. Introduce the concept of version
>> for these resource types by overloading this 32-bit type field
>> into two 16-bit version and type fields with the existing resources
>> behaving as version 0 thereby providing backward compatibility.
>>
>> The version field is passed as an additional argument to each of
>> the handler functions, and all the existing handlers are updated
>> accordingly. Each specific handler will be updated on a need basis
>> when a new version of the resource type is added.
>>
> 
> I really would prefer that we add additional types for the new
> structures, neither side will be compatible with new versions without
> enhancements to their respective implementations anyways.

OK.

> 
>> An alternate way would be to introduce the new types as completely
>> new resource types which would require additional customization of
>> the resource handlers based on the 32-bit or 64-bit mode of a remote
>> processor, and introduction of an additional mode flag to the rproc
>> structure.
>>
> 
> What would this "mode" indicate? If it's version 0 or 1?

No, for indicating if the remoteproc is 32-bit or 64-bit and adjust the 
loading handlers if the resource types need to be segregated accordingly.

> 
>> Signed-off-by: Suman Anna <s-anna@ti.com>
>> ---
>>   drivers/remoteproc/remoteproc_core.c    | 25 +++++++++++++++----------
>>   drivers/remoteproc/remoteproc_debugfs.c | 17 ++++++++++-------
>>   include/linux/remoteproc.h              |  8 +++++++-
>>   3 files changed, 32 insertions(+), 18 deletions(-)
>>
> [..]
>> diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
>> index 77788a4bb94e..526d3cb45e37 100644
>> --- a/include/linux/remoteproc.h
>> +++ b/include/linux/remoteproc.h
>> @@ -86,7 +86,13 @@ struct resource_table {
>>    * this header, and it should be parsed according to the resource type.
>>    */
>>   struct fw_rsc_hdr {
>> -	u32 type;
>> +	union {
>> +		u32 type;
>> +		struct {
>> +			u16 t;
>> +			u16 v;
>> +		} st;
> 
> I see your "type" is little endian...

Yeah, definitely a draw-back if we want to support big-endian rprocs. Do 
you have any remoteprocs following big-endian? All TI remoteprocs are 
little-endian except for really old ones.

regards
Suman
Bjorn Andersson May 21, 2020, 7:21 p.m. UTC | #6
On Thu 21 May 12:06 PDT 2020, Suman Anna wrote:

> Hi Bjorn,
> 
> On 5/21/20 12:54 PM, Bjorn Andersson wrote:
> > On Wed 25 Mar 13:46 PDT 2020, Suman Anna wrote:
> > 
> > > The current remoteproc core has supported only 32-bit remote
> > > processors and as such some of the current resource structures
> > > may not scale well for 64-bit remote processors, and would
> > > require new versions of resource types. Each resource is currently
> > > identified by a 32-bit type field. Introduce the concept of version
> > > for these resource types by overloading this 32-bit type field
> > > into two 16-bit version and type fields with the existing resources
> > > behaving as version 0 thereby providing backward compatibility.
> > > 
> > > The version field is passed as an additional argument to each of
> > > the handler functions, and all the existing handlers are updated
> > > accordingly. Each specific handler will be updated on a need basis
> > > when a new version of the resource type is added.
> > > 
> > 
> > I really would prefer that we add additional types for the new
> > structures, neither side will be compatible with new versions without
> > enhancements to their respective implementations anyways.
> 
> OK.
> 
> > 
> > > An alternate way would be to introduce the new types as completely
> > > new resource types which would require additional customization of
> > > the resource handlers based on the 32-bit or 64-bit mode of a remote
> > > processor, and introduction of an additional mode flag to the rproc
> > > structure.
> > > 
> > 
> > What would this "mode" indicate? If it's version 0 or 1?
> 
> No, for indicating if the remoteproc is 32-bit or 64-bit and adjust the
> loading handlers if the resource types need to be segregated accordingly.
> 

Sorry, I think I'm misunderstanding something. Wouldn't your 64-bit
remote processor need different firmware from your 32-bit processor
anyways, if you want to support the wider resource? And you would pack
your firmware with the appropriate resource types?


Afaict the bit width of your remote processor, busses or memory is
unrelated to the choice of number of bits used to express things in the
resource table.

Regards,
Bjorn
Suman Anna May 21, 2020, 7:29 p.m. UTC | #7
On 5/21/20 2:21 PM, Bjorn Andersson wrote:
> On Thu 21 May 12:06 PDT 2020, Suman Anna wrote:
> 
>> Hi Bjorn,
>>
>> On 5/21/20 12:54 PM, Bjorn Andersson wrote:
>>> On Wed 25 Mar 13:46 PDT 2020, Suman Anna wrote:
>>>
>>>> The current remoteproc core has supported only 32-bit remote
>>>> processors and as such some of the current resource structures
>>>> may not scale well for 64-bit remote processors, and would
>>>> require new versions of resource types. Each resource is currently
>>>> identified by a 32-bit type field. Introduce the concept of version
>>>> for these resource types by overloading this 32-bit type field
>>>> into two 16-bit version and type fields with the existing resources
>>>> behaving as version 0 thereby providing backward compatibility.
>>>>
>>>> The version field is passed as an additional argument to each of
>>>> the handler functions, and all the existing handlers are updated
>>>> accordingly. Each specific handler will be updated on a need basis
>>>> when a new version of the resource type is added.
>>>>
>>>
>>> I really would prefer that we add additional types for the new
>>> structures, neither side will be compatible with new versions without
>>> enhancements to their respective implementations anyways.
>>
>> OK.
>>
>>>
>>>> An alternate way would be to introduce the new types as completely
>>>> new resource types which would require additional customization of
>>>> the resource handlers based on the 32-bit or 64-bit mode of a remote
>>>> processor, and introduction of an additional mode flag to the rproc
>>>> structure.
>>>>
>>>
>>> What would this "mode" indicate? If it's version 0 or 1?
>>
>> No, for indicating if the remoteproc is 32-bit or 64-bit and adjust the
>> loading handlers if the resource types need to be segregated accordingly.
>>
> 
> Sorry, I think I'm misunderstanding something. Wouldn't your 64-bit
> remote processor need different firmware from your 32-bit processor
> anyways, if you want to support the wider resource? And you would pack
> your firmware with the appropriate resource types?

Yes, that's correct.

> 
> Afaict the bit width of your remote processor, busses or memory is
> unrelated to the choice of number of bits used to express things in the
> resource table.

I would have to add the new resource type to the loading_handlers right, 
so it is a question of whether we want to impose any restrictions in 
remoteproc core or not from supporting a certain resource type (eg: I 
don't expect RSC_TRACE entries on 64-bit processors).

regards
Suman
Bjorn Andersson May 21, 2020, 7:41 p.m. UTC | #8
On Thu 21 May 12:29 PDT 2020, Suman Anna wrote:

> On 5/21/20 2:21 PM, Bjorn Andersson wrote:
> > On Thu 21 May 12:06 PDT 2020, Suman Anna wrote:
> > 
> > > Hi Bjorn,
> > > 
> > > On 5/21/20 12:54 PM, Bjorn Andersson wrote:
> > > > On Wed 25 Mar 13:46 PDT 2020, Suman Anna wrote:
> > > > 
> > > > > The current remoteproc core has supported only 32-bit remote
> > > > > processors and as such some of the current resource structures
> > > > > may not scale well for 64-bit remote processors, and would
> > > > > require new versions of resource types. Each resource is currently
> > > > > identified by a 32-bit type field. Introduce the concept of version
> > > > > for these resource types by overloading this 32-bit type field
> > > > > into two 16-bit version and type fields with the existing resources
> > > > > behaving as version 0 thereby providing backward compatibility.
> > > > > 
> > > > > The version field is passed as an additional argument to each of
> > > > > the handler functions, and all the existing handlers are updated
> > > > > accordingly. Each specific handler will be updated on a need basis
> > > > > when a new version of the resource type is added.
> > > > > 
> > > > 
> > > > I really would prefer that we add additional types for the new
> > > > structures, neither side will be compatible with new versions without
> > > > enhancements to their respective implementations anyways.
> > > 
> > > OK.
> > > 
> > > > 
> > > > > An alternate way would be to introduce the new types as completely
> > > > > new resource types which would require additional customization of
> > > > > the resource handlers based on the 32-bit or 64-bit mode of a remote
> > > > > processor, and introduction of an additional mode flag to the rproc
> > > > > structure.
> > > > > 
> > > > 
> > > > What would this "mode" indicate? If it's version 0 or 1?
> > > 
> > > No, for indicating if the remoteproc is 32-bit or 64-bit and adjust the
> > > loading handlers if the resource types need to be segregated accordingly.
> > > 
> > 
> > Sorry, I think I'm misunderstanding something. Wouldn't your 64-bit
> > remote processor need different firmware from your 32-bit processor
> > anyways, if you want to support the wider resource? And you would pack
> > your firmware with the appropriate resource types?
> 
> Yes, that's correct.
> 
> > 
> > Afaict the bit width of your remote processor, busses or memory is
> > unrelated to the choice of number of bits used to express things in the
> > resource table.
> 
> I would have to add the new resource type to the loading_handlers right, so
> it is a question of whether we want to impose any restrictions in remoteproc
> core or not from supporting a certain resource type (eg: I don't expect
> RSC_TRACE entries on 64-bit processors).
> 

Right, but either you add support for new resource types to the
loading_handlers, or you add the version checks within each handler,
either way you will have to do some work to be compatible with new
versions.

Regarding what resources would be fit for a 64-bit processor probably
relates to many things, in particular the question of what we actually
mean when we say that a coprocessor is 64-bit. So I don't really see a
need for the remoteproc core to prevent someone to design their
system/firmware to have a 64-bit CPU being passed 32-bit addresses.

Regards,
Bjorn
Suman Anna May 21, 2020, 7:42 p.m. UTC | #9
Hi Bjorn,

On 5/21/20 1:04 PM, Bjorn Andersson wrote:
> On Wed 25 Mar 13:47 PDT 2020, Suman Anna wrote:
> 
>> Introduce a new trace entry resource structure that accommodates
>> a 64-bit device address to support 64-bit processors. This is to
>> be used using an overloaded version value of 1 in the upper 32-bits
>> of the previous resource type field. The new resource still uses
>> 32-bits for the length field (followed by a 32-bit reserved field,
>> so can be updated in the future), which is a sufficiently large
>> trace buffer size. A 32-bit padding field also had to be added
>> to align the device address on a 64-bit boundary, and match the
>> usage on the firmware side.
>>
>> The remoteproc debugfs logic also has been adjusted accordingly.
>>
>> Signed-off-by: Suman Anna <s-anna@ti.com>
>> ---
>>   drivers/remoteproc/remoteproc_core.c    | 40 ++++++++++++++++++++-----
>>   drivers/remoteproc/remoteproc_debugfs.c | 37 ++++++++++++++++++-----
>>   include/linux/remoteproc.h              | 26 ++++++++++++++++
>>   3 files changed, 87 insertions(+), 16 deletions(-)
>>
>> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
>> index 53bc37c508c6..b9a097990862 100644
>> --- a/drivers/remoteproc/remoteproc_core.c
>> +++ b/drivers/remoteproc/remoteproc_core.c
>> @@ -609,21 +609,45 @@ void rproc_vdev_release(struct kref *ref)
>>    *
>>    * Returns 0 on success, or an appropriate error code otherwise
>>    */
>> -static int rproc_handle_trace(struct rproc *rproc, struct fw_rsc_trace *rsc,
>> +static int rproc_handle_trace(struct rproc *rproc, void *rsc,
>>   			      int offset, int avail, u16 ver)
>>   {
>>   	struct rproc_debug_trace *trace;
>>   	struct device *dev = &rproc->dev;
>> +	struct fw_rsc_trace *rsc1;
>> +	struct fw_rsc_trace2 *rsc2;
>>   	char name[15];
>> +	size_t rsc_size;
>> +	u32 reserved;
>> +	u64 da;
>> +	u32 len;
>> +
>> +	if (!ver) {
> 
> This looks like a switch to me, but I also do think this looks rather
> crude, if you spin off the tail of this function and call it from a
> rproc_handle_trace() and rproc_handle_trace64() I believe this would be
> cleaner.

Yeah, ok. Will refactor for this in v2.

> 
>> +		rsc1 = (struct fw_rsc_trace *)rsc;
>> +		rsc_size = sizeof(*rsc1);
>> +		reserved = rsc1->reserved;
>> +		da = rsc1->da;
>> +		len = rsc1->len;
>> +	} else if (ver == 1) {
>> +		rsc2 = (struct fw_rsc_trace2 *)rsc;
>> +		rsc_size = sizeof(*rsc2);
>> +		reserved = rsc2->reserved;
>> +		da = rsc2->da;
>> +		len = rsc2->len;
>> +	} else {
>> +		dev_err(dev, "unsupported trace rsc version %d\n", ver);
> 
> If we use "type" to describe your 64-bit-da-trace then this sanity check
> would have been taken care of by the core.
> 
>> +		return -EINVAL;
>> +	}
>>   
>> -	if (sizeof(*rsc) > avail) {
>> +	if (rsc_size > avail) {
>>   		dev_err(dev, "trace rsc is truncated\n");
>>   		return -EINVAL;
>>   	}
>>   
>>   	/* make sure reserved bytes are zeroes */
>> -	if (rsc->reserved) {
>> -		dev_err(dev, "trace rsc has non zero reserved bytes\n");
>> +	if (reserved) {
>> +		dev_err(dev, "trace rsc has non zero reserved bytes, value = 0x%x\n",
>> +			reserved);
>>   		return -EINVAL;
>>   	}
>>   
>> @@ -632,8 +656,8 @@ static int rproc_handle_trace(struct rproc *rproc, struct fw_rsc_trace *rsc,
>>   		return -ENOMEM;
>>   
>>   	/* set the trace buffer dma properties */
>> -	trace->trace_mem.len = rsc->len;
>> -	trace->trace_mem.da = rsc->da;
>> +	trace->trace_mem.len = len;
>> +	trace->trace_mem.da = da;
>>   
>>   	/* set pointer on rproc device */
>>   	trace->rproc = rproc;
>> @@ -652,8 +676,8 @@ static int rproc_handle_trace(struct rproc *rproc, struct fw_rsc_trace *rsc,
>>   
>>   	rproc->num_traces++;
>>   
>> -	dev_dbg(dev, "%s added: da 0x%x, len 0x%x\n",
>> -		name, rsc->da, rsc->len);
>> +	dev_dbg(dev, "%s added: da 0x%llx, len 0x%x\n",
>> +		name, da, len);
>>   
>>   	return 0;
>>   }
>> diff --git a/drivers/remoteproc/remoteproc_debugfs.c b/drivers/remoteproc/remoteproc_debugfs.c
>> index 3560eed7a360..ff43736db45a 100644
>> --- a/drivers/remoteproc/remoteproc_debugfs.c
>> +++ b/drivers/remoteproc/remoteproc_debugfs.c
>> @@ -192,7 +192,8 @@ static int rproc_rsc_table_show(struct seq_file *seq, void *p)
>>   	struct resource_table *table = rproc->table_ptr;
>>   	struct fw_rsc_carveout *c;
>>   	struct fw_rsc_devmem *d;
>> -	struct fw_rsc_trace *t;
>> +	struct fw_rsc_trace *t1;
>> +	struct fw_rsc_trace2 *t2;
>>   	struct fw_rsc_vdev *v;
>>   	int i, j;
>>   
>> @@ -205,6 +206,7 @@ static int rproc_rsc_table_show(struct seq_file *seq, void *p)
>>   		int offset = table->offset[i];
>>   		struct fw_rsc_hdr *hdr = (void *)table + offset;
>>   		void *rsc = (void *)hdr + sizeof(*hdr);
>> +		u16 ver = hdr->st.v;
>>   
>>   		switch (hdr->st.t) {
>>   		case RSC_CARVEOUT:
>> @@ -230,13 +232,32 @@ static int rproc_rsc_table_show(struct seq_file *seq, void *p)
>>   			seq_printf(seq, "  Name %s\n\n", d->name);
>>   			break;
>>   		case RSC_TRACE:
>> -			t = rsc;
>> -			seq_printf(seq, "Entry %d is of type %s\n",
>> -				   i, types[hdr->st.t]);
>> -			seq_printf(seq, "  Device Address 0x%x\n", t->da);
>> -			seq_printf(seq, "  Length 0x%x Bytes\n", t->len);
>> -			seq_printf(seq, "  Reserved (should be zero) [%d]\n", t->reserved);
>> -			seq_printf(seq, "  Name %s\n\n", t->name);
>> +			if (ver == 0) {
> 
> Again, this is a switch, here in a switch. Just defining a new
> RSC_TRACE64 type would reduce the amount of code here...

OK.

> 
>> +				t1 = rsc;
>> +				seq_printf(seq, "Entry %d is version %d of type %s\n",
>> +					   i, ver, types[hdr->st.t]);
>> +				seq_printf(seq, "  Device Address 0x%x\n",
>> +					   t1->da);
>> +				seq_printf(seq, "  Length 0x%x Bytes\n",
>> +					   t1->len);
>> +				seq_printf(seq, "  Reserved (should be zero) [%d]\n",
>> +					   t1->reserved);
>> +				seq_printf(seq, "  Name %s\n\n", t1->name);
>> +			} else if (ver == 1) {
>> +				t2 = rsc;
>> +				seq_printf(seq, "Entry %d is version %d of type %s\n",
>> +					   i, ver, types[hdr->st.t]);
>> +				seq_printf(seq, "  Device Address 0x%llx\n",
>> +					   t2->da);
>> +				seq_printf(seq, "  Length 0x%x Bytes\n",
>> +					   t2->len);
>> +				seq_printf(seq, "  Reserved (should be zero) [%d]\n",
>> +					   t2->reserved);
>> +				seq_printf(seq, "  Name %s\n\n", t2->name);
>> +			} else {
>> +				seq_printf(seq, "Entry %d is an unsupported version %d of type %s\n",
>> +					   i, ver, types[hdr->st.t]);
>> +			}
>>   			break;
>>   		case RSC_VDEV:
>>   			v = rsc;
>> diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
>> index 526d3cb45e37..3b3bea42f8b1 100644
>> --- a/include/linux/remoteproc.h
>> +++ b/include/linux/remoteproc.h
>> @@ -243,6 +243,32 @@ struct fw_rsc_trace {
>>   	u8 name[32];
>>   } __packed;
>>   
>> +/**
>> + * struct fw_rsc_trace2 - trace buffer declaration supporting 64-bits
>> + * @padding: initial padding after type field for aligned 64-bit access
>> + * @da: device address (64-bit)
>> + * @len: length (in bytes)
>> + * @reserved: reserved (must be zero)
>> + * @name: human-readable name of the trace buffer
>> + *
>> + * This resource entry is an enhanced version of the fw_rsc_trace resourec entry
>> + * and the provides equivalent functionality but designed for 64-bit remote
>> + * processors.
>> + *
>> + * @da specifies the device address of the buffer, @len specifies
>> + * its size, and @name may contain a human readable name of the trace buffer.
>> + *
>> + * After booting the remote processor, the trace buffers are exposed to the
>> + * user via debugfs entries (called trace0, trace1, etc..).
>> + */
>> +struct fw_rsc_trace2 {
> 
> Sounds more like fw_rsc_trace64 to me - in particular since the version
> of trace2 is 1...

Yeah, will rename this.

> 
>> +	u32 padding;
>> +	u64 da;
>> +	u32 len;
>> +	u32 reserved;
> 
> What's the purpose of this reserved field?

Partly to make sure the entire resource is aligned on an 8-byte, and 
partly copied over from fw_rsc_trace entry. I guess 32-bits is already 
large enough of a size for trace entries irrespective of 32-bit or 
64-bit traces, so I doubt if we want to make the len field also a u64.

regards
Suman

> 
> Regards,
> Bjorn
> 
>> +	u8 name[32];
>> +} __packed;
>> +
>>   /**
>>    * struct fw_rsc_vdev_vring - vring descriptor entry
>>    * @da: device address
>> -- 
>> 2.23.0
>>
Suman Anna May 21, 2020, 7:52 p.m. UTC | #10
On 5/21/20 2:41 PM, Bjorn Andersson wrote:
> On Thu 21 May 12:29 PDT 2020, Suman Anna wrote:
> 
>> On 5/21/20 2:21 PM, Bjorn Andersson wrote:
>>> On Thu 21 May 12:06 PDT 2020, Suman Anna wrote:
>>>
>>>> Hi Bjorn,
>>>>
>>>> On 5/21/20 12:54 PM, Bjorn Andersson wrote:
>>>>> On Wed 25 Mar 13:46 PDT 2020, Suman Anna wrote:
>>>>>
>>>>>> The current remoteproc core has supported only 32-bit remote
>>>>>> processors and as such some of the current resource structures
>>>>>> may not scale well for 64-bit remote processors, and would
>>>>>> require new versions of resource types. Each resource is currently
>>>>>> identified by a 32-bit type field. Introduce the concept of version
>>>>>> for these resource types by overloading this 32-bit type field
>>>>>> into two 16-bit version and type fields with the existing resources
>>>>>> behaving as version 0 thereby providing backward compatibility.
>>>>>>
>>>>>> The version field is passed as an additional argument to each of
>>>>>> the handler functions, and all the existing handlers are updated
>>>>>> accordingly. Each specific handler will be updated on a need basis
>>>>>> when a new version of the resource type is added.
>>>>>>
>>>>>
>>>>> I really would prefer that we add additional types for the new
>>>>> structures, neither side will be compatible with new versions without
>>>>> enhancements to their respective implementations anyways.
>>>>
>>>> OK.
>>>>
>>>>>
>>>>>> An alternate way would be to introduce the new types as completely
>>>>>> new resource types which would require additional customization of
>>>>>> the resource handlers based on the 32-bit or 64-bit mode of a remote
>>>>>> processor, and introduction of an additional mode flag to the rproc
>>>>>> structure.
>>>>>>
>>>>>
>>>>> What would this "mode" indicate? If it's version 0 or 1?
>>>>
>>>> No, for indicating if the remoteproc is 32-bit or 64-bit and adjust the
>>>> loading handlers if the resource types need to be segregated accordingly.
>>>>
>>>
>>> Sorry, I think I'm misunderstanding something. Wouldn't your 64-bit
>>> remote processor need different firmware from your 32-bit processor
>>> anyways, if you want to support the wider resource? And you would pack
>>> your firmware with the appropriate resource types?
>>
>> Yes, that's correct.
>>
>>>
>>> Afaict the bit width of your remote processor, busses or memory is
>>> unrelated to the choice of number of bits used to express things in the
>>> resource table.
>>
>> I would have to add the new resource type to the loading_handlers right, so
>> it is a question of whether we want to impose any restrictions in remoteproc
>> core or not from supporting a certain resource type (eg: I don't expect
>> RSC_TRACE entries on 64-bit processors).
>>
> 
> Right, but either you add support for new resource types to the
> loading_handlers, or you add the version checks within each handler,
> either way you will have to do some work to be compatible with new
> versions.
> 
> Regarding what resources would be fit for a 64-bit processor probably
> relates to many things, in particular the question of what we actually
> mean when we say that a coprocessor is 64-bit. So I don't really see a
> need for the remoteproc core to prevent someone to design their
> system/firmware to have a 64-bit CPU being passed 32-bit addresses.

OK. In general, I have seen firmware developers get confused w.r.t the 
resource types, that's why I was inclined to go with the restrictive 
checking. Anyway, will rework the support as per the comments.

regards
Suman
Suman Anna May 22, 2020, 4:54 p.m. UTC | #11
On 5/21/20 2:42 PM, Suman Anna wrote:
> Hi Bjorn,
> 
> On 5/21/20 1:04 PM, Bjorn Andersson wrote:
>> On Wed 25 Mar 13:47 PDT 2020, Suman Anna wrote:
>>
>>> Introduce a new trace entry resource structure that accommodates
>>> a 64-bit device address to support 64-bit processors. This is to
>>> be used using an overloaded version value of 1 in the upper 32-bits
>>> of the previous resource type field. The new resource still uses
>>> 32-bits for the length field (followed by a 32-bit reserved field,
>>> so can be updated in the future), which is a sufficiently large
>>> trace buffer size. A 32-bit padding field also had to be added
>>> to align the device address on a 64-bit boundary, and match the
>>> usage on the firmware side.
>>>
>>> The remoteproc debugfs logic also has been adjusted accordingly.
>>>
>>> Signed-off-by: Suman Anna <s-anna@ti.com>
>>> ---
>>>   drivers/remoteproc/remoteproc_core.c    | 40 ++++++++++++++++++++-----
>>>   drivers/remoteproc/remoteproc_debugfs.c | 37 ++++++++++++++++++-----
>>>   include/linux/remoteproc.h              | 26 ++++++++++++++++
>>>   3 files changed, 87 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/drivers/remoteproc/remoteproc_core.c 
>>> b/drivers/remoteproc/remoteproc_core.c
>>> index 53bc37c508c6..b9a097990862 100644
>>> --- a/drivers/remoteproc/remoteproc_core.c
>>> +++ b/drivers/remoteproc/remoteproc_core.c
>>> @@ -609,21 +609,45 @@ void rproc_vdev_release(struct kref *ref)
>>>    *
>>>    * Returns 0 on success, or an appropriate error code otherwise
>>>    */
>>> -static int rproc_handle_trace(struct rproc *rproc, struct 
>>> fw_rsc_trace *rsc,
>>> +static int rproc_handle_trace(struct rproc *rproc, void *rsc,
>>>                     int offset, int avail, u16 ver)
>>>   {
>>>       struct rproc_debug_trace *trace;
>>>       struct device *dev = &rproc->dev;
>>> +    struct fw_rsc_trace *rsc1;
>>> +    struct fw_rsc_trace2 *rsc2;
>>>       char name[15];
>>> +    size_t rsc_size;
>>> +    u32 reserved;
>>> +    u64 da;
>>> +    u32 len;
>>> +
>>> +    if (!ver) {
>>
>> This looks like a switch to me, but I also do think this looks rather
>> crude, if you spin off the tail of this function and call it from a
>> rproc_handle_trace() and rproc_handle_trace64() I believe this would be
>> cleaner.
> 
> Yeah, ok. Will refactor for this in v2.
> 
>>
>>> +        rsc1 = (struct fw_rsc_trace *)rsc;
>>> +        rsc_size = sizeof(*rsc1);
>>> +        reserved = rsc1->reserved;
>>> +        da = rsc1->da;
>>> +        len = rsc1->len;
>>> +    } else if (ver == 1) {
>>> +        rsc2 = (struct fw_rsc_trace2 *)rsc;
>>> +        rsc_size = sizeof(*rsc2);
>>> +        reserved = rsc2->reserved;
>>> +        da = rsc2->da;
>>> +        len = rsc2->len;
>>> +    } else {
>>> +        dev_err(dev, "unsupported trace rsc version %d\n", ver);
>>
>> If we use "type" to describe your 64-bit-da-trace then this sanity check
>> would have been taken care of by the core.
>>
>>> +        return -EINVAL;
>>> +    }
>>> -    if (sizeof(*rsc) > avail) {
>>> +    if (rsc_size > avail) {
>>>           dev_err(dev, "trace rsc is truncated\n");
>>>           return -EINVAL;
>>>       }
>>>       /* make sure reserved bytes are zeroes */
>>> -    if (rsc->reserved) {
>>> -        dev_err(dev, "trace rsc has non zero reserved bytes\n");
>>> +    if (reserved) {
>>> +        dev_err(dev, "trace rsc has non zero reserved bytes, value = 
>>> 0x%x\n",
>>> +            reserved);
>>>           return -EINVAL;
>>>       }
>>> @@ -632,8 +656,8 @@ static int rproc_handle_trace(struct rproc 
>>> *rproc, struct fw_rsc_trace *rsc,
>>>           return -ENOMEM;
>>>       /* set the trace buffer dma properties */
>>> -    trace->trace_mem.len = rsc->len;
>>> -    trace->trace_mem.da = rsc->da;
>>> +    trace->trace_mem.len = len;
>>> +    trace->trace_mem.da = da;
>>>       /* set pointer on rproc device */
>>>       trace->rproc = rproc;
>>> @@ -652,8 +676,8 @@ static int rproc_handle_trace(struct rproc 
>>> *rproc, struct fw_rsc_trace *rsc,
>>>       rproc->num_traces++;
>>> -    dev_dbg(dev, "%s added: da 0x%x, len 0x%x\n",
>>> -        name, rsc->da, rsc->len);
>>> +    dev_dbg(dev, "%s added: da 0x%llx, len 0x%x\n",
>>> +        name, da, len);
>>>       return 0;
>>>   }
>>> diff --git a/drivers/remoteproc/remoteproc_debugfs.c 
>>> b/drivers/remoteproc/remoteproc_debugfs.c
>>> index 3560eed7a360..ff43736db45a 100644
>>> --- a/drivers/remoteproc/remoteproc_debugfs.c
>>> +++ b/drivers/remoteproc/remoteproc_debugfs.c
>>> @@ -192,7 +192,8 @@ static int rproc_rsc_table_show(struct seq_file 
>>> *seq, void *p)
>>>       struct resource_table *table = rproc->table_ptr;
>>>       struct fw_rsc_carveout *c;
>>>       struct fw_rsc_devmem *d;
>>> -    struct fw_rsc_trace *t;
>>> +    struct fw_rsc_trace *t1;
>>> +    struct fw_rsc_trace2 *t2;
>>>       struct fw_rsc_vdev *v;
>>>       int i, j;
>>> @@ -205,6 +206,7 @@ static int rproc_rsc_table_show(struct seq_file 
>>> *seq, void *p)
>>>           int offset = table->offset[i];
>>>           struct fw_rsc_hdr *hdr = (void *)table + offset;
>>>           void *rsc = (void *)hdr + sizeof(*hdr);
>>> +        u16 ver = hdr->st.v;
>>>           switch (hdr->st.t) {
>>>           case RSC_CARVEOUT:
>>> @@ -230,13 +232,32 @@ static int rproc_rsc_table_show(struct seq_file 
>>> *seq, void *p)
>>>               seq_printf(seq, "  Name %s\n\n", d->name);
>>>               break;
>>>           case RSC_TRACE:
>>> -            t = rsc;
>>> -            seq_printf(seq, "Entry %d is of type %s\n",
>>> -                   i, types[hdr->st.t]);
>>> -            seq_printf(seq, "  Device Address 0x%x\n", t->da);
>>> -            seq_printf(seq, "  Length 0x%x Bytes\n", t->len);
>>> -            seq_printf(seq, "  Reserved (should be zero) [%d]\n", 
>>> t->reserved);
>>> -            seq_printf(seq, "  Name %s\n\n", t->name);
>>> +            if (ver == 0) {
>>
>> Again, this is a switch, here in a switch. Just defining a new
>> RSC_TRACE64 type would reduce the amount of code here...
> 
> OK.
> 
>>
>>> +                t1 = rsc;
>>> +                seq_printf(seq, "Entry %d is version %d of type %s\n",
>>> +                       i, ver, types[hdr->st.t]);
>>> +                seq_printf(seq, "  Device Address 0x%x\n",
>>> +                       t1->da);
>>> +                seq_printf(seq, "  Length 0x%x Bytes\n",
>>> +                       t1->len);
>>> +                seq_printf(seq, "  Reserved (should be zero) [%d]\n",
>>> +                       t1->reserved);
>>> +                seq_printf(seq, "  Name %s\n\n", t1->name);
>>> +            } else if (ver == 1) {
>>> +                t2 = rsc;
>>> +                seq_printf(seq, "Entry %d is version %d of type %s\n",
>>> +                       i, ver, types[hdr->st.t]);
>>> +                seq_printf(seq, "  Device Address 0x%llx\n",
>>> +                       t2->da);
>>> +                seq_printf(seq, "  Length 0x%x Bytes\n",
>>> +                       t2->len);
>>> +                seq_printf(seq, "  Reserved (should be zero) [%d]\n",
>>> +                       t2->reserved);
>>> +                seq_printf(seq, "  Name %s\n\n", t2->name);
>>> +            } else {
>>> +                seq_printf(seq, "Entry %d is an unsupported version 
>>> %d of type %s\n",
>>> +                       i, ver, types[hdr->st.t]);
>>> +            }
>>>               break;
>>>           case RSC_VDEV:
>>>               v = rsc;
>>> diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
>>> index 526d3cb45e37..3b3bea42f8b1 100644
>>> --- a/include/linux/remoteproc.h
>>> +++ b/include/linux/remoteproc.h
>>> @@ -243,6 +243,32 @@ struct fw_rsc_trace {
>>>       u8 name[32];
>>>   } __packed;
>>> +/**
>>> + * struct fw_rsc_trace2 - trace buffer declaration supporting 64-bits
>>> + * @padding: initial padding after type field for aligned 64-bit access
>>> + * @da: device address (64-bit)
>>> + * @len: length (in bytes)
>>> + * @reserved: reserved (must be zero)
>>> + * @name: human-readable name of the trace buffer
>>> + *
>>> + * This resource entry is an enhanced version of the fw_rsc_trace 
>>> resourec entry
>>> + * and the provides equivalent functionality but designed for 64-bit 
>>> remote
>>> + * processors.
>>> + *
>>> + * @da specifies the device address of the buffer, @len specifies
>>> + * its size, and @name may contain a human readable name of the 
>>> trace buffer.
>>> + *
>>> + * After booting the remote processor, the trace buffers are exposed 
>>> to the
>>> + * user via debugfs entries (called trace0, trace1, etc..).
>>> + */
>>> +struct fw_rsc_trace2 {
>>
>> Sounds more like fw_rsc_trace64 to me - in particular since the version
>> of trace2 is 1...
> 
> Yeah, will rename this.
> 
>>
>>> +    u32 padding;
>>> +    u64 da;
>>> +    u32 len;
>>> +    u32 reserved;
>>
>> What's the purpose of this reserved field?
> 
> Partly to make sure the entire resource is aligned on an 8-byte, and 
> partly copied over from fw_rsc_trace entry. I guess 32-bits is already 
> large enough of a size for trace entries irrespective of 32-bit or 
> 64-bit traces, so I doubt if we want to make the len field also a u64.

Looking at this again, I can drop both padding and reserved fields, if I 
move the len field before da. Any preferences/comments?

regards
Suman

> 
> regards
> Suman
> 
>>
>> Regards,
>> Bjorn
>>
>>> +    u8 name[32];
>>> +} __packed;
>>> +
>>>   /**
>>>    * struct fw_rsc_vdev_vring - vring descriptor entry
>>>    * @da: device address
>>> -- 
>>> 2.23.0
>>>
>
Bjorn Andersson May 22, 2020, 5:33 p.m. UTC | #12
On Fri 22 May 09:54 PDT 2020, Suman Anna wrote:

> On 5/21/20 2:42 PM, Suman Anna wrote:
> > Hi Bjorn,
> > 
> > On 5/21/20 1:04 PM, Bjorn Andersson wrote:
> > > On Wed 25 Mar 13:47 PDT 2020, Suman Anna wrote:
[..]
> > > > diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
[..]
> > > > +struct fw_rsc_trace2 {
> > > 
> > > Sounds more like fw_rsc_trace64 to me - in particular since the version
> > > of trace2 is 1...
> > 
> > Yeah, will rename this.
> > 
> > > 
> > > > +    u32 padding;
> > > > +    u64 da;
> > > > +    u32 len;
> > > > +    u32 reserved;
> > > 
> > > What's the purpose of this reserved field?
> > 
> > Partly to make sure the entire resource is aligned on an 8-byte, and
> > partly copied over from fw_rsc_trace entry. I guess 32-bits is already
> > large enough of a size for trace entries irrespective of 32-bit or
> > 64-bit traces, so I doubt if we want to make the len field also a u64.
> 
> Looking at this again, I can drop both padding and reserved fields, if I
> move the len field before da. Any preferences/comments?
> 

Sounds good to me.

Thanks,
Bjorn
Clement Leger May 22, 2020, 6:03 p.m. UTC | #13
Hi Suman,

----- On 22 May, 2020, at 19:33, Bjorn Andersson bjorn.andersson@linaro.org wrote:

> On Fri 22 May 09:54 PDT 2020, Suman Anna wrote:
> 
>> On 5/21/20 2:42 PM, Suman Anna wrote:
>> > Hi Bjorn,
>> > 
>> > On 5/21/20 1:04 PM, Bjorn Andersson wrote:
>> > > On Wed 25 Mar 13:47 PDT 2020, Suman Anna wrote:
> [..]
>> > > > diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
> [..]
>> > > > +struct fw_rsc_trace2 {
>> > > 
>> > > Sounds more like fw_rsc_trace64 to me - in particular since the version
>> > > of trace2 is 1...
>> > 
>> > Yeah, will rename this.
>> > 
>> > > 
>> > > > +    u32 padding;
>> > > > +    u64 da;
>> > > > +    u32 len;
>> > > > +    u32 reserved;
>> > > 
>> > > What's the purpose of this reserved field?
>> > 
>> > Partly to make sure the entire resource is aligned on an 8-byte, and
>> > partly copied over from fw_rsc_trace entry. I guess 32-bits is already
>> > large enough of a size for trace entries irrespective of 32-bit or
>> > 64-bit traces, so I doubt if we want to make the len field also a u64.
>> 
>> Looking at this again, I can drop both padding and reserved fields, if I
>> move the len field before da. Any preferences/comments?

Not only the in structure alignment matters but also in the resource table.
Since the resource table is often packed (see [1] for instance), if a


[1] https://github.com/OpenAMP/open-amp/blob/master/apps/machine/zynqmp_r5/rsc_table.h

>> 
> 
> Sounds good to me.
> 
> Thanks,
> Bjorn
Clement Leger May 22, 2020, 6:10 p.m. UTC | #14
----- On 22 May, 2020, at 20:03, Clément Leger cleger@kalray.eu wrote:

> Hi Suman,
> 
> ----- On 22 May, 2020, at 19:33, Bjorn Andersson bjorn.andersson@linaro.org
> wrote:
> 
>> On Fri 22 May 09:54 PDT 2020, Suman Anna wrote:
>> 
>>> On 5/21/20 2:42 PM, Suman Anna wrote:
>>> > Hi Bjorn,
>>> > 
>>> > On 5/21/20 1:04 PM, Bjorn Andersson wrote:
>>> > > On Wed 25 Mar 13:47 PDT 2020, Suman Anna wrote:
>> [..]
>>> > > > diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
>> [..]
>>> > > > +struct fw_rsc_trace2 {
>>> > > 
>>> > > Sounds more like fw_rsc_trace64 to me - in particular since the version
>>> > > of trace2 is 1...
>>> > 
>>> > Yeah, will rename this.
>>> > 
>>> > > 
>>> > > > +    u32 padding;
>>> > > > +    u64 da;
>>> > > > +    u32 len;
>>> > > > +    u32 reserved;
>>> > > 
>>> > > What's the purpose of this reserved field?
>>> > 
>>> > Partly to make sure the entire resource is aligned on an 8-byte, and
>>> > partly copied over from fw_rsc_trace entry. I guess 32-bits is already
>>> > large enough of a size for trace entries irrespective of 32-bit or
>>> > 64-bit traces, so I doubt if we want to make the len field also a u64.
>>> 
>>> Looking at this again, I can drop both padding and reserved fields, if I
>>> move the len field before da. Any preferences/comments?
> 
Sorry, my message went a bit too fast... So as I was saying:

Not only the in-structure alignment matters but also in the resource table.
Since the resource table is often packed (see [1] for instance), if a trace
resource is embedded in the resource table after another resource aligned
on 32 bits only, your 64 bits trace field will potentially end up
misaligned.

To overcome this, there is multiple solutions:

- Split the 64 bits fields into 32bits low and high parts:
Since all resources are aligned on 32bits, it will be ok

- Use memcpy_from/to_io when reading/writing such fields
As I said in a previous message this should probably be used since
the memories that are accessed by rproc are io mem (ioremap in almost
all drivers).

Regards,

Clément

[1]  https://github.com/OpenAMP/open-amp/blob/master/apps/machine/zynqmp_r5/rsc_table.h
> 
> 
> 
> 
>>> 
>> 
>> Sounds good to me.
>> 
>> Thanks,
> > Bjorn
Suman Anna May 22, 2020, 6:59 p.m. UTC | #15
Hi Clement,

>  > ----- On 22 May, 2020, at 20:03, Clément Leger cleger@kalray.eu wrote:>
>> Hi Suman,
>>
>> ----- On 22 May, 2020, at 19:33, Bjorn Andersson bjorn.andersson@linaro.org
>> wrote:
>>
>>> On Fri 22 May 09:54 PDT 2020, Suman Anna wrote:
>>>
>>>> On 5/21/20 2:42 PM, Suman Anna wrote:
>>>>> Hi Bjorn,
>>>>>
>>>>> On 5/21/20 1:04 PM, Bjorn Andersson wrote:
>>>>>> On Wed 25 Mar 13:47 PDT 2020, Suman Anna wrote:
>>> [..]
>>>>>>> diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
>>> [..]
>>>>>>> +struct fw_rsc_trace2 {
>>>>>>
>>>>>> Sounds more like fw_rsc_trace64 to me - in particular since the version
>>>>>> of trace2 is 1...
>>>>>
>>>>> Yeah, will rename this.
>>>>>
>>>>>>
>>>>>>> +    u32 padding;
>>>>>>> +    u64 da;
>>>>>>> +    u32 len;
>>>>>>> +    u32 reserved;
>>>>>>
>>>>>> What's the purpose of this reserved field?
>>>>>
>>>>> Partly to make sure the entire resource is aligned on an 8-byte, and
>>>>> partly copied over from fw_rsc_trace entry. I guess 32-bits is already
>>>>> large enough of a size for trace entries irrespective of 32-bit or
>>>>> 64-bit traces, so I doubt if we want to make the len field also a u64.
>>>>
>>>> Looking at this again, I can drop both padding and reserved fields, if I
>>>> move the len field before da. Any preferences/comments?
>>
> Sorry, my message went a bit too fast... So as I was saying:
> 
> Not only the in-structure alignment matters but also in the resource table.
> Since the resource table is often packed (see [1] for instance), if a trace
> resource is embedded in the resource table after another resource aligned
> on 32 bits only, your 64 bits trace field will potentially end up
> misaligned.

Right. Since one can mix and match the resources of different sizes and 
include them in any order, the onus is going to be on the resource table 
constructors to ensure the inter-resource alignments, if any are 
required. The resource table format allows you to add padding fields in 
between if needed, and the remoteproc core relies on the offsets.

I can only ensure the alignment within this resource structure with 
ready-available access and conversion to/from a 64-bit type, as long as 
the resource is starting on a 64-bit boundary.

> 
> To overcome this, there is multiple solutions:
> 
> - Split the 64 bits fields into 32bits low and high parts:
> Since all resources are aligned on 32bits, it will be ok

Yes, this is one solution. At the same time, this means you need 
additional conversion logic for converting to and from 64-bit field. In 
this particular case, da is the address of the trace buffer pointer on a 
64-bit processor, so we can directly use the address of the trace 
buffer. Guess it is a question of easier translation vs packing the 
resource table as tight as possible.

> 
> - Use memcpy_from/to_io when reading/writing such fields
> As I said in a previous message this should probably be used since
> the memories that are accessed by rproc are io mem (ioremap in almost
> all drivers).

Anything running out of DDR actually doesn't need the io mem semantics, 
so we actually need to be fixing the drivers. Blindly changing the 
current memcpy to memcpy_to_io in the core loader is also not right. Any 
internal memories properties will actually depend on the processor and 
SoC. Eg: The R5 TCM interfaces in general can be treated as normal memories.

regards
Suman

> 
> Regards,
> 
> Clément
> 
> [1]  https://github.com/OpenAMP/open-amp/blob/master/apps/machine/zynqmp_r5/rsc_table.h
>>
>>
>>
>>
>>>>
>>>
>>> Sounds good to me.
>>>
>>> Thanks,
>>> Bjorn
Clement Leger May 22, 2020, 7:28 p.m. UTC | #16
Hi Suman,

----- On 22 May, 2020, at 20:59, s-anna s-anna@ti.com wrote:

> Hi Clement,
> 
>>  > ----- On 22 May, 2020, at 20:03, Clément Leger cleger@kalray.eu wrote:>
>>> Hi Suman,
>>>
>>> ----- On 22 May, 2020, at 19:33, Bjorn Andersson bjorn.andersson@linaro.org
>>> wrote:
>>>
>>>> On Fri 22 May 09:54 PDT 2020, Suman Anna wrote:
>>>>
>>>>> On 5/21/20 2:42 PM, Suman Anna wrote:
>>>>>> Hi Bjorn,
>>>>>>
>>>>>> On 5/21/20 1:04 PM, Bjorn Andersson wrote:
>>>>>>> On Wed 25 Mar 13:47 PDT 2020, Suman Anna wrote:
>>>> [..]
>>>>>>>> diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
>>>> [..]
>>>>>>>> +struct fw_rsc_trace2 {
>>>>>>>
>>>>>>> Sounds more like fw_rsc_trace64 to me - in particular since the version
>>>>>>> of trace2 is 1...
>>>>>>
>>>>>> Yeah, will rename this.
>>>>>>
>>>>>>>
>>>>>>>> +    u32 padding;
>>>>>>>> +    u64 da;
>>>>>>>> +    u32 len;
>>>>>>>> +    u32 reserved;
>>>>>>>
>>>>>>> What's the purpose of this reserved field?
>>>>>>
>>>>>> Partly to make sure the entire resource is aligned on an 8-byte, and
>>>>>> partly copied over from fw_rsc_trace entry. I guess 32-bits is already
>>>>>> large enough of a size for trace entries irrespective of 32-bit or
>>>>>> 64-bit traces, so I doubt if we want to make the len field also a u64.
>>>>>
>>>>> Looking at this again, I can drop both padding and reserved fields, if I
>>>>> move the len field before da. Any preferences/comments?
>>>
>> Sorry, my message went a bit too fast... So as I was saying:
>> 
>> Not only the in-structure alignment matters but also in the resource table.
>> Since the resource table is often packed (see [1] for instance), if a trace
>> resource is embedded in the resource table after another resource aligned
>> on 32 bits only, your 64 bits trace field will potentially end up
>> misaligned.
> 
> Right. Since one can mix and match the resources of different sizes and
> include them in any order, the onus is going to be on the resource table
> constructors to ensure the inter-resource alignments, if any are
> required. The resource table format allows you to add padding fields in
> between if needed, and the remoteproc core relies on the offsets.

Agreed

> 
> I can only ensure the alignment within this resource structure with
> ready-available access and conversion to/from a 64-bit type, as long as
> the resource is starting on a 64-bit boundary.
> 
>> 
>> To overcome this, there is multiple solutions:
>> 
>> - Split the 64 bits fields into 32bits low and high parts:
>> Since all resources are aligned on 32bits, it will be ok
> 
> Yes, this is one solution. At the same time, this means you need
> additional conversion logic for converting to and from 64-bit field. In
> this particular case, da is the address of the trace buffer pointer on a
> 64-bit processor, so we can directly use the address of the trace
> buffer. Guess it is a question of easier translation vs packing the
> resource table as tight as possible.

You simply have to add two wrapper such as the following:

static inline void rproc_rsc_set_addr(u32 *low, u32 *hi, u64 val)
{
	*low = lower_32_bits(val);
	*hi = upper_32_bits(val);
}

static inline u64 rproc_rsc_get_addr(u32 low, u32 hi)
{
	return ((u64) hi << 32) | low;
}

This is not really difficult to use and will ensure your new trace
resource can be used easily without breaking 32 bits alignment.
Breaking compatibility is an option also and it is probably needed
to document it clearly if it is chosen to do so.

> 
>> 
>> - Use memcpy_from/to_io when reading/writing such fields
>> As I said in a previous message this should probably be used since
>> the memories that are accessed by rproc are io mem (ioremap in almost
>> all drivers).
> 
> Anything running out of DDR actually doesn't need the io mem semantics,
> so we actually need to be fixing the drivers. Blindly changing the
> current memcpy to memcpy_to_io in the core loader is also not right. Any
> internal memories properties will actually depend on the processor and
> SoC. Eg: The R5 TCM interfaces in general can be treated as normal memories.

Agreed, this is most of the case indeed where the memories are actually
accessible directly. But using ioremap potentially creates a mapping that
does not support misaligned accesses and so accesses should be always aligned.
memcpy_from/to_io ensures that.
IMHO, there is probably something to be rework since the drivers are mapping
the memories but the core is accessing this memory, knowing nothing about
how it was mapped.

Regards,

Clément

> 
> regards
> Suman
> 
>> 
>> Regards,
>> 
>> Clément
>> 
>> [1]
>> https://github.com/OpenAMP/open-amp/blob/master/apps/machine/zynqmp_r5/rsc_table.h
>>>
>>>
>>>
>>>
>>>>>
>>>>
>>>> Sounds good to me.
>>>>
>>>> Thanks,
> >>> Bjorn