mbox series

[v2,00/10] Add support for TISCI irqchip drivers

Message ID 20181018154017.7112-1-lokeshvutla@ti.com
Headers show
Series Add support for TISCI irqchip drivers | expand

Message

Lokesh Vutla Oct. 18, 2018, 3:40 p.m. UTC
TISCI abstracts the handling of IRQ routes where interrupt sources
are not directly connected to host interrupt controller. This series
adds support for:
- TISCI commands needed for IRQ configuration
- Interrupt Router(INTR) and Interrupt Aggregator(INTA) drivers

More information on TISCI IRQ management can be found here[1].
Complete TISCI resource management information can be found here[2].
AM65x SoC related TISCI information can be found here[3].
INTR and INTA related information can be found in TRM[4].

[1] http://downloads.ti.com/tisci/esd/latest/2_tisci_msgs/rm/rm_irq.html
[2] http://downloads.ti.com/tisci/esd/latest/2_tisci_msgs/index.html#resource-management-rm
[3] http://downloads.ti.com/tisci/esd/latest/5_soc_doc/index.html#am6-soc-family
[4] http://www.ti.com/lit/pdf/spruid7

Changes since v1:
- Consolidated both TISCI and irqchip drivers as suggested by Marc.
- Each patch contains respective changes.

Grygorii Strashko (1):
  firmware: ti_sci: Add support to get TISCI handle using of_phandle

Lokesh Vutla (8):
  firmware: ti_sci: Add support for RM core ops
  firmware: ti_sci: Add support for IRQ management
  firmware: ti_sci: Add helper apis to manage resources
  dt-bindings: irqchip: Introduce TISCI Interrupt router bindings
  irqchip: ti-sci-intr: Add support for Interrupt Router driver
  dt-bindings: irqchip: Introduce TISCI Interrupt Aggregator bindings
  irqchip: ti-sci-inta: Add support for Interrupt Aggregator driver
  soc: ti: am6: Enable interrupt controller drivers

Peter Ujfalusi (1):
  firmware: ti_sci: Add RM mapping table for am654

 .../bindings/arm/keystone/ti,sci.txt          |   3 +-
 .../interrupt-controller/ti,sci-inta.txt      |  74 ++
 .../interrupt-controller/ti,sci-intr.txt      |  81 ++
 MAINTAINERS                                   |   4 +
 drivers/firmware/ti_sci.c                     | 850 ++++++++++++++++++
 drivers/firmware/ti_sci.h                     | 102 +++
 drivers/irqchip/Kconfig                       |  22 +
 drivers/irqchip/Makefile                      |   2 +
 drivers/irqchip/irq-ti-sci-inta.c             | 613 +++++++++++++
 drivers/irqchip/irq-ti-sci-intr.c             | 302 +++++++
 drivers/soc/ti/Kconfig                        |   3 +
 include/linux/irqchip/irq-ti-sci-inta.h       |  35 +
 include/linux/soc/ti/ti_sci_protocol.h        | 169 ++++
 13 files changed, 2259 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/devicetree/bindings/interrupt-controller/ti,sci-inta.txt
 create mode 100644 Documentation/devicetree/bindings/interrupt-controller/ti,sci-intr.txt
 create mode 100644 drivers/irqchip/irq-ti-sci-inta.c
 create mode 100644 drivers/irqchip/irq-ti-sci-intr.c
 create mode 100644 include/linux/irqchip/irq-ti-sci-inta.h

Comments

Marc Zyngier Oct. 19, 2018, 3:22 p.m. UTC | #1
Hi Lokesh,

On 18/10/18 16:40, Lokesh Vutla wrote:
> Texas Instruments' K3 generation SoCs has an IP Interrupt Aggregator
> which is an interrupt controller that does the following:
> - Converts events to interrupts that can be understood by
>   an interrupt router.
> - Allows for multiplexing of events to interrupts.
> - Allows for grouping of multiple events to a single interrupt.

Aren't the last two points the same thing? Also, can you please define
what an "event" is? What is its semantic? If they look like interrupts,
can we just name them as such?

> 
> Configuration of the interrupt aggregator registers can only be done by
> a system co-processor and the driver needs to send a message to this
> co processor over TISCI protocol.
> 
> Add support for Interrupt Aggregator driver over TISCI protocol.
> 
> Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
> Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
> ---
> Changes since v1:
> - New patch
> 
>  MAINTAINERS                             |   1 +
>  drivers/irqchip/Kconfig                 |  11 +
>  drivers/irqchip/Makefile                |   1 +
>  drivers/irqchip/irq-ti-sci-inta.c       | 613 ++++++++++++++++++++++++
>  include/linux/irqchip/irq-ti-sci-inta.h |  35 ++
>  5 files changed, 661 insertions(+)
>  create mode 100644 drivers/irqchip/irq-ti-sci-inta.c
>  create mode 100644 include/linux/irqchip/irq-ti-sci-inta.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 8cf1a6b73e6c..35c790ab0ae7 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -14689,6 +14689,7 @@ F:	drivers/reset/reset-ti-sci.c
>  F:	Documentation/devicetree/bindings/interrupt-controller/ti,sci-intr.txt
>  F:	Documentation/devicetree/bindings/interrupt-controller/ti,sci-inta.txt
>  F:	drivers/irqchip/irq-ti-sci-intr.c
> +F:	drivers/irqchip/irq-ti-sci-inta.c
>  
>  THANKO'S RAREMONO AM/FM/SW RADIO RECEIVER USB DRIVER
>  M:	Hans Verkuil <hverkuil@xs4all.nl>
> diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig
> index f6620a6bb872..895f6b47dc5b 100644
> --- a/drivers/irqchip/Kconfig
> +++ b/drivers/irqchip/Kconfig
> @@ -385,6 +385,17 @@ config TI_SCI_INTR_IRQCHIP
>  	  If you wish to use interrupt router irq resources managed by the
>  	  TI System Controller, say Y here. Otherwise, say N.
>  
> +config TI_SCI_INTA_IRQCHIP
> +	bool
> +	depends on TI_SCI_PROTOCOL && ARCH_K3
> +	select IRQ_DOMAIN
> +	select IRQ_DOMAIN_HIERARCHY
> +	help
> +	  This enables the irqchip driver support for K3 Interrupt aggregator
> +	  over TI System Control Interface available on some new TI's SoCs.
> +	  If you wish to use interrupt aggregator irq resources managed by the
> +	  TI System Controller, say Y here. Otherwise, say N.
> +
>  endmenu
>  
>  config SIFIVE_PLIC
> diff --git a/drivers/irqchip/Makefile b/drivers/irqchip/Makefile
> index 44bf65606d60..aede4c1cc4a6 100644
> --- a/drivers/irqchip/Makefile
> +++ b/drivers/irqchip/Makefile
> @@ -90,3 +90,4 @@ obj-$(CONFIG_NDS32)			+= irq-ativic32.o
>  obj-$(CONFIG_QCOM_PDC)			+= qcom-pdc.o
>  obj-$(CONFIG_SIFIVE_PLIC)		+= irq-sifive-plic.o
>  obj-$(CONFIG_TI_SCI_INTR_IRQCHIP)	+= irq-ti-sci-intr.o
> +obj-$(CONFIG_TI_SCI_INTA_IRQCHIP)	+= irq-ti-sci-inta.o
> diff --git a/drivers/irqchip/irq-ti-sci-inta.c b/drivers/irqchip/irq-ti-sci-inta.c
> new file mode 100644
> index 000000000000..ef0a2e8b782c
> --- /dev/null
> +++ b/drivers/irqchip/irq-ti-sci-inta.c
> @@ -0,0 +1,613 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Texas Instruments' K3 Interrupt Aggregator irqchip driver
> + *
> + * Copyright (C) 2018 Texas Instruments Incorporated - http://www.ti.com/
> + *	Lokesh Vutla <lokeshvutla@ti.com>
> + */
> +
> +#include <linux/err.h>
> +#include <linux/io.h>
> +#include <linux/irqchip.h>
> +#include <linux/of_platform.h>
> +#include <linux/of_address.h>
> +#include <linux/of_irq.h>
> +#include <linux/module.h>
> +#include <linux/moduleparam.h>
> +#include <linux/irqdomain.h>
> +#include <linux/soc/ti/ti_sci_protocol.h>
> +
> +#define MAX_EVENTS_PER_VINT	64
> +#define TI_SCI_EVENT_IRQ	BIT(31)
> +
> +#define VINT_ENABLE_CLR_OFFSET	0x18
> +
> +/**
> + * struct ti_sci_inta_irq_domain - Structure representing a TISCI based
> + *				   Interrupt Aggregator IRQ domain.
> + * @sci:	Pointer to TISCI handle
> + * @vint:	TISCI resource pointer representing IA inerrupts.
> + * @global_event:TISCI resource pointer representing global events.
> + * @base:	Base address of the memory mapped IO registers
> + * @ia_id:	TISCI device ID of this Interrupt Aggregator.
> + * @dst_id:	TISCI device ID of the destination irq controller.
> + */
> +struct ti_sci_inta_irq_domain {
> +	const struct ti_sci_handle *sci;
> +	struct ti_sci_resource *vint;
> +	struct ti_sci_resource *global_event;
> +	void __iomem *base;
> +	u16 ia_id;
> +	u16 dst_id;
> +};
> +
> +/**
> + * struct ti_sci_inta_event_desc - Description of an event coming to
> + *				   Interrupt Aggregator.
> + * @global_event:	Global event number corresponding to this event
> + * @src_id:		TISCI device ID of the event source
> + * @src_index:		Event source index within the device.
> + */
> +struct ti_sci_inta_event_desc {
> +	u16 global_event;
> +	u16 src_id;
> +	u16 src_index;
> +};
> +
> +/**
> + * struct ti_sci_inta_vint_desc - Description of a virtual interrupt coming out
> + *				  of Interrupt Aggregator.
> + * @lock:		lock to guard the event map
> + * @event_map:		Bitmap to manage the allocation of events to vint.
> + * @events:		Array of event descriptors assigned to this vint.
> + * @ack_needed:		Event needs to be acked via INTA. This is used when
> + *			HW generating events cannot clear the events by itself.
> + *			Assuming that only events from the same hw block are
> + *			grouped. So all the events attached to vint needs
> + *			an ack or none needs an ack.
> + */
> +struct ti_sci_inta_vint_desc {
> +	raw_spinlock_t lock;
> +	unsigned long *event_map;
> +	struct ti_sci_inta_event_desc events[MAX_EVENTS_PER_VINT];
> +	bool ack_needed;
> +};
> +
> +static void ti_sci_inta_irq_eoi(struct irq_data *data)
> +{
> +	struct ti_sci_inta_irq_domain *inta = data->domain->host_data;
> +	struct ti_sci_inta_vint_desc *vint_desc;
> +	u64 val;
> +	int bit;
> +
> +	vint_desc = irq_data_get_irq_chip_data(data);
> +	if (!vint_desc->ack_needed)
> +		goto out;
> +
> +	for_each_set_bit(bit, vint_desc->event_map, MAX_EVENTS_PER_VINT) {
> +		val = 1 << bit;
> +		__raw_writeq(val, inta->base + data->hwirq * 0x1000 +
> +			     VINT_ENABLE_CLR_OFFSET);
> +	}

If you need an ack callback, why is this part of the eoi? We have
interrupt flows that allow you to combine both, so why don't you use that?

Also, the __raw_writeq call is probably wrong, as it assumes that both
the CPU and the INTA have the same endianness.

> +
> +out:
> +	irq_chip_eoi_parent(data);
> +}
> +
> +static struct irq_chip ti_sci_inta_irq_chip = {
> +	.name			= "INTA",
> +	.irq_eoi		= ti_sci_inta_irq_eoi,
> +	.irq_mask		= irq_chip_mask_parent,
> +	.irq_unmask		= irq_chip_unmask_parent,
> +	.irq_retrigger		= irq_chip_retrigger_hierarchy,
> +	.irq_set_type		= irq_chip_set_type_parent,
> +	.irq_set_affinity	= irq_chip_set_affinity_parent,
> +};
> +
> +/**
> + * ti_sci_inta_irq_domain_translate() - Retrieve hwirq and type from
> + *					IRQ firmware specific handler.
> + * @domain:	Pointer to IRQ domain
> + * @fwspec:	Pointer to IRQ specific firmware structure
> + * @hwirq:	IRQ number identified by hardware
> + * @type:	IRQ type
> + *
> + * Return 0 if all went ok else appropriate error.
> + */
> +static int ti_sci_inta_irq_domain_translate(struct irq_domain *domain,
> +					    struct irq_fwspec *fwspec,
> +					    unsigned long *hwirq,
> +					    unsigned int *type)
> +{
> +	if (is_of_node(fwspec->fwnode)) {
> +		if (fwspec->param_count != 4)
> +			return -EINVAL;
> +
> +		*hwirq = fwspec->param[2];
> +		*type = fwspec->param[3] & IRQ_TYPE_SENSE_MASK;
> +
> +		return 0;
> +	}
> +
> +	return -EINVAL;
> +}
> +
> +/**
> + * ti_sci_free_event_irq() - Free an event from vint
> + * @inta:	Pointer to Interrupt Aggregator IRQ domain
> + * @vint_desc:	Virtual interrupt descriptor containing the event.
> + * @event_index:Event Index within the vint.
> + * @dst_irq:	Destination host irq
> + * @vint:	Interrupt number within interrupt aggregator.
> + */
> +static void ti_sci_free_event_irq(struct ti_sci_inta_irq_domain *inta,
> +				  struct ti_sci_inta_vint_desc *vint_desc,
> +				  u32 event_index, u16 dst_irq, u16 vint)
> +{
> +	struct ti_sci_inta_event_desc *event;
> +	unsigned long flags;
> +
> +	if (event_index >= MAX_EVENTS_PER_VINT)
> +		return;

How can this happen?

> +
> +	event = &vint_desc->events[event_index];
> +	inta->sci->ops.rm_irq_ops.free_event_irq(inta->sci,
> +						 event->src_id,
> +						 event->src_index,
> +						 inta->dst_id,
> +						 dst_irq,
> +						 inta->ia_id, vint,
> +						 event->global_event,
> +						 event_index);
> +
> +	raw_spin_lock_irqsave(&vint_desc->lock, flags);
> +	clear_bit(event_index, vint_desc->event_map);
> +	raw_spin_unlock_irqrestore(&vint_desc->lock, flags);

clear_bit is atomic. Why do you need a spinlock?

> +
> +	ti_sci_release_resource(inta->global_event, event->global_event);
> +}
> +
> +/**
> + * ti_sci_inta_irq_domain_free() - Free an IRQ from the IRQ domain
> + * @domain:	Domain to which the irqs belong
> + * @virq:	base linux virtual IRQ to be freed.
> + * @nr_irqs:	Number of continuous irqs to be freed
> + */
> +static void ti_sci_inta_irq_domain_free(struct irq_domain *domain,
> +					unsigned int virq, unsigned int nr_irqs)
> +{
> +	struct ti_sci_inta_irq_domain *inta = domain->host_data;
> +	struct ti_sci_inta_vint_desc *vint_desc;
> +	struct irq_data *data, *gic_data;
> +	int event_index;
> +
> +	data = irq_domain_get_irq_data(domain, virq);
> +	gic_data = irq_domain_get_irq_data(domain->parent->parent, virq);

That's absolutely horrid...

> +
> +	vint_desc = irq_data_get_irq_chip_data(data);
> +
> +	/* This is the last event in the vint */
> +	event_index = find_first_bit(vint_desc->event_map, MAX_EVENTS_PER_VINT);

What guarantees that you only have a single "event" left here?

> +	ti_sci_free_event_irq(inta, vint_desc, event_index,
> +			      gic_data->hwirq, data->hwirq);
> +	irq_domain_free_irqs_parent(domain, virq, 1);
> +	irq_domain_reset_irq_data(data);
> +	ti_sci_release_resource(inta->vint, data->hwirq);
> +	kfree(vint_desc->event_map);
> +	kfree(vint_desc);
> +}
> +
> +/**
> + * ti_sci_allocate_event_irq() - Allocate an event to a IA vint.
> + * @inta:	Pointer to Interrupt Aggregator IRQ domain
> + * @vint_desc:	Virtual interrupt descriptor to which the event gets attached.
> + * @src_id:	TISCI device id of the event source
> + * @src_index:	Event index with in the device.
> + * @dst_irq:	Destination host irq
> + * @vint:	Interrupt number within interrupt aggregator.
> + *
> + * Return 0 if all went ok else appropriate error value.
> + */
> +static int ti_sci_allocate_event_irq(struct ti_sci_inta_irq_domain *inta,
> +				     struct ti_sci_inta_vint_desc *vint_desc,
> +				     u16 src_id, u16 src_index, u16 dst_irq,
> +				     u16 vint)
> +{
> +	struct ti_sci_inta_event_desc *event;
> +	unsigned long flags;
> +	u32 free_bit;
> +	int err;
> +
> +	raw_spin_lock_irqsave(&vint_desc->lock, flags);
> +	free_bit = find_first_zero_bit(vint_desc->event_map,
> +				       MAX_EVENTS_PER_VINT);
> +	if (free_bit != MAX_EVENTS_PER_VINT)
> +		set_bit(free_bit, vint_desc->event_map);
> +	raw_spin_unlock_irqrestore(&vint_desc->lock, flags);

Why disabling the interrupts? Do you expect to take this lock
concurrently with an interrupt? Why isn't it enough to just have a mutex
instead?

> +
> +	if (free_bit >= MAX_EVENTS_PER_VINT)
> +		return -ENODEV;
> +
> +	event = &vint_desc->events[free_bit];
> +
> +	event->src_id = src_id;
> +	event->src_index = src_index;
> +	event->global_event = ti_sci_get_free_resource(inta->global_event);

Reading patch #5, shouldn't you at least test for the validity of what
this function returns?

> +
> +	err = inta->sci->ops.rm_irq_ops.set_event_irq(inta->sci,
> +						      src_id, src_index,
> +						      inta->dst_id,
> +						      dst_irq,
> +						      inta->ia_id,
> +						      vint,
> +						      event->global_event,
> +						      free_bit);
> +	if (err) {
> +		pr_err("%s: Event allocation failed from src = %d, index = %d, to dst = %d,irq = %d,via ia_id = %d, vint = %d,global event = %d, status_bit = %d\n",
> +		       __func__, src_id, src_index, inta->dst_id, dst_irq,
> +		       inta->ia_id, vint, event->global_event, free_bit);
> +		return err;
> +	}
> +
> +	return 0;
> +}
> +
> +/**
> + * alloc_parent_irq() - Allocate parent irq to Interrupt aggregator
> + * @domain:	IRQ domain corresponding to Interrupt Aggregator
> + * @virq:	Linux virtual IRQ number
> + * @src_id:	TISCI device id of the event source
> + * @src_index:	Event index with in the device.
> + * @vint:	Virtual interrupt number within IA
> + * @flags:	Corresponding IRQ flags
> + *
> + * Return pointer to vint descriptor if all went well else corresponding
> + * error pointer.
> + */
> +static struct ti_sci_inta_vint_desc *alloc_parent_irq(struct irq_domain *domain,

Please rename this function to something less ambiguous (you've prefixed
all functions so far, why not this one?).

> +						      unsigned int virq,
> +						      u32 src_id, u32 src_index,
> +						      u32 vint, u32 flags)
> +{
> +	struct ti_sci_inta_irq_domain *inta = domain->host_data;
> +	struct ti_sci_inta_vint_desc *vint_desc;
> +	struct irq_data *gic_data;
> +	struct irq_fwspec fwspec;
> +	int err;
> +
> +	if (!irq_domain_get_of_node(domain->parent))
> +		return ERR_PTR(-EINVAL);
> +
> +	vint_desc = kzalloc(sizeof(*vint_desc), GFP_KERNEL);
> +	if (!vint_desc)
> +		return ERR_PTR(-ENOMEM);
> +
> +	vint_desc->event_map = kcalloc(BITS_TO_LONGS(MAX_EVENTS_PER_VINT),
> +				       sizeof(*vint_desc->event_map),
> +				       GFP_KERNEL);
> +	if (!vint_desc->event_map) {
> +		kfree(vint_desc);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	fwspec.fwnode = domain->parent->fwnode;
> +	fwspec.param_count = 3;
> +	/* Interrupt parent is Interrupt Router */
> +	fwspec.param[0] = inta->ia_id;
> +	fwspec.param[1] = vint;
> +	fwspec.param[2] = flags | TI_SCI_EVENT_IRQ;

Why isn't that flag an additional parameter instead of mixing stuff
coming from DT and things that are purely internal?

> +
> +	err = irq_domain_alloc_irqs_parent(domain, virq, 1, &fwspec);
> +	if (err)
> +		goto err_irqs;
> +
> +	gic_data = irq_domain_get_irq_data(domain->parent->parent, virq);
> +
> +	raw_spin_lock_init(&vint_desc->lock);
> +
> +	err = ti_sci_allocate_event_irq(inta, vint_desc, src_id, src_index,
> +					gic_data->hwirq, vint);
> +	if (err)
> +		goto err_events;
> +
> +	return vint_desc;
> +
> +err_events:
> +	irq_domain_free_irqs_parent(domain, virq, 1);
> +err_irqs:
> +	ti_sci_release_resource(inta->vint, vint);
> +	kfree(vint_desc);
> +	return ERR_PTR(err);
> +}
> +
> +/**
> + * ti_sci_inta_irq_domain_alloc() - Allocate Interrupt aggregator IRQs
> + * @domain:	Point to the interrupt aggregator IRQ domain
> + * @virq:	Corresponding Linux virtual IRQ number
> + * @nr_irqs:	Continuous irqs to be allocated
> + * @data:	Pointer to firmware specifier
> + *
> + * Return 0 if all went well else appropriate error value.
> + */
> +static int ti_sci_inta_irq_domain_alloc(struct irq_domain *domain,
> +					unsigned int virq, unsigned int nr_irqs,
> +					void *data)
> +{
> +	struct ti_sci_inta_vint_desc *vint_desc;
> +	struct irq_fwspec *fwspec = data;
> +	int err;
> +
> +	vint_desc = alloc_parent_irq(domain, virq, fwspec->param[0],
> +				     fwspec->param[1], fwspec->param[2],
> +				     fwspec->param[3]);

Frankly, what is the point of doing that? Why don't you simply pass the
fwspec?

> +	if (IS_ERR(vint_desc))
> +		return PTR_ERR(vint_desc);
> +
> +	err = irq_domain_set_hwirq_and_chip(domain, virq, fwspec->param[2],
> +					    &ti_sci_inta_irq_chip, vint_desc);
> +
> +	return err;
> +}
> +
> +static const struct irq_domain_ops ti_sci_inta_irq_domain_ops = {
> +	.alloc		= ti_sci_inta_irq_domain_alloc,
> +	.free		= ti_sci_inta_irq_domain_free,
> +	.translate	= ti_sci_inta_irq_domain_translate,
> +};
> +
> +static int ti_sci_inta_irq_domain_probe(struct platform_device *pdev)
> +{
> +	struct irq_domain *parent_domain, *domain;
> +	struct ti_sci_inta_irq_domain *inta;
> +	struct device_node *parent_node;
> +	struct device *dev = &pdev->dev;
> +	struct resource *res;
> +	int ret;
> +
> +	parent_node = of_irq_find_parent(dev_of_node(dev));
> +	if (!parent_node) {
> +		dev_err(dev, "Failed to get IRQ parent node\n");
> +		return -ENODEV;
> +	}
> +
> +	parent_domain = irq_find_host(parent_node);
> +	if (!parent_domain)
> +		return -EPROBE_DEFER;
> +
> +	inta = devm_kzalloc(dev, sizeof(*inta), GFP_KERNEL);
> +	if (!inta)
> +		return -ENOMEM;
> +
> +	inta->sci = devm_ti_sci_get_by_phandle(dev, "ti,sci");
> +	if (IS_ERR(inta->sci)) {
> +		ret = PTR_ERR(inta->sci);
> +		if (ret != -EPROBE_DEFER)
> +			dev_err(dev, "ti,sci read fail %d\n", ret);
> +		inta->sci = NULL;
> +		return ret;
> +	}
> +
> +	ret = of_property_read_u32(dev->of_node, "ti,sci-dev-id",
> +				   (u32 *)&inta->ia_id);
> +	if (ret) {
> +		dev_err(dev, "missing 'ti,sci-dev-id' property\n");
> +		return -EINVAL;
> +	}
> +
> +	inta->vint = devm_ti_sci_get_of_resource(inta->sci, dev,
> +						 inta->ia_id,
> +						 "ti,sci-rm-range-vint");
> +	if (IS_ERR(inta->vint)) {
> +		dev_err(dev, "VINT resource allocation failed\n");
> +		return PTR_ERR(inta->vint);
> +	}
> +
> +	inta->global_event =
> +		devm_ti_sci_get_of_resource(inta->sci, dev,
> +					    inta->ia_id,
> +					    "ti,sci-rm-range-global-event");
> +	if (IS_ERR(inta->global_event)) {
> +		dev_err(dev, "Global event resource allocation failed\n");
> +		return PTR_ERR(inta->global_event);
> +	}
> +
> +	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> +	inta->base = devm_ioremap_resource(dev, res);
> +	if (IS_ERR(inta->base))
> +		return -ENODEV;
> +
> +	ret = of_property_read_u32(parent_node, "ti,sci-dst-id",
> +				   (u32 *)&inta->dst_id);
> +
> +	domain = irq_domain_add_hierarchy(parent_domain, 0, 0, dev_of_node(dev),
> +					  &ti_sci_inta_irq_domain_ops, inta);
> +	if (!domain) {
> +		dev_err(dev, "Failed to allocate IRQ domain\n");
> +		return -ENOMEM;
> +	}
> +
> +	return 0;
> +}
> +
> +/**
> + * ti_sci_inta_register_event() - Register a event to an interrupt aggregator
> + * @dev:	Device pointer to source generating the event
> + * @src_id:	TISCI device ID of the event source
> + * @src_index:	Event source index within the device.
> + * @virq:	Linux Virtual IRQ number
> + * @flags:	Corresponding IRQ flags
> + * @ack_needed:	If explicit clearing of event is required.
> + *
> + * Creates a new irq and attaches to IA domain if virq is not specified
> + * else attaches the event to vint corresponding to virq.
> + * When using TISCI within the client drivers, source indexes are always
> + * generated dynamically and cannot be represented in DT. So client
> + * drivers should call this API instead of platform_get_irq().

NAK. Either this fits in the standard model, or we adapt the standard
model to catter for your particular use case. But we don't define a new,
TI specific API.

I have a hunch that if the IDs are generated dynamically, then the model
we use for MSIs would fit this thing. I also want to understand what
this event is, and how drivers get notified that such an event has fired.

So please explain what this is all about, and we'll work out something.
In the meantime, I'll stop here for that particular patch.

Thanks,

	M.
Peter Ujfalusi Oct. 22, 2018, 10:42 a.m. UTC | #2
Lokesh,

On 2018-10-18 18:40, Lokesh Vutla wrote:
> Texas Instruments' K3 generation SoCs has an IP Interrupt Aggregator
> which is an interrupt controller that does the following:
> - Converts events to interrupts that can be understood by
>   an interrupt router.
> - Allows for multiplexing of events to interrupts.
> - Allows for grouping of multiple events to a single interrupt.
> 
> Configuration of the interrupt aggregator registers can only be done by
> a system co-processor and the driver needs to send a message to this
> co processor over TISCI protocol.
> 
> Add support for Interrupt Aggregator driver over TISCI protocol.

Have you compiled this?

...

> diff --git a/drivers/irqchip/irq-ti-sci-inta.c b/drivers/irqchip/irq-ti-sci-inta.c
> new file mode 100644
> index 000000000000..ef0a2e8b782c
> --- /dev/null
> +++ b/drivers/irqchip/irq-ti-sci-inta.c

...

> +/**
> + * ti_sci_inta_register_event() - Register a event to an interrupt aggregator
> + * @dev:	Device pointer to source generating the event
> + * @src_id:	TISCI device ID of the event source
> + * @src_index:	Event source index within the device.
> + * @virq:	Linux Virtual IRQ number
> + * @flags:	Corresponding IRQ flags
> + * @ack_needed:	If explicit clearing of event is required.
> + *
> + * Creates a new irq and attaches to IA domain if virq is not specified
> + * else attaches the event to vint corresponding to virq.
> + * When using TISCI within the client drivers, source indexes are always
> + * generated dynamically and cannot be represented in DT. So client
> + * drivers should call this API instead of platform_get_irq().
> + *
> + * Return virq if all went well else appropriate error value.
> + */
> +int ti_sci_inta_register_event(struct device *dev, u16 src_id, u16 src_index,
> +			       unsigned int virq, u32 flags, bool ack_needed)
> +{

...

> diff --git a/include/linux/irqchip/irq-ti-sci-inta.h b/include/linux/irqchip/irq-ti-sci-inta.h
> new file mode 100644
> index 000000000000..c078234fda3f
> --- /dev/null
> +++ b/include/linux/irqchip/irq-ti-sci-inta.h
> @@ -0,0 +1,35 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Texas Instruments' System Control Interface (TI-SCI) irqchip
> + *
> + * Copyright (C) 2018 Texas Instruments Incorporated - http://www.ti.com/
> + *	Lokesh Vutla <lokeshvutla@ti.com>
> + */
> +
> +#ifndef __INCLUDE_LINUX_IRQCHIP_TI_SCI_INTA_H
> +#define __INCLUDE_LINUX_IRQCHIP_TI_SCI_INTA_H
> +
> +#if IS_ENABLED(CONFIG_TI_SCI_INTA_IRQCHIP)
> +int ti_sci_inta_register_event(struct device *dev, u16 src_id, u16 src_index,
> +			       unsigned int virq, u32 flags);

You are missing the ack_needed

> +int ti_sci_inta_unregister_event(struct device *dev, u16 src_id, u16 src_index,
> +				 unsigned int virq);
> +
> +#else /* CONFIG_TI_SCI_INTA_IRQCHIP */
> +
> +static inline int ti_sci_inta_register_event(struct device *dev, u16 src_id,
> +					     u16 src_index, unsigned int virq,
> +					     u32 flags)

Here as well.

> +{
> +	return -EINVAL;
> +}
> +
> +static inline int ti_sci_inta_unregister_event(struct device *dev, u16 src_id,
> +					       u16 src_index, unsigned int virq)
> +{
> +	return -EINVAL;
> +}
> +
> +#endif /* CONFIG_TI_SCI_INTA_IRQCHIP */
> +
> +#endif /* __INCLUDE_LINUX_IRQCHIP_TI_SCI_INTA_H */
> 

- Péter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
Peter Ujfalusi Oct. 22, 2018, 10:43 a.m. UTC | #3
On 2018-10-22 13:42, Peter Ujfalusi wrote:
> Lokesh,
> 
> On 2018-10-18 18:40, Lokesh Vutla wrote:
>> Texas Instruments' K3 generation SoCs has an IP Interrupt Aggregator
>> which is an interrupt controller that does the following:
>> - Converts events to interrupts that can be understood by
>>   an interrupt router.
>> - Allows for multiplexing of events to interrupts.
>> - Allows for grouping of multiple events to a single interrupt.
>>
>> Configuration of the interrupt aggregator registers can only be done by
>> a system co-processor and the driver needs to send a message to this
>> co processor over TISCI protocol.
>>
>> Add support for Interrupt Aggregator driver over TISCI protocol.
> 
> Have you compiled this?
> 
> ...
> 
>> diff --git a/drivers/irqchip/irq-ti-sci-inta.c b/drivers/irqchip/irq-ti-sci-inta.c
>> new file mode 100644
>> index 000000000000..ef0a2e8b782c
>> --- /dev/null
>> +++ b/drivers/irqchip/irq-ti-sci-inta.c
> 
> ...
> 
>> +/**
>> + * ti_sci_inta_register_event() - Register a event to an interrupt aggregator
>> + * @dev:	Device pointer to source generating the event
>> + * @src_id:	TISCI device ID of the event source
>> + * @src_index:	Event source index within the device.
>> + * @virq:	Linux Virtual IRQ number
>> + * @flags:	Corresponding IRQ flags
>> + * @ack_needed:	If explicit clearing of event is required.
>> + *
>> + * Creates a new irq and attaches to IA domain if virq is not specified
>> + * else attaches the event to vint corresponding to virq.
>> + * When using TISCI within the client drivers, source indexes are always
>> + * generated dynamically and cannot be represented in DT. So client
>> + * drivers should call this API instead of platform_get_irq().
>> + *
>> + * Return virq if all went well else appropriate error value.
>> + */
>> +int ti_sci_inta_register_event(struct device *dev, u16 src_id, u16 src_index,
>> +			       unsigned int virq, u32 flags, bool ack_needed)

And can you swap the flags and ack_needed?

>> +{
> 
> ...
> 
>> diff --git a/include/linux/irqchip/irq-ti-sci-inta.h b/include/linux/irqchip/irq-ti-sci-inta.h
>> new file mode 100644
>> index 000000000000..c078234fda3f
>> --- /dev/null
>> +++ b/include/linux/irqchip/irq-ti-sci-inta.h
>> @@ -0,0 +1,35 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +/*
>> + * Texas Instruments' System Control Interface (TI-SCI) irqchip
>> + *
>> + * Copyright (C) 2018 Texas Instruments Incorporated - http://www.ti.com/
>> + *	Lokesh Vutla <lokeshvutla@ti.com>
>> + */
>> +
>> +#ifndef __INCLUDE_LINUX_IRQCHIP_TI_SCI_INTA_H
>> +#define __INCLUDE_LINUX_IRQCHIP_TI_SCI_INTA_H
>> +
>> +#if IS_ENABLED(CONFIG_TI_SCI_INTA_IRQCHIP)
>> +int ti_sci_inta_register_event(struct device *dev, u16 src_id, u16 src_index,
>> +			       unsigned int virq, u32 flags);
> 
> You are missing the ack_needed
> 
>> +int ti_sci_inta_unregister_event(struct device *dev, u16 src_id, u16 src_index,
>> +				 unsigned int virq);
>> +
>> +#else /* CONFIG_TI_SCI_INTA_IRQCHIP */
>> +
>> +static inline int ti_sci_inta_register_event(struct device *dev, u16 src_id,
>> +					     u16 src_index, unsigned int virq,
>> +					     u32 flags)
> 
> Here as well.
> 
>> +{
>> +	return -EINVAL;
>> +}
>> +
>> +static inline int ti_sci_inta_unregister_event(struct device *dev, u16 src_id,
>> +					       u16 src_index, unsigned int virq)
>> +{
>> +	return -EINVAL;
>> +}
>> +
>> +#endif /* CONFIG_TI_SCI_INTA_IRQCHIP */
>> +
>> +#endif /* __INCLUDE_LINUX_IRQCHIP_TI_SCI_INTA_H */
>>
> 
> - Péter
> 
> Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
> Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

- Péter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
Lokesh Vutla Oct. 22, 2018, 2:35 p.m. UTC | #4
Hi Marc,

On Friday 19 October 2018 08:52 PM, Marc Zyngier wrote:
> Hi Lokesh,
> 
> On 18/10/18 16:40, Lokesh Vutla wrote:
>> Texas Instruments' K3 generation SoCs has an IP Interrupt Aggregator
>> which is an interrupt controller that does the following:
>> - Converts events to interrupts that can be understood by
>>    an interrupt router.
>> - Allows for multiplexing of events to interrupts.
>> - Allows for grouping of multiple events to a single interrupt.
> 
> Aren't the last two points the same thing? Also, can you please define
> what an "event" is? What is its semantic? If they look like interrupts,
> can we just name them as such?

Event is actually a message sent by a master via an Event transport lane. Based 
on the id within the message, each message is directed to corresponding 
Interrupt Aggregator(IA). In turn IA raises a corresponding interrupt as 
configured for this event.

> 
>>
>> Configuration of the interrupt aggregator registers can only be done by
>> a system co-processor and the driver needs to send a message to this
>> co processor over TISCI protocol.
>>
>> Add support for Interrupt Aggregator driver over TISCI protocol.
>>
>> Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
>> Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
>> ---
>> Changes since v1:
>> - New patch
>>
>>   MAINTAINERS                             |   1 +
>>   drivers/irqchip/Kconfig                 |  11 +
>>   drivers/irqchip/Makefile                |   1 +
>>   drivers/irqchip/irq-ti-sci-inta.c       | 613 ++++++++++++++++++++++++
>>   include/linux/irqchip/irq-ti-sci-inta.h |  35 ++
>>   5 files changed, 661 insertions(+)
>>   create mode 100644 drivers/irqchip/irq-ti-sci-inta.c
>>   create mode 100644 include/linux/irqchip/irq-ti-sci-inta.h
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 8cf1a6b73e6c..35c790ab0ae7 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -14689,6 +14689,7 @@ F:	drivers/reset/reset-ti-sci.c
>>   F:	Documentation/devicetree/bindings/interrupt-controller/ti,sci-intr.txt
>>   F:	Documentation/devicetree/bindings/interrupt-controller/ti,sci-inta.txt
>>   F:	drivers/irqchip/irq-ti-sci-intr.c
>> +F:	drivers/irqchip/irq-ti-sci-inta.c
>>   
>>   THANKO'S RAREMONO AM/FM/SW RADIO RECEIVER USB DRIVER
>>   M:	Hans Verkuil <hverkuil@xs4all.nl>
>> diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig
>> index f6620a6bb872..895f6b47dc5b 100644
>> --- a/drivers/irqchip/Kconfig
>> +++ b/drivers/irqchip/Kconfig
>> @@ -385,6 +385,17 @@ config TI_SCI_INTR_IRQCHIP
>>   	  If you wish to use interrupt router irq resources managed by the
>>   	  TI System Controller, say Y here. Otherwise, say N.
>>   
>> +config TI_SCI_INTA_IRQCHIP
>> +	bool
>> +	depends on TI_SCI_PROTOCOL && ARCH_K3
>> +	select IRQ_DOMAIN
>> +	select IRQ_DOMAIN_HIERARCHY
>> +	help
>> +	  This enables the irqchip driver support for K3 Interrupt aggregator
>> +	  over TI System Control Interface available on some new TI's SoCs.
>> +	  If you wish to use interrupt aggregator irq resources managed by the
>> +	  TI System Controller, say Y here. Otherwise, say N.
>> +
>>   endmenu
>>   
>>   config SIFIVE_PLIC
>> diff --git a/drivers/irqchip/Makefile b/drivers/irqchip/Makefile
>> index 44bf65606d60..aede4c1cc4a6 100644
>> --- a/drivers/irqchip/Makefile
>> +++ b/drivers/irqchip/Makefile
>> @@ -90,3 +90,4 @@ obj-$(CONFIG_NDS32)			+= irq-ativic32.o
>>   obj-$(CONFIG_QCOM_PDC)			+= qcom-pdc.o
>>   obj-$(CONFIG_SIFIVE_PLIC)		+= irq-sifive-plic.o
>>   obj-$(CONFIG_TI_SCI_INTR_IRQCHIP)	+= irq-ti-sci-intr.o
>> +obj-$(CONFIG_TI_SCI_INTA_IRQCHIP)	+= irq-ti-sci-inta.o
>> diff --git a/drivers/irqchip/irq-ti-sci-inta.c b/drivers/irqchip/irq-ti-sci-inta.c
>> new file mode 100644
>> index 000000000000..ef0a2e8b782c
>> --- /dev/null
>> +++ b/drivers/irqchip/irq-ti-sci-inta.c
>> @@ -0,0 +1,613 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Texas Instruments' K3 Interrupt Aggregator irqchip driver
>> + *
>> + * Copyright (C) 2018 Texas Instruments Incorporated - http://www.ti.com/
>> + *	Lokesh Vutla <lokeshvutla@ti.com>
>> + */
>> +
>> +#include <linux/err.h>
>> +#include <linux/io.h>
>> +#include <linux/irqchip.h>
>> +#include <linux/of_platform.h>
>> +#include <linux/of_address.h>
>> +#include <linux/of_irq.h>
>> +#include <linux/module.h>
>> +#include <linux/moduleparam.h>
>> +#include <linux/irqdomain.h>
>> +#include <linux/soc/ti/ti_sci_protocol.h>
>> +
>> +#define MAX_EVENTS_PER_VINT	64
>> +#define TI_SCI_EVENT_IRQ	BIT(31)
>> +
>> +#define VINT_ENABLE_CLR_OFFSET	0x18
>> +
>> +/**
>> + * struct ti_sci_inta_irq_domain - Structure representing a TISCI based
>> + *				   Interrupt Aggregator IRQ domain.
>> + * @sci:	Pointer to TISCI handle
>> + * @vint:	TISCI resource pointer representing IA inerrupts.
>> + * @global_event:TISCI resource pointer representing global events.
>> + * @base:	Base address of the memory mapped IO registers
>> + * @ia_id:	TISCI device ID of this Interrupt Aggregator.
>> + * @dst_id:	TISCI device ID of the destination irq controller.
>> + */
>> +struct ti_sci_inta_irq_domain {
>> +	const struct ti_sci_handle *sci;
>> +	struct ti_sci_resource *vint;
>> +	struct ti_sci_resource *global_event;
>> +	void __iomem *base;
>> +	u16 ia_id;
>> +	u16 dst_id;
>> +};
>> +
>> +/**
>> + * struct ti_sci_inta_event_desc - Description of an event coming to
>> + *				   Interrupt Aggregator.
>> + * @global_event:	Global event number corresponding to this event
>> + * @src_id:		TISCI device ID of the event source
>> + * @src_index:		Event source index within the device.
>> + */
>> +struct ti_sci_inta_event_desc {
>> +	u16 global_event;
>> +	u16 src_id;
>> +	u16 src_index;
>> +};
>> +
>> +/**
>> + * struct ti_sci_inta_vint_desc - Description of a virtual interrupt coming out
>> + *				  of Interrupt Aggregator.
>> + * @lock:		lock to guard the event map
>> + * @event_map:		Bitmap to manage the allocation of events to vint.
>> + * @events:		Array of event descriptors assigned to this vint.
>> + * @ack_needed:		Event needs to be acked via INTA. This is used when
>> + *			HW generating events cannot clear the events by itself.
>> + *			Assuming that only events from the same hw block are
>> + *			grouped. So all the events attached to vint needs
>> + *			an ack or none needs an ack.
>> + */
>> +struct ti_sci_inta_vint_desc {
>> +	raw_spinlock_t lock;
>> +	unsigned long *event_map;
>> +	struct ti_sci_inta_event_desc events[MAX_EVENTS_PER_VINT];
>> +	bool ack_needed;
>> +};
>> +
>> +static void ti_sci_inta_irq_eoi(struct irq_data *data)
>> +{
>> +	struct ti_sci_inta_irq_domain *inta = data->domain->host_data;
>> +	struct ti_sci_inta_vint_desc *vint_desc;
>> +	u64 val;
>> +	int bit;
>> +
>> +	vint_desc = irq_data_get_irq_chip_data(data);
>> +	if (!vint_desc->ack_needed)
>> +		goto out;
>> +
>> +	for_each_set_bit(bit, vint_desc->event_map, MAX_EVENTS_PER_VINT) {
>> +		val = 1 << bit;
>> +		__raw_writeq(val, inta->base + data->hwirq * 0x1000 +
>> +			     VINT_ENABLE_CLR_OFFSET);
>> +	}
> 
> If you need an ack callback, why is this part of the eoi? We have
> interrupt flows that allow you to combine both, so why don't you use that?

Actually I started with ack_irq. But I did not see this callback being triggered 
when interrupt is raised. Then I was suggested to use irq_roi. Will see why 
ack_irq is not being triggered and  update it in next version.

> 
> Also, the __raw_writeq call is probably wrong, as it assumes that both
> the CPU and the INTA have the same endianness.

hmm.. May I know what is the right call to use here?

> 
>> +
>> +out:
>> +	irq_chip_eoi_parent(data);
>> +}
>> +
>> +static struct irq_chip ti_sci_inta_irq_chip = {
>> +	.name			= "INTA",
>> +	.irq_eoi		= ti_sci_inta_irq_eoi,
>> +	.irq_mask		= irq_chip_mask_parent,
>> +	.irq_unmask		= irq_chip_unmask_parent,
>> +	.irq_retrigger		= irq_chip_retrigger_hierarchy,
>> +	.irq_set_type		= irq_chip_set_type_parent,
>> +	.irq_set_affinity	= irq_chip_set_affinity_parent,
>> +};
>> +
>> +/**
>> + * ti_sci_inta_irq_domain_translate() - Retrieve hwirq and type from
>> + *					IRQ firmware specific handler.
>> + * @domain:	Pointer to IRQ domain
>> + * @fwspec:	Pointer to IRQ specific firmware structure
>> + * @hwirq:	IRQ number identified by hardware
>> + * @type:	IRQ type
>> + *
>> + * Return 0 if all went ok else appropriate error.
>> + */
>> +static int ti_sci_inta_irq_domain_translate(struct irq_domain *domain,
>> +					    struct irq_fwspec *fwspec,
>> +					    unsigned long *hwirq,
>> +					    unsigned int *type)
>> +{
>> +	if (is_of_node(fwspec->fwnode)) {
>> +		if (fwspec->param_count != 4)
>> +			return -EINVAL;
>> +
>> +		*hwirq = fwspec->param[2];
>> +		*type = fwspec->param[3] & IRQ_TYPE_SENSE_MASK;
>> +
>> +		return 0;
>> +	}
>> +
>> +	return -EINVAL;
>> +}
>> +
>> +/**
>> + * ti_sci_free_event_irq() - Free an event from vint
>> + * @inta:	Pointer to Interrupt Aggregator IRQ domain
>> + * @vint_desc:	Virtual interrupt descriptor containing the event.
>> + * @event_index:Event Index within the vint.
>> + * @dst_irq:	Destination host irq
>> + * @vint:	Interrupt number within interrupt aggregator.
>> + */
>> +static void ti_sci_free_event_irq(struct ti_sci_inta_irq_domain *inta,
>> +				  struct ti_sci_inta_vint_desc *vint_desc,
>> +				  u32 event_index, u16 dst_irq, u16 vint)
>> +{
>> +	struct ti_sci_inta_event_desc *event;
>> +	unsigned long flags;
>> +
>> +	if (event_index >= MAX_EVENTS_PER_VINT)
>> +		return;
> 
> How can this happen?
> 
>> +
>> +	event = &vint_desc->events[event_index];
>> +	inta->sci->ops.rm_irq_ops.free_event_irq(inta->sci,
>> +						 event->src_id,
>> +						 event->src_index,
>> +						 inta->dst_id,
>> +						 dst_irq,
>> +						 inta->ia_id, vint,
>> +						 event->global_event,
>> +						 event_index);
>> +
>> +	raw_spin_lock_irqsave(&vint_desc->lock, flags);
>> +	clear_bit(event_index, vint_desc->event_map);
>> +	raw_spin_unlock_irqrestore(&vint_desc->lock, flags);
> 
> clear_bit is atomic. Why do you need a spinlock?

will drop the spinlock guard here.

> 
>> +
>> +	ti_sci_release_resource(inta->global_event, event->global_event);
>> +}
>> +
>> +/**
>> + * ti_sci_inta_irq_domain_free() - Free an IRQ from the IRQ domain
>> + * @domain:	Domain to which the irqs belong
>> + * @virq:	base linux virtual IRQ to be freed.
>> + * @nr_irqs:	Number of continuous irqs to be freed
>> + */
>> +static void ti_sci_inta_irq_domain_free(struct irq_domain *domain,
>> +					unsigned int virq, unsigned int nr_irqs)
>> +{
>> +	struct ti_sci_inta_irq_domain *inta = domain->host_data;
>> +	struct ti_sci_inta_vint_desc *vint_desc;
>> +	struct irq_data *data, *gic_data;
>> +	int event_index;
>> +
>> +	data = irq_domain_get_irq_data(domain, virq);
>> +	gic_data = irq_domain_get_irq_data(domain->parent->parent, virq);
> 
> That's absolutely horrid...

I agree. But I need to get GIC irq for sending TISCI message. Can you suggest a 
better way of doing it?

> 
>> +
>> +	vint_desc = irq_data_get_irq_chip_data(data);
>> +
>> +	/* This is the last event in the vint */
>> +	event_index = find_first_bit(vint_desc->event_map, MAX_EVENTS_PER_VINT);
> 
> What guarantees that you only have a single "event" left here?

As per the current implementation, ti_sci_inta_irq_domain_free() gets called 
only by irq_dispose_mapping. irq_dispose_mapping() will be called from 
ti_sci_inta_unregister_event() only if it is the last event attached to vint.

> 
>> +	ti_sci_free_event_irq(inta, vint_desc, event_index,
>> +			      gic_data->hwirq, data->hwirq);
>> +	irq_domain_free_irqs_parent(domain, virq, 1);
>> +	irq_domain_reset_irq_data(data);
>> +	ti_sci_release_resource(inta->vint, data->hwirq);
>> +	kfree(vint_desc->event_map);
>> +	kfree(vint_desc);
>> +}
>> +
>> +/**
>> + * ti_sci_allocate_event_irq() - Allocate an event to a IA vint.
>> + * @inta:	Pointer to Interrupt Aggregator IRQ domain
>> + * @vint_desc:	Virtual interrupt descriptor to which the event gets attached.
>> + * @src_id:	TISCI device id of the event source
>> + * @src_index:	Event index with in the device.
>> + * @dst_irq:	Destination host irq
>> + * @vint:	Interrupt number within interrupt aggregator.
>> + *
>> + * Return 0 if all went ok else appropriate error value.
>> + */
>> +static int ti_sci_allocate_event_irq(struct ti_sci_inta_irq_domain *inta,
>> +				     struct ti_sci_inta_vint_desc *vint_desc,
>> +				     u16 src_id, u16 src_index, u16 dst_irq,
>> +				     u16 vint)
>> +{
>> +	struct ti_sci_inta_event_desc *event;
>> +	unsigned long flags;
>> +	u32 free_bit;
>> +	int err;
>> +
>> +	raw_spin_lock_irqsave(&vint_desc->lock, flags);
>> +	free_bit = find_first_zero_bit(vint_desc->event_map,
>> +				       MAX_EVENTS_PER_VINT);
>> +	if (free_bit != MAX_EVENTS_PER_VINT)
>> +		set_bit(free_bit, vint_desc->event_map);
>> +	raw_spin_unlock_irqrestore(&vint_desc->lock, flags);
> 
> Why disabling the interrupts? Do you expect to take this lock
> concurrently with an interrupt? Why isn't it enough to just have a mutex
> instead?

I have thought about this while coding. We are attaching multiple events to the 
same interrupt. Technically the events from different IPs can be attached to the 
same interrupt or events from the same IP can be registered at different times. 
So I thought it is possible that when an event is being allocated to an 
interrupt, an event can be raised that belongs to the same interrupt.

> 
>> +
>> +	if (free_bit >= MAX_EVENTS_PER_VINT)
>> +		return -ENODEV;
>> +
>> +	event = &vint_desc->events[free_bit];
>> +
>> +	event->src_id = src_id;
>> +	event->src_index = src_index;
>> +	event->global_event = ti_sci_get_free_resource(inta->global_event);
> 
> Reading patch #5, shouldn't you at least test for the validity of what
> this function returns?

the below call will anyway fail and report the invlaid global_event. But you are 
right, will check for validity of global_event in my next version.

> 
>> +
>> +	err = inta->sci->ops.rm_irq_ops.set_event_irq(inta->sci,
>> +						      src_id, src_index,
>> +						      inta->dst_id,
>> +						      dst_irq,
>> +						      inta->ia_id,
>> +						      vint,
>> +						      event->global_event,
>> +						      free_bit);
>> +	if (err) {
>> +		pr_err("%s: Event allocation failed from src = %d, index = %d, to dst = %d,irq = %d,via ia_id = %d, vint = %d,global event = %d, status_bit = %d\n",
>> +		       __func__, src_id, src_index, inta->dst_id, dst_irq,
>> +		       inta->ia_id, vint, event->global_event, free_bit);
>> +		return err;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +/**
>> + * alloc_parent_irq() - Allocate parent irq to Interrupt aggregator
>> + * @domain:	IRQ domain corresponding to Interrupt Aggregator
>> + * @virq:	Linux virtual IRQ number
>> + * @src_id:	TISCI device id of the event source
>> + * @src_index:	Event index with in the device.
>> + * @vint:	Virtual interrupt number within IA
>> + * @flags:	Corresponding IRQ flags
>> + *
>> + * Return pointer to vint descriptor if all went well else corresponding
>> + * error pointer.
>> + */
>> +static struct ti_sci_inta_vint_desc *alloc_parent_irq(struct irq_domain *domain,
> 
> Please rename this function to something less ambiguous (you've prefixed
> all functions so far, why not this one?).

Will fix it in my next version.

> 
>> +						      unsigned int virq,
>> +						      u32 src_id, u32 src_index,
>> +						      u32 vint, u32 flags)
>> +{
>> +	struct ti_sci_inta_irq_domain *inta = domain->host_data;
>> +	struct ti_sci_inta_vint_desc *vint_desc;
>> +	struct irq_data *gic_data;
>> +	struct irq_fwspec fwspec;
>> +	int err;
>> +
>> +	if (!irq_domain_get_of_node(domain->parent))
>> +		return ERR_PTR(-EINVAL);
>> +
>> +	vint_desc = kzalloc(sizeof(*vint_desc), GFP_KERNEL);
>> +	if (!vint_desc)
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	vint_desc->event_map = kcalloc(BITS_TO_LONGS(MAX_EVENTS_PER_VINT),
>> +				       sizeof(*vint_desc->event_map),
>> +				       GFP_KERNEL);
>> +	if (!vint_desc->event_map) {
>> +		kfree(vint_desc);
>> +		return ERR_PTR(-ENOMEM);
>> +	}
>> +
>> +	fwspec.fwnode = domain->parent->fwnode;
>> +	fwspec.param_count = 3;
>> +	/* Interrupt parent is Interrupt Router */
>> +	fwspec.param[0] = inta->ia_id;
>> +	fwspec.param[1] = vint;
>> +	fwspec.param[2] = flags | TI_SCI_EVENT_IRQ;
> 
> Why isn't that flag an additional parameter instead of mixing stuff
> coming from DT and things that are purely internal?

Since this is a single bit I tried to optimize the number of fields passing to 
the parent IRQ. Will update the INTR driver to take 4 fields in next version.

> 
>> +
>> +	err = irq_domain_alloc_irqs_parent(domain, virq, 1, &fwspec);
>> +	if (err)
>> +		goto err_irqs;
>> +
>> +	gic_data = irq_domain_get_irq_data(domain->parent->parent, virq);
>> +
>> +	raw_spin_lock_init(&vint_desc->lock);
>> +
>> +	err = ti_sci_allocate_event_irq(inta, vint_desc, src_id, src_index,
>> +					gic_data->hwirq, vint);
>> +	if (err)
>> +		goto err_events;
>> +
>> +	return vint_desc;
>> +
>> +err_events:
>> +	irq_domain_free_irqs_parent(domain, virq, 1);
>> +err_irqs:
>> +	ti_sci_release_resource(inta->vint, vint);
>> +	kfree(vint_desc);
>> +	return ERR_PTR(err);
>> +}
>> +
>> +/**
>> + * ti_sci_inta_irq_domain_alloc() - Allocate Interrupt aggregator IRQs
>> + * @domain:	Point to the interrupt aggregator IRQ domain
>> + * @virq:	Corresponding Linux virtual IRQ number
>> + * @nr_irqs:	Continuous irqs to be allocated
>> + * @data:	Pointer to firmware specifier
>> + *
>> + * Return 0 if all went well else appropriate error value.
>> + */
>> +static int ti_sci_inta_irq_domain_alloc(struct irq_domain *domain,
>> +					unsigned int virq, unsigned int nr_irqs,
>> +					void *data)
>> +{
>> +	struct ti_sci_inta_vint_desc *vint_desc;
>> +	struct irq_fwspec *fwspec = data;
>> +	int err;
>> +
>> +	vint_desc = alloc_parent_irq(domain, virq, fwspec->param[0],
>> +				     fwspec->param[1], fwspec->param[2],
>> +				     fwspec->param[3]);
> 
> Frankly, what is the point of doing that? Why don't you simply pass the
> fwspec?

okay, will fix in next version.

> 
>> +	if (IS_ERR(vint_desc))
>> +		return PTR_ERR(vint_desc);
>> +
>> +	err = irq_domain_set_hwirq_and_chip(domain, virq, fwspec->param[2],
>> +					    &ti_sci_inta_irq_chip, vint_desc);
>> +
>> +	return err;
>> +}
>> +
>> +static const struct irq_domain_ops ti_sci_inta_irq_domain_ops = {
>> +	.alloc		= ti_sci_inta_irq_domain_alloc,
>> +	.free		= ti_sci_inta_irq_domain_free,
>> +	.translate	= ti_sci_inta_irq_domain_translate,
>> +};
>> +
>> +static int ti_sci_inta_irq_domain_probe(struct platform_device *pdev)
>> +{
>> +	struct irq_domain *parent_domain, *domain;
>> +	struct ti_sci_inta_irq_domain *inta;
>> +	struct device_node *parent_node;
>> +	struct device *dev = &pdev->dev;
>> +	struct resource *res;
>> +	int ret;
>> +
>> +	parent_node = of_irq_find_parent(dev_of_node(dev));
>> +	if (!parent_node) {
>> +		dev_err(dev, "Failed to get IRQ parent node\n");
>> +		return -ENODEV;
>> +	}
>> +
>> +	parent_domain = irq_find_host(parent_node);
>> +	if (!parent_domain)
>> +		return -EPROBE_DEFER;
>> +
>> +	inta = devm_kzalloc(dev, sizeof(*inta), GFP_KERNEL);
>> +	if (!inta)
>> +		return -ENOMEM;
>> +
>> +	inta->sci = devm_ti_sci_get_by_phandle(dev, "ti,sci");
>> +	if (IS_ERR(inta->sci)) {
>> +		ret = PTR_ERR(inta->sci);
>> +		if (ret != -EPROBE_DEFER)
>> +			dev_err(dev, "ti,sci read fail %d\n", ret);
>> +		inta->sci = NULL;
>> +		return ret;
>> +	}
>> +
>> +	ret = of_property_read_u32(dev->of_node, "ti,sci-dev-id",
>> +				   (u32 *)&inta->ia_id);
>> +	if (ret) {
>> +		dev_err(dev, "missing 'ti,sci-dev-id' property\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	inta->vint = devm_ti_sci_get_of_resource(inta->sci, dev,
>> +						 inta->ia_id,
>> +						 "ti,sci-rm-range-vint");
>> +	if (IS_ERR(inta->vint)) {
>> +		dev_err(dev, "VINT resource allocation failed\n");
>> +		return PTR_ERR(inta->vint);
>> +	}
>> +
>> +	inta->global_event =
>> +		devm_ti_sci_get_of_resource(inta->sci, dev,
>> +					    inta->ia_id,
>> +					    "ti,sci-rm-range-global-event");
>> +	if (IS_ERR(inta->global_event)) {
>> +		dev_err(dev, "Global event resource allocation failed\n");
>> +		return PTR_ERR(inta->global_event);
>> +	}
>> +
>> +	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>> +	inta->base = devm_ioremap_resource(dev, res);
>> +	if (IS_ERR(inta->base))
>> +		return -ENODEV;
>> +
>> +	ret = of_property_read_u32(parent_node, "ti,sci-dst-id",
>> +				   (u32 *)&inta->dst_id);
>> +
>> +	domain = irq_domain_add_hierarchy(parent_domain, 0, 0, dev_of_node(dev),
>> +					  &ti_sci_inta_irq_domain_ops, inta);
>> +	if (!domain) {
>> +		dev_err(dev, "Failed to allocate IRQ domain\n");
>> +		return -ENOMEM;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +/**
>> + * ti_sci_inta_register_event() - Register a event to an interrupt aggregator
>> + * @dev:	Device pointer to source generating the event
>> + * @src_id:	TISCI device ID of the event source
>> + * @src_index:	Event source index within the device.
>> + * @virq:	Linux Virtual IRQ number
>> + * @flags:	Corresponding IRQ flags
>> + * @ack_needed:	If explicit clearing of event is required.
>> + *
>> + * Creates a new irq and attaches to IA domain if virq is not specified
>> + * else attaches the event to vint corresponding to virq.
>> + * When using TISCI within the client drivers, source indexes are always
>> + * generated dynamically and cannot be represented in DT. So client
>> + * drivers should call this API instead of platform_get_irq().
> 
> NAK. Either this fits in the standard model, or we adapt the standard
> model to catter for your particular use case. But we don't define a new,
> TI specific API.
> 
> I have a hunch that if the IDs are generated dynamically, then the model
> we use for MSIs would fit this thing. I also want to understand what

hmm..I haven't thought about using MSI. Will try to explore it. But the "struct 
msi_msg" is not applicable in this case as device does not write to a specific 
location.

> this event is, and how drivers get notified that such an event has fired.

As said above, Event is a message being sent by a device using a hardware 
protocol. This message is sent over an Event Transport Lane(ETL) that 
understands this protocol. Based on the message ETL re directs the message to a 
specificed target(In our case it is interrupt Aggregator).

 From a client drivers(that generates this event) prespective, the following 
needs to be done:
- Get an index that is free and allocate it to a particular task.
- Request INTA driver to assign an irq for this index.
- do a request_irq baseed on the return value from the above step.

In case of grouping events, the client drivers has its own mechanism to identify 
the index that caused an interrupt(at least that is the case for the existing user).

More details can be found in TRM section 10.2.7 Interrupt Aggregator (INTR_AGGR) 
chapter[1]

[1] http://www.ti.com/lit/pdf/spruid7


> 
> So please explain what this is all about, and we'll work out something.
> In the meantime, I'll stop here for that particular patch.
> 

Thanks a lot for the detailed review.

Regards,
Lokesh

> Thanks,
> 
> 	M.
>
Santosh Shilimkar Oct. 22, 2018, 8:39 p.m. UTC | #5
On 10/18/2018 8:40 AM, Lokesh Vutla wrote:
> TISCI abstracts the handling of IRQ routes where interrupt sources
> are not directly connected to host interrupt controller. This series
> adds support for:
> - TISCI commands needed for IRQ configuration
> - Interrupt Router(INTR) and Interrupt Aggregator(INTA) drivers
> 
> More information on TISCI IRQ management can be found here[1].
> Complete TISCI resource management information can be found here[2].
> AM65x SoC related TISCI information can be found here[3].
> INTR and INTA related information can be found in TRM[4].
> 
I didn't read the specs but from what you described in
INTA and INTR bindings, does the flow of IRQs like below ?

Device IRQ(e.g USB) -->INTR-->INTA--->HOST IRQ controller(GIC)

The confusing part is aggregator and can multiplex as well
as grouping but seems like its event grouping or multiplexing
than actual device IRQ grouping or multi-plexsing.

What am I missing ?

Regards,S
antosh
Lokesh Vutla Oct. 23, 2018, 8:17 a.m. UTC | #6
Hi Santosh,

On Tuesday 23 October 2018 02:09 AM, Santosh Shilimkar wrote:
> On 10/18/2018 8:40 AM, Lokesh Vutla wrote:
>> TISCI abstracts the handling of IRQ routes where interrupt sources
>> are not directly connected to host interrupt controller. This series
>> adds support for:
>> - TISCI commands needed for IRQ configuration
>> - Interrupt Router(INTR) and Interrupt Aggregator(INTA) drivers
>>
>> More information on TISCI IRQ management can be found here[1].
>> Complete TISCI resource management information can be found here[2].
>> AM65x SoC related TISCI information can be found here[3].
>> INTR and INTA related information can be found in TRM[4].
>>
> I didn't read the specs but from what you described in
> INTA and INTR bindings, does the flow of IRQs like below ?
> 
> Device IRQ(e.g USB) -->INTR-->INTA--->HOST IRQ controller(GIC)

Not all devices in SoC are connected to INTA. Only the devices that are capable 
of generating events are connected to INTA. And INTA is connected to INTR.

So there are three ways in which IRQ can flow in AM65x SoC:
1) Device directly connected to GIC
	- Device IRQ --> GIC
	- (Most legacy peripherals like MMC, UART falls in this case)
2) Device connected to INTR.
	- Device IRQ --> INTR --> GIC
	- This is cases where you want to mux IRQs. Used for GPIOs and Mailboxes
	- (This is somewhat similar to crossbar on DRA7 devices)
3) Devices connected to INTA.
	- Device Event --> INTA --> INTR --> GIC
	- Used for DMA and networking devices.

Events are messages based on a hw protocol, sent by a master over a dedicated 
Event transport lane. Events are highly precise that no under/over flow of data 
transfer occurs at source/destination regardless of distance and latency. So 
this is mostly preferred in DMA and networking usecases. Now Interrupt 
Aggregator(IA) has the logic to converts these events to Interrupts.

Thanks and regards
Lokesh
Marc Zyngier Oct. 23, 2018, 8:27 a.m. UTC | #7
On Tue, 23 Oct 2018 09:17:56 +0100,
Lokesh Vutla <lokeshvutla@ti.com> wrote:
> 
> Hi Santosh,
> 
> On Tuesday 23 October 2018 02:09 AM, Santosh Shilimkar wrote:
> > On 10/18/2018 8:40 AM, Lokesh Vutla wrote:
> >> TISCI abstracts the handling of IRQ routes where interrupt sources
> >> are not directly connected to host interrupt controller. This series
> >> adds support for:
> >> - TISCI commands needed for IRQ configuration
> >> - Interrupt Router(INTR) and Interrupt Aggregator(INTA) drivers
> >> 
> >> More information on TISCI IRQ management can be found here[1].
> >> Complete TISCI resource management information can be found here[2].
> >> AM65x SoC related TISCI information can be found here[3].
> >> INTR and INTA related information can be found in TRM[4].
> >> 
> > I didn't read the specs but from what you described in
> > INTA and INTR bindings, does the flow of IRQs like below ?
> > 
> > Device IRQ(e.g USB) -->INTR-->INTA--->HOST IRQ controller(GIC)
> 
> Not all devices in SoC are connected to INTA. Only the devices that
> are capable of generating events are connected to INTA. And INTA is
> connected to INTR.
> 
> So there are three ways in which IRQ can flow in AM65x SoC:
> 1) Device directly connected to GIC
> 	- Device IRQ --> GIC
> 	- (Most legacy peripherals like MMC, UART falls in this case)
> 2) Device connected to INTR.
> 	- Device IRQ --> INTR --> GIC
> 	- This is cases where you want to mux IRQs. Used for GPIOs and Mailboxes
> 	- (This is somewhat similar to crossbar on DRA7 devices)
> 3) Devices connected to INTA.
> 	- Device Event --> INTA --> INTR --> GIC
> 	- Used for DMA and networking devices.
> 
> Events are messages based on a hw protocol, sent by a master over a
> dedicated Event transport lane. Events are highly precise that no
> under/over flow of data transfer occurs at source/destination
> regardless of distance and latency. So this is mostly preferred in DMA
> and networking usecases. Now Interrupt Aggregator(IA) has the logic to
> converts these events to Interrupts.

Can we stop with these events already? What you describe here *is* an
interrupt. The fact that you have some other dedicated infrastructure
in your SoC is an implementation detail that doesn't concern the
kernel at all.

So this should be modelled as an interrupt, and not have its own
special interface at all.

Thanks,

	M.
Marc Zyngier Oct. 23, 2018, 1:50 p.m. UTC | #8
Hi Lokesh,

On Mon, 22 Oct 2018 15:35:41 +0100,
Lokesh Vutla <lokeshvutla@ti.com> wrote:
> 
> Hi Marc,
> 
> On Friday 19 October 2018 08:52 PM, Marc Zyngier wrote:
> > Hi Lokesh,
> > 
> > On 18/10/18 16:40, Lokesh Vutla wrote:
> >> Texas Instruments' K3 generation SoCs has an IP Interrupt Aggregator
> >> which is an interrupt controller that does the following:
> >> - Converts events to interrupts that can be understood by
> >>    an interrupt router.
> >> - Allows for multiplexing of events to interrupts.
> >> - Allows for grouping of multiple events to a single interrupt.
> > 
> > Aren't the last two points the same thing? Also, can you please define
> > what an "event" is? What is its semantic? If they look like interrupts,
> > can we just name them as such?
> 
> Event is actually a message sent by a master via an Event transport
> lane. Based on the id within the message, each message is directed to
> corresponding Interrupt Aggregator(IA). In turn IA raises a
> corresponding interrupt as configured for this event.

Ergo, this is an interrupt, and there is nothing more to it. HW folks
may want to give it a sexy name, but as far as SW is concerned, it has
the properties of an interrupt and should be modelled as such.

[...]

> >> +	for_each_set_bit(bit, vint_desc->event_map, MAX_EVENTS_PER_VINT) {
> >> +		val = 1 << bit;
> >> +		__raw_writeq(val, inta->base + data->hwirq * 0x1000 +
> >> +			     VINT_ENABLE_CLR_OFFSET);
> >> +	}
> > 
> > If you need an ack callback, why is this part of the eoi? We have
> > interrupt flows that allow you to combine both, so why don't you use that?
> 
> Actually I started with ack_irq. But I did not see this callback being
> triggered when interrupt is raised. Then I was suggested to use
> irq_roi. Will see why ack_irq is not being triggered and  update it in
> next version.

It is probably because you're not using the right interrupt flow.

> > Also, the __raw_writeq call is probably wrong, as it assumes that both
> > the CPU and the INTA have the same endianness.
> 
> hmm.. May I know what is the right call to use here?

writeq_relaxed is most probably what you want. I assume this code will
never run on a 32bit platform, right?

[...]

> >> +/**
> >> + * ti_sci_inta_irq_domain_free() - Free an IRQ from the IRQ domain
> >> + * @domain:	Domain to which the irqs belong
> >> + * @virq:	base linux virtual IRQ to be freed.
> >> + * @nr_irqs:	Number of continuous irqs to be freed
> >> + */
> >> +static void ti_sci_inta_irq_domain_free(struct irq_domain *domain,
> >> +					unsigned int virq, unsigned int nr_irqs)
> >> +{
> >> +	struct ti_sci_inta_irq_domain *inta = domain->host_data;
> >> +	struct ti_sci_inta_vint_desc *vint_desc;
> >> +	struct irq_data *data, *gic_data;
> >> +	int event_index;
> >> +
> >> +	data = irq_domain_get_irq_data(domain, virq);
> >> +	gic_data = irq_domain_get_irq_data(domain->parent->parent, virq);
> > 
> > That's absolutely horrid...
> 
> I agree. But I need to get GIC irq for sending TISCI message. Can you
> suggest a better way of doing it?

I'd say "fix the firmware to have a layered approach". But I guess
that's not an option, right?

[...]

> >> +/**
> >> + * ti_sci_allocate_event_irq() - Allocate an event to a IA vint.
> >> + * @inta:	Pointer to Interrupt Aggregator IRQ domain
> >> + * @vint_desc:	Virtual interrupt descriptor to which the event gets attached.
> >> + * @src_id:	TISCI device id of the event source
> >> + * @src_index:	Event index with in the device.
> >> + * @dst_irq:	Destination host irq
> >> + * @vint:	Interrupt number within interrupt aggregator.
> >> + *
> >> + * Return 0 if all went ok else appropriate error value.
> >> + */
> >> +static int ti_sci_allocate_event_irq(struct ti_sci_inta_irq_domain *inta,
> >> +				     struct ti_sci_inta_vint_desc *vint_desc,
> >> +				     u16 src_id, u16 src_index, u16 dst_irq,
> >> +				     u16 vint)
> >> +{
> >> +	struct ti_sci_inta_event_desc *event;
> >> +	unsigned long flags;
> >> +	u32 free_bit;
> >> +	int err;
> >> +
> >> +	raw_spin_lock_irqsave(&vint_desc->lock, flags);
> >> +	free_bit = find_first_zero_bit(vint_desc->event_map,
> >> +				       MAX_EVENTS_PER_VINT);
> >> +	if (free_bit != MAX_EVENTS_PER_VINT)
> >> +		set_bit(free_bit, vint_desc->event_map);
> >> +	raw_spin_unlock_irqrestore(&vint_desc->lock, flags);
> > 
> > Why disabling the interrupts? Do you expect to take this lock
> > concurrently with an interrupt? Why isn't it enough to just have a mutex
> > instead?
> 
> I have thought about this while coding. We are attaching multiple
> events to the same interrupt. Technically the events from different
> IPs can be attached to the same interrupt or events from the same IP
> can be registered at different times. So I thought it is possible that
> when an event is being allocated to an interrupt, an event can be
> raised that belongs to the same interrupt.

I strongly dispute this. Events are interrupts, and we're not
requesting an interrupt from an interrupt handler. That would be just
crazy.

[...]

> >> +/**
> >> + * ti_sci_inta_register_event() - Register a event to an interrupt aggregator
> >> + * @dev:	Device pointer to source generating the event
> >> + * @src_id:	TISCI device ID of the event source
> >> + * @src_index:	Event source index within the device.
> >> + * @virq:	Linux Virtual IRQ number
> >> + * @flags:	Corresponding IRQ flags
> >> + * @ack_needed:	If explicit clearing of event is required.
> >> + *
> >> + * Creates a new irq and attaches to IA domain if virq is not specified
> >> + * else attaches the event to vint corresponding to virq.
> >> + * When using TISCI within the client drivers, source indexes are always
> >> + * generated dynamically and cannot be represented in DT. So client
> >> + * drivers should call this API instead of platform_get_irq().
> > 
> > NAK. Either this fits in the standard model, or we adapt the standard
> > model to catter for your particular use case. But we don't define a new,
> > TI specific API.
> > 
> > I have a hunch that if the IDs are generated dynamically, then the model
> > we use for MSIs would fit this thing. I also want to understand what
> 
> hmm..I haven't thought about using MSI. Will try to explore it. But
> the "struct msi_msg" is not applicable in this case as device does not
> write to a specific location.

It doesn't need to. You can perfectly ignore the address field and
only be concerned with the data. We already have MSI users that do not
need programming of the doorbell address, just the data.

> 
> > this event is, and how drivers get notified that such an event has fired.
> 
> As said above, Event is a message being sent by a device using a
> hardware protocol. This message is sent over an Event Transport
> Lane(ETL) that understands this protocol. Based on the message ETL re
> directs the message to a specificed target(In our case it is interrupt
> Aggregator).
> 
> From a client drivers(that generates this event) prespective, the
> following needs to be done:
> - Get an index that is free and allocate it to a particular task.
> - Request INTA driver to assign an irq for this index.
> - do a request_irq baseed on the return value from the above step.

All of that can be done in the using the current MSI framework. You
can either implement your own bus framework or use the platform MSI
stuff. You can then rewrite the INTA driver to be what it really is,
an interrupt multiplexer.

> In case of grouping events, the client drivers has its own mechanism
> to identify the index that caused an interrupt(at least that is the
> case for the existing user).

This simply isn't acceptable. Each event must be the result of a
single interrupt allocation from the point of view of the driver. If
events are shared, they should be modelled as a shared interrupt.

Overall, I'm extremely concerned that you're reinventing the wheel and
coming up with a new "concept" that seems incredibly similar to what
we already have everywhere else, just offering an incompatible
API. This means that your drivers become specialised for your new API,
and this isn't going to fly.

I can only urge you to reconsider the way you provide these events,
and make sure that you use the existing API to its full potential. If
something is not up to the task, we can then fix it in core code.

Thanks,

	M.
Santosh Shilimkar Oct. 23, 2018, 5:34 p.m. UTC | #9
On 10/23/2018 1:17 AM, Lokesh Vutla wrote:
> Hi Santosh,
> 
> On Tuesday 23 October 2018 02:09 AM, Santosh Shilimkar wrote:
>> On 10/18/2018 8:40 AM, Lokesh Vutla wrote:
>>> TISCI abstracts the handling of IRQ routes where interrupt sources
>>> are not directly connected to host interrupt controller. This series
>>> adds support for:
>>> - TISCI commands needed for IRQ configuration
>>> - Interrupt Router(INTR) and Interrupt Aggregator(INTA) drivers
>>>
>>> More information on TISCI IRQ management can be found here[1].
>>> Complete TISCI resource management information can be found here[2].
>>> AM65x SoC related TISCI information can be found here[3].
>>> INTR and INTA related information can be found in TRM[4].
>>>
>> I didn't read the specs but from what you described in
>> INTA and INTR bindings, does the flow of IRQs like below ?
>>
>> Device IRQ(e.g USB) -->INTR-->INTA--->HOST IRQ controller(GIC)
> 
> Not all devices in SoC are connected to INTA. Only the devices that are 
> capable of generating events are connected to INTA. And INTA is 
> connected to INTR.
> 
> So there are three ways in which IRQ can flow in AM65x SoC:
> 1) Device directly connected to GIC
>      - Device IRQ --> GIC
>      - (Most legacy peripherals like MMC, UART falls in this case)
> 2) Device connected to INTR.
>      - Device IRQ --> INTR --> GIC
>      - This is cases where you want to mux IRQs. Used for GPIOs and 
> Mailboxes
>      - (This is somewhat similar to crossbar on DRA7 devices)
> 3) Devices connected to INTA.
>      - Device Event --> INTA --> INTR --> GIC
>      - Used for DMA and networking devices.
> 
> Events are messages based on a hw protocol, sent by a master over a 
> dedicated Event transport lane. Events are highly precise that no 
> under/over flow of data transfer occurs at source/destination regardless 
> of distance and latency. So this is mostly preferred in DMA and 
> networking usecases. Now Interrupt Aggregator(IA) has the logic to 
> converts these events to Interrupts.
> 
This helps but none of the kernel doc you added, makes this clear so
perhaps you want to add this info to make that clear for reviewers
as well as for future reference.

Now regarding the events, no matter how they are routed/processed
within SOC, they are essentially interrupts so I do agree with
Marc's other comment.

Thanks for explanation again !!

regards,
Santosh
Lokesh Vutla Oct. 26, 2018, 6:39 a.m. UTC | #10
Hi Santosh,

On Tuesday 23 October 2018 11:04 PM, Santosh Shilimkar wrote:
> On 10/23/2018 1:17 AM, Lokesh Vutla wrote:
>> Hi Santosh,
>>
>> On Tuesday 23 October 2018 02:09 AM, Santosh Shilimkar wrote:
>>> On 10/18/2018 8:40 AM, Lokesh Vutla wrote:
>>>> TISCI abstracts the handling of IRQ routes where interrupt sources
>>>> are not directly connected to host interrupt controller. This series
>>>> adds support for:
>>>> - TISCI commands needed for IRQ configuration
>>>> - Interrupt Router(INTR) and Interrupt Aggregator(INTA) drivers
>>>>
>>>> More information on TISCI IRQ management can be found here[1].
>>>> Complete TISCI resource management information can be found here[2].
>>>> AM65x SoC related TISCI information can be found here[3].
>>>> INTR and INTA related information can be found in TRM[4].
>>>>
>>> I didn't read the specs but from what you described in
>>> INTA and INTR bindings, does the flow of IRQs like below ?
>>>
>>> Device IRQ(e.g USB) -->INTR-->INTA--->HOST IRQ controller(GIC)
>>
>> Not all devices in SoC are connected to INTA. Only the devices that are
>> capable of generating events are connected to INTA. And INTA is
>> connected to INTR.
>>
>> So there are three ways in which IRQ can flow in AM65x SoC:
>> 1) Device directly connected to GIC
>>       - Device IRQ --> GIC
>>       - (Most legacy peripherals like MMC, UART falls in this case)
>> 2) Device connected to INTR.
>>       - Device IRQ --> INTR --> GIC
>>       - This is cases where you want to mux IRQs. Used for GPIOs and
>> Mailboxes
>>       - (This is somewhat similar to crossbar on DRA7 devices)
>> 3) Devices connected to INTA.
>>       - Device Event --> INTA --> INTR --> GIC
>>       - Used for DMA and networking devices.
>>
>> Events are messages based on a hw protocol, sent by a master over a
>> dedicated Event transport lane. Events are highly precise that no
>> under/over flow of data transfer occurs at source/destination regardless
>> of distance and latency. So this is mostly preferred in DMA and
>> networking usecases. Now Interrupt Aggregator(IA) has the logic to
>> converts these events to Interrupts.
>>
> This helps but none of the kernel doc you added, makes this clear so
> perhaps you want to add this info to make that clear for reviewers
> as well as for future reference.

Sure will add it.

> 
> Now regarding the events, no matter how they are routed/processed
> within SOC, they are essentially interrupts so I do agree with
> Marc's other comment.

Agreed. Marc suggested to use MSI in this scenario. Currently working in that 
direction. Will repost the series once it is done.

Thanks and regards,
Lokesh

> 
> Thanks for explanation again !!
> 
> regards,
> Santosh
>
Lokesh Vutla Oct. 26, 2018, 6:39 a.m. UTC | #11
Hi Marc,

On Tuesday 23 October 2018 07:20 PM, Marc Zyngier wrote:
> Hi Lokesh,
> 
> On Mon, 22 Oct 2018 15:35:41 +0100,
> Lokesh Vutla <lokeshvutla@ti.com> wrote:
>>
>> Hi Marc,
>>
>> On Friday 19 October 2018 08:52 PM, Marc Zyngier wrote:
>>> Hi Lokesh,
>>>
>>> On 18/10/18 16:40, Lokesh Vutla wrote:
>>>> Texas Instruments' K3 generation SoCs has an IP Interrupt Aggregator
>>>> which is an interrupt controller that does the following:
>>>> - Converts events to interrupts that can be understood by
>>>>     an interrupt router.
>>>> - Allows for multiplexing of events to interrupts.
>>>> - Allows for grouping of multiple events to a single interrupt.
>>>
>>> Aren't the last two points the same thing? Also, can you please define
>>> what an "event" is? What is its semantic? If they look like interrupts,
>>> can we just name them as such?
>>
>> Event is actually a message sent by a master via an Event transport
>> lane. Based on the id within the message, each message is directed to
>> corresponding Interrupt Aggregator(IA). In turn IA raises a
>> corresponding interrupt as configured for this event.
> 
> Ergo, this is an interrupt, and there is nothing more to it. HW folks
> may want to give it a sexy name, but as far as SW is concerned, it has
> the properties of an interrupt and should be modelled as such.
> 
> [...]
> 
>>>> +	for_each_set_bit(bit, vint_desc->event_map, MAX_EVENTS_PER_VINT) {
>>>> +		val = 1 << bit;
>>>> +		__raw_writeq(val, inta->base + data->hwirq * 0x1000 +
>>>> +			     VINT_ENABLE_CLR_OFFSET);
>>>> +	}
>>>
>>> If you need an ack callback, why is this part of the eoi? We have
>>> interrupt flows that allow you to combine both, so why don't you use that?
>>
>> Actually I started with ack_irq. But I did not see this callback being
>> triggered when interrupt is raised. Then I was suggested to use
>> irq_roi. Will see why ack_irq is not being triggered and  update it in
>> next version.
> 
> It is probably because you're not using the right interrupt flow.
> 
>>> Also, the __raw_writeq call is probably wrong, as it assumes that both
>>> the CPU and the INTA have the same endianness.
>>
>> hmm.. May I know what is the right call to use here?
> 
> writeq_relaxed is most probably what you want. I assume this code will
> never run on a 32bit platform, right?
> 
> [...]
> 
>>>> +/**
>>>> + * ti_sci_inta_irq_domain_free() - Free an IRQ from the IRQ domain
>>>> + * @domain:	Domain to which the irqs belong
>>>> + * @virq:	base linux virtual IRQ to be freed.
>>>> + * @nr_irqs:	Number of continuous irqs to be freed
>>>> + */
>>>> +static void ti_sci_inta_irq_domain_free(struct irq_domain *domain,
>>>> +					unsigned int virq, unsigned int nr_irqs)
>>>> +{
>>>> +	struct ti_sci_inta_irq_domain *inta = domain->host_data;
>>>> +	struct ti_sci_inta_vint_desc *vint_desc;
>>>> +	struct irq_data *data, *gic_data;
>>>> +	int event_index;
>>>> +
>>>> +	data = irq_domain_get_irq_data(domain, virq);
>>>> +	gic_data = irq_domain_get_irq_data(domain->parent->parent, virq);
>>>
>>> That's absolutely horrid...
>>
>> I agree. But I need to get GIC irq for sending TISCI message. Can you
>> suggest a better way of doing it?
> 
> I'd say "fix the firmware to have a layered approach". But I guess
> that's not an option, right?

yeah, we cannot change the APIs now.

> 
> [...]
> 
>>>> +/**
>>>> + * ti_sci_allocate_event_irq() - Allocate an event to a IA vint.
>>>> + * @inta:	Pointer to Interrupt Aggregator IRQ domain
>>>> + * @vint_desc:	Virtual interrupt descriptor to which the event gets attached.
>>>> + * @src_id:	TISCI device id of the event source
>>>> + * @src_index:	Event index with in the device.
>>>> + * @dst_irq:	Destination host irq
>>>> + * @vint:	Interrupt number within interrupt aggregator.
>>>> + *
>>>> + * Return 0 if all went ok else appropriate error value.
>>>> + */
>>>> +static int ti_sci_allocate_event_irq(struct ti_sci_inta_irq_domain *inta,
>>>> +				     struct ti_sci_inta_vint_desc *vint_desc,
>>>> +				     u16 src_id, u16 src_index, u16 dst_irq,
>>>> +				     u16 vint)
>>>> +{
>>>> +	struct ti_sci_inta_event_desc *event;
>>>> +	unsigned long flags;
>>>> +	u32 free_bit;
>>>> +	int err;
>>>> +
>>>> +	raw_spin_lock_irqsave(&vint_desc->lock, flags);
>>>> +	free_bit = find_first_zero_bit(vint_desc->event_map,
>>>> +				       MAX_EVENTS_PER_VINT);
>>>> +	if (free_bit != MAX_EVENTS_PER_VINT)
>>>> +		set_bit(free_bit, vint_desc->event_map);
>>>> +	raw_spin_unlock_irqrestore(&vint_desc->lock, flags);
>>>
>>> Why disabling the interrupts? Do you expect to take this lock
>>> concurrently with an interrupt? Why isn't it enough to just have a mutex
>>> instead?
>>
>> I have thought about this while coding. We are attaching multiple
>> events to the same interrupt. Technically the events from different
>> IPs can be attached to the same interrupt or events from the same IP
>> can be registered at different times. So I thought it is possible that
>> when an event is being allocated to an interrupt, an event can be
>> raised that belongs to the same interrupt.
> 
> I strongly dispute this. Events are interrupts, and we're not
> requesting an interrupt from an interrupt handler. That would be just
> crazy.

okay, will use mutex instead.

> 
> [...]
> 
>>>> +/**
>>>> + * ti_sci_inta_register_event() - Register a event to an interrupt aggregator
>>>> + * @dev:	Device pointer to source generating the event
>>>> + * @src_id:	TISCI device ID of the event source
>>>> + * @src_index:	Event source index within the device.
>>>> + * @virq:	Linux Virtual IRQ number
>>>> + * @flags:	Corresponding IRQ flags
>>>> + * @ack_needed:	If explicit clearing of event is required.
>>>> + *
>>>> + * Creates a new irq and attaches to IA domain if virq is not specified
>>>> + * else attaches the event to vint corresponding to virq.
>>>> + * When using TISCI within the client drivers, source indexes are always
>>>> + * generated dynamically and cannot be represented in DT. So client
>>>> + * drivers should call this API instead of platform_get_irq().
>>>
>>> NAK. Either this fits in the standard model, or we adapt the standard
>>> model to catter for your particular use case. But we don't define a new,
>>> TI specific API.
>>>
>>> I have a hunch that if the IDs are generated dynamically, then the model
>>> we use for MSIs would fit this thing. I also want to understand what
>>
>> hmm..I haven't thought about using MSI. Will try to explore it. But
>> the "struct msi_msg" is not applicable in this case as device does not
>> write to a specific location.
> 
> It doesn't need to. You can perfectly ignore the address field and
> only be concerned with the data. We already have MSI users that do not
> need programming of the doorbell address, just the data.

Okay. I am reworking towards using MSI for this case. Will post the series once 
it is done.

Once again, Thanks for the clear explanation.

Thanks and regards,
Lokesh

> 
>>
>>> this event is, and how drivers get notified that such an event has fired.
>>
>> As said above, Event is a message being sent by a device using a
>> hardware protocol. This message is sent over an Event Transport
>> Lane(ETL) that understands this protocol. Based on the message ETL re
>> directs the message to a specificed target(In our case it is interrupt
>> Aggregator).
>>
>>  From a client drivers(that generates this event) prespective, the
>> following needs to be done:
>> - Get an index that is free and allocate it to a particular task.
>> - Request INTA driver to assign an irq for this index.
>> - do a request_irq baseed on the return value from the above step.
> 
> All of that can be done in the using the current MSI framework. You
> can either implement your own bus framework or use the platform MSI
> stuff. You can then rewrite the INTA driver to be what it really is,
> an interrupt multiplexer.
> 
>> In case of grouping events, the client drivers has its own mechanism
>> to identify the index that caused an interrupt(at least that is the
>> case for the existing user).
> 
> This simply isn't acceptable. Each event must be the result of a
> single interrupt allocation from the point of view of the driver. If
> events are shared, they should be modelled as a shared interrupt.
> 
> Overall, I'm extremely concerned that you're reinventing the wheel and
> coming up with a new "concept" that seems incredibly similar to what
> we already have everywhere else, just offering an incompatible
> API. This means that your drivers become specialised for your new API,
> and this isn't going to fly.
> 
> I can only urge you to reconsider the way you provide these events,
> and make sure that you use the existing API to its full potential. If
> something is not up to the task, we can then fix it in core code.
> 
> Thanks,
> 
> 	M.
>
Lokesh Vutla Oct. 26, 2018, 8:19 p.m. UTC | #12
Hi Marc,

[..snip..]
>> [...]
>>
>>>>> +/**
>>>>> + * ti_sci_inta_register_event() - Register a event to an interrupt aggregator
>>>>> + * @dev:	Device pointer to source generating the event
>>>>> + * @src_id:	TISCI device ID of the event source
>>>>> + * @src_index:	Event source index within the device.
>>>>> + * @virq:	Linux Virtual IRQ number
>>>>> + * @flags:	Corresponding IRQ flags
>>>>> + * @ack_needed:	If explicit clearing of event is required.
>>>>> + *
>>>>> + * Creates a new irq and attaches to IA domain if virq is not specified
>>>>> + * else attaches the event to vint corresponding to virq.
>>>>> + * When using TISCI within the client drivers, source indexes are always
>>>>> + * generated dynamically and cannot be represented in DT. So client
>>>>> + * drivers should call this API instead of platform_get_irq().
>>>>
>>>> NAK. Either this fits in the standard model, or we adapt the standard
>>>> model to catter for your particular use case. But we don't define a new,
>>>> TI specific API.
>>>>
>>>> I have a hunch that if the IDs are generated dynamically, then the model
>>>> we use for MSIs would fit this thing. I also want to understand what
>>>
>>> hmm..I haven't thought about using MSI. Will try to explore it. But
>>> the "struct msi_msg" is not applicable in this case as device does not
>>> write to a specific location.
>>
>> It doesn't need to. You can perfectly ignore the address field and
>> only be concerned with the data. We already have MSI users that do not
>> need programming of the doorbell address, just the data.
> 

Just one more clarification.

First let me explain the IRQ routes a bit deeply. As I said earlier there are 
three ways in which IRQ can flow in AM65x SoC
1) Device directly connected to GIC
	- Device IRQ --> GIC
2) Device connected to INTR.
	- Device IRQ --> INTR --> GIC
3) Devices connected to INTA.
	- Device IRQ --> INTA --> INTR --> GIC

1 and 2 are straight forward and we use DT for IRQ representation. Coming to 3 
the trickier part is that Input to INTA and output from INTA and dynamically 
managed. To be more specific:
- By hardware design there are certain set of physical global events(interrupts) 
attached to an INTA. Out of which a certain range are assigned to the current 
linux host that can be queried from system-controller.
- Similarly out of all the INTA outputs(referenced as vints) a certain range can 
be used by the current linux host.


So for configuring an IRQ route in case 3, the following steps are needed:
- Device id and device resource index for which the interrupt is needed
- A free event id from the range assigned to the INTA in this host context
- A free vint from the range assigned to the INTA in this host context
- A free gic IRQ from the range assigned to the INTR in this host context.

With the above information, linux should send a message to system-controller 
using TISCI protocol. After policing the given information, system-controller 
does the following:
- Attaches the interrupt(INTA input) to the device resource index
- Muxes the interrupt(INTA input) to corresponding vint(INTA output)
- Muxes the vint(INTR input) to GIC irq(INTR output).

For grouping of interrupts, the same vint number is to be passed to 
system-controller for all the requests.

Keeping all the above in mind, I see the following as software IRQ Domain Hierarchy:

1) INTA multi MSI --> 2)INTA  -->3) MSI --> 4) INTR  -->5) GIC

INTA driver has to set a chained IRQ using virq allocated from its parent MSI. 
This is to differentiate the grouped interrupts within INTA.

Inorder to cover the above two MSI domains, a new bus driver has to be created 
as I couldn't find a fit with the existing bus drivers.

Does the above approach make sense? Please correct me if i am wrong.

Thanks and regards,
Lokesh
Marc Zyngier Oct. 28, 2018, 1:31 p.m. UTC | #13
Hi Lokesh,

On Fri, 26 Oct 2018 21:19:41 +0100,
Lokesh Vutla <lokeshvutla@ti.com> wrote:
> 
> Hi Marc,
> 
> [..snip..]
> >> [...]
> >> 
> >>>>> +/**
> >>>>> + * ti_sci_inta_register_event() - Register a event to an interrupt aggregator
> >>>>> + * @dev:	Device pointer to source generating the event
> >>>>> + * @src_id:	TISCI device ID of the event source
> >>>>> + * @src_index:	Event source index within the device.
> >>>>> + * @virq:	Linux Virtual IRQ number
> >>>>> + * @flags:	Corresponding IRQ flags
> >>>>> + * @ack_needed:	If explicit clearing of event is required.
> >>>>> + *
> >>>>> + * Creates a new irq and attaches to IA domain if virq is not specified
> >>>>> + * else attaches the event to vint corresponding to virq.
> >>>>> + * When using TISCI within the client drivers, source indexes are always
> >>>>> + * generated dynamically and cannot be represented in DT. So client
> >>>>> + * drivers should call this API instead of platform_get_irq().
> >>>> 
> >>>> NAK. Either this fits in the standard model, or we adapt the standard
> >>>> model to catter for your particular use case. But we don't define a new,
> >>>> TI specific API.
> >>>> 
> >>>> I have a hunch that if the IDs are generated dynamically, then the model
> >>>> we use for MSIs would fit this thing. I also want to understand what
> >>> 
> >>> hmm..I haven't thought about using MSI. Will try to explore it. But
> >>> the "struct msi_msg" is not applicable in this case as device does not
> >>> write to a specific location.
> >> 
> >> It doesn't need to. You can perfectly ignore the address field and
> >> only be concerned with the data. We already have MSI users that do not
> >> need programming of the doorbell address, just the data.
> > 
> 
> Just one more clarification.
> 
> First let me explain the IRQ routes a bit deeply. As I said earlier
> there are three ways in which IRQ can flow in AM65x SoC
> 1) Device directly connected to GIC
> 	- Device IRQ --> GIC
> 2) Device connected to INTR.
> 	- Device IRQ --> INTR --> GIC
> 3) Devices connected to INTA.
> 	- Device IRQ --> INTA --> INTR --> GIC
> 
> 1 and 2 are straight forward and we use DT for IRQ
> representation. Coming to 3 the trickier part is that Input to INTA
> and output from INTA and dynamically managed. To be more specific:
> - By hardware design there are certain set of physical global
> events(interrupts) attached to an INTA. Out of which a certain range
> are assigned to the current linux host that can be queried from
> system-controller.
> - Similarly out of all the INTA outputs(referenced as vints) a certain
> range can be used by the current linux host.
> 
> 
> So for configuring an IRQ route in case 3, the following steps are needed:
> - Device id and device resource index for which the interrupt is needed

THat is no different from a PCI device for example, where we need the
requester ID and the number of the interrupt in the MSI-X table.

> - A free event id from the range assigned to the INTA in this host context
> - A free vint from the range assigned to the INTA in this host context
> - A free gic IRQ from the range assigned to the INTR in this host context.

>From what I understand of the driver, at least some of that is under
the responsibility of the firmware, right? Or is the driver under
control of all three parameters? To be honest, it doesn't really
matter, as the as far as the kernel is concerned, the irqchip drivers
are free to deal with the routing anyway they want.

> With the above information, linux should send a message to
> system-controller using TISCI protocol. After policing the given
> information, system-controller does the following:
> - Attaches the interrupt(INTA input) to the device resource index
> - Muxes the interrupt(INTA input) to corresponding vint(INTA output)
> - Muxes the vint(INTR input) to GIC irq(INTR output).

Isn't there a 1:1 mapping between *used* INTR inputs and outputs?
Since INTR is a router, there is no real muxing. I assume that the
third point above is just a copy-paste error.

> 
> For grouping of interrupts, the same vint number is to be passed to
> system-controller for all the requests.
> 
> Keeping all the above in mind, I see the following as software IRQ
> Domain Hierarchy:
> 
> 1) INTA multi MSI --> 2)INTA  -->3) MSI --> 4) INTR  -->5) GIC
> 
> INTA driver has to set a chained IRQ using virq allocated from its
> parent MSI. This is to differentiate the grouped interrupts within
> INTA.
> 
> Inorder to cover the above two MSI domains, a new bus driver has to be
> created as I couldn't find a fit with the existing bus drivers.
> 
> Does the above approach make sense? Please correct me if i am wrong.

I think this can be further simplified, as you seem to assume that
dynamic allocation implies MSI. This is not the case. You can
perfectly use dynamically allocated interrupts and still not use MSIs.

INTA is indeed a chained interrupt controller, as it may mux several
inputs onto a single output. But the output of INTA is not an MSI. It
is just a regular interrupt that can allocated when the first mapping
gets established.

Also, INTA shouldn't offer any "multi-MSI". This is a PCI-specific
concept that doesn't translate on any other type of bus. What you want
is something that should behave like MSI-X for its allocation part,
where each MSI gets allocated independently.

Hierarchy-wise, you should end-up with something like this:

       TISCI-MSI       Chained-intr       SPI
Device ---------> INTA ------------> INTR ---> GIC

As for the bus, you have two choices:

- You create a new one altogether. See drivers/bus/fsl-mc for
  an example of something totally over the top. This implies that all
  your devices are following the exact same programming model for more
  than just interrupts.

- You use the platform-MSI framework to build your interrupt
  infrastructure, and you don't have to implement much more than
  that.

Hope this helps,

	M.
Lokesh Vutla Oct. 29, 2018, 1:04 p.m. UTC | #14
Hi Marc,

On Sunday 28 October 2018 07:01 PM, Marc Zyngier wrote:
> Hi Lokesh,
> 
> On Fri, 26 Oct 2018 21:19:41 +0100,
> Lokesh Vutla <lokeshvutla@ti.com> wrote:
>>
>> Hi Marc,
>>
>> [..snip..]
>>>> [...]
>>>>
>>>>>>> +/**
>>>>>>> + * ti_sci_inta_register_event() - Register a event to an interrupt aggregator
>>>>>>> + * @dev:	Device pointer to source generating the event
>>>>>>> + * @src_id:	TISCI device ID of the event source
>>>>>>> + * @src_index:	Event source index within the device.
>>>>>>> + * @virq:	Linux Virtual IRQ number
>>>>>>> + * @flags:	Corresponding IRQ flags
>>>>>>> + * @ack_needed:	If explicit clearing of event is required.
>>>>>>> + *
>>>>>>> + * Creates a new irq and attaches to IA domain if virq is not specified
>>>>>>> + * else attaches the event to vint corresponding to virq.
>>>>>>> + * When using TISCI within the client drivers, source indexes are always
>>>>>>> + * generated dynamically and cannot be represented in DT. So client
>>>>>>> + * drivers should call this API instead of platform_get_irq().
>>>>>>
>>>>>> NAK. Either this fits in the standard model, or we adapt the standard
>>>>>> model to catter for your particular use case. But we don't define a new,
>>>>>> TI specific API.
>>>>>>
>>>>>> I have a hunch that if the IDs are generated dynamically, then the model
>>>>>> we use for MSIs would fit this thing. I also want to understand what
>>>>>
>>>>> hmm..I haven't thought about using MSI. Will try to explore it. But
>>>>> the "struct msi_msg" is not applicable in this case as device does not
>>>>> write to a specific location.
>>>>
>>>> It doesn't need to. You can perfectly ignore the address field and
>>>> only be concerned with the data. We already have MSI users that do not
>>>> need programming of the doorbell address, just the data.
>>>
>>
>> Just one more clarification.
>>
>> First let me explain the IRQ routes a bit deeply. As I said earlier
>> there are three ways in which IRQ can flow in AM65x SoC
>> 1) Device directly connected to GIC
>> 	- Device IRQ --> GIC
>> 2) Device connected to INTR.
>> 	- Device IRQ --> INTR --> GIC
>> 3) Devices connected to INTA.
>> 	- Device IRQ --> INTA --> INTR --> GIC
>>
>> 1 and 2 are straight forward and we use DT for IRQ
>> representation. Coming to 3 the trickier part is that Input to INTA
>> and output from INTA and dynamically managed. To be more specific:
>> - By hardware design there are certain set of physical global
>> events(interrupts) attached to an INTA. Out of which a certain range
>> are assigned to the current linux host that can be queried from
>> system-controller.
>> - Similarly out of all the INTA outputs(referenced as vints) a certain
>> range can be used by the current linux host.
>>
>>
>> So for configuring an IRQ route in case 3, the following steps are needed:
>> - Device id and device resource index for which the interrupt is needed
> 
> THat is no different from a PCI device for example, where we need the
> requester ID and the number of the interrupt in the MSI-X table.
> 
>> - A free event id from the range assigned to the INTA in this host context
>> - A free vint from the range assigned to the INTA in this host context
>> - A free gic IRQ from the range assigned to the INTR in this host context.
> 
>  From what I understand of the driver, at least some of that is under
> the responsibility of the firmware, right? Or is the driver under
> control of all three parameters? To be honest, it doesn't really

Driver should control all three parameters.

> matter, as the as far as the kernel is concerned, the irqchip drivers
> are free to deal with the routing anyway they want.

Correct, that's my understanding as well.

> 
>> With the above information, linux should send a message to
>> system-controller using TISCI protocol. After policing the given
>> information, system-controller does the following:
>> - Attaches the interrupt(INTA input) to the device resource index
>> - Muxes the interrupt(INTA input) to corresponding vint(INTA output)
>> - Muxes the vint(INTR input) to GIC irq(INTR output).
> 
> Isn't there a 1:1 mapping between *used* INTR inputs and outputs?
> Since INTR is a router, there is no real muxing. I assume that the
> third point above is just a copy-paste error.

Right, my bad. INTR is just a router and no read muxing.

> 
>>
>> For grouping of interrupts, the same vint number is to be passed to
>> system-controller for all the requests.
>>
>> Keeping all the above in mind, I see the following as software IRQ
>> Domain Hierarchy:
>>
>> 1) INTA multi MSI --> 2)INTA  -->3) MSI --> 4) INTR  -->5) GIC
>>
>> INTA driver has to set a chained IRQ using virq allocated from its
>> parent MSI. This is to differentiate the grouped interrupts within
>> INTA.
>>
>> Inorder to cover the above two MSI domains, a new bus driver has to be
>> created as I couldn't find a fit with the existing bus drivers.
>>
>> Does the above approach make sense? Please correct me if i am wrong.
> 
> I think this can be further simplified, as you seem to assume that
> dynamic allocation implies MSI. This is not the case. You can
> perfectly use dynamically allocated interrupts and still not use MSIs.
> 
> INTA is indeed a chained interrupt controller, as it may mux several
> inputs onto a single output. But the output of INTA is not an MSI. It
> is just a regular interrupt that can allocated when the first mapping
> gets established.

okay. I guess it can just be done using irq_create_fwspec_mapping().

> 
> Also, INTA shouldn't offer any "multi-MSI". This is a PCI-specific
> concept that doesn't translate on any other type of bus. What you want
> is something that should behave like MSI-X for its allocation part,
> where each MSI gets allocated independently.
> 
> Hierarchy-wise, you should end-up with something like this:
> 
>         TISCI-MSI       Chained-intr       SPI
> Device ---------> INTA ------------> INTR ---> GIC

makes sense. Thanks for the clarification. Will re work the driver using this 
approach and post it.

Thanks and regards,
Lokesh

> 
> As for the bus, you have two choices:
> 
> - You create a new one altogether. See drivers/bus/fsl-mc for
>    an example of something totally over the top. This implies that all
>    your devices are following the exact same programming model for more
>    than just interrupts.
> 
> - You use the platform-MSI framework to build your interrupt
>    infrastructure, and you don't have to implement much more than
>    that.
> 
> Hope this helps,
> 
> 	M.
>
Grygorii Strashko Oct. 31, 2018, 4:39 p.m. UTC | #15
Hi Marc,

On 10/23/18 8:50 AM, Marc Zyngier wrote:
> On Mon, 22 Oct 2018 15:35:41 +0100,
> Lokesh Vutla <lokeshvutla@ti.com> wrote:
>> On Friday 19 October 2018 08:52 PM, Marc Zyngier wrote:
>>> On 18/10/18 16:40, Lokesh Vutla wrote:
>>>> Texas Instruments' K3 generation SoCs has an IP Interrupt Aggregator
>>>> which is an interrupt controller that does the following:
>>>> - Converts events to interrupts that can be understood by
>>>>     an interrupt router.
>>>> - Allows for multiplexing of events to interrupts.
>>>> - Allows for grouping of multiple events to a single interrupt.
>>>
>>> Aren't the last two points the same thing? Also, can you please define
>>> what an "event" is? What is its semantic? If they look like interrupts,
>>> can we just name them as such?
>>
>> Event is actually a message sent by a master via an Event transport
>> lane. Based on the id within the message, each message is directed to
>> corresponding Interrupt Aggregator(IA). In turn IA raises a
>> corresponding interrupt as configured for this event.
> 
> Ergo, this is an interrupt, and there is nothing more to it. HW folks
> may want to give it a sexy name, but as far as SW is concerned, it has
> the properties of an interrupt and should be modelled as such.
> 
> [...]
> 
>>>> +	for_each_set_bit(bit, vint_desc->event_map, MAX_EVENTS_PER_VINT) {
>>>> +		val = 1 << bit;
>>>> +		__raw_writeq(val, inta->base + data->hwirq * 0x1000 +
>>>> +			     VINT_ENABLE_CLR_OFFSET);
>>>> +	}
>>>
>>> If you need an ack callback, why is this part of the eoi? We have
>>> interrupt flows that allow you to combine both, so why don't you use that?
>>
>> Actually I started with ack_irq. But I did not see this callback being
>> triggered when interrupt is raised. Then I was suggested to use
>> irq_roi. Will see why ack_irq is not being triggered and  update it in
>> next version.
> 
> It is probably because you're not using the right interrupt flow.
> 
>>> Also, the __raw_writeq call is probably wrong, as it assumes that both
>>> the CPU and the INTA have the same endianness.
>>
>> hmm.. May I know what is the right call to use here?
> 
> writeq_relaxed is most probably what you want. I assume this code will
> never run on a 32bit platform, right?
> 
> [...]
> 
>>>> +/**
>>>> + * ti_sci_inta_irq_domain_free() - Free an IRQ from the IRQ domain
>>>> + * @domain:	Domain to which the irqs belong
>>>> + * @virq:	base linux virtual IRQ to be freed.
>>>> + * @nr_irqs:	Number of continuous irqs to be freed
>>>> + */
>>>> +static void ti_sci_inta_irq_domain_free(struct irq_domain *domain,
>>>> +					unsigned int virq, unsigned int nr_irqs)
>>>> +{
>>>> +	struct ti_sci_inta_irq_domain *inta = domain->host_data;
>>>> +	struct ti_sci_inta_vint_desc *vint_desc;
>>>> +	struct irq_data *data, *gic_data;
>>>> +	int event_index;
>>>> +
>>>> +	data = irq_domain_get_irq_data(domain, virq);
>>>> +	gic_data = irq_domain_get_irq_data(domain->parent->parent, virq);
>>>
>>> That's absolutely horrid...
>>
>> I agree. But I need to get GIC irq for sending TISCI message. Can you
>> suggest a better way of doing it?
> 
> I'd say "fix the firmware to have a layered approach". But I guess
> that's not an option, right?
> 
> [...]
> 
>>>> +/**
>>>> + * ti_sci_allocate_event_irq() - Allocate an event to a IA vint.
>>>> + * @inta:	Pointer to Interrupt Aggregator IRQ domain
>>>> + * @vint_desc:	Virtual interrupt descriptor to which the event gets attached.
>>>> + * @src_id:	TISCI device id of the event source
>>>> + * @src_index:	Event index with in the device.
>>>> + * @dst_irq:	Destination host irq
>>>> + * @vint:	Interrupt number within interrupt aggregator.
>>>> + *
>>>> + * Return 0 if all went ok else appropriate error value.
>>>> + */
>>>> +static int ti_sci_allocate_event_irq(struct ti_sci_inta_irq_domain *inta,
>>>> +				     struct ti_sci_inta_vint_desc *vint_desc,
>>>> +				     u16 src_id, u16 src_index, u16 dst_irq,
>>>> +				     u16 vint)
>>>> +{
>>>> +	struct ti_sci_inta_event_desc *event;
>>>> +	unsigned long flags;
>>>> +	u32 free_bit;
>>>> +	int err;
>>>> +
>>>> +	raw_spin_lock_irqsave(&vint_desc->lock, flags);
>>>> +	free_bit = find_first_zero_bit(vint_desc->event_map,
>>>> +				       MAX_EVENTS_PER_VINT);
>>>> +	if (free_bit != MAX_EVENTS_PER_VINT)
>>>> +		set_bit(free_bit, vint_desc->event_map);
>>>> +	raw_spin_unlock_irqrestore(&vint_desc->lock, flags);
>>>
>>> Why disabling the interrupts? Do you expect to take this lock
>>> concurrently with an interrupt? Why isn't it enough to just have a mutex
>>> instead?
>>
>> I have thought about this while coding. We are attaching multiple
>> events to the same interrupt. Technically the events from different
>> IPs can be attached to the same interrupt or events from the same IP
>> can be registered at different times. So I thought it is possible that
>> when an event is being allocated to an interrupt, an event can be
>> raised that belongs to the same interrupt.
> 
> I strongly dispute this. Events are interrupts, and we're not
> requesting an interrupt from an interrupt handler. That would be just
> crazy.
> 
> [...]
> 
>>>> +/**
>>>> + * ti_sci_inta_register_event() - Register a event to an interrupt aggregator
>>>> + * @dev:	Device pointer to source generating the event
>>>> + * @src_id:	TISCI device ID of the event source
>>>> + * @src_index:	Event source index within the device.
>>>> + * @virq:	Linux Virtual IRQ number
>>>> + * @flags:	Corresponding IRQ flags
>>>> + * @ack_needed:	If explicit clearing of event is required.
>>>> + *
>>>> + * Creates a new irq and attaches to IA domain if virq is not specified
>>>> + * else attaches the event to vint corresponding to virq.
>>>> + * When using TISCI within the client drivers, source indexes are always
>>>> + * generated dynamically and cannot be represented in DT. So client
>>>> + * drivers should call this API instead of platform_get_irq().
>>>
>>> NAK. Either this fits in the standard model, or we adapt the standard
>>> model to catter for your particular use case. But we don't define a new,
>>> TI specific API.
>>>
>>> I have a hunch that if the IDs are generated dynamically, then the model
>>> we use for MSIs would fit this thing. I also want to understand what
>>
>> hmm..I haven't thought about using MSI. Will try to explore it. But
>> the "struct msi_msg" is not applicable in this case as device does not
>> write to a specific location.
> 
> It doesn't need to. You can perfectly ignore the address field and
> only be concerned with the data. We already have MSI users that do not
> need programming of the doorbell address, just the data.
> 
>>
>>> this event is, and how drivers get notified that such an event has fired.
>>
>> As said above, Event is a message being sent by a device using a
>> hardware protocol. This message is sent over an Event Transport
>> Lane(ETL) that understands this protocol. Based on the message ETL re
>> directs the message to a specificed target(In our case it is interrupt
>> Aggregator).
>>
>>  From a client drivers(that generates this event) prespective, the
>> following needs to be done:
>> - Get an index that is free and allocate it to a particular task.
>> - Request INTA driver to assign an irq for this index.
>> - do a request_irq baseed on the return value from the above step.
> 
> All of that can be done in the using the current MSI framework. You
> can either implement your own bus framework or use the platform MSI
> stuff. You can then rewrite the INTA driver to be what it really is,
> an interrupt multiplexer.
> 
>> In case of grouping events, the client drivers has its own mechanism
>> to identify the index that caused an interrupt(at least that is the
>> case for the existing user).
> 
> This simply isn't acceptable. Each event must be the result of a
> single interrupt allocation from the point of view of the driver. If
> events are shared, they should be modelled as a shared interrupt.
> 
> Overall, I'm extremely concerned that you're reinventing the wheel and
> coming up with a new "concept" that seems incredibly similar to what
> we already have everywhere else, just offering an incompatible
> API. This means that your drivers become specialised for your new API,
> and this isn't going to fly.
> 
> I can only urge you to reconsider the way you provide these events,
> and make sure that you use the existing API to its full potential. If
> something is not up to the task, we can then fix it in core code.

I'd try to provide some additional information here.
(Sry, I'll still use term "events")

As Lokesh explained in other mail on K3 SoC everything is generic and most
of resources allocated dynamicaly:
- generic DMA channels
- generic HW rings (used by DMA channel)
- generic events (assigned to the rings) and muxed to different cores/hosts

So, when some driver would like to perform DMA transaction It's
required to build (configure) DMA channel by allocating different type of
resources and link them together to get finally working Data Movement path
(situation complicated by ti-sci firmware which policies resources between cores/hosts):
- get UDMA channel from available range
- get HW rings and attach them to the UDMA channel
- get event, assign it to the ring and mux it to the core/host through IA->IR-> chain
   (and this step is done by ti_sci_inta_register_event() - no DT as everything is dynamic).

Next, how this is working now - ti_sci_inta_register_event():
- first call does similar things as regular DT irq mapping (end up calling irq_create_fwspec_mapping()
   and builds IRQ chain as below:
   linux_virq = ti_sci_inta_register_event(dev, <ringacc tisci_dev_id>,
				<ringacc id>, 0, IRQF_TRIGGER_HIGH, false);

              +---------------------+
              |         IA          |
+--------+   |            +------+ |        +--------+         +------+
| ring 1 +----->evtA+----->VintX +---------->   IR   +--------->  GIC +-->
+--------+   |            +------+ |        +--------+         +------+  Linux IRQ Y
    evtA      |                     |
              |                     |
              +---------------------+

- second call updates only IA input part while keeping other parts of IRQ chain the same
   if valid <linux_virq> passed as input parameter:
   linux_virq = ti_sci_inta_register_event(dev, <ringacc tisci_dev_id>,
				<ringacc id>, linux_virq, IRQF_TRIGGER_HIGH, false);
              +---------------------+
              |         IA          |
+--------+   |            +------+ |        +--------+         +------+
| ring 1 +----->evtA+--^-->VintX +---------->   IR   +--------->  GIC +-->
+--------+   |         |  +------+ |        +--------+         +------+  Linux IRQ Y
              |         |           |
+--------+   |         |           |
| ring 2 +----->evtB+--+           |
+--------+   |                     |
              +---------------------+

   As per above, irq-ti-sci-inta and tisci fw creates shared IRQ on HW layer by attaching
   events to already established IA->IR->GIC IRQ chain. Any Rings events will trigger
   Linux IRQ Y line and keep it active until Rings are not empty.

Now why this was done this way?
Note. I'm not saying this is right, but it is the way we've done it as of now. And I hope MSI
will help to move forward, but I'm not very familiar with it.

The consumer of this approach is K3 Networking driver, first of all, and
this approach allows to eliminate runtime overhead in Networking hot path and
provides possibility to implement driver's specific queues/rings handling policies
- like round-robin vs priority.

CPSW networking driver doesn't need to know exact ring generated IRQ - it
need to know if there is packet for processing, so current IRQ handling sequence we have (simplified):
- any ring evt -> IA -> IR -> GIC -> Linux IRQ Y
   handle_fasteoi_irq() -> cpsw_irq_handler -> disable_irq() -> napi_schedule()
   ...
   soft_irq() -> cpsw_poll():
	- [1] for each ring from Hi prio to Low prio
	      [2] get packet
	      [3] if (packet) process packet & goto [2]
	          else goto [1]
	       if (no more packets) goto [4]
	  [4] enable_irq()

As can be seen there is no intermediate IRQ dispatchers on IA/IR levels and no IRQs-per-rings,
and NAPI poll cycle allows to implement driver's specific rings handling policy.

Next: depending on the use case following optimizations are possible:
1) throughput: split all TX (or RX) rings on X groups, where X = num_cpus
and allocate assign IRQ to each group for Networking XPS/RPS/RSS.
For example, CPSW2G has 8 TX channels and so 8 completion rings, 4 CPUs:
  rings[0,1] -(IA/IR) - Linux IRQ 1
  rings[2,3] -(IA/IR) - Linux IRQ 2
  rings[4,5] -(IA/IR) - Linux IRQ 3
  rings[6,7] -(IA/IR) - Linux IRQ 4
each Linux IRQ assigned to separate CPU.

2) min latency:
  Ring X is used by RT application for TX/RX some traffic (using AF_XDP sockets for example)
  Ring X can be assigned with separate IRQ while other rings still grouped to
  produce 1 IRQ
  rings[0,6] - (IA/IR) - Linux IRQ 1
  rings[7] - (IA/IR) - Linux IRQ 2
  Linux IRQ 2 assigned to separate CPU where RT application is running.

Hope above will help to clarify some K3 AM6 IRQ generation questions and
find the way to move forward.

Thanks a lot for you review and comments.
Marc Zyngier Oct. 31, 2018, 6:21 p.m. UTC | #16
Hi Grygorii,

On 31/10/18 16:39, Grygorii Strashko wrote:

[...]

> I'd try to provide some additional information here.
> (Sry, I'll still use term "events")
> 
> As Lokesh explained in other mail on K3 SoC everything is generic and most
> of resources allocated dynamicaly:
> - generic DMA channels
> - generic HW rings (used by DMA channel)
> - generic events (assigned to the rings) and muxed to different cores/hosts
> 
> So, when some driver would like to perform DMA transaction It's
> required to build (configure) DMA channel by allocating different type of
> resources and link them together to get finally working Data Movement path
> (situation complicated by ti-sci firmware which policies resources between cores/hosts):
> - get UDMA channel from available range
> - get HW rings and attach them to the UDMA channel
> - get event, assign it to the ring and mux it to the core/host through IA->IR-> chain
>    (and this step is done by ti_sci_inta_register_event() - no DT as everything is dynamic).
> 
> Next, how this is working now - ti_sci_inta_register_event():
> - first call does similar things as regular DT irq mapping (end up calling irq_create_fwspec_mapping()
>    and builds IRQ chain as below:
>    linux_virq = ti_sci_inta_register_event(dev, <ringacc tisci_dev_id>,
> 				<ringacc id>, 0, IRQF_TRIGGER_HIGH, false);
> 
>               +---------------------+
>               |         IA          |
> +--------+   |            +------+ |        +--------+         +------+
> | ring 1 +----->evtA+----->VintX +---------->   IR   +--------->  GIC +-->
> +--------+   |            +------+ |        +--------+         +------+  Linux IRQ Y
>     evtA      |                     |
>               |                     |
>               +---------------------+
> 
> - second call updates only IA input part while keeping other parts of IRQ chain the same
>    if valid <linux_virq> passed as input parameter:
>    linux_virq = ti_sci_inta_register_event(dev, <ringacc tisci_dev_id>,
> 				<ringacc id>, linux_virq, IRQF_TRIGGER_HIGH, false);
>               +---------------------+
>               |         IA          |
> +--------+   |            +------+ |        +--------+         +------+
> | ring 1 +----->evtA+--^-->VintX +---------->   IR   +--------->  GIC +-->
> +--------+   |         |  +------+ |        +--------+         +------+  Linux IRQ Y
>               |         |           |
> +--------+   |         |           |
> | ring 2 +----->evtB+--+           |
> +--------+   |                     |
>               +---------------------+

This is basically equivalent requesting a bunch of MSIs for a single
device, and obtaining a set of corresponding interrupts. The fact that
you end-up muxing them in the IA block is an implementation detail.

> 
>    As per above, irq-ti-sci-inta and tisci fw creates shared IRQ on HW layer by attaching
>    events to already established IA->IR->GIC IRQ chain. Any Rings events will trigger
>    Linux IRQ Y line and keep it active until Rings are not empty.
> 
> Now why this was done this way?
> Note. I'm not saying this is right, but it is the way we've done it as of now. And I hope MSI
> will help to move forward, but I'm not very familiar with it.
> 
> The consumer of this approach is K3 Networking driver, first of all, and
> this approach allows to eliminate runtime overhead in Networking hot path and
> provides possibility to implement driver's specific queues/rings handling policies
> - like round-robin vs priority.
> 
> CPSW networking driver doesn't need to know exact ring generated IRQ - it

Well, to fit the Linux model, you'll have to know. Events needs to be
signalled as individual IRQs.

> need to know if there is packet for processing, so current IRQ handling sequence we have (simplified):
> - any ring evt -> IA -> IR -> GIC -> Linux IRQ Y
>    handle_fasteoi_irq() -> cpsw_irq_handler -> disable_irq() -> napi_schedule()

Here, disable_irq() will only affect a single "event".

>    ...
>    soft_irq() -> cpsw_poll():
> 	- [1] for each ring from Hi prio to Low prio
> 	      [2] get packet
> 	      [3] if (packet) process packet & goto [2]
> 	          else goto [1]
> 	       if (no more packets) goto [4]
> 	  [4] enable_irq()
> 
> As can be seen there is no intermediate IRQ dispatchers on IA/IR levels and no IRQs-per-rings,
> and NAPI poll cycle allows to implement driver's specific rings handling policy.
> 
> Next: depending on the use case following optimizations are possible:
> 1) throughput: split all TX (or RX) rings on X groups, where X = num_cpus
> and allocate assign IRQ to each group for Networking XPS/RPS/RSS.
> For example, CPSW2G has 8 TX channels and so 8 completion rings, 4 CPUs:
>   rings[0,1] -(IA/IR) - Linux IRQ 1
>   rings[2,3] -(IA/IR) - Linux IRQ 2
>   rings[4,5] -(IA/IR) - Linux IRQ 3
>   rings[6,7] -(IA/IR) - Linux IRQ 4
> each Linux IRQ assigned to separate CPU.

What you call "Linux IRQ" is what ends up being generated at the GIC
level, and isn't the interrupt the driver will get. It will get an
interrupt number which represent a single event. We absolutely need to
maintain this 1:1 mapping between event and driver-visible interrupts.
Whatever happens between the scenes is none of the driver problem.

In your "one interrupt, multiple events" paradigm, the whole IA thing
would be conceptually part of your networking IP. I don't believe this
is the case, and trawling the documentation seems to confirm this view.

> 2) min latency:
>   Ring X is used by RT application for TX/RX some traffic (using AF_XDP sockets for example)
>   Ring X can be assigned with separate IRQ while other rings still grouped to
>   produce 1 IRQ
>   rings[0,6] - (IA/IR) - Linux IRQ 1
>   rings[7] - (IA/IR) - Linux IRQ 2
>   Linux IRQ 2 assigned to separate CPU where RT application is running.
> 
> Hope above will help to clarify some K3 AM6 IRQ generation questions and
> find the way to move forward.

Well, I'm convinced that we do not want a networking driver to be tied
to an interrupt architecture, and that the two should be completely
independent. But that's my own opinion. I can only see two solutions
moving forward:

1) You make the IA a real interrupt controller that exposes real
interrupts (one per event), and write your networking driver
independently of the underlying interrupt architecture.

2) you make the IA an integral part of your network driver, not exposing
anything outside of it, and limiting the interactions with the IR
*through the standard IRQ API*. You duplicate this knowledge throughout
the other client drivers.

I believe that (2) would be a massive design mistake as it locks the
driver to a single of the HW (and potentially a single revision of the
firmware) while (1) gives you the required level of flexibility by
hiding the whole event "concept" at a single location.

Yes, (1) makes you rewrite your existing, out of tree drivers. Oh well...

Thanks,
	
	M.
Santosh Shilimkar Oct. 31, 2018, 6:38 p.m. UTC | #17
On 10/31/2018 11:21 AM, Marc Zyngier wrote:
> Hi Grygorii,
> 

[...]

> 
> Well, I'm convinced that we do not want a networking driver to be tied
> to an interrupt architecture, and that the two should be completely
> independent. But that's my own opinion. I can only see two solutions
> moving forward:
> 
> 1) You make the IA a real interrupt controller that exposes real
> interrupts (one per event), and write your networking driver
> independently of the underlying interrupt architecture.
> 
> 2) you make the IA an integral part of your network driver, not exposing
> anything outside of it, and limiting the interactions with the IR
> *through the standard IRQ API*. You duplicate this knowledge throughout
> the other client drivers.
> 
> I believe that (2) would be a massive design mistake as it locks the
> driver to a single of the HW (and potentially a single revision of the
> firmware) while (1) gives you the required level of flexibility by
> hiding the whole event "concept" at a single location.
> 
> Yes, (1) makes you rewrite your existing, out of tree drivers. Oh well...
> 
My preference is also not tie the network driver with IA. BTW, this is
very standard functionality with other network drivers too. And this
is handled using MSI-X.

So strong NO for 1) from me as well.

regards,
Santosh
Marc Zyngier Oct. 31, 2018, 6:42 p.m. UTC | #18
On 31/10/18 18:38, Santosh Shilimkar wrote:
> On 10/31/2018 11:21 AM, Marc Zyngier wrote:
>> Hi Grygorii,
>>
> 
> [...]
> 
>>
>> Well, I'm convinced that we do not want a networking driver to be tied
>> to an interrupt architecture, and that the two should be completely
>> independent. But that's my own opinion. I can only see two solutions
>> moving forward:
>>
>> 1) You make the IA a real interrupt controller that exposes real
>> interrupts (one per event), and write your networking driver
>> independently of the underlying interrupt architecture.
>>
>> 2) you make the IA an integral part of your network driver, not exposing
>> anything outside of it, and limiting the interactions with the IR
>> *through the standard IRQ API*. You duplicate this knowledge throughout
>> the other client drivers.
>>
>> I believe that (2) would be a massive design mistake as it locks the
>> driver to a single of the HW (and potentially a single revision of the
>> firmware) while (1) gives you the required level of flexibility by
>> hiding the whole event "concept" at a single location.
>>
>> Yes, (1) makes you rewrite your existing, out of tree drivers. Oh well...
>>
> My preference is also not tie the network driver with IA. BTW, this is
> very standard functionality with other network drivers too. And this
> is handled using MSI-X.
> 
> So strong NO for 1) from me as well.

Err. Are you opposing to (1) or (2)? From the above, I cannot really
tell... ;-)

	M.
Santosh Shilimkar Oct. 31, 2018, 6:48 p.m. UTC | #19
On 10/31/2018 11:42 AM, Marc Zyngier wrote:
> On 31/10/18 18:38, Santosh Shilimkar wrote:
>> On 10/31/2018 11:21 AM, Marc Zyngier wrote:
>>> Hi Grygorii,
>>>
>>
>> [...]
>>
>>>
>>> Well, I'm convinced that we do not want a networking driver to be tied
>>> to an interrupt architecture, and that the two should be completely
>>> independent. But that's my own opinion. I can only see two solutions
>>> moving forward:
>>>
>>> 1) You make the IA a real interrupt controller that exposes real
>>> interrupts (one per event), and write your networking driver
>>> independently of the underlying interrupt architecture.
>>>
>>> 2) you make the IA an integral part of your network driver, not exposing
>>> anything outside of it, and limiting the interactions with the IR
>>> *through the standard IRQ API*. You duplicate this knowledge throughout
>>> the other client drivers.
>>>
>>> I believe that (2) would be a massive design mistake as it locks the
>>> driver to a single of the HW (and potentially a single revision of the
>>> firmware) while (1) gives you the required level of flexibility by
>>> hiding the whole event "concept" at a single location.
>>>
>>> Yes, (1) makes you rewrite your existing, out of tree drivers. Oh well...
>>>
>> My preference is also not tie the network driver with IA. BTW, this is
>> very standard functionality with other network drivers too. And this
>> is handled using MSI-X.
>>
>> So strong NO for 1) from me as well.
> 
> Err. Are you opposing to (1) or (2)? From the above, I cannot really
> tell... ;-)
> 
I mixed it up, sorry. I meant NO for (2), i.e No for making IA part of
the network driver.

Regards,
Santosh
Grygorii Strashko Oct. 31, 2018, 8:33 p.m. UTC | #20
On 10/31/18 1:21 PM, Marc Zyngier wrote:
> Hi Grygorii,
> 
> On 31/10/18 16:39, Grygorii Strashko wrote:
> 
> [...]
> 
>> I'd try to provide some additional information here.
>> (Sry, I'll still use term "events")
>>
>> As Lokesh explained in other mail on K3 SoC everything is generic and most
>> of resources allocated dynamicaly:
>> - generic DMA channels
>> - generic HW rings (used by DMA channel)
>> - generic events (assigned to the rings) and muxed to different cores/hosts
>>
>> So, when some driver would like to perform DMA transaction It's
>> required to build (configure) DMA channel by allocating different type of
>> resources and link them together to get finally working Data Movement path
>> (situation complicated by ti-sci firmware which policies resources between cores/hosts):
>> - get UDMA channel from available range
>> - get HW rings and attach them to the UDMA channel
>> - get event, assign it to the ring and mux it to the core/host through IA->IR-> chain
>>     (and this step is done by ti_sci_inta_register_event() - no DT as everything is dynamic).
>>
>> Next, how this is working now - ti_sci_inta_register_event():
>> - first call does similar things as regular DT irq mapping (end up calling irq_create_fwspec_mapping()
>>     and builds IRQ chain as below:
>>     linux_virq = ti_sci_inta_register_event(dev, <ringacc tisci_dev_id>,
>> 				<ringacc id>, 0, IRQF_TRIGGER_HIGH, false);
>>
>>                +---------------------+
>>                |         IA          |
>> +--------+   |            +------+ |        +--------+         +------+
>> | ring 1 +----->evtA+----->VintX +---------->   IR   +--------->  GIC +-->
>> +--------+   |            +------+ |        +--------+         +------+  Linux IRQ Y
>>      evtA      |                     |
>>                |                     |
>>                +---------------------+
>>
>> - second call updates only IA input part while keeping other parts of IRQ chain the same
>>     if valid <linux_virq> passed as input parameter:
>>     linux_virq = ti_sci_inta_register_event(dev, <ringacc tisci_dev_id>,
>> 				<ringacc id>, linux_virq, IRQF_TRIGGER_HIGH, false);
>>                +---------------------+
>>                |         IA          |
>> +--------+   |            +------+ |        +--------+         +------+
>> | ring 1 +----->evtA+--^-->VintX +---------->   IR   +--------->  GIC +-->
>> +--------+   |         |  +------+ |        +--------+         +------+  Linux IRQ Y
>>                |         |           |
>> +--------+   |         |           |
>> | ring 2 +----->evtB+--+           |
>> +--------+   |                     |
>>                +---------------------+
> 
> This is basically equivalent requesting a bunch of MSIs for a single
> device, and obtaining a set of corresponding interrupts. The fact that
> you end-up muxing them in the IA block is an implementation detail.
> 
>>
>>     As per above, irq-ti-sci-inta and tisci fw creates shared IRQ on HW layer by attaching
>>     events to already established IA->IR->GIC IRQ chain. Any Rings events will trigger
>>     Linux IRQ Y line and keep it active until Rings are not empty.
>>
>> Now why this was done this way?
>> Note. I'm not saying this is right, but it is the way we've done it as of now. And I hope MSI
>> will help to move forward, but I'm not very familiar with it.
>>
>> The consumer of this approach is K3 Networking driver, first of all, and
>> this approach allows to eliminate runtime overhead in Networking hot path and
>> provides possibility to implement driver's specific queues/rings handling policies
>> - like round-robin vs priority.
>>
>> CPSW networking driver doesn't need to know exact ring generated IRQ - it
> 
> Well, to fit the Linux model, you'll have to know. Events needs to be
> signalled as individual IRQs.

"
NAK. Either this fits in the standard model, or we adapt the standard
model to catter for your particular use case. But we don't define a new,
TI specific API.
"



> 
>> need to know if there is packet for processing, so current IRQ handling sequence we have (simplified):
>> - any ring evt -> IA -> IR -> GIC -> Linux IRQ Y
>>     handle_fasteoi_irq() -> cpsw_irq_handler -> disable_irq() -> napi_schedule()
> 
> Here, disable_irq() will only affect a single "event".

No. It will disable "Linux IRQ Y". On IA level there is no mask/unmask/ack functions for ring's events.
sum of rings events keeps "Linux IRQ Y" line physically active until all rings are serviced - empty.
once ring empty - corresponding event auto cleared.

> 
>>     ...
>>     soft_irq() -> cpsw_poll():
>> 	- [1] for each ring from Hi prio to Low prio
>> 	      [2] get packet
>> 	      [3] if (packet) process packet & goto [2]
>> 	          else goto [1]
>> 	       if (no more packets) goto [4]
>> 	  [4] enable_irq()
>>
>> As can be seen there is no intermediate IRQ dispatchers on IA/IR levels and no IRQs-per-rings,
>> and NAPI poll cycle allows to implement driver's specific rings handling policy.
>>
>> Next: depending on the use case following optimizations are possible:
>> 1) throughput: split all TX (or RX) rings on X groups, where X = num_cpus
>> and allocate assign IRQ to each group for Networking XPS/RPS/RSS.
>> For example, CPSW2G has 8 TX channels and so 8 completion rings, 4 CPUs:
>>    rings[0,1] -(IA/IR) - Linux IRQ 1
>>    rings[2,3] -(IA/IR) - Linux IRQ 2
>>    rings[4,5] -(IA/IR) - Linux IRQ 3
>>    rings[6,7] -(IA/IR) - Linux IRQ 4
>> each Linux IRQ assigned to separate CPU.
> 
> What you call "Linux IRQ" is what ends up being generated at the GIC
> level, and isn't the interrupt the driver will get. It will get an
> interrupt number which represent a single event. 

In current implementation the interrupt controller driver will not know what event was generated
when this ti_sci_inta_register_event() is used as this is responsibility of consumer driver
which required to get only one notification - packet received (GIC->Linux IRQ).

We need this to build fast IRQ handling path for networking -
GIC->Linux IRQ considered exclusive and no other event can be assigned to it except
as by using ti_sci_inta_register_event().
I think, it can be considered the same way as "reserved memory" - it exist, but linux
knows nothing about it, while consumer drivers still can have access to it.

ti_sci_inta_register_event() does mostly the same - it steals set of events from IA,
preforms some muxing inside HW and makes one GIC->Linux IRQ visible to Linux IRQ framework
- from Linux point of view allocated GIC->Linux IRQ is just regular irq and It
doesn't know internals (while consumer driver does).

We can't split or divide it on networking/non networking part due to fact that
all resources are dynamic, so ti-sci-inta + FW manages/own resources - and
ti_sci_inta_register_event() is entry point for IRQ resources allocation from
available ranges.

It's no too much different from OMAP CPSW device - just legacy CPSW IRQ generation
schema statically implemented in HW: 8 TX/RX CPPI channels, which are representing
8 linked lists of CPPI descriptor. Any change of linked lists state triggers
local CPSW CPPI IRQ which are summed to generate one (and only one) GIC->Linux IRQ.
It's responsibility of CPSW networking driver to handle CPPI channels in correct order.
And there is no chained IRQ controller implemented simply because of (a) runtime overhead
and (b) impossibility to implement priority handling.

Difference is that CPSW CPPI IRQs need to be asked while K3 AM6 it happens automatically
- and in K3 AM6 HW need to be configured dynamically based on allocated resources.

We absolutely need to
> maintain this 1:1 mapping between event and driver-visible interrupts.
> Whatever happens between the scenes is none of the driver problem.
> 
> In your "one interrupt, multiple events" paradigm, the whole IA thing
> would be conceptually part of your networking IP. I don't believe this
> is the case, and trawling the documentation seems to confirm this view.

not exactly - ti-sci-inta expected to work this way only when ti_sci_inta_register_event() is used.
Other allocation will follow standard Linux approach by "maintaining "1:1 mapping".

> 
>> 2) min latency:
>>    Ring X is used by RT application for TX/RX some traffic (using AF_XDP sockets for example)
>>    Ring X can be assigned with separate IRQ while other rings still grouped to
>>    produce 1 IRQ
>>    rings[0,6] - (IA/IR) - Linux IRQ 1
>>    rings[7] - (IA/IR) - Linux IRQ 2
>>    Linux IRQ 2 assigned to separate CPU where RT application is running.
>>
>> Hope above will help to clarify some K3 AM6 IRQ generation questions and
>> find the way to move forward.
> 
> Well, I'm convinced that we do not want a networking driver to be tied
> to an interrupt architecture, and that the two should be completely
> independent. But that's my own opinion. I can only see two solutions
> moving forward:
> 
> 1) You make the IA a real interrupt controller that exposes real
> interrupts (one per event), and write your networking driver
> independently of the underlying interrupt architecture.

And that's actually what is implemented IA is real interrupt controller which produces "1:1 mapping",
but it provides possibility to steal and mux IRQ event for non-standard purposes
- networking/ipc. IA is resource owner in this case as there is no way to preallocate/assign
resources statically.

> 
> 2) you make the IA an integral part of your network driver, not exposing
> anything outside of it, and limiting the interactions with the IR
> *through the standard IRQ API*. You duplicate this knowledge throughout
> the other client drivers.
> 
> I believe that (2) would be a massive design mistake as it locks the
> driver to a single of the HW (and potentially a single revision of the
> firmware) while (1) gives you the required level of flexibility by
> hiding the whole event "concept" at a single location.
> 
> Yes, (1) makes you rewrite your existing, out of tree drivers. Oh well...
Peter Ujfalusi Nov. 1, 2018, 7:55 a.m. UTC | #21
Lokesh,

On 10/29/18 3:04 PM, Lokesh Vutla wrote:
>>> With the above information, linux should send a message to
>>> system-controller using TISCI protocol. After policing the given
>>> information, system-controller does the following:
>>> - Attaches the interrupt(INTA input) to the device resource index
>>> - Muxes the interrupt(INTA input) to corresponding vint(INTA output)
>>> - Muxes the vint(INTR input) to GIC irq(INTR output).
>>
>> Isn't there a 1:1 mapping between *used* INTR inputs and outputs?
>> Since INTR is a router, there is no real muxing. I assume that the
>> third point above is just a copy-paste error.
> 
> Right, my bad. INTR is just a router and no read muxing.

INTR can mux M interrupt inputs to N interrupt outputs.
One selects which interrupt input is outputted on the given interrupt
output.
It is perfectly valid (but not sane) to select the same interrupt input
to be routed to _all_ interrupt output for example.

Not sure if we are going to use this for anything but 1:1 mapping, but
might worth keeping in mind...


- Peter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
Marc Zyngier Nov. 1, 2018, 9 a.m. UTC | #22
On Thu, 01 Nov 2018 07:55:12 +0000,
Peter Ujfalusi <peter.ujfalusi@ti.com> wrote:
> 
> Lokesh,
> 
> On 10/29/18 3:04 PM, Lokesh Vutla wrote:
> >>> With the above information, linux should send a message to
> >>> system-controller using TISCI protocol. After policing the given
> >>> information, system-controller does the following:
> >>> - Attaches the interrupt(INTA input) to the device resource index
> >>> - Muxes the interrupt(INTA input) to corresponding vint(INTA output)
> >>> - Muxes the vint(INTR input) to GIC irq(INTR output).
> >>
> >> Isn't there a 1:1 mapping between *used* INTR inputs and outputs?
> >> Since INTR is a router, there is no real muxing. I assume that the
> >> third point above is just a copy-paste error.
> > 
> > Right, my bad. INTR is just a router and no read muxing.
> 
> INTR can mux M interrupt inputs to N interrupt outputs.
> One selects which interrupt input is outputted on the given interrupt
> output.
> It is perfectly valid (but not sane) to select the same interrupt input
> to be routed to _all_ interrupt output for example.
> 
> Not sure if we are going to use this for anything but 1:1 mapping, but
> might worth keeping in mind...

It's not obvious how you'd use this "feature". Interrupt replicator,
should one of the output be tied to another part of the system? Or
maybe that's just the result of reusing some generic block...

	M.
Peter Ujfalusi Nov. 1, 2018, 9:09 a.m. UTC | #23
Hi Marc,

On 10/31/18 8:21 PM, Marc Zyngier wrote:
> Well, I'm convinced that we do not want a networking driver to be tied
> to an interrupt architecture, and that the two should be completely
> independent. But that's my own opinion. I can only see two solutions
> moving forward:
> 
> 1) You make the IA a real interrupt controller that exposes real
> interrupts (one per event), and write your networking driver
> independently of the underlying interrupt architecture.
> 
> 2) you make the IA an integral part of your network driver, not exposing
> anything outside of it, and limiting the interactions with the IR
> *through the standard IRQ API*. You duplicate this knowledge throughout
> the other client drivers.
> 
> I believe that (2) would be a massive design mistake as it locks the
> driver to a single of the HW (and potentially a single revision of the
> firmware) while (1) gives you the required level of flexibility by
> hiding the whole event "concept" at a single location.
> 
> Yes, (1) makes you rewrite your existing, out of tree drivers. Oh well...

We need to have generic support for INTA within the NAVSS for Linux. The
UDMA (also part of the NAVSS) is used as system DMA and for the
DMAengine driver we need to have ability to handle the events from Rings
and from the UDMA itself.

Option 2 is not really an option as other components need to configure
INTA to get interrupts from the Events flying within NAVSS.

In the past with Keystone II we did have similar PacketDMA engine but it
was dedicated to service networking and crypto. For all other
peripherals we had EDMA as generic system DMA.

With AM654 this is no longer the case as we no longer have EDMA and the
NAVSS is tasked to service all peripherals, like networking and the ones
we used to use EDMA.

> 
> Thanks,
> 	
> 	M.
> 

- Peter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
Peter Ujfalusi Nov. 1, 2018, 9:14 a.m. UTC | #24
Hi Marc,

On 11/1/18 11:00 AM, Marc Zyngier wrote:
> On Thu, 01 Nov 2018 07:55:12 +0000,
> Peter Ujfalusi <peter.ujfalusi@ti.com> wrote:
>>
>> Lokesh,
>>
>> On 10/29/18 3:04 PM, Lokesh Vutla wrote:
>>>>> With the above information, linux should send a message to
>>>>> system-controller using TISCI protocol. After policing the given
>>>>> information, system-controller does the following:
>>>>> - Attaches the interrupt(INTA input) to the device resource index
>>>>> - Muxes the interrupt(INTA input) to corresponding vint(INTA output)
>>>>> - Muxes the vint(INTR input) to GIC irq(INTR output).
>>>>
>>>> Isn't there a 1:1 mapping between *used* INTR inputs and outputs?
>>>> Since INTR is a router, there is no real muxing. I assume that the
>>>> third point above is just a copy-paste error.
>>>
>>> Right, my bad. INTR is just a router and no read muxing.
>>
>> INTR can mux M interrupt inputs to N interrupt outputs.
>> One selects which interrupt input is outputted on the given interrupt
>> output.
>> It is perfectly valid (but not sane) to select the same interrupt input
>> to be routed to _all_ interrupt output for example.
>>
>> Not sure if we are going to use this for anything but 1:1 mapping, but
>> might worth keeping in mind...
> 
> It's not obvious how you'd use this "feature". Interrupt replicator,
> should one of the output be tied to another part of the system? Or
> maybe that's just the result of reusing some generic block...

I think the intention is that different virtualized OS would got
assigned with different range of NAVSS GIC irqs and there might be a
case when more than one virtualized environment need to get a GIC irq
for the same virt. Timer interrupts comes to mind first, but there could
be other cases when the same virt should trigger on multiple GIC line.

- Peter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
Marc Zyngier Nov. 1, 2018, 2:52 p.m. UTC | #25
On 31/10/18 20:33, Grygorii Strashko wrote:
> 
> 
> On 10/31/18 1:21 PM, Marc Zyngier wrote:
>> Hi Grygorii,
>>
>> On 31/10/18 16:39, Grygorii Strashko wrote:
>>
>> [...]
>>
>>> I'd try to provide some additional information here.
>>> (Sry, I'll still use term "events")
>>>
>>> As Lokesh explained in other mail on K3 SoC everything is generic and most
>>> of resources allocated dynamicaly:
>>> - generic DMA channels
>>> - generic HW rings (used by DMA channel)
>>> - generic events (assigned to the rings) and muxed to different cores/hosts
>>>
>>> So, when some driver would like to perform DMA transaction It's
>>> required to build (configure) DMA channel by allocating different type of
>>> resources and link them together to get finally working Data Movement path
>>> (situation complicated by ti-sci firmware which policies resources between cores/hosts):
>>> - get UDMA channel from available range
>>> - get HW rings and attach them to the UDMA channel
>>> - get event, assign it to the ring and mux it to the core/host through IA->IR-> chain
>>>     (and this step is done by ti_sci_inta_register_event() - no DT as everything is dynamic).
>>>
>>> Next, how this is working now - ti_sci_inta_register_event():
>>> - first call does similar things as regular DT irq mapping (end up calling irq_create_fwspec_mapping()
>>>     and builds IRQ chain as below:
>>>     linux_virq = ti_sci_inta_register_event(dev, <ringacc tisci_dev_id>,
>>> 				<ringacc id>, 0, IRQF_TRIGGER_HIGH, false);
>>>
>>>                +---------------------+
>>>                |         IA          |
>>> +--------+   |            +------+ |        +--------+         +------+
>>> | ring 1 +----->evtA+----->VintX +---------->   IR   +--------->  GIC +-->
>>> +--------+   |            +------+ |        +--------+         +------+  Linux IRQ Y
>>>      evtA      |                     |
>>>                |                     |
>>>                +---------------------+
>>>
>>> - second call updates only IA input part while keeping other parts of IRQ chain the same
>>>     if valid <linux_virq> passed as input parameter:
>>>     linux_virq = ti_sci_inta_register_event(dev, <ringacc tisci_dev_id>,
>>> 				<ringacc id>, linux_virq, IRQF_TRIGGER_HIGH, false);
>>>                +---------------------+
>>>                |         IA          |
>>> +--------+   |            +------+ |        +--------+         +------+
>>> | ring 1 +----->evtA+--^-->VintX +---------->   IR   +--------->  GIC +-->
>>> +--------+   |         |  +------+ |        +--------+         +------+  Linux IRQ Y
>>>                |         |           |
>>> +--------+   |         |           |
>>> | ring 2 +----->evtB+--+           |
>>> +--------+   |                     |
>>>                +---------------------+
>>
>> This is basically equivalent requesting a bunch of MSIs for a single
>> device, and obtaining a set of corresponding interrupts. The fact that
>> you end-up muxing them in the IA block is an implementation detail.
>>
>>>
>>>     As per above, irq-ti-sci-inta and tisci fw creates shared IRQ on HW layer by attaching
>>>     events to already established IA->IR->GIC IRQ chain. Any Rings events will trigger
>>>     Linux IRQ Y line and keep it active until Rings are not empty.
>>>
>>> Now why this was done this way?
>>> Note. I'm not saying this is right, but it is the way we've done it as of now. And I hope MSI
>>> will help to move forward, but I'm not very familiar with it.
>>>
>>> The consumer of this approach is K3 Networking driver, first of all, and
>>> this approach allows to eliminate runtime overhead in Networking hot path and
>>> provides possibility to implement driver's specific queues/rings handling policies
>>> - like round-robin vs priority.
>>>
>>> CPSW networking driver doesn't need to know exact ring generated IRQ - it
>>
>> Well, to fit the Linux model, you'll have to know. Events needs to be
>> signalled as individual IRQs.
> 
> "
> NAK. Either this fits in the standard model, or we adapt the standard
> model to catter for your particular use case. But we don't define a new,
> TI specific API.
> "

And I stand by what I've written.

>>> need to know if there is packet for processing, so current IRQ handling sequence we have (simplified):
>>> - any ring evt -> IA -> IR -> GIC -> Linux IRQ Y
>>>     handle_fasteoi_irq() -> cpsw_irq_handler -> disable_irq() -> napi_schedule()
>>
>> Here, disable_irq() will only affect a single "event".
> 
> No. It will disable "Linux IRQ Y". On IA level there is no mask/unmask/ack functions for ring's events.
> sum of rings events keeps "Linux IRQ Y" line physically active until all rings are serviced - empty.
> once ring empty - corresponding event auto cleared.

You're missing the point I'm trying to make: either you can and we fit
it into the Linux model of an interrupt controller, or this thing is not
an interrupt controller at all. Either event can be individually masked,
and this can be modelled as an interrupt controller, and they cannot.

So if the IA cannot be represented as an interrupt controller and is so
specific to a particular device, move it out of the interrupt subsystem
and keep it private to your networking device.

Thanks,

	M.
Grygorii Strashko Nov. 1, 2018, 3:36 p.m. UTC | #26
On 11/1/18 9:52 AM, Marc Zyngier wrote:
> On 31/10/18 20:33, Grygorii Strashko wrote:

>>
>> "
>> NAK. Either this fits in the standard model, or we adapt the standard
>> model to catter for your particular use case. But we don't define a new,
>> TI specific API.
>> "
> 
> And I stand by what I've written.
> 

Sry, this was added by mistake during copy-pasting/editing.
Lokesh Vutla Nov. 5, 2018, 8:08 a.m. UTC | #27
Hi Marc,

On Monday 29 October 2018 06:34 PM, Lokesh Vutla wrote:
> Hi Marc,
> 
> On Sunday 28 October 2018 07:01 PM, Marc Zyngier wrote:
>> Hi Lokesh,
>>
>> On Fri, 26 Oct 2018 21:19:41 +0100,
>> Lokesh Vutla <lokeshvutla@ti.com> wrote:
>>>
>>> Hi Marc,
>>>
>>> [..snip..]
>>>>> [...]
>>>>>
>>>>>>>> +/**
>>>>>>>> + * ti_sci_inta_register_event() - Register a event to an interrupt aggregator
>>>>>>>> + * @dev:	Device pointer to source generating the event
>>>>>>>> + * @src_id:	TISCI device ID of the event source
>>>>>>>> + * @src_index:	Event source index within the device.
>>>>>>>> + * @virq:	Linux Virtual IRQ number
>>>>>>>> + * @flags:	Corresponding IRQ flags
>>>>>>>> + * @ack_needed:	If explicit clearing of event is required.
>>>>>>>> + *
>>>>>>>> + * Creates a new irq and attaches to IA domain if virq is not specified
>>>>>>>> + * else attaches the event to vint corresponding to virq.
>>>>>>>> + * When using TISCI within the client drivers, source indexes are always
>>>>>>>> + * generated dynamically and cannot be represented in DT. So client
>>>>>>>> + * drivers should call this API instead of platform_get_irq().
>>>>>>>
>>>>>>> NAK. Either this fits in the standard model, or we adapt the standard
>>>>>>> model to catter for your particular use case. But we don't define a new,
>>>>>>> TI specific API.
>>>>>>>
>>>>>>> I have a hunch that if the IDs are generated dynamically, then the model
>>>>>>> we use for MSIs would fit this thing. I also want to understand what
>>>>>>
>>>>>> hmm..I haven't thought about using MSI. Will try to explore it. But
>>>>>> the "struct msi_msg" is not applicable in this case as device does not
>>>>>> write to a specific location.
>>>>>
>>>>> It doesn't need to. You can perfectly ignore the address field and
>>>>> only be concerned with the data. We already have MSI users that do not
>>>>> need programming of the doorbell address, just the data.
>>>>
>>>
>>> Just one more clarification.
>>>
>>> First let me explain the IRQ routes a bit deeply. As I said earlier
>>> there are three ways in which IRQ can flow in AM65x SoC
>>> 1) Device directly connected to GIC
>>> 	- Device IRQ --> GIC
>>> 2) Device connected to INTR.
>>> 	- Device IRQ --> INTR --> GIC
>>> 3) Devices connected to INTA.
>>> 	- Device IRQ --> INTA --> INTR --> GIC
>>>
>>> 1 and 2 are straight forward and we use DT for IRQ
>>> representation. Coming to 3 the trickier part is that Input to INTA
>>> and output from INTA and dynamically managed. To be more specific:
>>> - By hardware design there are certain set of physical global
>>> events(interrupts) attached to an INTA. Out of which a certain range
>>> are assigned to the current linux host that can be queried from
>>> system-controller.
>>> - Similarly out of all the INTA outputs(referenced as vints) a certain
>>> range can be used by the current linux host.
>>>
>>>
>>> So for configuring an IRQ route in case 3, the following steps are needed:
>>> - Device id and device resource index for which the interrupt is needed
>>
>> THat is no different from a PCI device for example, where we need the
>> requester ID and the number of the interrupt in the MSI-X table.
>>
>>> - A free event id from the range assigned to the INTA in this host context
>>> - A free vint from the range assigned to the INTA in this host context
>>> - A free gic IRQ from the range assigned to the INTR in this host context.
>>
>>   From what I understand of the driver, at least some of that is under
>> the responsibility of the firmware, right? Or is the driver under
>> control of all three parameters? To be honest, it doesn't really
> 
> Driver should control all three parameters.
> 
>> matter, as the as far as the kernel is concerned, the irqchip drivers
>> are free to deal with the routing anyway they want.
> 
> Correct, that's my understanding as well.
> 
>>
>>> With the above information, linux should send a message to
>>> system-controller using TISCI protocol. After policing the given
>>> information, system-controller does the following:
>>> - Attaches the interrupt(INTA input) to the device resource index
>>> - Muxes the interrupt(INTA input) to corresponding vint(INTA output)
>>> - Muxes the vint(INTR input) to GIC irq(INTR output).
>>
>> Isn't there a 1:1 mapping between *used* INTR inputs and outputs?
>> Since INTR is a router, there is no real muxing. I assume that the
>> third point above is just a copy-paste error.
> 
> Right, my bad. INTR is just a router and no read muxing.
> 
>>
>>>
>>> For grouping of interrupts, the same vint number is to be passed to
>>> system-controller for all the requests.
>>>
>>> Keeping all the above in mind, I see the following as software IRQ
>>> Domain Hierarchy:
>>>
>>> 1) INTA multi MSI --> 2)INTA  -->3) MSI --> 4) INTR  -->5) GIC
>>>
>>> INTA driver has to set a chained IRQ using virq allocated from its
>>> parent MSI. This is to differentiate the grouped interrupts within
>>> INTA.
>>>
>>> Inorder to cover the above two MSI domains, a new bus driver has to be
>>> created as I couldn't find a fit with the existing bus drivers.
>>>
>>> Does the above approach make sense? Please correct me if i am wrong.
>>
>> I think this can be further simplified, as you seem to assume that
>> dynamic allocation implies MSI. This is not the case. You can
>> perfectly use dynamically allocated interrupts and still not use MSIs.
>>
>> INTA is indeed a chained interrupt controller, as it may mux several
>> inputs onto a single output. But the output of INTA is not an MSI. It
>> is just a regular interrupt that can allocated when the first mapping
>> gets established.
> 
> okay. I guess it can just be done using irq_create_fwspec_mapping().
> 

I am facing an issue with this approach as I am trying to call 
irq_create_fwspec_mapping() from alloc callback of INTA driver. During 
allocation the function call flow looks like:

inta_msi_domain_alloc_irqs()
	msi_domain_alloc_irqs()
		__irq_domain_alloc_irqs()
			*mutex_lock(&irq_domain_mutex);*
			irq_domain_alloc_irqs_hierarchy()
				ti_sci_inta_irq_domain_alloc()
					if (first event in group)
							irq_create_fwspec_mapping()
								irq_find_matching_fwspec()
									*mutex_lock(&irq_domain_mutex);*
									

The mutex_lock is called again if INTR IRQ gets allocated in alloc callback of 
INTA driver. So I am clearly calling irq_create_fwspec_mapping() from a wrong place.

To avoid this problem, some other msi domain ops should be used to allocate INTR 
irq. I see msi_prepare() might be a good place to allocate parent interrupt for 
each group. But there is no similar callback available while freeing. Is it okay 
to add msi_unprepare callback?

Also I would like to get the parent(INTR) IRQ to get established first as we 
need the gic_irq, vint_id while configuring the IRQ route using system-controller.

Any help on avoiding this problem?

Thanks and regards,
Lokesh
Marc Zyngier Nov. 5, 2018, 3:36 p.m. UTC | #28
On 05/11/18 08:08, Lokesh Vutla wrote:
> Hi Marc,
> 
> On Monday 29 October 2018 06:34 PM, Lokesh Vutla wrote:
>> Hi Marc,
>>
>> On Sunday 28 October 2018 07:01 PM, Marc Zyngier wrote:
>>> Hi Lokesh,
>>>
>>> On Fri, 26 Oct 2018 21:19:41 +0100,
>>> Lokesh Vutla <lokeshvutla@ti.com> wrote:
>>>>
>>>> Hi Marc,
>>>>
>>>> [..snip..]
>>>>>> [...]
>>>>>>
>>>>>>>>> +/**
>>>>>>>>> + * ti_sci_inta_register_event() - Register a event to an interrupt aggregator
>>>>>>>>> + * @dev:	Device pointer to source generating the event
>>>>>>>>> + * @src_id:	TISCI device ID of the event source
>>>>>>>>> + * @src_index:	Event source index within the device.
>>>>>>>>> + * @virq:	Linux Virtual IRQ number
>>>>>>>>> + * @flags:	Corresponding IRQ flags
>>>>>>>>> + * @ack_needed:	If explicit clearing of event is required.
>>>>>>>>> + *
>>>>>>>>> + * Creates a new irq and attaches to IA domain if virq is not specified
>>>>>>>>> + * else attaches the event to vint corresponding to virq.
>>>>>>>>> + * When using TISCI within the client drivers, source indexes are always
>>>>>>>>> + * generated dynamically and cannot be represented in DT. So client
>>>>>>>>> + * drivers should call this API instead of platform_get_irq().
>>>>>>>>
>>>>>>>> NAK. Either this fits in the standard model, or we adapt the standard
>>>>>>>> model to catter for your particular use case. But we don't define a new,
>>>>>>>> TI specific API.
>>>>>>>>
>>>>>>>> I have a hunch that if the IDs are generated dynamically, then the model
>>>>>>>> we use for MSIs would fit this thing. I also want to understand what
>>>>>>>
>>>>>>> hmm..I haven't thought about using MSI. Will try to explore it. But
>>>>>>> the "struct msi_msg" is not applicable in this case as device does not
>>>>>>> write to a specific location.
>>>>>>
>>>>>> It doesn't need to. You can perfectly ignore the address field and
>>>>>> only be concerned with the data. We already have MSI users that do not
>>>>>> need programming of the doorbell address, just the data.
>>>>>
>>>>
>>>> Just one more clarification.
>>>>
>>>> First let me explain the IRQ routes a bit deeply. As I said earlier
>>>> there are three ways in which IRQ can flow in AM65x SoC
>>>> 1) Device directly connected to GIC
>>>> 	- Device IRQ --> GIC
>>>> 2) Device connected to INTR.
>>>> 	- Device IRQ --> INTR --> GIC
>>>> 3) Devices connected to INTA.
>>>> 	- Device IRQ --> INTA --> INTR --> GIC
>>>>
>>>> 1 and 2 are straight forward and we use DT for IRQ
>>>> representation. Coming to 3 the trickier part is that Input to INTA
>>>> and output from INTA and dynamically managed. To be more specific:
>>>> - By hardware design there are certain set of physical global
>>>> events(interrupts) attached to an INTA. Out of which a certain range
>>>> are assigned to the current linux host that can be queried from
>>>> system-controller.
>>>> - Similarly out of all the INTA outputs(referenced as vints) a certain
>>>> range can be used by the current linux host.
>>>>
>>>>
>>>> So for configuring an IRQ route in case 3, the following steps are needed:
>>>> - Device id and device resource index for which the interrupt is needed
>>>
>>> THat is no different from a PCI device for example, where we need the
>>> requester ID and the number of the interrupt in the MSI-X table.
>>>
>>>> - A free event id from the range assigned to the INTA in this host context
>>>> - A free vint from the range assigned to the INTA in this host context
>>>> - A free gic IRQ from the range assigned to the INTR in this host context.
>>>
>>>   From what I understand of the driver, at least some of that is under
>>> the responsibility of the firmware, right? Or is the driver under
>>> control of all three parameters? To be honest, it doesn't really
>>
>> Driver should control all three parameters.
>>
>>> matter, as the as far as the kernel is concerned, the irqchip drivers
>>> are free to deal with the routing anyway they want.
>>
>> Correct, that's my understanding as well.
>>
>>>
>>>> With the above information, linux should send a message to
>>>> system-controller using TISCI protocol. After policing the given
>>>> information, system-controller does the following:
>>>> - Attaches the interrupt(INTA input) to the device resource index
>>>> - Muxes the interrupt(INTA input) to corresponding vint(INTA output)
>>>> - Muxes the vint(INTR input) to GIC irq(INTR output).
>>>
>>> Isn't there a 1:1 mapping between *used* INTR inputs and outputs?
>>> Since INTR is a router, there is no real muxing. I assume that the
>>> third point above is just a copy-paste error.
>>
>> Right, my bad. INTR is just a router and no read muxing.
>>
>>>
>>>>
>>>> For grouping of interrupts, the same vint number is to be passed to
>>>> system-controller for all the requests.
>>>>
>>>> Keeping all the above in mind, I see the following as software IRQ
>>>> Domain Hierarchy:
>>>>
>>>> 1) INTA multi MSI --> 2)INTA  -->3) MSI --> 4) INTR  -->5) GIC
>>>>
>>>> INTA driver has to set a chained IRQ using virq allocated from its
>>>> parent MSI. This is to differentiate the grouped interrupts within
>>>> INTA.
>>>>
>>>> Inorder to cover the above two MSI domains, a new bus driver has to be
>>>> created as I couldn't find a fit with the existing bus drivers.
>>>>
>>>> Does the above approach make sense? Please correct me if i am wrong.
>>>
>>> I think this can be further simplified, as you seem to assume that
>>> dynamic allocation implies MSI. This is not the case. You can
>>> perfectly use dynamically allocated interrupts and still not use MSIs.
>>>
>>> INTA is indeed a chained interrupt controller, as it may mux several
>>> inputs onto a single output. But the output of INTA is not an MSI. It
>>> is just a regular interrupt that can allocated when the first mapping
>>> gets established.
>>
>> okay. I guess it can just be done using irq_create_fwspec_mapping().
>>
> 
> I am facing an issue with this approach as I am trying to call 
> irq_create_fwspec_mapping() from alloc callback of INTA driver. During 
> allocation the function call flow looks like:
> 
> inta_msi_domain_alloc_irqs()
> 	msi_domain_alloc_irqs()
> 		__irq_domain_alloc_irqs()
> 			*mutex_lock(&irq_domain_mutex);*
> 			irq_domain_alloc_irqs_hierarchy()
> 				ti_sci_inta_irq_domain_alloc()
> 					if (first event in group)
> 							irq_create_fwspec_mapping()
> 								irq_find_matching_fwspec()
> 									*mutex_lock(&irq_domain_mutex);*
> 									
> 
> The mutex_lock is called again if INTR IRQ gets allocated in alloc callback of 
> INTA driver. So I am clearly calling irq_create_fwspec_mapping() from a wrong place.

The real issue is that you're are calling irq_create_fwspec_mapping at
all. This is only supposed to be called by the high level code, not an
irqchip driver in the middle of its own allocation.

The right API to use is irq_domain_alloc_irqs_parent(), which calls into
the parent domain allocation. See the multiple uses in the tree already.

> 
> To avoid this problem, some other msi domain ops should be used to allocate INTR 
> irq. I see msi_prepare() might be a good place to allocate parent interrupt for 
> each group. But there is no similar callback available while freeing. Is it okay 
> to add msi_unprepare callback?
> 
> Also I would like to get the parent(INTR) IRQ to get established first as we 
> need the gic_irq, vint_id while configuring the IRQ route using system-controller.
> 
> Any help on avoiding this problem?

I think you need to fix the above first before we try and twist the API
around.

Thanks,

	M.
Lokesh Vutla Nov. 5, 2018, 4:20 p.m. UTC | #29
Hi Marc,

On Monday 05 November 2018 09:06 PM, Marc Zyngier wrote:
> On 05/11/18 08:08, Lokesh Vutla wrote:
>> Hi Marc,
>>
>> On Monday 29 October 2018 06:34 PM, Lokesh Vutla wrote:
>>> Hi Marc,
>>>
>>> On Sunday 28 October 2018 07:01 PM, Marc Zyngier wrote:
>>>> Hi Lokesh,
>>>>
>>>> On Fri, 26 Oct 2018 21:19:41 +0100,
>>>> Lokesh Vutla <lokeshvutla@ti.com> wrote:
>>>>>
>>>>> Hi Marc,
>>>>>
>>>>> [..snip..]
>>>>>>> [...]
>>>>>>>
>>>>>>>>>> +/**
>>>>>>>>>> + * ti_sci_inta_register_event() - Register a event to an interrupt aggregator
>>>>>>>>>> + * @dev:	Device pointer to source generating the event
>>>>>>>>>> + * @src_id:	TISCI device ID of the event source
>>>>>>>>>> + * @src_index:	Event source index within the device.
>>>>>>>>>> + * @virq:	Linux Virtual IRQ number
>>>>>>>>>> + * @flags:	Corresponding IRQ flags
>>>>>>>>>> + * @ack_needed:	If explicit clearing of event is required.
>>>>>>>>>> + *
>>>>>>>>>> + * Creates a new irq and attaches to IA domain if virq is not specified
>>>>>>>>>> + * else attaches the event to vint corresponding to virq.
>>>>>>>>>> + * When using TISCI within the client drivers, source indexes are always
>>>>>>>>>> + * generated dynamically and cannot be represented in DT. So client
>>>>>>>>>> + * drivers should call this API instead of platform_get_irq().
>>>>>>>>>
>>>>>>>>> NAK. Either this fits in the standard model, or we adapt the standard
>>>>>>>>> model to catter for your particular use case. But we don't define a new,
>>>>>>>>> TI specific API.
>>>>>>>>>
>>>>>>>>> I have a hunch that if the IDs are generated dynamically, then the model
>>>>>>>>> we use for MSIs would fit this thing. I also want to understand what
>>>>>>>>
>>>>>>>> hmm..I haven't thought about using MSI. Will try to explore it. But
>>>>>>>> the "struct msi_msg" is not applicable in this case as device does not
>>>>>>>> write to a specific location.
>>>>>>>
>>>>>>> It doesn't need to. You can perfectly ignore the address field and
>>>>>>> only be concerned with the data. We already have MSI users that do not
>>>>>>> need programming of the doorbell address, just the data.
>>>>>>
>>>>>
>>>>> Just one more clarification.
>>>>>
>>>>> First let me explain the IRQ routes a bit deeply. As I said earlier
>>>>> there are three ways in which IRQ can flow in AM65x SoC
>>>>> 1) Device directly connected to GIC
>>>>> 	- Device IRQ --> GIC
>>>>> 2) Device connected to INTR.
>>>>> 	- Device IRQ --> INTR --> GIC
>>>>> 3) Devices connected to INTA.
>>>>> 	- Device IRQ --> INTA --> INTR --> GIC
>>>>>
>>>>> 1 and 2 are straight forward and we use DT for IRQ
>>>>> representation. Coming to 3 the trickier part is that Input to INTA
>>>>> and output from INTA and dynamically managed. To be more specific:
>>>>> - By hardware design there are certain set of physical global
>>>>> events(interrupts) attached to an INTA. Out of which a certain range
>>>>> are assigned to the current linux host that can be queried from
>>>>> system-controller.
>>>>> - Similarly out of all the INTA outputs(referenced as vints) a certain
>>>>> range can be used by the current linux host.
>>>>>
>>>>>
>>>>> So for configuring an IRQ route in case 3, the following steps are needed:
>>>>> - Device id and device resource index for which the interrupt is needed
>>>>
>>>> THat is no different from a PCI device for example, where we need the
>>>> requester ID and the number of the interrupt in the MSI-X table.
>>>>
>>>>> - A free event id from the range assigned to the INTA in this host context
>>>>> - A free vint from the range assigned to the INTA in this host context
>>>>> - A free gic IRQ from the range assigned to the INTR in this host context.
>>>>
>>>>    From what I understand of the driver, at least some of that is under
>>>> the responsibility of the firmware, right? Or is the driver under
>>>> control of all three parameters? To be honest, it doesn't really
>>>
>>> Driver should control all three parameters.
>>>
>>>> matter, as the as far as the kernel is concerned, the irqchip drivers
>>>> are free to deal with the routing anyway they want.
>>>
>>> Correct, that's my understanding as well.
>>>
>>>>
>>>>> With the above information, linux should send a message to
>>>>> system-controller using TISCI protocol. After policing the given
>>>>> information, system-controller does the following:
>>>>> - Attaches the interrupt(INTA input) to the device resource index
>>>>> - Muxes the interrupt(INTA input) to corresponding vint(INTA output)
>>>>> - Muxes the vint(INTR input) to GIC irq(INTR output).
>>>>
>>>> Isn't there a 1:1 mapping between *used* INTR inputs and outputs?
>>>> Since INTR is a router, there is no real muxing. I assume that the
>>>> third point above is just a copy-paste error.
>>>
>>> Right, my bad. INTR is just a router and no read muxing.
>>>
>>>>
>>>>>
>>>>> For grouping of interrupts, the same vint number is to be passed to
>>>>> system-controller for all the requests.
>>>>>
>>>>> Keeping all the above in mind, I see the following as software IRQ
>>>>> Domain Hierarchy:
>>>>>
>>>>> 1) INTA multi MSI --> 2)INTA  -->3) MSI --> 4) INTR  -->5) GIC
>>>>>
>>>>> INTA driver has to set a chained IRQ using virq allocated from its
>>>>> parent MSI. This is to differentiate the grouped interrupts within
>>>>> INTA.
>>>>>
>>>>> Inorder to cover the above two MSI domains, a new bus driver has to be
>>>>> created as I couldn't find a fit with the existing bus drivers.
>>>>>
>>>>> Does the above approach make sense? Please correct me if i am wrong.
>>>>
>>>> I think this can be further simplified, as you seem to assume that
>>>> dynamic allocation implies MSI. This is not the case. You can
>>>> perfectly use dynamically allocated interrupts and still not use MSIs.
>>>>
>>>> INTA is indeed a chained interrupt controller, as it may mux several
>>>> inputs onto a single output. But the output of INTA is not an MSI. It
>>>> is just a regular interrupt that can allocated when the first mapping
>>>> gets established.
>>>
>>> okay. I guess it can just be done using irq_create_fwspec_mapping().
>>>
>>
>> I am facing an issue with this approach as I am trying to call
>> irq_create_fwspec_mapping() from alloc callback of INTA driver. During
>> allocation the function call flow looks like:
>>
>> inta_msi_domain_alloc_irqs()
>> 	msi_domain_alloc_irqs()
>> 		__irq_domain_alloc_irqs()
>> 			*mutex_lock(&irq_domain_mutex);*
>> 			irq_domain_alloc_irqs_hierarchy()
>> 				ti_sci_inta_irq_domain_alloc()
>> 					if (first event in group)
>> 							irq_create_fwspec_mapping()
>> 								irq_find_matching_fwspec()
>> 									*mutex_lock(&irq_domain_mutex);*
>> 									
>>
>> The mutex_lock is called again if INTR IRQ gets allocated in alloc callback of
>> INTA driver. So I am clearly calling irq_create_fwspec_mapping() from a wrong place.
> 
> The real issue is that you're are calling irq_create_fwspec_mapping at
> all. This is only supposed to be called by the high level code, not an
> irqchip driver in the middle of its own allocation.
> 
> The right API to use is irq_domain_alloc_irqs_parent(), which calls into
> the parent domain allocation. See the multiple uses in the tree already.

But irq_domain_alloc_irqs_parent() doesn't create a new IRQ mapping. Or your 
suggestion is that when first event mapping gets established in the group, use 
the same Linux virq number to allocate the parent interrupts?

If yes, then wouldn't it be a problem during free irqs? client driver should 
make sure that first event that gets allocated in the group should be the last 
to be freed.

Thanks and regards,
Lokesh
Marc Zyngier Nov. 5, 2018, 4:44 p.m. UTC | #30
On 05/11/18 16:20, Lokesh Vutla wrote:
> Hi Marc,
> 
> On Monday 05 November 2018 09:06 PM, Marc Zyngier wrote:
>> On 05/11/18 08:08, Lokesh Vutla wrote:
>>> Hi Marc,
>>>
>>> On Monday 29 October 2018 06:34 PM, Lokesh Vutla wrote:
>>>> Hi Marc,
>>>>
>>>> On Sunday 28 October 2018 07:01 PM, Marc Zyngier wrote:
>>>>> Hi Lokesh,
>>>>>
>>>>> On Fri, 26 Oct 2018 21:19:41 +0100,
>>>>> Lokesh Vutla <lokeshvutla@ti.com> wrote:
>>>>>>
>>>>>> Hi Marc,
>>>>>>
>>>>>> [..snip..]
>>>>>>>> [...]
>>>>>>>>
>>>>>>>>>>> +/**
>>>>>>>>>>> + * ti_sci_inta_register_event() - Register a event to an interrupt aggregator
>>>>>>>>>>> + * @dev:	Device pointer to source generating the event
>>>>>>>>>>> + * @src_id:	TISCI device ID of the event source
>>>>>>>>>>> + * @src_index:	Event source index within the device.
>>>>>>>>>>> + * @virq:	Linux Virtual IRQ number
>>>>>>>>>>> + * @flags:	Corresponding IRQ flags
>>>>>>>>>>> + * @ack_needed:	If explicit clearing of event is required.
>>>>>>>>>>> + *
>>>>>>>>>>> + * Creates a new irq and attaches to IA domain if virq is not specified
>>>>>>>>>>> + * else attaches the event to vint corresponding to virq.
>>>>>>>>>>> + * When using TISCI within the client drivers, source indexes are always
>>>>>>>>>>> + * generated dynamically and cannot be represented in DT. So client
>>>>>>>>>>> + * drivers should call this API instead of platform_get_irq().
>>>>>>>>>>
>>>>>>>>>> NAK. Either this fits in the standard model, or we adapt the standard
>>>>>>>>>> model to catter for your particular use case. But we don't define a new,
>>>>>>>>>> TI specific API.
>>>>>>>>>>
>>>>>>>>>> I have a hunch that if the IDs are generated dynamically, then the model
>>>>>>>>>> we use for MSIs would fit this thing. I also want to understand what
>>>>>>>>>
>>>>>>>>> hmm..I haven't thought about using MSI. Will try to explore it. But
>>>>>>>>> the "struct msi_msg" is not applicable in this case as device does not
>>>>>>>>> write to a specific location.
>>>>>>>>
>>>>>>>> It doesn't need to. You can perfectly ignore the address field and
>>>>>>>> only be concerned with the data. We already have MSI users that do not
>>>>>>>> need programming of the doorbell address, just the data.
>>>>>>>
>>>>>>
>>>>>> Just one more clarification.
>>>>>>
>>>>>> First let me explain the IRQ routes a bit deeply. As I said earlier
>>>>>> there are three ways in which IRQ can flow in AM65x SoC
>>>>>> 1) Device directly connected to GIC
>>>>>> 	- Device IRQ --> GIC
>>>>>> 2) Device connected to INTR.
>>>>>> 	- Device IRQ --> INTR --> GIC
>>>>>> 3) Devices connected to INTA.
>>>>>> 	- Device IRQ --> INTA --> INTR --> GIC
>>>>>>
>>>>>> 1 and 2 are straight forward and we use DT for IRQ
>>>>>> representation. Coming to 3 the trickier part is that Input to INTA
>>>>>> and output from INTA and dynamically managed. To be more specific:
>>>>>> - By hardware design there are certain set of physical global
>>>>>> events(interrupts) attached to an INTA. Out of which a certain range
>>>>>> are assigned to the current linux host that can be queried from
>>>>>> system-controller.
>>>>>> - Similarly out of all the INTA outputs(referenced as vints) a certain
>>>>>> range can be used by the current linux host.
>>>>>>
>>>>>>
>>>>>> So for configuring an IRQ route in case 3, the following steps are needed:
>>>>>> - Device id and device resource index for which the interrupt is needed
>>>>>
>>>>> THat is no different from a PCI device for example, where we need the
>>>>> requester ID and the number of the interrupt in the MSI-X table.
>>>>>
>>>>>> - A free event id from the range assigned to the INTA in this host context
>>>>>> - A free vint from the range assigned to the INTA in this host context
>>>>>> - A free gic IRQ from the range assigned to the INTR in this host context.
>>>>>
>>>>>    From what I understand of the driver, at least some of that is under
>>>>> the responsibility of the firmware, right? Or is the driver under
>>>>> control of all three parameters? To be honest, it doesn't really
>>>>
>>>> Driver should control all three parameters.
>>>>
>>>>> matter, as the as far as the kernel is concerned, the irqchip drivers
>>>>> are free to deal with the routing anyway they want.
>>>>
>>>> Correct, that's my understanding as well.
>>>>
>>>>>
>>>>>> With the above information, linux should send a message to
>>>>>> system-controller using TISCI protocol. After policing the given
>>>>>> information, system-controller does the following:
>>>>>> - Attaches the interrupt(INTA input) to the device resource index
>>>>>> - Muxes the interrupt(INTA input) to corresponding vint(INTA output)
>>>>>> - Muxes the vint(INTR input) to GIC irq(INTR output).
>>>>>
>>>>> Isn't there a 1:1 mapping between *used* INTR inputs and outputs?
>>>>> Since INTR is a router, there is no real muxing. I assume that the
>>>>> third point above is just a copy-paste error.
>>>>
>>>> Right, my bad. INTR is just a router and no read muxing.
>>>>
>>>>>
>>>>>>
>>>>>> For grouping of interrupts, the same vint number is to be passed to
>>>>>> system-controller for all the requests.
>>>>>>
>>>>>> Keeping all the above in mind, I see the following as software IRQ
>>>>>> Domain Hierarchy:
>>>>>>
>>>>>> 1) INTA multi MSI --> 2)INTA  -->3) MSI --> 4) INTR  -->5) GIC
>>>>>>
>>>>>> INTA driver has to set a chained IRQ using virq allocated from its
>>>>>> parent MSI. This is to differentiate the grouped interrupts within
>>>>>> INTA.
>>>>>>
>>>>>> Inorder to cover the above two MSI domains, a new bus driver has to be
>>>>>> created as I couldn't find a fit with the existing bus drivers.
>>>>>>
>>>>>> Does the above approach make sense? Please correct me if i am wrong.
>>>>>
>>>>> I think this can be further simplified, as you seem to assume that
>>>>> dynamic allocation implies MSI. This is not the case. You can
>>>>> perfectly use dynamically allocated interrupts and still not use MSIs.
>>>>>
>>>>> INTA is indeed a chained interrupt controller, as it may mux several
>>>>> inputs onto a single output. But the output of INTA is not an MSI. It
>>>>> is just a regular interrupt that can allocated when the first mapping
>>>>> gets established.
>>>>
>>>> okay. I guess it can just be done using irq_create_fwspec_mapping().
>>>>
>>>
>>> I am facing an issue with this approach as I am trying to call
>>> irq_create_fwspec_mapping() from alloc callback of INTA driver. During
>>> allocation the function call flow looks like:
>>>
>>> inta_msi_domain_alloc_irqs()
>>> 	msi_domain_alloc_irqs()
>>> 		__irq_domain_alloc_irqs()
>>> 			*mutex_lock(&irq_domain_mutex);*
>>> 			irq_domain_alloc_irqs_hierarchy()
>>> 				ti_sci_inta_irq_domain_alloc()
>>> 					if (first event in group)
>>> 							irq_create_fwspec_mapping()
>>> 								irq_find_matching_fwspec()
>>> 									*mutex_lock(&irq_domain_mutex);*
>>> 									
>>>
>>> The mutex_lock is called again if INTR IRQ gets allocated in alloc callback of
>>> INTA driver. So I am clearly calling irq_create_fwspec_mapping() from a wrong place.
>>
>> The real issue is that you're are calling irq_create_fwspec_mapping at
>> all. This is only supposed to be called by the high level code, not an
>> irqchip driver in the middle of its own allocation.
>>
>> The right API to use is irq_domain_alloc_irqs_parent(), which calls into
>> the parent domain allocation. See the multiple uses in the tree already.
> 
> But irq_domain_alloc_irqs_parent() doesn't create a new IRQ mapping. Or your 
> suggestion is that when first event mapping gets established in the group, use 
> the same Linux virq number to allocate the parent interrupts?

I had already forgotten that your INTA is the multiplexer in your system.

No, using the same virq is completely wrong, as you must have a unique
irq for each of the outputs lines of your INTA.

One solution would be to pre-allocate all the interrupts for the output
lines at probe time, so that you don't have to do much when the INTA
irqs get allocated.

Thanks,

	M.
Lokesh Vutla Nov. 5, 2018, 5:56 p.m. UTC | #31
Hi Marc,

On 11/5/2018 10:14 PM, Marc Zyngier wrote:
> On 05/11/18 16:20, Lokesh Vutla wrote:
>> Hi Marc,
>>
>> On Monday 05 November 2018 09:06 PM, Marc Zyngier wrote:
>>> On 05/11/18 08:08, Lokesh Vutla wrote:
>>>> Hi Marc,
>>>>
>>>> On Monday 29 October 2018 06:34 PM, Lokesh Vutla wrote:
>>>>> Hi Marc,
>>>>>
>>>>> On Sunday 28 October 2018 07:01 PM, Marc Zyngier wrote:
>>>>>> Hi Lokesh,
>>>>>>
>>>>>> On Fri, 26 Oct 2018 21:19:41 +0100,
>>>>>> Lokesh Vutla <lokeshvutla@ti.com> wrote:
>>>>>>>
>>>>>>> Hi Marc,
>>>>>>>
>>>>>>> [..snip..]
>>>>>>>>> [...]
>>>>>>>>>
>>>>>>>>>>>> +/**
>>>>>>>>>>>> + * ti_sci_inta_register_event() - Register a event to an interrupt aggregator
>>>>>>>>>>>> + * @dev:	Device pointer to source generating the event
>>>>>>>>>>>> + * @src_id:	TISCI device ID of the event source
>>>>>>>>>>>> + * @src_index:	Event source index within the device.
>>>>>>>>>>>> + * @virq:	Linux Virtual IRQ number
>>>>>>>>>>>> + * @flags:	Corresponding IRQ flags
>>>>>>>>>>>> + * @ack_needed:	If explicit clearing of event is required.
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * Creates a new irq and attaches to IA domain if virq is not specified
>>>>>>>>>>>> + * else attaches the event to vint corresponding to virq.
>>>>>>>>>>>> + * When using TISCI within the client drivers, source indexes are always
>>>>>>>>>>>> + * generated dynamically and cannot be represented in DT. So client
>>>>>>>>>>>> + * drivers should call this API instead of platform_get_irq().
>>>>>>>>>>>
>>>>>>>>>>> NAK. Either this fits in the standard model, or we adapt the standard
>>>>>>>>>>> model to catter for your particular use case. But we don't define a new,
>>>>>>>>>>> TI specific API.
>>>>>>>>>>>
>>>>>>>>>>> I have a hunch that if the IDs are generated dynamically, then the model
>>>>>>>>>>> we use for MSIs would fit this thing. I also want to understand what
>>>>>>>>>>
>>>>>>>>>> hmm..I haven't thought about using MSI. Will try to explore it. But
>>>>>>>>>> the "struct msi_msg" is not applicable in this case as device does not
>>>>>>>>>> write to a specific location.
>>>>>>>>>
>>>>>>>>> It doesn't need to. You can perfectly ignore the address field and
>>>>>>>>> only be concerned with the data. We already have MSI users that do not
>>>>>>>>> need programming of the doorbell address, just the data.
>>>>>>>>
>>>>>>>
>>>>>>> Just one more clarification.
>>>>>>>
>>>>>>> First let me explain the IRQ routes a bit deeply. As I said earlier
>>>>>>> there are three ways in which IRQ can flow in AM65x SoC
>>>>>>> 1) Device directly connected to GIC
>>>>>>> 	- Device IRQ --> GIC
>>>>>>> 2) Device connected to INTR.
>>>>>>> 	- Device IRQ --> INTR --> GIC
>>>>>>> 3) Devices connected to INTA.
>>>>>>> 	- Device IRQ --> INTA --> INTR --> GIC
>>>>>>>
>>>>>>> 1 and 2 are straight forward and we use DT for IRQ
>>>>>>> representation. Coming to 3 the trickier part is that Input to INTA
>>>>>>> and output from INTA and dynamically managed. To be more specific:
>>>>>>> - By hardware design there are certain set of physical global
>>>>>>> events(interrupts) attached to an INTA. Out of which a certain range
>>>>>>> are assigned to the current linux host that can be queried from
>>>>>>> system-controller.
>>>>>>> - Similarly out of all the INTA outputs(referenced as vints) a certain
>>>>>>> range can be used by the current linux host.
>>>>>>>
>>>>>>>
>>>>>>> So for configuring an IRQ route in case 3, the following steps are needed:
>>>>>>> - Device id and device resource index for which the interrupt is needed
>>>>>>
>>>>>> THat is no different from a PCI device for example, where we need the
>>>>>> requester ID and the number of the interrupt in the MSI-X table.
>>>>>>
>>>>>>> - A free event id from the range assigned to the INTA in this host context
>>>>>>> - A free vint from the range assigned to the INTA in this host context
>>>>>>> - A free gic IRQ from the range assigned to the INTR in this host context.
>>>>>>
>>>>>>    From what I understand of the driver, at least some of that is under
>>>>>> the responsibility of the firmware, right? Or is the driver under
>>>>>> control of all three parameters? To be honest, it doesn't really
>>>>>
>>>>> Driver should control all three parameters.
>>>>>
>>>>>> matter, as the as far as the kernel is concerned, the irqchip drivers
>>>>>> are free to deal with the routing anyway they want.
>>>>>
>>>>> Correct, that's my understanding as well.
>>>>>
>>>>>>
>>>>>>> With the above information, linux should send a message to
>>>>>>> system-controller using TISCI protocol. After policing the given
>>>>>>> information, system-controller does the following:
>>>>>>> - Attaches the interrupt(INTA input) to the device resource index
>>>>>>> - Muxes the interrupt(INTA input) to corresponding vint(INTA output)
>>>>>>> - Muxes the vint(INTR input) to GIC irq(INTR output).
>>>>>>
>>>>>> Isn't there a 1:1 mapping between *used* INTR inputs and outputs?
>>>>>> Since INTR is a router, there is no real muxing. I assume that the
>>>>>> third point above is just a copy-paste error.
>>>>>
>>>>> Right, my bad. INTR is just a router and no read muxing.
>>>>>
>>>>>>
>>>>>>>
>>>>>>> For grouping of interrupts, the same vint number is to be passed to
>>>>>>> system-controller for all the requests.
>>>>>>>
>>>>>>> Keeping all the above in mind, I see the following as software IRQ
>>>>>>> Domain Hierarchy:
>>>>>>>
>>>>>>> 1) INTA multi MSI --> 2)INTA  -->3) MSI --> 4) INTR  -->5) GIC
>>>>>>>
>>>>>>> INTA driver has to set a chained IRQ using virq allocated from its
>>>>>>> parent MSI. This is to differentiate the grouped interrupts within
>>>>>>> INTA.
>>>>>>>
>>>>>>> Inorder to cover the above two MSI domains, a new bus driver has to be
>>>>>>> created as I couldn't find a fit with the existing bus drivers.
>>>>>>>
>>>>>>> Does the above approach make sense? Please correct me if i am wrong.
>>>>>>
>>>>>> I think this can be further simplified, as you seem to assume that
>>>>>> dynamic allocation implies MSI. This is not the case. You can
>>>>>> perfectly use dynamically allocated interrupts and still not use MSIs.
>>>>>>
>>>>>> INTA is indeed a chained interrupt controller, as it may mux several
>>>>>> inputs onto a single output. But the output of INTA is not an MSI. It
>>>>>> is just a regular interrupt that can allocated when the first mapping
>>>>>> gets established.
>>>>>
>>>>> okay. I guess it can just be done using irq_create_fwspec_mapping().
>>>>>
>>>>
>>>> I am facing an issue with this approach as I am trying to call
>>>> irq_create_fwspec_mapping() from alloc callback of INTA driver. During
>>>> allocation the function call flow looks like:
>>>>
>>>> inta_msi_domain_alloc_irqs()
>>>> 	msi_domain_alloc_irqs()
>>>> 		__irq_domain_alloc_irqs()
>>>> 			*mutex_lock(&irq_domain_mutex);*
>>>> 			irq_domain_alloc_irqs_hierarchy()
>>>> 				ti_sci_inta_irq_domain_alloc()
>>>> 					if (first event in group)
>>>> 							irq_create_fwspec_mapping()
>>>> 								irq_find_matching_fwspec()
>>>> 									*mutex_lock(&irq_domain_mutex);*
>>>> 									
>>>>
>>>> The mutex_lock is called again if INTR IRQ gets allocated in alloc callback of
>>>> INTA driver. So I am clearly calling irq_create_fwspec_mapping() from a wrong place.
>>>
>>> The real issue is that you're are calling irq_create_fwspec_mapping at
>>> all. This is only supposed to be called by the high level code, not an
>>> irqchip driver in the middle of its own allocation.
>>>
>>> The right API to use is irq_domain_alloc_irqs_parent(), which calls into
>>> the parent domain allocation. See the multiple uses in the tree already.
>>
>> But irq_domain_alloc_irqs_parent() doesn't create a new IRQ mapping. Or your 
>> suggestion is that when first event mapping gets established in the group, use 
>> the same Linux virq number to allocate the parent interrupts?
> 
> I had already forgotten that your INTA is the multiplexer in your system.
> 
> No, using the same virq is completely wrong, as you must have a unique
> irq for each of the outputs lines of your INTA.
> 
> One solution would be to pre-allocate all the interrupts for the output
> lines at probe time, so that you don't have to do much when the INTA
> irqs get allocated.

Yes, that is one possibility but all the while I am trying to avoid
that. Because the number of INTA outputs can be greater than gic IRQs
assigned to this IP.

Thanks and regards,
Lokesh