mbox series

[0/9] arm64: tegra: Prevent early SMMU faults

Message ID 20210325130332.778208-1-thierry.reding@gmail.com
Headers show
Series arm64: tegra: Prevent early SMMU faults | expand

Message

Thierry Reding March 25, 2021, 1:03 p.m. UTC
From: Thierry Reding <treding@nvidia.com>

Hi,

this is a set of patches that is the result of earlier discussions
regarding early identity mappings that are needed to avoid SMMU faults
during early boot.

The goal here is to avoid early identity mappings altogether and instead
postpone the need for the identity mappings to when devices are attached
to the SMMU. This works by making the SMMU driver coordinate with the
memory controller driver on when to start enforcing SMMU translations.
This makes Tegra behave in a more standard way and pushes the code to
deal with the Tegra-specific programming into the NVIDIA SMMU
implementation.

Patches 1 and 2 are preparatory work that is used in patch 3 to provide
a mechanism to program SID overrides at runtime. Patches 4 and 5 create
the fundamentals in the SMMU driver to support this and also make this
functionality available on Tegra186. Patch 6 hooks the ARM SMMU up to
the memory controller so that the memory overrides can be programmed at
the right time.

Patch 7 extends this mechanism to Tegra186 and patches 8-9 enable all of
this through device tree updates.

The end result is that various peripherals will have SMMU enabled, while
the display controllers will keep using passthrough, as initially set up
by firmware. Once the device tree bindings have been accepted and the
SMMU driver has been updated to create identity mappings for the display
controllers, they can be hooked up to the SMMU and the code in this
series will automatically program the SID overrides to enable SMMU
translations at the right time.

Thierry

Thierry Reding (9):
  memory: tegra: Move internal data structures into separate header
  memory: tegra: Add memory client IDs to tables
  memory: tegra: Implement SID override programming
  iommu/arm-smmu: Implement ->probe_finalize()
  iommu/arm-smmu: tegra: Detect number of instances at runtime
  iommu/arm-smmu: tegra: Implement SID override programming
  iommu/arm-smmu: Use Tegra implementation on Tegra186
  arm64: tegra: Hook up memory controller to SMMU on Tegra186
  arm64: tegra: Enable SMMU support on Tegra194

 arch/arm64/boot/dts/nvidia/tegra186.dtsi     |   2 +
 arch/arm64/boot/dts/nvidia/tegra194.dtsi     |  86 ++++++
 drivers/iommu/arm/arm-smmu/arm-smmu-impl.c   |   3 +-
 drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c |  81 ++++--
 drivers/iommu/arm/arm-smmu/arm-smmu.c        |  17 ++
 drivers/iommu/arm/arm-smmu/arm-smmu.h        |   1 +
 drivers/iommu/tegra-gart.c                   |   2 +-
 drivers/iommu/tegra-smmu.c                   |   2 +-
 drivers/memory/tegra/mc.h                    |   2 +-
 drivers/memory/tegra/tegra186.c              | 288 ++++++++++++++++++-
 include/soc/tegra/mc-internal.h              |  62 ++++
 include/soc/tegra/mc.h                       |  60 +---
 12 files changed, 529 insertions(+), 77 deletions(-)
 create mode 100644 include/soc/tegra/mc-internal.h

Comments

Dmitry Osipenko March 26, 2021, 3:29 p.m. UTC | #1
25.03.2021 16:03, Thierry Reding пишет:
> From: Thierry Reding <treding@nvidia.com>
> 
> Hi,
> 
> this is a set of patches that is the result of earlier discussions
> regarding early identity mappings that are needed to avoid SMMU faults
> during early boot.
> 
> The goal here is to avoid early identity mappings altogether and instead
> postpone the need for the identity mappings to when devices are attached
> to the SMMU. This works by making the SMMU driver coordinate with the
> memory controller driver on when to start enforcing SMMU translations.
> This makes Tegra behave in a more standard way and pushes the code to
> deal with the Tegra-specific programming into the NVIDIA SMMU
> implementation.

It is an interesting idea which inspired me to try to apply a somewhat similar thing to Tegra SMMU driver by holding the SMMU ASID enable-bit until display driver allows to toggle it. This means that we will need an extra small tegra-specific SMMU API function, but it should be okay.

I typed a patch and seems it's working good, I'll prepare a proper patch if you like it.

What do you think about this:

diff --git a/drivers/gpu/drm/grate/dc.c b/drivers/gpu/drm/grate/dc.c
index 45a41586f153..8874cfba40a1 100644
--- a/drivers/gpu/drm/grate/dc.c
+++ b/drivers/gpu/drm/grate/dc.c
@@ -17,6 +17,7 @@
 #include <linux/reset.h>
 
 #include <soc/tegra/common.h>
+#include <soc/tegra/mc.h>
 #include <soc/tegra/pmc.h>
 
 #include <drm/drm_atomic.h>
@@ -2640,6 +2641,11 @@ static int tegra_dc_init(struct host1x_client *client)
 		return err;
 	}
 
+	if (dc->soc->sync_smmu) {
+		struct iommu_domain *domain = iommu_get_domain_for_dev(dc->dev);
+		tegra_smmu_sync_domain(domain, dc->dev);
+	}
+
 	if (dc->soc->wgrps)
 		primary = tegra_dc_add_shared_planes(drm, dc);
 	else
@@ -2824,6 +2830,7 @@ static const struct tegra_dc_soc_info tegra20_dc_soc_info = {
 	.has_win_b_vfilter_mem_client = true,
 	.has_win_c_without_vert_filter = true,
 	.plane_tiled_memory_bandwidth_x2 = false,
+	.sync_smmu = false,
 };
 
 static const struct tegra_dc_soc_info tegra30_dc_soc_info = {
@@ -2846,6 +2853,7 @@ static const struct tegra_dc_soc_info tegra30_dc_soc_info = {
 	.has_win_b_vfilter_mem_client = true,
 	.has_win_c_without_vert_filter = false,
 	.plane_tiled_memory_bandwidth_x2 = true,
+	.sync_smmu = true,
 };
 
 static const struct tegra_dc_soc_info tegra114_dc_soc_info = {
@@ -2868,6 +2876,7 @@ static const struct tegra_dc_soc_info tegra114_dc_soc_info = {
 	.has_win_b_vfilter_mem_client = false,
 	.has_win_c_without_vert_filter = false,
 	.plane_tiled_memory_bandwidth_x2 = true,
+	.sync_smmu = true,
 };
 
 static const struct tegra_dc_soc_info tegra124_dc_soc_info = {
@@ -2890,6 +2899,7 @@ static const struct tegra_dc_soc_info tegra124_dc_soc_info = {
 	.has_win_b_vfilter_mem_client = false,
 	.has_win_c_without_vert_filter = false,
 	.plane_tiled_memory_bandwidth_x2 = false,
+	.sync_smmu = true,
 };
 
 static const struct tegra_dc_soc_info tegra210_dc_soc_info = {
@@ -2912,6 +2922,7 @@ static const struct tegra_dc_soc_info tegra210_dc_soc_info = {
 	.has_win_b_vfilter_mem_client = false,
 	.has_win_c_without_vert_filter = false,
 	.plane_tiled_memory_bandwidth_x2 = false,
+	.sync_smmu = true,
 };
 
 static const struct tegra_windowgroup_soc tegra186_dc_wgrps[] = {
@@ -2961,6 +2972,7 @@ static const struct tegra_dc_soc_info tegra186_dc_soc_info = {
 	.wgrps = tegra186_dc_wgrps,
 	.num_wgrps = ARRAY_SIZE(tegra186_dc_wgrps),
 	.plane_tiled_memory_bandwidth_x2 = false,
+	.sync_smmu = false,
 };
 
 static const struct tegra_windowgroup_soc tegra194_dc_wgrps[] = {
@@ -3010,6 +3022,7 @@ static const struct tegra_dc_soc_info tegra194_dc_soc_info = {
 	.wgrps = tegra194_dc_wgrps,
 	.num_wgrps = ARRAY_SIZE(tegra194_dc_wgrps),
 	.plane_tiled_memory_bandwidth_x2 = false,
+	.sync_smmu = false,
 };
 
 static const struct of_device_id tegra_dc_of_match[] = {
diff --git a/drivers/gpu/drm/grate/dc.h b/drivers/gpu/drm/grate/dc.h
index 316a56131cf1..e0057bf7be99 100644
--- a/drivers/gpu/drm/grate/dc.h
+++ b/drivers/gpu/drm/grate/dc.h
@@ -91,6 +91,7 @@ struct tegra_dc_soc_info {
 	bool has_win_b_vfilter_mem_client;
 	bool has_win_c_without_vert_filter;
 	bool plane_tiled_memory_bandwidth_x2;
+	bool sync_smmu;
 };
 
 struct tegra_dc {
diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index 602aab98c079..e750b1844a88 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -47,6 +47,9 @@ struct tegra_smmu {
 	struct dentry *debugfs;
 
 	struct iommu_device iommu;	/* IOMMU Core code handle */
+
+	bool display_synced[2];
+	bool display_enabled[2];
 };
 
 struct tegra_smmu_as {
@@ -78,6 +81,10 @@ static inline u32 smmu_readl(struct tegra_smmu *smmu, unsigned long offset)
 	return readl(smmu->regs + offset);
 }
 
+/* all Tegra SoCs use the same group IDs for displays */
+#define SMMU_SWGROUP_DC		1
+#define SMMU_SWGROUP_DCB	2
+
 #define SMMU_CONFIG 0x010
 #define  SMMU_CONFIG_ENABLE (1 << 0)
 
@@ -253,6 +260,20 @@ static inline void smmu_flush(struct tegra_smmu *smmu)
 	smmu_readl(smmu, SMMU_PTB_ASID);
 }
 
+static int smmu_swgroup_to_display_id(unsigned int swgroup)
+{
+	switch (swgroup) {
+	case SMMU_SWGROUP_DC:
+		return 0;
+
+	case SMMU_SWGROUP_DCB:
+		return 1;
+
+	default:
+		return -1;
+	}
+}
+
 static int tegra_smmu_alloc_asid(struct tegra_smmu *smmu, unsigned int *idp)
 {
 	unsigned long id;
@@ -352,10 +373,21 @@ tegra_smmu_find_swgroup(struct tegra_smmu *smmu, unsigned int swgroup)
 static void tegra_smmu_enable(struct tegra_smmu *smmu, unsigned int swgroup,
 			      unsigned int asid)
 {
+	const int disp_id = smmu_swgroup_to_display_id(swgroup);
 	const struct tegra_smmu_swgroup *group;
 	unsigned int i;
 	u32 value;
 
+	if (disp_id >= 0) {
+		smmu->display_enabled[disp_id] = true;
+
+		if (!smmu->display_synced[disp_id]) {
+			pr_debug("%s deferred for swgroup %u\n",
+				 __func__, swgroup);
+			return;
+		}
+	}
+
 	group = tegra_smmu_find_swgroup(smmu, swgroup);
 	if (group) {
 		value = smmu_readl(smmu, group->reg);
@@ -385,10 +417,14 @@ static void tegra_smmu_enable(struct tegra_smmu *smmu, unsigned int swgroup,
 static void tegra_smmu_disable(struct tegra_smmu *smmu, unsigned int swgroup,
 			       unsigned int asid)
 {
+	const int disp_id = smmu_swgroup_to_display_id(swgroup);
 	const struct tegra_smmu_swgroup *group;
 	unsigned int i;
 	u32 value;
 
+	if (disp_id >= 0)
+		smmu->display_enabled[disp_id] = false;
+
 	group = tegra_smmu_find_swgroup(smmu, swgroup);
 	if (group) {
 		value = smmu_readl(smmu, group->reg);
@@ -410,6 +446,32 @@ static void tegra_smmu_disable(struct tegra_smmu *smmu, unsigned int swgroup,
 	}
 }
 
+void tegra_smmu_sync_domain(struct iommu_domain *domain, struct device *dev)
+{
+	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
+	struct tegra_smmu *smmu = dev_iommu_priv_get(dev);
+	unsigned int index;
+
+	if (!fwspec || !domain)
+		return;
+
+	for (index = 0; index < fwspec->num_ids; index++) {
+		const unsigned int swgroup = fwspec->ids[index];
+		const int disp_id = smmu_swgroup_to_display_id(swgroup);
+
+		if (disp_id < 0 || smmu->display_synced[disp_id])
+			continue;
+
+		smmu->display_synced[disp_id] = true;
+
+		if (!smmu->display_enabled[disp_id])
+			continue;
+
+		tegra_smmu_enable(smmu, swgroup, to_smmu_as(domain)->id);
+	}
+}
+EXPORT_SYMBOL_GPL(tegra_smmu_sync_domain);
+
 static int tegra_smmu_as_prepare(struct tegra_smmu *smmu,
 				 struct tegra_smmu_as *as)
 {
diff --git a/include/soc/tegra/mc.h b/include/soc/tegra/mc.h
index cfd3b35e23e5..ac1f3226b2ac 100644
--- a/include/soc/tegra/mc.h
+++ b/include/soc/tegra/mc.h
@@ -15,6 +15,7 @@
 
 struct clk;
 struct device;
+struct iommu_domain;
 struct page;
 
 struct tegra_smmu_enable {
@@ -88,6 +89,7 @@ struct tegra_smmu *tegra_smmu_probe(struct device *dev,
 				    const struct tegra_smmu_soc *soc,
 				    struct tegra_mc *mc);
 void tegra_smmu_remove(struct tegra_smmu *smmu);
+void tegra_smmu_sync_domain(struct iommu_domain *domain, struct device *dev);
 #else
 static inline struct tegra_smmu *
 tegra_smmu_probe(struct device *dev, const struct tegra_smmu_soc *soc,
@@ -99,6 +101,11 @@ tegra_smmu_probe(struct device *dev, const struct tegra_smmu_soc *soc,
 static inline void tegra_smmu_remove(struct tegra_smmu *smmu)
 {
 }
+
+static inline void tegra_smmu_sync_domain(struct iommu_domain *domain,
+					  struct device *dev)
+{
+}
 #endif
 
 #ifdef CONFIG_TEGRA_IOMMU_GART
Thierry Reding March 26, 2021, 4:35 p.m. UTC | #2
On Fri, Mar 26, 2021 at 06:29:28PM +0300, Dmitry Osipenko wrote:
> 25.03.2021 16:03, Thierry Reding пишет:
> > From: Thierry Reding <treding@nvidia.com>
> > 
> > Hi,
> > 
> > this is a set of patches that is the result of earlier discussions
> > regarding early identity mappings that are needed to avoid SMMU faults
> > during early boot.
> > 
> > The goal here is to avoid early identity mappings altogether and instead
> > postpone the need for the identity mappings to when devices are attached
> > to the SMMU. This works by making the SMMU driver coordinate with the
> > memory controller driver on when to start enforcing SMMU translations.
> > This makes Tegra behave in a more standard way and pushes the code to
> > deal with the Tegra-specific programming into the NVIDIA SMMU
> > implementation.
> 
> It is an interesting idea which inspired me to try to apply a somewhat similar thing to Tegra SMMU driver by holding the SMMU ASID enable-bit until display driver allows to toggle it. This means that we will need an extra small tegra-specific SMMU API function, but it should be okay.
> 
> I typed a patch and seems it's working good, I'll prepare a proper patch if you like it.

That would actually be working around the problem that this patch was
supposed to prepare for. The reason for this current patch series is to
make sure SMMU translation isn't enabled until a device has actually
been attached to the SMMU. Once it has been attached, the assumption is
that any identity mappings will have been created.

One Tegra SMMU that shouldn't be a problem because translations aren't
enabled until device attach time. So in other words this patch set is to
get Tegra186 and later to parity with earlier chips from this point of
view.

I think the problem that you're trying to work around is better solved
by establishing these identity mappings. I do have patches to implement
this for Tegra210 and earlier, though they may require additional work
if you have bootloaders that don't use standard DT bindings for passing
information about the framebuffer to the kernel.

Thierry
Dmitry Osipenko March 26, 2021, 4:55 p.m. UTC | #3
26.03.2021 19:35, Thierry Reding пишет:
> On Fri, Mar 26, 2021 at 06:29:28PM +0300, Dmitry Osipenko wrote:
>> 25.03.2021 16:03, Thierry Reding пишет:
>>> From: Thierry Reding <treding@nvidia.com>
>>>
>>> Hi,
>>>
>>> this is a set of patches that is the result of earlier discussions
>>> regarding early identity mappings that are needed to avoid SMMU faults
>>> during early boot.
>>>
>>> The goal here is to avoid early identity mappings altogether and instead
>>> postpone the need for the identity mappings to when devices are attached
>>> to the SMMU. This works by making the SMMU driver coordinate with the
>>> memory controller driver on when to start enforcing SMMU translations.
>>> This makes Tegra behave in a more standard way and pushes the code to
>>> deal with the Tegra-specific programming into the NVIDIA SMMU
>>> implementation.
>>
>> It is an interesting idea which inspired me to try to apply a somewhat similar thing to Tegra SMMU driver by holding the SMMU ASID enable-bit until display driver allows to toggle it. This means that we will need an extra small tegra-specific SMMU API function, but it should be okay.
>>
>> I typed a patch and seems it's working good, I'll prepare a proper patch if you like it.
> 
> That would actually be working around the problem that this patch was
> supposed to prepare for. The reason for this current patch series is to
> make sure SMMU translation isn't enabled until a device has actually
> been attached to the SMMU. Once it has been attached, the assumption is
> that any identity mappings will have been created.
> 
> One Tegra SMMU that shouldn't be a problem because translations aren't
> enabled until device attach time. So in other words this patch set is to
> get Tegra186 and later to parity with earlier chips from this point of
> view.
> 
> I think the problem that you're trying to work around is better solved
> by establishing these identity mappings. I do have patches to implement
> this for Tegra210 and earlier, though they may require additional work
> if you have bootloaders that don't use standard DT bindings for passing
> information about the framebuffer to the kernel.

I'm not sure what else reasonable could be done without upgrading to a
very specific version of firmware, which definitely isn't a variant for
older devices which have a wild variety of bootloaders, customized
use-cases and etc.

We could add a kludge that I'm suggesting as a universal fallback
solution, it should work well for all cases that I care about.

So we could have the variant with identity mappings, and if mapping
isn't provided, then fall back to the kludge.
Dmitry Osipenko March 26, 2021, 10:05 p.m. UTC | #4
26.03.2021 19:55, Dmitry Osipenko пишет:
> 26.03.2021 19:35, Thierry Reding пишет:
>> On Fri, Mar 26, 2021 at 06:29:28PM +0300, Dmitry Osipenko wrote:
>>> 25.03.2021 16:03, Thierry Reding пишет:
>>>> From: Thierry Reding <treding@nvidia.com>
>>>>
>>>> Hi,
>>>>
>>>> this is a set of patches that is the result of earlier discussions
>>>> regarding early identity mappings that are needed to avoid SMMU faults
>>>> during early boot.
>>>>
>>>> The goal here is to avoid early identity mappings altogether and instead
>>>> postpone the need for the identity mappings to when devices are attached
>>>> to the SMMU. This works by making the SMMU driver coordinate with the
>>>> memory controller driver on when to start enforcing SMMU translations.
>>>> This makes Tegra behave in a more standard way and pushes the code to
>>>> deal with the Tegra-specific programming into the NVIDIA SMMU
>>>> implementation.
>>>
>>> It is an interesting idea which inspired me to try to apply a somewhat similar thing to Tegra SMMU driver by holding the SMMU ASID enable-bit until display driver allows to toggle it. This means that we will need an extra small tegra-specific SMMU API function, but it should be okay.
>>>
>>> I typed a patch and seems it's working good, I'll prepare a proper patch if you like it.
>>
>> That would actually be working around the problem that this patch was
>> supposed to prepare for. The reason for this current patch series is to
>> make sure SMMU translation isn't enabled until a device has actually
>> been attached to the SMMU. Once it has been attached, the assumption is
>> that any identity mappings will have been created.
>>
>> One Tegra SMMU that shouldn't be a problem because translations aren't
>> enabled until device attach time. So in other words this patch set is to
>> get Tegra186 and later to parity with earlier chips from this point of
>> view.
>>
>> I think the problem that you're trying to work around is better solved
>> by establishing these identity mappings. I do have patches to implement
>> this for Tegra210 and earlier, though they may require additional work
>> if you have bootloaders that don't use standard DT bindings for passing
>> information about the framebuffer to the kernel.
> 
> I'm not sure what else reasonable could be done without upgrading to a
> very specific version of firmware, which definitely isn't a variant for
> older devices which have a wild variety of bootloaders, customized
> use-cases and etc.
> 
> We could add a kludge that I'm suggesting as a universal fallback
> solution, it should work well for all cases that I care about.
> 
> So we could have the variant with identity mappings, and if mapping
> isn't provided, then fall back to the kludge.
> 

I tried a slightly different variant of the kludge by holding the ASID's
enable till the first mapping is created for the display clients and
IOMMU_DOMAIN_DMA now works properly (no EMEM errors on boot and etc) and
without a need to change the DC driver.

I also tried to remove the arm_iommu_detach_device() from the VDE driver
and we now have 3 implicit domains in use (DRM, HX, VDE[wasted]) + 1
explicit (VDE) on T30, which works okay for today. So technically we
could support the IOMMU_DOMAIN_DMA with a couple small changes right now
or at least revert the hacks that were needed for Nyan.

But in order to enable IOMMU_DOMAIN_DMA properly, we will need to do
something about the DMA mappings first in the DRM driver and I also
found that implicit IOMMU somehow doesn't work for host1x driver at all,
so this needs to be fixed too.