diff mbox series

PCI: tegra: limit MSI target address to 32-bit

Message ID 1509991387-15951-1-git-send-email-vidyas@nvidia.com
State Deferred
Headers show
Series PCI: tegra: limit MSI target address to 32-bit | expand

Commit Message

Vidya Sagar Nov. 6, 2017, 6:03 p.m. UTC
limits MSI target address to only 32-bit region to enable
some of the PCIe end points where only 32-bit MSIs
are supported work properly.
One example being Marvel SATA controller

Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
---
 drivers/pci/host/pci-tegra.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Bjorn Helgaas Nov. 8, 2017, 9:25 p.m. UTC | #1
[+cc Michal, Sören, Simon]

On Mon, Nov 06, 2017 at 11:33:07PM +0530, Vidya Sagar wrote:
> limits MSI target address to only 32-bit region to enable
> some of the PCIe end points where only 32-bit MSIs
> are supported work properly.
> One example being Marvel SATA controller
> 
> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> ---
>  drivers/pci/host/pci-tegra.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/host/pci-tegra.c b/drivers/pci/host/pci-tegra.c
> index 1987fec1f126..03d3dcdd06c2 100644
> --- a/drivers/pci/host/pci-tegra.c
> +++ b/drivers/pci/host/pci-tegra.c
> @@ -1531,7 +1531,7 @@ static int tegra_pcie_enable_msi(struct tegra_pcie *pcie)
>  	}
>  
>  	/* setup AFI/FPCI range */
> -	msi->pages = __get_free_pages(GFP_KERNEL, 0);
> +	msi->pages = __get_free_pages(GFP_DMA, 0);
>  	msi->phys = virt_to_phys((void *)msi->pages);

Should this be GFP_DMA32?  See the comment above the GFP_DMA
definition.

Should we be using virt_to_phys() here?  Where exactly is the result
("msi->phys") used, i.e., what bus will that address appear on?  If it
appears on the PCI side, this should probably use something like
pcibios_resource_to_bus().

Do rcar_pcie_enable_msi() and xilinx_pcie_enable_msi() have a similar
problem?  They both use GFP_KERNEL, then virt_to_phys(), then write the
result of virt_to_phys() using a 32-bit register write.

>  	afi_writel(pcie, msi->phys >> soc->msi_base_shift, AFI_MSI_FPCI_BAR_ST);
> -- 
> 2.7.4
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vidya Sagar Nov. 9, 2017, 7:18 a.m. UTC | #2
On Thursday 09 November 2017 02:55 AM, Bjorn Helgaas wrote:
> [+cc Michal, Sören, Simon]
>
> On Mon, Nov 06, 2017 at 11:33:07PM +0530, Vidya Sagar wrote:
>> limits MSI target address to only 32-bit region to enable
>> some of the PCIe end points where only 32-bit MSIs
>> are supported work properly.
>> One example being Marvel SATA controller
>>
>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
>> ---
>>   drivers/pci/host/pci-tegra.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/host/pci-tegra.c b/drivers/pci/host/pci-tegra.c
>> index 1987fec1f126..03d3dcdd06c2 100644
>> --- a/drivers/pci/host/pci-tegra.c
>> +++ b/drivers/pci/host/pci-tegra.c
>> @@ -1531,7 +1531,7 @@ static int tegra_pcie_enable_msi(struct tegra_pcie *pcie)
>>   	}
>>   
>>   	/* setup AFI/FPCI range */
>> -	msi->pages = __get_free_pages(GFP_KERNEL, 0);
>> +	msi->pages = __get_free_pages(GFP_DMA, 0);
>>   	msi->phys = virt_to_phys((void *)msi->pages);
> Should this be GFP_DMA32?  See the comment above the GFP_DMA
> definition.
looking at the comments for both GFP_DMA32 and GFP_DMA, I thought GFP_DMA32
is the correct one to use, but, even with that I got >32-bit addresses.
GFP_DMA always gives addresses in <4GB boundary (i.e. 32-bit).
I didn't dig into it to find out why is this the case.
> Should we be using virt_to_phys() here?  Where exactly is the result
> ("msi->phys") used, i.e., what bus will that address appear on?  If it
> appears on the PCI side, this should probably use something like
> pcibios_resource_to_bus().
This address is written to two places.
First, into host's internal register to let it know that when an incoming
memory write comes with this address, raise an MSI interrupt instead of 
forwarding it to
memory subsystem.
Second, into 'Message Address' field of 'Message Address Register for 
MSI' register in
end point's configuration space (this is done by MSI framework) for end 
point to know
which address to be used to generate MSI interrupt.
> Do rcar_pcie_enable_msi() and xilinx_pcie_enable_msi() have a similar
> problem?  They both use GFP_KERNEL, then virt_to_phys(), then write the
> result of virt_to_phys() using a 32-bit register write.
Well, if those systems deal with 64-bit addresses and when an end point 
is connected which supports
only 32-bit MSI addresses, this problem will surface when 
__get_free_pages() returns an address that
translates to a >32-bit address after virt_to_phys() call on it.
>
>>   	afi_writel(pcie, msi->phys >> soc->msi_base_shift, AFI_MSI_FPCI_BAR_ST);
>> -- 
>> 2.7.4
>>

--
To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Nov. 9, 2017, 6:14 p.m. UTC | #3
[+cc Lorenzo]

On Thu, Nov 09, 2017 at 12:48:14PM +0530, Vidya Sagar wrote:
> On Thursday 09 November 2017 02:55 AM, Bjorn Helgaas wrote:
> >On Mon, Nov 06, 2017 at 11:33:07PM +0530, Vidya Sagar wrote:
> >>limits MSI target address to only 32-bit region to enable
> >>some of the PCIe end points where only 32-bit MSIs
> >>are supported work properly.
> >>One example being Marvel SATA controller
> >>
> >>Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> >>---
> >>  drivers/pci/host/pci-tegra.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >>diff --git a/drivers/pci/host/pci-tegra.c b/drivers/pci/host/pci-tegra.c
> >>index 1987fec1f126..03d3dcdd06c2 100644
> >>--- a/drivers/pci/host/pci-tegra.c
> >>+++ b/drivers/pci/host/pci-tegra.c
> >>@@ -1531,7 +1531,7 @@ static int tegra_pcie_enable_msi(struct tegra_pcie *pcie)
> >>  	}
> >>  	/* setup AFI/FPCI range */
> >>-	msi->pages = __get_free_pages(GFP_KERNEL, 0);
> >>+	msi->pages = __get_free_pages(GFP_DMA, 0);
> >>  	msi->phys = virt_to_phys((void *)msi->pages);
>
> >Should this be GFP_DMA32?  See the comment above the GFP_DMA
> >definition.
>
> looking at the comments for both GFP_DMA32 and GFP_DMA, I thought GFP_DMA32
> is the correct one to use, but, even with that I got >32-bit addresses.
> GFP_DMA always gives addresses in <4GB boundary (i.e. 32-bit).
> I didn't dig into it to find out why is this the case.

This sounds worth looking into (but maybe we don't need the
__get_free_pages() at all; see below).  Maybe there's some underlying
bug.  My laptop shows this, which looks like it might be related:

  Zone ranges:
    DMA      [mem 0x0000000000001000-0x0000000000ffffff]
    DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
    Normal   [mem 0x0000000100000000-0x00000004217fffff]
    Device   empty

What does your machine show?

> >Should we be using virt_to_phys() here?  Where exactly is the
> >result ("msi->phys") used, i.e., what bus will that address appear
> >on?  If it appears on the PCI side, this should probably use
> >something like pcibios_resource_to_bus().
>
> This address is written to two places.  First, into host's internal
> register to let it know that when an incoming memory write comes
> with this address, raise an MSI interrupt instead of forwarding it
> to memory subsystem.  Second, into 'Message Address' field of
> 'Message Address Register for MSI' register in end point's
> configuration space (this is done by MSI framework) for end point to
> know which address to be used to generate MSI interrupt.

Hmmm, ISTR some past discussion about this.  Here's a little: [1, 2].
And this commit [3] sounds like it describes a similar hardware
situation with Tegra where the host bridge intercepts the MSI target
address, so writes to it never reach system memory.  That means that
Tegra doesn't need to allocate system memory at all.

Is your system similar?  Can you just statically allocate a little bus
address space, use that for the MSI target address, and skip the
__get_free_pages()?

> >Do rcar_pcie_enable_msi() and xilinx_pcie_enable_msi() have a
> >similar problem?  They both use GFP_KERNEL, then virt_to_phys(),
> >then write the result of virt_to_phys() using a 32-bit register
> >write.
>
> Well, if those systems deal with 64-bit addresses and when an end
> point is connected which supports only 32-bit MSI addresses, this
> problem will surface when __get_free_pages() returns an address that
> translates to a >32-bit address after virt_to_phys() call on it.

I'd like to hear from the R-Car and Xilinx folks about (1) whether
there's a potential issue with truncating a 64-bit address, and
(2) whether that hardware works like Tegra, where the MSI write never
reaches memory so we don't actually need to allocate a page.

If all we need is to allocate a little bus address space for the MSI
target, I'd like to figure out a better way to do that than
__get_free_pages().  The current code seems a little buggy, and
it's getting propagated through several drivers.

> >>  	afi_writel(pcie, msi->phys >> soc->msi_base_shift, AFI_MSI_FPCI_BAR_ST);

[1] https://lkml.kernel.org/r/20170824134451.GA31858@bhelgaas-glaptop.roam.corp.google.com
[2] https://lkml.kernel.org/r/86efs3wesi.fsf@arm.com
[3] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d7bd554f27c9
--
To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vidya Sagar Nov. 9, 2017, 6:41 p.m. UTC | #4
On Thursday 09 November 2017 11:44 PM, Bjorn Helgaas wrote:
> [+cc Lorenzo]
>
> On Thu, Nov 09, 2017 at 12:48:14PM +0530, Vidya Sagar wrote:
>> On Thursday 09 November 2017 02:55 AM, Bjorn Helgaas wrote:
>>> On Mon, Nov 06, 2017 at 11:33:07PM +0530, Vidya Sagar wrote:
>>>> limits MSI target address to only 32-bit region to enable
>>>> some of the PCIe end points where only 32-bit MSIs
>>>> are supported work properly.
>>>> One example being Marvel SATA controller
>>>>
>>>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
>>>> ---
>>>>   drivers/pci/host/pci-tegra.c | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/pci/host/pci-tegra.c b/drivers/pci/host/pci-tegra.c
>>>> index 1987fec1f126..03d3dcdd06c2 100644
>>>> --- a/drivers/pci/host/pci-tegra.c
>>>> +++ b/drivers/pci/host/pci-tegra.c
>>>> @@ -1531,7 +1531,7 @@ static int tegra_pcie_enable_msi(struct tegra_pcie *pcie)
>>>>   	}
>>>>   	/* setup AFI/FPCI range */
>>>> -	msi->pages = __get_free_pages(GFP_KERNEL, 0);
>>>> +	msi->pages = __get_free_pages(GFP_DMA, 0);
>>>>   	msi->phys = virt_to_phys((void *)msi->pages);
>>> Should this be GFP_DMA32?  See the comment above the GFP_DMA
>>> definition.
>> looking at the comments for both GFP_DMA32 and GFP_DMA, I thought GFP_DMA32
>> is the correct one to use, but, even with that I got >32-bit addresses.
>> GFP_DMA always gives addresses in <4GB boundary (i.e. 32-bit).
>> I didn't dig into it to find out why is this the case.
> This sounds worth looking into (but maybe we don't need the
> __get_free_pages() at all; see below).  Maybe there's some underlying
> bug.  My laptop shows this, which looks like it might be related:
>
>    Zone ranges:
>      DMA      [mem 0x0000000000001000-0x0000000000ffffff]
>      DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
>      Normal   [mem 0x0000000100000000-0x00000004217fffff]
>      Device   empty
>
> What does your machine show?
I see following in my linux box
  Zone ranges:
    DMA      [mem 0x0000000000001000-0x0000000000ffffff]
    DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
    Normal   [mem 0x0000000100000000-0x000000106effffff]
    Device   empty

and following in my T210 Tegra platform
Zone ranges:
   DMA      [mem 0x0000000080000000-0x00000000ffffffff]
   Normal   [mem 0x0000000100000000-0x000000017fffffff]

>
>>> Should we be using virt_to_phys() here?  Where exactly is the
>>> result ("msi->phys") used, i.e., what bus will that address appear
>>> on?  If it appears on the PCI side, this should probably use
>>> something like pcibios_resource_to_bus().
>> This address is written to two places.  First, into host's internal
>> register to let it know that when an incoming memory write comes
>> with this address, raise an MSI interrupt instead of forwarding it
>> to memory subsystem.  Second, into 'Message Address' field of
>> 'Message Address Register for MSI' register in end point's
>> configuration space (this is done by MSI framework) for end point to
>> know which address to be used to generate MSI interrupt.
> Hmmm, ISTR some past discussion about this.  Here's a little: [1, 2].
> And this commit [3] sounds like it describes a similar hardware
> situation with Tegra where the host bridge intercepts the MSI target
> address, so writes to it never reach system memory.  That means that
> Tegra doesn't need to allocate system memory at all.
>
> Is your system similar?  Can you just statically allocate a little bus
> address space, use that for the MSI target address, and skip the
> __get_free_pages()?
It is the same system where MSI memory writes don't really make it to
system memory. But, the addresses mentioned in that patch don't work.
we are working with hardware folks on understanding the issue better.
Meanwhile, using GFP_DMA is giving consistent results and thought this
can be used as a stop gap solution before we figure out a better bus
address to be used as MSI target address.
>>> Do rcar_pcie_enable_msi() and xilinx_pcie_enable_msi() have a
>>> similar problem?  They both use GFP_KERNEL, then virt_to_phys(),
>>> then write the result of virt_to_phys() using a 32-bit register
>>> write.
>> Well, if those systems deal with 64-bit addresses and when an end
>> point is connected which supports only 32-bit MSI addresses, this
>> problem will surface when __get_free_pages() returns an address that
>> translates to a >32-bit address after virt_to_phys() call on it.
> I'd like to hear from the R-Car and Xilinx folks about (1) whether
> there's a potential issue with truncating a 64-bit address, and
> (2) whether that hardware works like Tegra, where the MSI write never
> reaches memory so we don't actually need to allocate a page.
>
> If all we need is to allocate a little bus address space for the MSI
> target, I'd like to figure out a better way to do that than
> __get_free_pages().  The current code seems a little buggy, and
> it's getting propagated through several drivers.
>
>>>>   	afi_writel(pcie, msi->phys >> soc->msi_base_shift, AFI_MSI_FPCI_BAR_ST);
> [1] https://lkml.kernel.org/r/20170824134451.GA31858@bhelgaas-glaptop.roam.corp.google.com
> [2] https://lkml.kernel.org/r/86efs3wesi.fsf@arm.com
> [3] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d7bd554f27c9

--
To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Subrahmanya Lingappa Nov. 10, 2017, 12:47 a.m. UTC | #5
Bjorn,

On 11/9/2017 11:44 PM, Bjorn Helgaas wrote:
> [+cc Lorenzo]
>
> On Thu, Nov 09, 2017 at 12:48:14PM +0530, Vidya Sagar wrote:
>> On Thursday 09 November 2017 02:55 AM, Bjorn Helgaas wrote:
>>> On Mon, Nov 06, 2017 at 11:33:07PM +0530, Vidya Sagar wrote:
>>>> limits MSI target address to only 32-bit region to enable
>>>> some of the PCIe end points where only 32-bit MSIs
>>>> are supported work properly.
>>>> One example being Marvel SATA controller
>>>>
>>>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
>>>> ---
>>>>  drivers/pci/host/pci-tegra.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/pci/host/pci-tegra.c b/drivers/pci/host/pci-tegra.c
>>>> index 1987fec1f126..03d3dcdd06c2 100644
>>>> --- a/drivers/pci/host/pci-tegra.c
>>>> +++ b/drivers/pci/host/pci-tegra.c
>>>> @@ -1531,7 +1531,7 @@ static int tegra_pcie_enable_msi(struct tegra_pcie *pcie)
>>>>  	}
>>>>  	/* setup AFI/FPCI range */
>>>> -	msi->pages = __get_free_pages(GFP_KERNEL, 0);
>>>> +	msi->pages = __get_free_pages(GFP_DMA, 0);
>>>>  	msi->phys = virt_to_phys((void *)msi->pages);
>>
>>> Should this be GFP_DMA32?  See the comment above the GFP_DMA
>>> definition.
>>
>> looking at the comments for both GFP_DMA32 and GFP_DMA, I thought GFP_DMA32
>> is the correct one to use, but, even with that I got >32-bit addresses.
>> GFP_DMA always gives addresses in <4GB boundary (i.e. 32-bit).
>> I didn't dig into it to find out why is this the case.
>
> This sounds worth looking into (but maybe we don't need the
> __get_free_pages() at all; see below).  Maybe there's some underlying
> bug.  My laptop shows this, which looks like it might be related:
>
>   Zone ranges:
>     DMA      [mem 0x0000000000001000-0x0000000000ffffff]
>     DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
>     Normal   [mem 0x0000000100000000-0x00000004217fffff]
>     Device   empty
>
> What does your machine show?
>
>>> Should we be using virt_to_phys() here?  Where exactly is the
>>> result ("msi->phys") used, i.e., what bus will that address appear
>>> on?  If it appears on the PCI side, this should probably use
>>> something like pcibios_resource_to_bus().
>>
>> This address is written to two places.  First, into host's internal
>> register to let it know that when an incoming memory write comes
>> with this address, raise an MSI interrupt instead of forwarding it
>> to memory subsystem.  Second, into 'Message Address' field of
>> 'Message Address Register for MSI' register in end point's
>> configuration space (this is done by MSI framework) for end point to
>> know which address to be used to generate MSI interrupt.
>
> Hmmm, ISTR some past discussion about this.  Here's a little: [1, 2].
> And this commit [3] sounds like it describes a similar hardware
> situation with Tegra where the host bridge intercepts the MSI target
> address, so writes to it never reach system memory.  That means that
> Tegra doesn't need to allocate system memory at all.
>
> Is your system similar?  Can you just statically allocate a little bus
> address space, use that for the MSI target address, and skip the
> __get_free_pages()?
>
>>> Do rcar_pcie_enable_msi() and xilinx_pcie_enable_msi() have a
>>> similar problem?  They both use GFP_KERNEL, then virt_to_phys(),
>>> then write the result of virt_to_phys() using a 32-bit register
>>> write.
>>
>> Well, if those systems deal with 64-bit addresses and when an end
>> point is connected which supports only 32-bit MSI addresses, this
>> problem will surface when __get_free_pages() returns an address that
>> translates to a >32-bit address after virt_to_phys() call on it.
>
> I'd like to hear from the R-Car and Xilinx folks about (1) whether
> there's a potential issue with truncating a 64-bit address, and
> (2) whether that hardware works like Tegra, where the MSI write never
> reaches memory so we don't actually need to allocate a page.
>

it is so, at least drivers/pci/host/pcie-xilinx.c driver, MSI data is 
never read from the msi_pages physical
memory allocated, data is always intercepted, caught and fed from one of 
the HW FIFO register XILINX_PCIE_REG_RPIFR2. Here too __get_free_pages 
was used to allocate msi_pages just to get a real memory address to 
program in the MSI address in PCIe endpoints config register.

MSI handling code from pcie-xilinx.c

	if (status & (XILINX_PCIE_INTR_INTX | XILINX_PCIE_INTR_MSI)) {
		.
		.

		/* Decode the IRQ number */
		if (val & XILINX_PCIE_RPIFR1_MSI_INTR) {
			val = pcie_read(port, XILINX_PCIE_REG_RPIFR2) &
				XILINX_PCIE_RPIFR2_MSG_DATA;
		} else {
			val = (val & XILINX_PCIE_RPIFR1_INTR_MASK) >>
				XILINX_PCIE_RPIFR1_INTR_SHIFT;
			val = irq_find_mapping(port->leg_domain, val);
		}
		
		.
		.
		
		/* Handle the interrupt */
		if (IS_ENABLED(CONFIG_PCI_MSI) ||
		    !(val & XILINX_PCIE_RPIFR1_MSI_INTR))
			generic_handle_irq(val);
	}

> If all we need is to allocate a little bus address space for the MSI
> target, I'd like to figure out a better way to do that than
> __get_free_pages().  The current code seems a little buggy, and
> it's getting propagated through several drivers.
>
>>>>  	afi_writel(pcie, msi->phys >> soc->msi_base_shift, AFI_MSI_FPCI_BAR_ST);
>
> [1] https://lkml.kernel.org/r/20170824134451.GA31858@bhelgaas-glaptop.roam.corp.google.com
> [2] https://lkml.kernel.org/r/86efs3wesi.fsf@arm.com
> [3] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d7bd554f27c9
>
--
To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thierry Reding Nov. 10, 2017, 9:37 a.m. UTC | #6
On Fri, Nov 10, 2017 at 12:11:05AM +0530, Vidya Sagar wrote:
> 
> 
> On Thursday 09 November 2017 11:44 PM, Bjorn Helgaas wrote:
> > [+cc Lorenzo]
> > 
> > On Thu, Nov 09, 2017 at 12:48:14PM +0530, Vidya Sagar wrote:
> > > On Thursday 09 November 2017 02:55 AM, Bjorn Helgaas wrote:
> > > > On Mon, Nov 06, 2017 at 11:33:07PM +0530, Vidya Sagar wrote:
> > > > > limits MSI target address to only 32-bit region to enable
> > > > > some of the PCIe end points where only 32-bit MSIs
> > > > > are supported work properly.
> > > > > One example being Marvel SATA controller
> > > > > 
> > > > > Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> > > > > ---
> > > > >   drivers/pci/host/pci-tegra.c | 2 +-
> > > > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/drivers/pci/host/pci-tegra.c b/drivers/pci/host/pci-tegra.c
> > > > > index 1987fec1f126..03d3dcdd06c2 100644
> > > > > --- a/drivers/pci/host/pci-tegra.c
> > > > > +++ b/drivers/pci/host/pci-tegra.c
> > > > > @@ -1531,7 +1531,7 @@ static int tegra_pcie_enable_msi(struct tegra_pcie *pcie)
> > > > >   	}
> > > > >   	/* setup AFI/FPCI range */
> > > > > -	msi->pages = __get_free_pages(GFP_KERNEL, 0);
> > > > > +	msi->pages = __get_free_pages(GFP_DMA, 0);
> > > > >   	msi->phys = virt_to_phys((void *)msi->pages);
> > > > Should this be GFP_DMA32?  See the comment above the GFP_DMA
> > > > definition.
> > > looking at the comments for both GFP_DMA32 and GFP_DMA, I thought GFP_DMA32
> > > is the correct one to use, but, even with that I got >32-bit addresses.
> > > GFP_DMA always gives addresses in <4GB boundary (i.e. 32-bit).
> > > I didn't dig into it to find out why is this the case.
> > This sounds worth looking into (but maybe we don't need the
> > __get_free_pages() at all; see below).  Maybe there's some underlying
> > bug.  My laptop shows this, which looks like it might be related:
> > 
> >    Zone ranges:
> >      DMA      [mem 0x0000000000001000-0x0000000000ffffff]
> >      DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
> >      Normal   [mem 0x0000000100000000-0x00000004217fffff]
> >      Device   empty
> > 
> > What does your machine show?
> I see following in my linux box
>  Zone ranges:
>    DMA      [mem 0x0000000000001000-0x0000000000ffffff]
>    DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
>    Normal   [mem 0x0000000100000000-0x000000106effffff]
>    Device   empty
> 
> and following in my T210 Tegra platform
> Zone ranges:
>   DMA      [mem 0x0000000080000000-0x00000000ffffffff]
>   Normal   [mem 0x0000000100000000-0x000000017fffffff]

This seems to be happening because 64-bit ARM doesn't have the
ZONE_DMA32 Kconfig option, which seems to cause the DMA32 zone
to default to the normal zone (see include/linux/gfp.h).

That's very confusing in conjunction with the kerneldoc comment
for GFP_DMA32 because it isn't actually guaranteed to give you
32-bit addresses for !ZONE_DMA32.

Cc'ing Arnd who knows about these things.

Thierry
Thierry Reding Nov. 10, 2017, 9:44 a.m. UTC | #7
On Fri, Nov 10, 2017 at 06:17:16AM +0530, subrahmanya_lingappa wrote:
> Bjorn,
> 
> On 11/9/2017 11:44 PM, Bjorn Helgaas wrote:
> > [+cc Lorenzo]
> > 
> > On Thu, Nov 09, 2017 at 12:48:14PM +0530, Vidya Sagar wrote:
> > > On Thursday 09 November 2017 02:55 AM, Bjorn Helgaas wrote:
> > > > On Mon, Nov 06, 2017 at 11:33:07PM +0530, Vidya Sagar wrote:
> > > > > limits MSI target address to only 32-bit region to enable
> > > > > some of the PCIe end points where only 32-bit MSIs
> > > > > are supported work properly.
> > > > > One example being Marvel SATA controller
> > > > > 
> > > > > Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> > > > > ---
> > > > >  drivers/pci/host/pci-tegra.c | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/drivers/pci/host/pci-tegra.c b/drivers/pci/host/pci-tegra.c
> > > > > index 1987fec1f126..03d3dcdd06c2 100644
> > > > > --- a/drivers/pci/host/pci-tegra.c
> > > > > +++ b/drivers/pci/host/pci-tegra.c
> > > > > @@ -1531,7 +1531,7 @@ static int tegra_pcie_enable_msi(struct tegra_pcie *pcie)
> > > > >  	}
> > > > >  	/* setup AFI/FPCI range */
> > > > > -	msi->pages = __get_free_pages(GFP_KERNEL, 0);
> > > > > +	msi->pages = __get_free_pages(GFP_DMA, 0);
> > > > >  	msi->phys = virt_to_phys((void *)msi->pages);
> > > 
> > > > Should this be GFP_DMA32?  See the comment above the GFP_DMA
> > > > definition.
> > > 
> > > looking at the comments for both GFP_DMA32 and GFP_DMA, I thought GFP_DMA32
> > > is the correct one to use, but, even with that I got >32-bit addresses.
> > > GFP_DMA always gives addresses in <4GB boundary (i.e. 32-bit).
> > > I didn't dig into it to find out why is this the case.
> > 
> > This sounds worth looking into (but maybe we don't need the
> > __get_free_pages() at all; see below).  Maybe there's some underlying
> > bug.  My laptop shows this, which looks like it might be related:
> > 
> >   Zone ranges:
> >     DMA      [mem 0x0000000000001000-0x0000000000ffffff]
> >     DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
> >     Normal   [mem 0x0000000100000000-0x00000004217fffff]
> >     Device   empty
> > 
> > What does your machine show?
> > 
> > > > Should we be using virt_to_phys() here?  Where exactly is the
> > > > result ("msi->phys") used, i.e., what bus will that address appear
> > > > on?  If it appears on the PCI side, this should probably use
> > > > something like pcibios_resource_to_bus().
> > > 
> > > This address is written to two places.  First, into host's internal
> > > register to let it know that when an incoming memory write comes
> > > with this address, raise an MSI interrupt instead of forwarding it
> > > to memory subsystem.  Second, into 'Message Address' field of
> > > 'Message Address Register for MSI' register in end point's
> > > configuration space (this is done by MSI framework) for end point to
> > > know which address to be used to generate MSI interrupt.
> > 
> > Hmmm, ISTR some past discussion about this.  Here's a little: [1, 2].
> > And this commit [3] sounds like it describes a similar hardware
> > situation with Tegra where the host bridge intercepts the MSI target
> > address, so writes to it never reach system memory.  That means that
> > Tegra doesn't need to allocate system memory at all.
> > 
> > Is your system similar?  Can you just statically allocate a little bus
> > address space, use that for the MSI target address, and skip the
> > __get_free_pages()?
> > 
> > > > Do rcar_pcie_enable_msi() and xilinx_pcie_enable_msi() have a
> > > > similar problem?  They both use GFP_KERNEL, then virt_to_phys(),
> > > > then write the result of virt_to_phys() using a 32-bit register
> > > > write.
> > > 
> > > Well, if those systems deal with 64-bit addresses and when an end
> > > point is connected which supports only 32-bit MSI addresses, this
> > > problem will surface when __get_free_pages() returns an address that
> > > translates to a >32-bit address after virt_to_phys() call on it.
> > 
> > I'd like to hear from the R-Car and Xilinx folks about (1) whether
> > there's a potential issue with truncating a 64-bit address, and
> > (2) whether that hardware works like Tegra, where the MSI write never
> > reaches memory so we don't actually need to allocate a page.
> > 
> 
> it is so, at least drivers/pci/host/pcie-xilinx.c driver, MSI data is never
> read from the msi_pages physical
> memory allocated, data is always intercepted, caught and fed from one of the
> HW FIFO register XILINX_PCIE_REG_RPIFR2. Here too __get_free_pages was used
> to allocate msi_pages just to get a real memory address to program in the
> MSI address in PCIe endpoints config register.
> 
> MSI handling code from pcie-xilinx.c
> 
> 	if (status & (XILINX_PCIE_INTR_INTX | XILINX_PCIE_INTR_MSI)) {
> 		.
> 		.
> 
> 		/* Decode the IRQ number */
> 		if (val & XILINX_PCIE_RPIFR1_MSI_INTR) {
> 			val = pcie_read(port, XILINX_PCIE_REG_RPIFR2) &
> 				XILINX_PCIE_RPIFR2_MSG_DATA;
> 		} else {
> 			val = (val & XILINX_PCIE_RPIFR1_INTR_MASK) >>
> 				XILINX_PCIE_RPIFR1_INTR_SHIFT;
> 			val = irq_find_mapping(port->leg_domain, val);
> 		}
> 		
> 		.
> 		.
> 		
> 		/* Handle the interrupt */
> 		if (IS_ENABLED(CONFIG_PCI_MSI) ||
> 		    !(val & XILINX_PCIE_RPIFR1_MSI_INTR))
> 			generic_handle_irq(val);
> 	}

A side-note: this looks like a bug to me. In case of an MSI, the val
variable will have been overwritten by the MSI decode branch above, so
it seems to me like the conditional down here isn't guaranteed to work
and depends on what MSI was actually received?

Don't you need to store the interrupt status in a variable separate from
the value that you're using to find which interrupt was received?

Thierry
Lorenzo Pieralisi Nov. 10, 2017, 11:22 a.m. UTC | #8
[+Robin]

On Thu, Nov 09, 2017 at 12:14:35PM -0600, Bjorn Helgaas wrote:
> [+cc Lorenzo]
> 
> On Thu, Nov 09, 2017 at 12:48:14PM +0530, Vidya Sagar wrote:
> > On Thursday 09 November 2017 02:55 AM, Bjorn Helgaas wrote:
> > >On Mon, Nov 06, 2017 at 11:33:07PM +0530, Vidya Sagar wrote:
> > >>limits MSI target address to only 32-bit region to enable
> > >>some of the PCIe end points where only 32-bit MSIs
> > >>are supported work properly.
> > >>One example being Marvel SATA controller
> > >>
> > >>Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> > >>---
> > >>  drivers/pci/host/pci-tegra.c | 2 +-
> > >>  1 file changed, 1 insertion(+), 1 deletion(-)
> > >>
> > >>diff --git a/drivers/pci/host/pci-tegra.c b/drivers/pci/host/pci-tegra.c
> > >>index 1987fec1f126..03d3dcdd06c2 100644
> > >>--- a/drivers/pci/host/pci-tegra.c
> > >>+++ b/drivers/pci/host/pci-tegra.c
> > >>@@ -1531,7 +1531,7 @@ static int tegra_pcie_enable_msi(struct tegra_pcie *pcie)
> > >>  	}
> > >>  	/* setup AFI/FPCI range */
> > >>-	msi->pages = __get_free_pages(GFP_KERNEL, 0);
> > >>+	msi->pages = __get_free_pages(GFP_DMA, 0);
> > >>  	msi->phys = virt_to_phys((void *)msi->pages);
> >
> > >Should this be GFP_DMA32?  See the comment above the GFP_DMA
> > >definition.
> >
> > looking at the comments for both GFP_DMA32 and GFP_DMA, I thought GFP_DMA32
> > is the correct one to use, but, even with that I got >32-bit addresses.
> > GFP_DMA always gives addresses in <4GB boundary (i.e. 32-bit).
> > I didn't dig into it to find out why is this the case.
> 
> This sounds worth looking into (but maybe we don't need the
> __get_free_pages() at all; see below).  Maybe there's some underlying
> bug.  My laptop shows this, which looks like it might be related:
> 
>   Zone ranges:
>     DMA      [mem 0x0000000000001000-0x0000000000ffffff]
>     DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
>     Normal   [mem 0x0000000100000000-0x00000004217fffff]
>     Device   empty
> 
> What does your machine show?
> 
> > >Should we be using virt_to_phys() here?  Where exactly is the
> > >result ("msi->phys") used, i.e., what bus will that address appear
> > >on?  If it appears on the PCI side, this should probably use
> > >something like pcibios_resource_to_bus().

I had a quick chat with Robin (CC'ed, who has dealt/is dealing with most
of the work this thread relates to) and I think that as things stand,
for MSI physical addresses, an offset between PCI->host address shift is
not really contemplated (ie 1:1 host<->pci is assumed).

> > This address is written to two places.  First, into host's internal
> > register to let it know that when an incoming memory write comes
> > with this address, raise an MSI interrupt instead of forwarding it
> > to memory subsystem.  Second, into 'Message Address' field of
> > 'Message Address Register for MSI' register in end point's
> > configuration space (this is done by MSI framework) for end point to
> > know which address to be used to generate MSI interrupt.
> 
> Hmmm, ISTR some past discussion about this.  Here's a little: [1, 2].
> And this commit [3] sounds like it describes a similar hardware
> situation with Tegra where the host bridge intercepts the MSI target
> address, so writes to it never reach system memory.  That means that
> Tegra doesn't need to allocate system memory at all.
> 
> Is your system similar?  Can you just statically allocate a little bus
> address space, use that for the MSI target address, and skip the
> __get_free_pages()?

IIUC, all these host bridges need is a physical address that is routed
upstream to the host bridge by the PCIe tree (ie it is not part of the
host bridge windows), as long as the host bridge intercepts it (and
endpoints do _not_ reuse it for something else) that should be fine, as
it has been pointed out in this thread, allocating a page is a solution
- there may be others (which are likely to be platform specific).

> > >Do rcar_pcie_enable_msi() and xilinx_pcie_enable_msi() have a
> > >similar problem?  They both use GFP_KERNEL, then virt_to_phys(),
> > >then write the result of virt_to_phys() using a 32-bit register
> > >write.
> >
> > Well, if those systems deal with 64-bit addresses and when an end
> > point is connected which supports only 32-bit MSI addresses, this
> > problem will surface when __get_free_pages() returns an address that
> > translates to a >32-bit address after virt_to_phys() call on it.
> 
> I'd like to hear from the R-Car and Xilinx folks about (1) whether
> there's a potential issue with truncating a 64-bit address, and
> (2) whether that hardware works like Tegra, where the MSI write never
> reaches memory so we don't actually need to allocate a page.
> 
> If all we need is to allocate a little bus address space for the MSI
> target, I'd like to figure out a better way to do that than
> __get_free_pages().  The current code seems a little buggy, and
> it's getting propagated through several drivers.

I will look into this with Robin's help.

Thanks,
Lorenzo

> > >>  	afi_writel(pcie, msi->phys >> soc->msi_base_shift, AFI_MSI_FPCI_BAR_ST);
> 
> [1] https://lkml.kernel.org/r/20170824134451.GA31858@bhelgaas-glaptop.roam.corp.google.com
> [2] https://lkml.kernel.org/r/86efs3wesi.fsf@arm.com
> [3] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d7bd554f27c9
--
To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann Nov. 10, 2017, 11:57 a.m. UTC | #9
On Fri, Nov 10, 2017 at 10:37 AM, Thierry Reding <treding@nvidia.com> wrote:
> On Fri, Nov 10, 2017 at 12:11:05AM +0530, Vidya Sagar wrote:
>> On Thursday 09 November 2017 11:44 PM, Bjorn Helgaas wrote:
>> > [+cc Lorenzo]
>> > On Thu, Nov 09, 2017 at 12:48:14PM +0530, Vidya Sagar wrote:
>> > > On Thursday 09 November 2017 02:55 AM, Bjorn Helgaas wrote:
>> > > > On Mon, Nov 06, 2017 at 11:33:07PM +0530, Vidya Sagar wrote:
>> > > > > limits MSI target address to only 32-bit region to enable
>> > > > > some of the PCIe end points where only 32-bit MSIs
>> > > > > are supported work properly.
>> > > > > One example being Marvel SATA controller
>> > > > >
>> > > > > Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
>> > > > > ---
>> > > > >   drivers/pci/host/pci-tegra.c | 2 +-
>> > > > >   1 file changed, 1 insertion(+), 1 deletion(-)
>> > > > >
>> > > > > diff --git a/drivers/pci/host/pci-tegra.c b/drivers/pci/host/pci-tegra.c
>> > > > > index 1987fec1f126..03d3dcdd06c2 100644
>> > > > > --- a/drivers/pci/host/pci-tegra.c
>> > > > > +++ b/drivers/pci/host/pci-tegra.c
>> > > > > @@ -1531,7 +1531,7 @@ static int tegra_pcie_enable_msi(struct tegra_pcie *pcie)
>> > > > >       }
>> > > > >       /* setup AFI/FPCI range */
>> > > > > -     msi->pages = __get_free_pages(GFP_KERNEL, 0);
>> > > > > +     msi->pages = __get_free_pages(GFP_DMA, 0);
>> > > > >       msi->phys = virt_to_phys((void *)msi->pages);
>> > > > Should this be GFP_DMA32?  See the comment above the GFP_DMA
>> > > > definition.
>> > > looking at the comments for both GFP_DMA32 and GFP_DMA, I thought GFP_DMA32
>> > > is the correct one to use, but, even with that I got >32-bit addresses.
>> > > GFP_DMA always gives addresses in <4GB boundary (i.e. 32-bit).
>> > > I didn't dig into it to find out why is this the case.
>> > This sounds worth looking into (but maybe we don't need the
>> > __get_free_pages() at all; see below).  Maybe there's some underlying
>> > bug.  My laptop shows this, which looks like it might be related:
>> >
>> >    Zone ranges:
>> >      DMA      [mem 0x0000000000001000-0x0000000000ffffff]
>> >      DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
>> >      Normal   [mem 0x0000000100000000-0x00000004217fffff]
>> >      Device   empty
>> >
>> > What does your machine show?
>> I see following in my linux box
>>  Zone ranges:
>>    DMA      [mem 0x0000000000001000-0x0000000000ffffff]
>>    DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
>>    Normal   [mem 0x0000000100000000-0x000000106effffff]
>>    Device   empty
>>
>> and following in my T210 Tegra platform
>> Zone ranges:
>>   DMA      [mem 0x0000000080000000-0x00000000ffffffff]
>>   Normal   [mem 0x0000000100000000-0x000000017fffffff]
>
> This seems to be happening because 64-bit ARM doesn't have the
> ZONE_DMA32 Kconfig option, which seems to cause the DMA32 zone
> to default to the normal zone (see include/linux/gfp.h).
>
> That's very confusing in conjunction with the kerneldoc comment
> for GFP_DMA32 because it isn't actually guaranteed to give you
> 32-bit addresses for !ZONE_DMA32.
>
> Cc'ing Arnd who knows about these things.

I don't have a good answer unfortunately, but I agree there is clearly
something wrong with the behavior, AFAICT, ZONE_DMA32 should
really do what the name says.

When you look at the commit that introduced ZONE_DMA32, you
see that it used to be different, see commit a2f1b4249007
("[PATCH] x86_64: Add 4GB DMA32 zone"). ZONE_DMA32 used
to fall back to ZONE_DMA, but this got changed shortly after,
still in 2006.

      Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Robin Murphy Nov. 10, 2017, 12:04 p.m. UTC | #10
On 10/11/17 11:22, Lorenzo Pieralisi wrote:
> [+Robin]
> 
> On Thu, Nov 09, 2017 at 12:14:35PM -0600, Bjorn Helgaas wrote:
>> [+cc Lorenzo]
>>
>> On Thu, Nov 09, 2017 at 12:48:14PM +0530, Vidya Sagar wrote:
>>> On Thursday 09 November 2017 02:55 AM, Bjorn Helgaas wrote:
>>>> On Mon, Nov 06, 2017 at 11:33:07PM +0530, Vidya Sagar wrote:
>>>>> limits MSI target address to only 32-bit region to enable
>>>>> some of the PCIe end points where only 32-bit MSIs
>>>>> are supported work properly.
>>>>> One example being Marvel SATA controller
>>>>>
>>>>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
>>>>> ---
>>>>>   drivers/pci/host/pci-tegra.c | 2 +-
>>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/pci/host/pci-tegra.c b/drivers/pci/host/pci-tegra.c
>>>>> index 1987fec1f126..03d3dcdd06c2 100644
>>>>> --- a/drivers/pci/host/pci-tegra.c
>>>>> +++ b/drivers/pci/host/pci-tegra.c
>>>>> @@ -1531,7 +1531,7 @@ static int tegra_pcie_enable_msi(struct tegra_pcie *pcie)
>>>>>   	}
>>>>>   	/* setup AFI/FPCI range */
>>>>> -	msi->pages = __get_free_pages(GFP_KERNEL, 0);
>>>>> +	msi->pages = __get_free_pages(GFP_DMA, 0);
>>>>>   	msi->phys = virt_to_phys((void *)msi->pages);
>>>
>>>> Should this be GFP_DMA32?  See the comment above the GFP_DMA
>>>> definition.
>>>
>>> looking at the comments for both GFP_DMA32 and GFP_DMA, I thought GFP_DMA32
>>> is the correct one to use, but, even with that I got >32-bit addresses.
>>> GFP_DMA always gives addresses in <4GB boundary (i.e. 32-bit).
>>> I didn't dig into it to find out why is this the case.
>>
>> This sounds worth looking into (but maybe we don't need the
>> __get_free_pages() at all; see below).  Maybe there's some underlying
>> bug.  My laptop shows this, which looks like it might be related:
>>
>>    Zone ranges:
>>      DMA      [mem 0x0000000000001000-0x0000000000ffffff]
>>      DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
>>      Normal   [mem 0x0000000100000000-0x00000004217fffff]
>>      Device   empty
>>
>> What does your machine show?

The great thing about ZONE_DMA is that it has completely different 
meanings per platform. ZONE_DMA32 is even worse as it's more or less 
just an x86 thing, and isn't implemented (thus does nothing) on most 
other architectures, including ARM/arm64 where Tegra is relevant.

Fun.

>>>> Should we be using virt_to_phys() here?  Where exactly is the
>>>> result ("msi->phys") used, i.e., what bus will that address appear
>>>> on?  If it appears on the PCI side, this should probably use
>>>> something like pcibios_resource_to_bus().
> 
> I had a quick chat with Robin (CC'ed, who has dealt/is dealing with most
> of the work this thread relates to) and I think that as things stand,
> for MSI physical addresses, an offset between PCI->host address shift is
> not really contemplated (ie 1:1 host<->pci is assumed).

I think the most correct way to generate an address would actually be 
via the DMA mapping API (i.e. dma_map_page() with the host bridge's PCI 
device), which would take any relevant offsets into account. It's a bit 
funky, but there's some method in the madness.

>>> This address is written to two places.  First, into host's internal
>>> register to let it know that when an incoming memory write comes
>>> with this address, raise an MSI interrupt instead of forwarding it
>>> to memory subsystem.  Second, into 'Message Address' field of
>>> 'Message Address Register for MSI' register in end point's
>>> configuration space (this is done by MSI framework) for end point to
>>> know which address to be used to generate MSI interrupt.
>>
>> Hmmm, ISTR some past discussion about this.  Here's a little: [1, 2].
>> And this commit [3] sounds like it describes a similar hardware
>> situation with Tegra where the host bridge intercepts the MSI target
>> address, so writes to it never reach system memory.  That means that
>> Tegra doesn't need to allocate system memory at all.
>>
>> Is your system similar?  Can you just statically allocate a little bus
>> address space, use that for the MSI target address, and skip the
>> __get_free_pages()?
> 
> IIUC, all these host bridges need is a physical address that is routed
> upstream to the host bridge by the PCIe tree (ie it is not part of the
> host bridge windows), as long as the host bridge intercepts it (and
> endpoints do _not_ reuse it for something else) that should be fine, as
> it has been pointed out in this thread, allocating a page is a solution
> - there may be others (which are likely to be platform specific).
> 
>>>> Do rcar_pcie_enable_msi() and xilinx_pcie_enable_msi() have a
>>>> similar problem?  They both use GFP_KERNEL, then virt_to_phys(),
>>>> then write the result of virt_to_phys() using a 32-bit register
>>>> write.
>>>
>>> Well, if those systems deal with 64-bit addresses and when an end
>>> point is connected which supports only 32-bit MSI addresses, this
>>> problem will surface when __get_free_pages() returns an address that
>>> translates to a >32-bit address after virt_to_phys() call on it.
>>
>> I'd like to hear from the R-Car and Xilinx folks about (1) whether
>> there's a potential issue with truncating a 64-bit address, and
>> (2) whether that hardware works like Tegra, where the MSI write never
>> reaches memory so we don't actually need to allocate a page.
>>
>> If all we need is to allocate a little bus address space for the MSI
>> target, I'd like to figure out a better way to do that than
>> __get_free_pages().  The current code seems a little buggy, and
>> it's getting propagated through several drivers.

The really neat version is to take a known non-memory physical address 
like the host controller's own MMIO region, which has no legitimate 
reason to ever be used as a DMA address. pcie-mediatek almost gets this 
right, but by using virt_to_phys() on an ioremapped address they end up 
with nonsense rather than the correct address (although realistically 
you would have to be extremely unlucky for said nonsense to collide with 
a real DMA address given to a PCI endpoint later). Following on from 
above, dma_map_resource() would be the foolproof way to get that right.

Robin.

> 
> I will look into this with Robin's help.
> 
> Thanks,
> Lorenzo
> 
>>>>>   	afi_writel(pcie, msi->phys >> soc->msi_base_shift, AFI_MSI_FPCI_BAR_ST);
>>
>> [1] https://lkml.kernel.org/r/20170824134451.GA31858@bhelgaas-glaptop.roam.corp.google.com
>> [2] https://lkml.kernel.org/r/86efs3wesi.fsf@arm.com
>> [3] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d7bd554f27c9
--
To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thierry Reding Nov. 10, 2017, 1:07 p.m. UTC | #11
On Fri, Nov 10, 2017 at 12:04:21PM +0000, Robin Murphy wrote:
> On 10/11/17 11:22, Lorenzo Pieralisi wrote:
> > [+Robin]
> > 
> > On Thu, Nov 09, 2017 at 12:14:35PM -0600, Bjorn Helgaas wrote:
> > > [+cc Lorenzo]
> > > 
> > > On Thu, Nov 09, 2017 at 12:48:14PM +0530, Vidya Sagar wrote:
> > > > On Thursday 09 November 2017 02:55 AM, Bjorn Helgaas wrote:
> > > > > On Mon, Nov 06, 2017 at 11:33:07PM +0530, Vidya Sagar wrote:
> > > > > > limits MSI target address to only 32-bit region to enable
> > > > > > some of the PCIe end points where only 32-bit MSIs
> > > > > > are supported work properly.
> > > > > > One example being Marvel SATA controller
> > > > > > 
> > > > > > Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> > > > > > ---
> > > > > >   drivers/pci/host/pci-tegra.c | 2 +-
> > > > > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/drivers/pci/host/pci-tegra.c b/drivers/pci/host/pci-tegra.c
> > > > > > index 1987fec1f126..03d3dcdd06c2 100644
> > > > > > --- a/drivers/pci/host/pci-tegra.c
> > > > > > +++ b/drivers/pci/host/pci-tegra.c
> > > > > > @@ -1531,7 +1531,7 @@ static int tegra_pcie_enable_msi(struct tegra_pcie *pcie)
> > > > > >   	}
> > > > > >   	/* setup AFI/FPCI range */
> > > > > > -	msi->pages = __get_free_pages(GFP_KERNEL, 0);
> > > > > > +	msi->pages = __get_free_pages(GFP_DMA, 0);
> > > > > >   	msi->phys = virt_to_phys((void *)msi->pages);
> > > > 
> > > > > Should this be GFP_DMA32?  See the comment above the GFP_DMA
> > > > > definition.
> > > > 
> > > > looking at the comments for both GFP_DMA32 and GFP_DMA, I thought GFP_DMA32
> > > > is the correct one to use, but, even with that I got >32-bit addresses.
> > > > GFP_DMA always gives addresses in <4GB boundary (i.e. 32-bit).
> > > > I didn't dig into it to find out why is this the case.
> > > 
> > > This sounds worth looking into (but maybe we don't need the
> > > __get_free_pages() at all; see below).  Maybe there's some underlying
> > > bug.  My laptop shows this, which looks like it might be related:
> > > 
> > >    Zone ranges:
> > >      DMA      [mem 0x0000000000001000-0x0000000000ffffff]
> > >      DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
> > >      Normal   [mem 0x0000000100000000-0x00000004217fffff]
> > >      Device   empty
> > > 
> > > What does your machine show?
> 
> The great thing about ZONE_DMA is that it has completely different meanings
> per platform. ZONE_DMA32 is even worse as it's more or less just an x86
> thing, and isn't implemented (thus does nothing) on most other
> architectures, including ARM/arm64 where Tegra is relevant.
> 
> Fun.
> 
> > > > > Should we be using virt_to_phys() here?  Where exactly is the
> > > > > result ("msi->phys") used, i.e., what bus will that address appear
> > > > > on?  If it appears on the PCI side, this should probably use
> > > > > something like pcibios_resource_to_bus().
> > 
> > I had a quick chat with Robin (CC'ed, who has dealt/is dealing with most
> > of the work this thread relates to) and I think that as things stand,
> > for MSI physical addresses, an offset between PCI->host address shift is
> > not really contemplated (ie 1:1 host<->pci is assumed).
> 
> I think the most correct way to generate an address would actually be via
> the DMA mapping API (i.e. dma_map_page() with the host bridge's PCI device),
> which would take any relevant offsets into account. It's a bit funky, but
> there's some method in the madness.
> 
> > > > This address is written to two places.  First, into host's internal
> > > > register to let it know that when an incoming memory write comes
> > > > with this address, raise an MSI interrupt instead of forwarding it
> > > > to memory subsystem.  Second, into 'Message Address' field of
> > > > 'Message Address Register for MSI' register in end point's
> > > > configuration space (this is done by MSI framework) for end point to
> > > > know which address to be used to generate MSI interrupt.
> > > 
> > > Hmmm, ISTR some past discussion about this.  Here's a little: [1, 2].
> > > And this commit [3] sounds like it describes a similar hardware
> > > situation with Tegra where the host bridge intercepts the MSI target
> > > address, so writes to it never reach system memory.  That means that
> > > Tegra doesn't need to allocate system memory at all.
> > > 
> > > Is your system similar?  Can you just statically allocate a little bus
> > > address space, use that for the MSI target address, and skip the
> > > __get_free_pages()?
> > 
> > IIUC, all these host bridges need is a physical address that is routed
> > upstream to the host bridge by the PCIe tree (ie it is not part of the
> > host bridge windows), as long as the host bridge intercepts it (and
> > endpoints do _not_ reuse it for something else) that should be fine, as
> > it has been pointed out in this thread, allocating a page is a solution
> > - there may be others (which are likely to be platform specific).
> > 
> > > > > Do rcar_pcie_enable_msi() and xilinx_pcie_enable_msi() have a
> > > > > similar problem?  They both use GFP_KERNEL, then virt_to_phys(),
> > > > > then write the result of virt_to_phys() using a 32-bit register
> > > > > write.
> > > > 
> > > > Well, if those systems deal with 64-bit addresses and when an end
> > > > point is connected which supports only 32-bit MSI addresses, this
> > > > problem will surface when __get_free_pages() returns an address that
> > > > translates to a >32-bit address after virt_to_phys() call on it.
> > > 
> > > I'd like to hear from the R-Car and Xilinx folks about (1) whether
> > > there's a potential issue with truncating a 64-bit address, and
> > > (2) whether that hardware works like Tegra, where the MSI write never
> > > reaches memory so we don't actually need to allocate a page.
> > > 
> > > If all we need is to allocate a little bus address space for the MSI
> > > target, I'd like to figure out a better way to do that than
> > > __get_free_pages().  The current code seems a little buggy, and
> > > it's getting propagated through several drivers.
> 
> The really neat version is to take a known non-memory physical address like
> the host controller's own MMIO region, which has no legitimate reason to
> ever be used as a DMA address. pcie-mediatek almost gets this right, but by
> using virt_to_phys() on an ioremapped address they end up with nonsense
> rather than the correct address (although realistically you would have to be
> extremely unlucky for said nonsense to collide with a real DMA address given
> to a PCI endpoint later). Following on from above, dma_map_resource() would
> be the foolproof way to get that right.

Yes, that was our intention as well. Our initial plan was to use an
address from the PCI aperture within Tegra that wasn't being used for
any other purpose. However, we ran into some odd corner cases where this
wasn't working as expected. As a temporary solution we wanted to move to
GFP_DMA32 (or GFP_DMA) in order to support 32-bit only MSI endpoints.

Eventually we'll want to get rid of the allocation altogether, we just
need to find a set of values that work reliably. In the meantime, it
looks as though GFP_DMA would be the right solution as long as we have
to stick with __get_free_pages().

Alternatively, since we've already verified that the MSI writes are
never committed to memory, we could choose some random address pointing
to system memory as well, but I'm reluctant to do that because it could
end up being confusing for users (and developers) to see some random
address showing up somewhere. A physical address such as the beginning
of system memory should always work and might be unique enough to
indicate that it is special.

Thierry
Lorenzo Pieralisi Nov. 13, 2017, 11:06 a.m. UTC | #12
On Fri, Nov 10, 2017 at 12:04:21PM +0000, Robin Murphy wrote:

[...]

> >>>>Do rcar_pcie_enable_msi() and xilinx_pcie_enable_msi() have a
> >>>>similar problem?  They both use GFP_KERNEL, then virt_to_phys(),
> >>>>then write the result of virt_to_phys() using a 32-bit register
> >>>>write.
> >>>
> >>>Well, if those systems deal with 64-bit addresses and when an end
> >>>point is connected which supports only 32-bit MSI addresses, this
> >>>problem will surface when __get_free_pages() returns an address that
> >>>translates to a >32-bit address after virt_to_phys() call on it.
> >>
> >>I'd like to hear from the R-Car and Xilinx folks about (1) whether
> >>there's a potential issue with truncating a 64-bit address, and
> >>(2) whether that hardware works like Tegra, where the MSI write never
> >>reaches memory so we don't actually need to allocate a page.
> >>
> >>If all we need is to allocate a little bus address space for the MSI
> >>target, I'd like to figure out a better way to do that than
> >>__get_free_pages().  The current code seems a little buggy, and
> >>it's getting propagated through several drivers.
> 
> The really neat version is to take a known non-memory physical
> address like the host controller's own MMIO region, which has no
> legitimate reason to ever be used as a DMA address.

True - that could be safe enough.

What about IOVAs though ? I suspect we should reserve the MSI address
range - it looks like there is something missing here.

> pcie-mediatek almost gets this right, but by using virt_to_phys() on
> an ioremapped address they end up with nonsense rather than the
> correct address

That definitely needs fixing given that it works by chance.

> (although realistically you would have to be extremely unlucky for
> said nonsense to collide with a real DMA address given to a PCI
> endpoint later). Following on from above, dma_map_resource() would
> be the foolproof way to get that right.

That's how it should be done then lest it trickles into other drivers
requiring a similar set-up.

Thanks,
Lorenzo
--
To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lorenzo Pieralisi Nov. 20, 2017, 5:07 p.m. UTC | #13
[+ryder]

On Fri, Nov 10, 2017 at 02:07:05PM +0100, Thierry Reding wrote:

[...]

> > The really neat version is to take a known non-memory physical address like
> > the host controller's own MMIO region, which has no legitimate reason to
> > ever be used as a DMA address. pcie-mediatek almost gets this right, but by
> > using virt_to_phys() on an ioremapped address they end up with nonsense
> > rather than the correct address (although realistically you would have to be
> > extremely unlucky for said nonsense to collide with a real DMA address given
> > to a PCI endpoint later). Following on from above, dma_map_resource() would
> > be the foolproof way to get that right.
> 
> Yes, that was our intention as well. Our initial plan was to use an
> address from the PCI aperture within Tegra that wasn't being used for
> any other purpose.

Hi Thierry,

to wrap up this thread, why an address from the PCI aperture and
not a host bridge register ?

I CC'ed Ryder so that he can explain to me please what:

PCIE_MSI_VECTOR and PCIE_IMSI_ADDR offsets in:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/host/pcie-mediatek.c?h=v4.14

register space represent (IIUC the driver uses:

virt_to_phys(port->base + PCIE_MSI_VECTOR);

as MSI doorbell address, which is wrong anyway as Robin explained, just
want to understand how that register was chosen - it is never written or
read in the driver so it is hard to figure that out) to understand
whether the approach can be replicated instead of relying on GFP_DMA
for pages allocation.

Thanks,
Lorenzo

> However, we ran into some odd corner cases where this wasn't working
> as expected. As a temporary solution we wanted to move to GFP_DMA32
> (or GFP_DMA) in order to support 32-bit only MSI endpoints.
> 
> Eventually we'll want to get rid of the allocation altogether, we just
> need to find a set of values that work reliably. In the meantime, it
> looks as though GFP_DMA would be the right solution as long as we have
> to stick with __get_free_pages().
> 
> Alternatively, since we've already verified that the MSI writes are
> never committed to memory, we could choose some random address pointing
> to system memory as well, but I'm reluctant to do that because it could
> end up being confusing for users (and developers) to see some random
> address showing up somewhere. A physical address such as the beginning
> of system memory should always work and might be unique enough to
> indicate that it is special.
> 
> Thierry
--
To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lorenzo Pieralisi March 16, 2018, 5:23 p.m. UTC | #14
On Fri, Nov 10, 2017 at 12:11:05AM +0530, Vidya Sagar wrote:

[...]

> >>>>--- a/drivers/pci/host/pci-tegra.c
> >>>>+++ b/drivers/pci/host/pci-tegra.c
> >>>>@@ -1531,7 +1531,7 @@ static int tegra_pcie_enable_msi(struct tegra_pcie *pcie)
> >>>>  	}
> >>>>  	/* setup AFI/FPCI range */
> >>>>-	msi->pages = __get_free_pages(GFP_KERNEL, 0);
> >>>>+	msi->pages = __get_free_pages(GFP_DMA, 0);
> >>>>  	msi->phys = virt_to_phys((void *)msi->pages);
> >>>Should this be GFP_DMA32?  See the comment above the GFP_DMA
> >>>definition.
> >>looking at the comments for both GFP_DMA32 and GFP_DMA, I thought GFP_DMA32
> >>is the correct one to use, but, even with that I got >32-bit addresses.
> >>GFP_DMA always gives addresses in <4GB boundary (i.e. 32-bit).
> >>I didn't dig into it to find out why is this the case.
> >This sounds worth looking into (but maybe we don't need the
> >__get_free_pages() at all; see below).  Maybe there's some underlying
> >bug.  My laptop shows this, which looks like it might be related:
> >
> >   Zone ranges:
> >     DMA      [mem 0x0000000000001000-0x0000000000ffffff]
> >     DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
> >     Normal   [mem 0x0000000100000000-0x00000004217fffff]
> >     Device   empty
> >
> >What does your machine show?
> I see following in my linux box
>  Zone ranges:
>    DMA      [mem 0x0000000000001000-0x0000000000ffffff]
>    DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
>    Normal   [mem 0x0000000100000000-0x000000106effffff]
>    Device   empty
> 
> and following in my T210 Tegra platform
> Zone ranges:
>   DMA      [mem 0x0000000080000000-0x00000000ffffffff]
>   Normal   [mem 0x0000000100000000-0x000000017fffffff]
> 
> >
> >>>Should we be using virt_to_phys() here?  Where exactly is the
> >>>result ("msi->phys") used, i.e., what bus will that address appear
> >>>on?  If it appears on the PCI side, this should probably use
> >>>something like pcibios_resource_to_bus().
> >>This address is written to two places.  First, into host's internal
> >>register to let it know that when an incoming memory write comes
> >>with this address, raise an MSI interrupt instead of forwarding it
> >>to memory subsystem.  Second, into 'Message Address' field of
> >>'Message Address Register for MSI' register in end point's
> >>configuration space (this is done by MSI framework) for end point to
> >>know which address to be used to generate MSI interrupt.
> >Hmmm, ISTR some past discussion about this.  Here's a little: [1, 2].
> >And this commit [3] sounds like it describes a similar hardware
> >situation with Tegra where the host bridge intercepts the MSI target
> >address, so writes to it never reach system memory.  That means that
> >Tegra doesn't need to allocate system memory at all.
> >
> >Is your system similar?  Can you just statically allocate a little bus
> >address space, use that for the MSI target address, and skip the
> >__get_free_pages()?
> It is the same system where MSI memory writes don't really make it to
> system memory. But, the addresses mentioned in that patch don't work.
> we are working with hardware folks on understanding the issue better.
> Meanwhile, using GFP_DMA is giving consistent results and thought this
> can be used as a stop gap solution before we figure out a better bus
> address to be used as MSI target address.

Vidya,

I will mark this as superseded - by Thierry's patch:

https://patchwork.ozlabs.org/patch/848569/

even though we have not reached a real conclusion yet
on that patch - we should take this discussion from the
thread above.

Thank you,
Lorenzo

> >>>Do rcar_pcie_enable_msi() and xilinx_pcie_enable_msi() have a
> >>>similar problem?  They both use GFP_KERNEL, then virt_to_phys(),
> >>>then write the result of virt_to_phys() using a 32-bit register
> >>>write.
> >>Well, if those systems deal with 64-bit addresses and when an end
> >>point is connected which supports only 32-bit MSI addresses, this
> >>problem will surface when __get_free_pages() returns an address that
> >>translates to a >32-bit address after virt_to_phys() call on it.
> >I'd like to hear from the R-Car and Xilinx folks about (1) whether
> >there's a potential issue with truncating a 64-bit address, and
> >(2) whether that hardware works like Tegra, where the MSI write never
> >reaches memory so we don't actually need to allocate a page.
> >
> >If all we need is to allocate a little bus address space for the MSI
> >target, I'd like to figure out a better way to do that than
> >__get_free_pages().  The current code seems a little buggy, and
> >it's getting propagated through several drivers.
> >
> >>>>  	afi_writel(pcie, msi->phys >> soc->msi_base_shift, AFI_MSI_FPCI_BAR_ST);
> >[1] https://lkml.kernel.org/r/20170824134451.GA31858@bhelgaas-glaptop.roam.corp.google.com
> >[2] https://lkml.kernel.org/r/86efs3wesi.fsf@arm.com
> >[3] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d7bd554f27c9
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox series

Patch

diff --git a/drivers/pci/host/pci-tegra.c b/drivers/pci/host/pci-tegra.c
index 1987fec1f126..03d3dcdd06c2 100644
--- a/drivers/pci/host/pci-tegra.c
+++ b/drivers/pci/host/pci-tegra.c
@@ -1531,7 +1531,7 @@  static int tegra_pcie_enable_msi(struct tegra_pcie *pcie)
 	}
 
 	/* setup AFI/FPCI range */
-	msi->pages = __get_free_pages(GFP_KERNEL, 0);
+	msi->pages = __get_free_pages(GFP_DMA, 0);
 	msi->phys = virt_to_phys((void *)msi->pages);
 
 	afi_writel(pcie, msi->phys >> soc->msi_base_shift, AFI_MSI_FPCI_BAR_ST);