mbox series

[GIT,PULL] arm64 fix for 5.14

Message ID 20210826131747.GE26318@willie-the-truck
State New
Headers show
Series [GIT,PULL] arm64 fix for 5.14 | expand

Pull-request

git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git tags/arm64-fixes

Message

Will Deacon Aug. 26, 2021, 1:17 p.m. UTC
Hi Linus,

Please pull this single arm64 fix for 5.14. We received a report this week
that the generic version of pfn_valid(), which we switched to this merge
window in 16c9afc77660 ("arm64/mm: drop HAVE_ARCH_PFN_VALID"), interacts
badly with dma_map_resource() due to the following check:

        /* Don't allow RAM to be mapped */
        if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
                return DMA_MAPPING_ERROR;

Since the ongoing saga to determine the semantics of pfn_valid() is
unlikely to be resolved this week (does it indicate valid memory, or
just the presence of a struct page, or whether that struct page has been
initialised?), just revert back to our old version of pfn_valid() for
5.14.

Thanks,

Will

--->8

The following changes since commit bde8fff82e4a4b0f000dbf4d5eadab2079be0b56:

  arm64: initialize all of CNTHCTL_EL2 (2021-08-19 10:02:10 +0100)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git tags/arm64-fixes

for you to fetch changes up to 3eb9cdffb39701743973382860f214026f4d7825:

  Partially revert "arm64/mm: drop HAVE_ARCH_PFN_VALID" (2021-08-25 11:33:24 +0100)

----------------------------------------------------------------
arm64 fix for 5.14

- Fix dma_map_resource() by reverting back to old pfn_valid() code

----------------------------------------------------------------
Will Deacon (1):
      Partially revert "arm64/mm: drop HAVE_ARCH_PFN_VALID"

 arch/arm64/Kconfig            |  1 +
 arch/arm64/include/asm/page.h |  1 +
 arch/arm64/mm/init.c          | 37 +++++++++++++++++++++++++++++++++++++
 3 files changed, 39 insertions(+)

Comments

Linus Torvalds Aug. 26, 2021, 6:41 p.m. UTC | #1
On Thu, Aug 26, 2021 at 6:17 AM Will Deacon <will@kernel.org> wrote:
>
> Please pull this single arm64 fix for 5.14.

Pulled.

But adding Christoph to the cc, since I do think the eventual fix
needs to be in the DMA mapping code:

> We received a report this week
> that the generic version of pfn_valid(), which we switched to this merge
> window in 16c9afc77660 ("arm64/mm: drop HAVE_ARCH_PFN_VALID"), interacts
> badly with dma_map_resource() due to the following check:
>
>         /* Don't allow RAM to be mapped */
>         if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
>                 return DMA_MAPPING_ERROR;
>
> Since the ongoing saga to determine the semantics of pfn_valid() is
> unlikely to be resolved this week (does it indicate valid memory, or
> just the presence of a struct page, or whether that struct page has been
> initialised?), just revert back to our old version of pfn_valid() for
> 5.14.

I think that's the right thing for now, but yeah, that condition for
WARN_ON_ONCE() seems very questionable.

"pfn_valid()" is more about whether you can do a "pfn_to_page()" lookup on it.

II get the feeling that the dma-mapping code should allow pages that
are PageReserved() to be mapped - they aren't "ram" in the kernel
sense.

Perhaps also make sure it's not the zero page (which is
PageReserved(), but most definitely RAM).

In a PC world that would be (for example) the legacy PCI space at
0xa0000-0xfffff, but I could easily imagine other platforms having
other situations.

              Linus
pr-tracker-bot@kernel.org Aug. 26, 2021, 6:52 p.m. UTC | #2
The pull request you sent on Thu, 26 Aug 2021 14:17:48 +0100:

> git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git tags/arm64-fixes

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/1a6d80ff2419e8ad627b4bf4775a8b4c70af535d

Thank you!
Christoph Hellwig Aug. 27, 2021, 7:40 a.m. UTC | #3
On Thu, Aug 26, 2021 at 11:41:34AM -0700, Linus Torvalds wrote:
> "pfn_valid()" is more about whether you can do a "pfn_to_page()" lookup on it.
> 
> II get the feeling that the dma-mapping code should allow pages that
> are PageReserved() to be mapped - they aren't "ram" in the kernel
> sense.
> 
> Perhaps also make sure it's not the zero page (which is
> PageReserved(), but most definitely RAM).
> 
> In a PC world that would be (for example) the legacy PCI space at
> 0xa0000-0xfffff, but I could easily imagine other platforms having
> other situations.

So what would be the correct check for "this is not actually page backed
normal RAM"?
Linus Torvalds Aug. 27, 2021, 5:03 p.m. UTC | #4
On Fri, Aug 27, 2021 at 12:40 AM Christoph Hellwig <hch@lst.de> wrote:
>
> > In a PC world that would be (for example) the legacy PCI space at
> > 0xa0000-0xfffff, but I could easily imagine other platforms having
> > other situations.
>
> So what would be the correct check for "this is not actually page backed
> normal RAM"?

It would probably be interesting to have the arm people explain the
call chain for the warning that caused that revert, so we'd have a
very concrete example of the situation that goes wrong, but taking a
wild stab at it, the code might be something like

            /* Don't allow RAM to be mapped */
            if (WARN_ON_ONCE(phys_addr_is_ram(phys_addr)))
                    return DMA_MAPPING_ERROR;

and then having something like

  static inline bool phys_addr_is_ram(phys_addr_t phys_addr)
  {
        unsigned long pfn = PHYS_PFN(phys_addr);

        if (!pfn_valid(pfn))
                return false;
        return is_zero_pfn(pfn) || !PageReserved(pfn_to_page(pfn));
  }

might be close to right.

The ARM code actually uses that complex pfn_to_section_nr() and
memblock_is_memory() etc. That seems a bit of an overkill, since the
memblock code should have translated all that into being reserved.

But again, I don't actually know exactly what triggered the issue on
ARM, so the above is just my "this seems to be a more proper check"
suggestion.

Will?

                   Linus
Christoph Hellwig Aug. 27, 2021, 5:10 p.m. UTC | #5
On Fri, Aug 27, 2021 at 10:03:29AM -0700, Linus Torvalds wrote:
> The ARM code actually uses that complex pfn_to_section_nr() and
> memblock_is_memory() etc. That seems a bit of an overkill, since the
> memblock code should have translated all that into being reserved.
> 
> But again, I don't actually know exactly what triggered the issue on
> ARM, so the above is just my "this seems to be a more proper check"
> suggestion.

They CCed me on their earlier discussion, but I did not catch up on it
until you responded to the pull request  If I understood it correct it
was about a platform device mapping a MMIO region (like a PCI bar),
but something about section alignment cause pfn_valid to mistrigger.
Linus Torvalds Aug. 27, 2021, 5:16 p.m. UTC | #6
On Fri, Aug 27, 2021 at 10:10 AM Christoph Hellwig <hch@lst.de> wrote:
>
> They CCed me on their earlier discussion, but I did not catch up on it
> until you responded to the pull request  If I understood it correct it
> was about a platform device mapping a MMIO region (like a PCI bar),
> but something about section alignment cause pfn_valid to mistrigger.

Yeah, so I can easily see the maxpfn numbers can easily end up being
rounded up to a whole memory section etc.

I think my suggested solution should JustWork(tm) - exactly because if
the area is then in that "this pfn is valid" area, it will
double-check the actual underlying page.

That said, I think x86 avoids the problem another way - by just making
sure max_pfn is exact. That works too, as long as there are no holes
in the RAM map that might be used for PCI BAR's.

So I think arm could fix it that way too, depending on their memory layout.

(x86 has that traditional hole at A0000-FFFFF, but it's special enough
that it presumably is never an issue).

                  Linus
Will Deacon Aug. 31, 2021, 1:31 p.m. UTC | #7
[+David]

On Fri, Aug 27, 2021 at 10:16:27AM -0700, Linus Torvalds wrote:
> On Fri, Aug 27, 2021 at 10:10 AM Christoph Hellwig <hch@lst.de> wrote:
> >
> > They CCed me on their earlier discussion, but I did not catch up on it
> > until you responded to the pull request  If I understood it correct it
> > was about a platform device mapping a MMIO region (like a PCI bar),
> > but something about section alignment cause pfn_valid to mistrigger.
> 
> Yeah, so I can easily see the maxpfn numbers can easily end up being
> rounded up to a whole memory section etc.
> 
> I think my suggested solution should JustWork(tm) - exactly because if
> the area is then in that "this pfn is valid" area, it will
> double-check the actual underlying page.

I think the pitfall there is that the 'struct page' might well exist,
but isn't necessarily initialised with anything meaningful. I remember
seeing something like that in the past (I think for "no-map" memory) and
David's reply here:

https://lore.kernel.org/r/aff3942e-b9ce-5bae-8214-0e5d89cd071c@redhat.com

hints that there are still gotchas with looking at the page flags for
pages if the memory is offline or ZONE_DEVICE.

Don't get me wrong, I'd really like to use the generic code here as I
think it would help to set expectations around what pfn_valid() actually
means, I'm just less keen on the try-it-and-see-what-breaks approach
given how sensitive it is to the layout of the physical memory map.

> That said, I think x86 avoids the problem another way - by just making
> sure max_pfn is exact. That works too, as long as there are no holes
> in the RAM map that might be used for PCI BAR's.
> 
> So I think arm could fix it that way too, depending on their memory layout.

The physical memory map is the wild west, unfortunately. It's one of the
things where everybody does something different and it's very common to see
disjoint banks of memory placed seemingly randomly around.

Will
Catalin Marinas Aug. 31, 2021, 4:56 p.m. UTC | #8
On Fri, Aug 27, 2021 at 10:16:27AM -0700, Linus Torvalds wrote:
> On Fri, Aug 27, 2021 at 10:10 AM Christoph Hellwig <hch@lst.de> wrote:
> > They CCed me on their earlier discussion, but I did not catch up on it
> > until you responded to the pull request  If I understood it correct it
> > was about a platform device mapping a MMIO region (like a PCI bar),
> > but something about section alignment cause pfn_valid to mistrigger.
> 
> Yeah, so I can easily see the maxpfn numbers can easily end up being
> rounded up to a whole memory section etc.
> 
> I think my suggested solution should JustWork(tm) - exactly because if
> the area is then in that "this pfn is valid" area, it will
> double-check the actual underlying page.
> 
> That said, I think x86 avoids the problem another way - by just making
> sure max_pfn is exact. That works too, as long as there are no holes
> in the RAM map that might be used for PCI BAR's.
> 
> So I think arm could fix it that way too, depending on their memory layout.

The suggested solution in the original thread was to change the generic
DMA code to use memblock_is_memory() instead of pfn_valid():

https://lore.kernel.org/lkml/b720e7c8-ca44-0a25-480b-05bf49d03c35@redhat.com/

Given how late we discovered this in the -rc cycle, the decision was to
revert the pfn_valid() patch. We'll re-instate it at some point but
someone needs to sanity check the other pfn_valid() call sites and the
expected semantics.
David Hildenbrand Aug. 31, 2021, 7:16 p.m. UTC | #9
On 31.08.21 15:31, Will Deacon wrote:
> [+David]
> 
> On Fri, Aug 27, 2021 at 10:16:27AM -0700, Linus Torvalds wrote:
>> On Fri, Aug 27, 2021 at 10:10 AM Christoph Hellwig <hch@lst.de> wrote:
>>>
>>> They CCed me on their earlier discussion, but I did not catch up on it
>>> until you responded to the pull request  If I understood it correct it
>>> was about a platform device mapping a MMIO region (like a PCI bar),
>>> but something about section alignment cause pfn_valid to mistrigger.
>>
>> Yeah, so I can easily see the maxpfn numbers can easily end up being
>> rounded up to a whole memory section etc.
>>
>> I think my suggested solution should JustWork(tm) - exactly because if
>> the area is then in that "this pfn is valid" area, it will
>> double-check the actual underlying page.
> 
> I think the pitfall there is that the 'struct page' might well exist,
> but isn't necessarily initialised with anything meaningful. I remember
> seeing something like that in the past (I think for "no-map" memory) and
> David's reply here:
> 
> https://lore.kernel.org/r/aff3942e-b9ce-5bae-8214-0e5d89cd071c@redhat.com
> 
> hints that there are still gotchas with looking at the page flags for
> pages if the memory is offline or ZONE_DEVICE.
> 
> Don't get me wrong, I'd really like to use the generic code here as I
> think it would help to set expectations around what pfn_valid() actually
> means, I'm just less keen on the try-it-and-see-what-breaks approach
> given how sensitive it is to the layout of the physical memory map.
> 
>> That said, I think x86 avoids the problem another way - by just making
>> sure max_pfn is exact. That works too, as long as there are no holes
>> in the RAM map that might be used for PCI BAR's.
>>
>> So I think arm could fix it that way too, depending on their memory layout.
> 
> The physical memory map is the wild west, unfortunately. It's one of the
> things where everybody does something different and it's very common to see
> disjoint banks of memory placed seemingly randomly around.

The resource tree is usually the best place to really identify what's 
system RAM and what's not IIRC. memblock should work on applicable archs 
as well. Identifying ZONE_DEVICE ranges reliably is a different story ...